CN110969112B

CN110969112B - Pedestrian identity alignment method under camera-crossing scene

Info

Publication number: CN110969112B
Application number: CN201911189515.XA
Authority: CN
Inventors: 余春艳; 钟诗俊; 赖奇嵘
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2019-11-28
Filing date: 2019-11-28
Publication date: 2022-08-16
Anticipated expiration: 2039-11-28
Also published as: CN110969112A

Abstract

The invention proposes a pedestrian identity alignment method in a cross-camera scene, which solves the pedestrian association problem under the cross-camera, and is based on the continuous and correct tracking of the target pedestrian under a single camera. After the target leaves the camera field of view, Entering the blind spot, the tracking is intermittent, but when it reappears under the camera, it can re-identify the pedestrian and keep its identity unchanged and continue tracking. When the detector detects a new pedestrian, it is added to the candidate pool to be associated, and then a pair of pedestrians are selected from the candidate pool to complete the image preprocessing using the F‑CCT model, and the processed image will be used as the SAM‑Dets model. Enter the data and find the appearance fit of the two pedestrians. After the pedestrians in the candidate pool are paired to calculate the appearance fit, the minimum cost flow graph model is established according to the fit of each pair of pedestrians and combined with the space-time relationship to solve the optimal pedestrian correlation solution; finally, the pedestrians are kept original according to the correlation results. Identify or assign a new identity.

Description

Alignment of pedestrian identities across camera scenes

技术领域technical field

本发明属于机器视觉、智能安防领域，尤其涉及一种跨摄像头场景下行人身份对齐方法。The invention belongs to the fields of machine vision and intelligent security, and in particular relates to a pedestrian identity alignment method in a cross-camera scene.

背景技术Background technique

随着经济社会的不断发展，人们对于安全的需求越来越大。因此，智能安防领域不断的被讨论和发展，使用范围也越来越大，类似于行人检测和跟踪等技术开始成为当前研究的热点问题。对于行人追踪问题，自然是跨摄像头下的研究才具有实际意义，且要求要对多个行人进行处理。尽管现在单摄像头场景下的行人跟踪技术已经相对成熟，但是多摄像头，尤其是非重叠视野区域情形下由于盲区的存在使得目标的时空信息变得不再可靠，从而给不同时刻不同空间下不同摄像头中同一目标的识别跟踪和检索造成了很大的困扰。所以，跨摄像头场景下的行人追踪技术这一研究正在被兴起。其中，最重要的部分便是在不同的摄像头下怎么让行人身份匹配。With the continuous development of economy and society, people's demand for safety is increasing. Therefore, the field of intelligent security is constantly being discussed and developed, and the scope of application is getting larger and larger, and technologies such as pedestrian detection and tracking have become a hot issue in current research. For the pedestrian tracking problem, it is naturally the research under the cross-camera that has practical significance, and requires the processing of multiple pedestrians. Although the pedestrian tracking technology in single-camera scenarios is relatively mature, the presence of blind spots in multi-cameras, especially in non-overlapping fields of view, makes the spatiotemporal information of the target unreliable. The identification, tracking and retrieval of the same target has caused great trouble. Therefore, research on pedestrian tracking technology in cross-camera scenarios is emerging. Among them, the most important part is how to match pedestrian identities under different cameras.

跨摄像头行人身份对齐主要是以行人为研究对象，关注非重叠视野区域的多摄像头多目标跟踪问题。这一问题，目前常见的解决机制是分为两个步骤：首先，使用检测和跟踪算法获得目标在单摄像头下得运行轨迹。其次，使用关联算法将摄像头间独立的行人运行轨迹进行关联整合，从而获得每个目标完整的运动轨迹。上述机制的局限在于只能处理离线的数据，本质上适用于检索场景，无法支持在线跟踪。究其原因在于，目标行人离开当前摄像头视域后，由于盲区的存在目标在进入下一摄像头视域时，时空信息均缺失，增加了将目标行人从上一个摄像头正确移交给下一个摄像头的难度。这种机制还产生了副效应:使得跨摄像头行人跟踪的结果严重依赖单摄像头下得行人跟踪效果。Cross-camera pedestrian identity alignment mainly focuses on pedestrians and focuses on multi-camera and multi-target tracking problems in non-overlapping visual fields. The current common solution mechanism for this problem is divided into two steps: First, use the detection and tracking algorithm to obtain the running trajectory of the target under a single camera. Secondly, use the association algorithm to associate and integrate the independent pedestrian running trajectories between cameras, so as to obtain the complete motion trajectory of each target. The limitation of the above mechanism is that it can only process offline data, which is essentially suitable for retrieval scenarios and cannot support online tracking. The reason is that after the target pedestrian leaves the field of view of the current camera, due to the existence of the blind spot, when the target enters the field of view of the next camera, the spatiotemporal information is missing, which increases the difficulty of correctly handing over the target pedestrian from the previous camera to the next camera. . This mechanism also has a side effect: the results of cross-camera pedestrian tracking rely heavily on the pedestrian tracking effect under a single camera.

实现行人跨摄像头身份对齐的关键在于将不同视域内的同一目标行人正确关联。针对目前大部分跨摄像头行人跟踪算法中对于行人特征学习能力有限，并不能学习到较为鲁棒的行人特征。因此，最终影响到了后面的行人相似性度量的精度，最终造成了并不理想的数据关联结果。所以很难适应于跨摄像头行人跟踪的复杂环境。The key to achieving cross-camera identity alignment of pedestrians is to correctly associate the same target pedestrians in different fields of view. For most of the current cross-camera pedestrian tracking algorithms, the learning ability of pedestrian features is limited, and more robust pedestrian features cannot be learned. Therefore, the accuracy of the following pedestrian similarity measurement is finally affected, resulting in an unsatisfactory data association result. Therefore, it is difficult to adapt to the complex environment of cross-camera pedestrian tracking.

尽管现有的跨摄像头身份对齐相关的研究可以比较有效地解决一些对于离线数据的行人跟踪，但是并不能满足对于需要即时在线跟踪的要求，并且也不能在未知行人进出区域的时候进行有效的跟踪。Although the existing research related to cross-camera identity alignment can effectively solve some pedestrian tracking for offline data, it cannot meet the requirements for real-time online tracking, and it cannot effectively track unknown pedestrians entering and leaving the area. .

发明内容SUMMARY OF THE INVENTION

为了克服现有技术存在的空白和不足，本发明的方案在于解决跨摄像头下的行人关联问题，主要功能是给新进入目标分配新的身份表示或者与之前离开的目标关联成功，则将之前的目标赋予新的身份标识。其以对单摄像头下的目标行人进行持续正确的跟踪为基础，在目标离开摄像头视野区域后，进入盲区，跟踪断续，但当其再次出现在摄像头下时，能够重新识别出该行人并维持其身份标识不变延续跟踪。当检测器检测到新的行人时，将其加入待关联的候选池中，之后从候选池中选择一对行人使用F-CCT模型完成图像预处理，处理后的图像将作为SAM-Dets模型的输入数据并求得两个行人的外观适配度。候选池中的行人两两配对计算完外观适配度后，根据每对行人适配度并结合时空关系建立最小费用流图模型求解最优的行人关联解；最后根据关联结果对行人保持原有标识或者赋予新的身份标识，交由跟踪器延续跟踪。In order to overcome the gaps and deficiencies in the prior art, the solution of the present invention is to solve the pedestrian association problem under cross-cameras. The target is given a new identity. It is based on the continuous and correct tracking of the target pedestrian under a single camera. After the target leaves the camera's field of view, it enters the blind spot and the tracking is intermittent, but when it reappears under the camera, the pedestrian can be re-identified and maintained. Its identity does not change and continues tracking. When the detector detects a new pedestrian, it is added to the candidate pool to be associated, and then a pair of pedestrians are selected from the candidate pool to complete the image preprocessing using the F-CCT model, and the processed image will be used as the SAM-Dets model. Enter the data and find the appearance fit of the two pedestrians. After the pedestrians in the candidate pool are paired to calculate the appearance fit, the minimum cost flow graph model is established according to the fit of each pair of pedestrians and combined with the space-time relationship to solve the optimal pedestrian correlation solution; finally, the pedestrians are kept original according to the correlation results. Identify or assign a new identity to the tracker to continue tracking.

本发明具体采用以下技术方案：The present invention specifically adopts the following technical solutions:

一种跨摄像头场景下行人身份对齐方法，其特征在于，包括以下步骤：A method for aligning pedestrian identities across camera scenes, comprising the following steps:

步骤S1：多个摄像头各自将其通过检测器检测到的行人图像加入待关联的候选池中；步骤S2：对待关联的候选池中属于不同摄像头的两个行人图像计算外观适配度；Step S1: each of the plurality of cameras adds the pedestrian images detected by the detector to the candidate pool to be associated; Step S2: calculates the appearance fit of two pedestrian images belonging to different cameras in the candidate pool to be associated;

步骤S3：将待关联的候选池中的行人图像两两配对计算完外观适配度后，根据每对行人外观适配度，结合时空关系，建立最小费用流图模型，求解最优的行人关联解；Step S3: After the pedestrian images in the candidate pool to be associated are paired to calculate the appearance fit degree, according to the appearance fit degree of each pair of pedestrians, combined with the spatiotemporal relationship, a minimum cost flow graph model is established to solve the optimal pedestrian association untie;

步骤S4：根据步骤S3的关联结果，对行人进行保持原有标识或者赋予新的身份标识的操作。Step S4: According to the association result of step S3, the pedestrian is maintained with the original identification or given a new identification.

优选地，在步骤S1中，所述检测器为Faster R-CNN。Preferably, in step S1, the detector is Faster R-CNN.

优选地，步骤S2中，计算外观适配度具体包括以下步骤：Preferably, in step S2, calculating the degree of appearance fit specifically includes the following steps:

步骤A21：使用模糊C均值聚类F-CCT模型完成图像预处理，设行人图像A的整体性特征为X＝{x₁,x₂,...,x_N}，行人图像B的整体性特征为Y＝{y₁,y₂,...,y_N}；Step A21: Use the fuzzy C-means clustering F-CCT model to complete image preprocessing, and set the overall feature of pedestrian image A to be X={x ₁ , x ₂ ,...,x _N }, and the overall feature of pedestrian image B The characteristic is Y={y ₁ , y ₂ ,...,y _N };

步骤A22：将步骤S21处理后的图像作为融合细粒度表征的行人关联模型SAM-Dets的输入数据：以X为输入向量，Y为权值向量，通过所述融合细粒度表征的行人关联模型SAM-Dets编码行人A具有的局部细粒度特征f₁；以Y为输入向量，X为权值向量，通过所述融合细粒度表征的行人关联模型SAM-Dets编码行人B具有的局部细粒度特征f₂；Step A22: The image processed in step S21 is used as the input data of the pedestrian association model SAM-Dets fused with fine-grained representation: X is the input vector, Y is the weight vector, and the pedestrian association model SAM-Dets fused with fine-grained representation is used as the input vector. -Dets encode the local fine-grained feature f ₁ of pedestrian A; take Y as the input vector and X as the weight vector, encode the local fine-grained feature f of pedestrian B by the pedestrian association model SAM-Dets fused with fine-grained representation ₂ ;

步骤A23：将f_s＝(f₁-f₂)²作为两个核大小为1×1×4096的卷积层C的输入值，将softmax作为输出函数，输出一个二维向量(q₁,q₂)，表示输入两个对象属于现实世界中同一个人的概率值，作为外观适配度。Step A23: Take f _s =(f ₁ -f ₂ ) ² as the input value of two convolutional layers C with kernel size of 1×1×4096, take softmax as the output function, and output a two-dimensional vector (q ₁ , q ₂ ), representing the probability value that the input two objects belong to the same person in the real world, as the appearance fit degree.

优选地，在步骤A22中，所述融合细粒度表征的行人关联模型SAM-Dets的结构包括：K个注意力分支和拼接层；每一所述注意力分支均包括以下六层，其中：Preferably, in step A22, the structure of the pedestrian association model SAM-Dets fused with fine-grained representation includes: K attention branches and splicing layers; each of the attention branches includes the following six layers, wherein:

第一层为卷积层A，用于提取输入的行人整体特征的高层特征；The first layer is the convolutional layer A, which is used to extract the high-level features of the overall pedestrian features of the input;

第二层为激活层，激活函数为softmax；The second layer is the activation layer, and the activation function is softmax;

第三层为维度扩大层；The third layer is the dimension expansion layer;

第四层为求和层，将行人整体特征与第三层获得的结果相加；The fourth layer is the summation layer, which adds the overall pedestrian characteristics to the results obtained in the third layer;

第五层为全局平均池化层，用于降低特征维度；The fifth layer is the global average pooling layer, which is used to reduce the feature dimension;

第六层为全连接层，用于完成输入向量与权值矩阵中的权值向量的内积计算；The sixth layer is the fully connected layer, which is used to complete the inner product calculation of the input vector and the weight vector in the weight matrix;

所述拼接层将K个注意力分支得到的结果按通道拼接，输出行人局部细粒度特征。The splicing layer splices the results obtained by the K attention branches by channel, and outputs the local fine-grained features of pedestrians.

优选地，所述卷积层A的卷积核尺寸为1×1，步长为1；所述维度扩大层将通道维度扩大为512维；所述全局平均池化层的尺寸为1×1，步长为1。Preferably, the size of the convolution kernel of the convolutional layer A is 1×1, and the stride is 1; the dimension expansion layer expands the channel dimension to 512 dimensions; the size of the global average pooling layer is 1×1 , with a step size of 1.

步骤B21：使用模糊C均值聚类F-CCT模型完成图像预处理，设行人图像A的整体性特征为X＝{x₁,x₂,...,x_N}，行人图像B的整体性特征为Y＝{y₁,y₂,...,y_N}；Step B21: Use the fuzzy C-means clustering F-CCT model to complete image preprocessing, set the overall feature of pedestrian image A as X={x ₁ ,x ₂ ,...,x _N }, the overall feature of pedestrian image B The characteristic is Y={y ₁ , y ₂ ,...,y _N };

步骤B22：将行人图像A和行人图像B经过DR-ResNet基础网络提取行人抽象特征；Step B22: Extract the pedestrian abstract features from the pedestrian image A and the pedestrian image B through the DR-ResNet basic network;

步骤B23：使用卷积层B进一步提取目标行人高层特征，作为分类模型和融合细粒度表征的行人关联模型SAM-Dets的输入数据；所述分类模型分别输出行人图像A和行人图像B各自的身份识别表示号，所述融合细粒度表征的行人关联模型SAM-Dets输出行人A具有的局部细粒度特征f₁，行人B具有的局部细粒度特征f₂。Step B23: Use the convolutional layer B to further extract the high-level features of the target pedestrian as the input data of the classification model and the pedestrian association model SAM-Dets fused with fine-grained representation; the classification model outputs the respective identities of the pedestrian image A and the pedestrian image B respectively Identifying the representation number, the pedestrian association model SAM-Dets fused with fine-grained representation outputs the local fine-grained feature f ₁ possessed by pedestrian A and the local fine-grained feature f ₂ possessed by pedestrian B.

步骤B24：将f_s＝(f₁-f₂)²作为两个核大小为1×1×4096的卷积层C的输入值，将softmax作为输出函数，输出一个二维向量(q₁,q₂)，表示输入两个对象属于现实世界中同一个人的概率值，作为外观适配度。Step B24: take f _s =(f ₁ -f ₂ ) ² as the input value of the two convolutional layers C with a kernel size of 1×1×4096, take softmax as the output function, and output a two-dimensional vector (q ₁ , q ₂ ), representing the probability value that the input two objects belong to the same person in the real world, as the appearance fit degree.

优选地，所述DR-ResNet基础网络包括两个权重共享的完全相同的深度卷积孪生神经基础网络模块R-ResNet；所述深度卷积孪生神经基础网络模块R-ResNet的结构包括四十九层卷积层、三个并行的卷积层、以及末端卷积层：Preferably, the DR-ResNet basic network includes two identical deep convolutional Siamese neural basic network modules R-ResNet with shared weights; the structure of the deep convolutional Siamese neural basic network module R-ResNet includes forty-nine layer convolutional layers, three parallel convolutional layers, and end convolutional layers:

其中，第一卷积层的卷积核大小为(7，7，64)，max-pooling为(3,3)，滑动步长为2；Among them, the convolution kernel size of the first convolution layer is (7, 7, 64), the max-pooling is (3, 3), and the sliding step size is 2;

第二卷积层到第四卷积层的卷积核大小分别为(1,1,64)、(3,3,64)、(1,1,256)，激活函数都采用ReLu函数；该三层卷积层和激活函数组成一个卷积块，将该卷积块的输入值既作为第二卷积层的输入值，也作为卷积块的第三层激活函数的输入值；第五卷积层到第七卷积层，以及第八卷积层到第十卷积层都采用与第二卷积层到第四卷积层相同的结构；The convolution kernel sizes from the second convolutional layer to the fourth convolutional layer are (1,1,64), (3,3,64), (1,1,256) respectively, and the activation functions all use the ReLu function; the three layers The convolution layer and the activation function form a convolution block, and the input value of the convolution block is used both as the input value of the second convolution layer and the input value of the activation function of the third layer of the convolution block; the fifth convolution Layers to the seventh convolutional layer, and the eighth convolutional layer to the tenth convolutional layer all use the same structure as the second to the fourth convolutional layer;

第十一卷积层到第十三卷积层的卷积核大小分别为(1，1，128)、(3，3，128)、(1，1，512)，激活函数都采用ReLu函数；该三层卷积层和激活函数组成一个卷积块，将该卷积块的输入值既作为第十一层卷积层的输入值，也作为卷积块的第三层激活函数的输入值；第十四卷积层到第十六卷积层、第十七卷积层到第十九卷积层以及第二十卷积层到第二十二卷积层都采了与第十一卷积层到第十三卷积层相同的结构；The convolution kernel sizes of the eleventh convolutional layer to the thirteenth convolutional layer are (1, 1, 128), (3, 3, 128), (1, 1, 512), and the activation functions all use the ReLu function ; The three-layer convolution layer and the activation function form a convolution block, and the input value of the convolution block is used as the input value of the eleventh convolution layer and the input of the activation function of the third layer of the convolution block. value; the fourteenth convolutional layer to the sixteenth convolutional layer, the seventeenth convolutional layer to the nineteenth convolutional layer, and the twentieth convolutional layer to the twenty-second convolutional layer all adopt the same value as the tenth convolutional layer. The same structure from a convolutional layer to the thirteenth convolutional layer;

第二十三卷积层到第二十五卷积层的卷积核大小分别为(1，1，256)、(3，3，256)、(1，1，1024)，激活函数都采用ReLu函数；该三层卷积层和激活函数组成一个卷积块，将该卷积块的输入值既作为第二十三卷积层的输入值，也作为卷积块的第三层激活函数的输入值；第二十六卷积层到第二十八卷积层、第二十九卷积层到三十一卷积层、第三十二卷积层到第三十四卷积层、第三十五卷积层到三十七卷积层、第三十八卷积层到第四十卷积层均采用与第二十三卷积层到二十五卷积层相同的结构；The convolution kernel sizes of the twenty-third convolutional layer to the twenty-fifth convolutional layer are (1, 1, 256), (3, 3, 256), (1, 1, 1024), respectively, and the activation functions are all used ReLu function; the three-layer convolution layer and the activation function form a convolution block, and the input value of the convolution block is used as the input value of the twenty-third convolution layer and the activation function of the third layer of the convolution block. The input value of ; the twenty-sixth convolutional layer to the twenty-eighth convolutional layer, the twenty-ninth convolutional layer to the thirty-first convolutional layer, the thirty-second convolutional layer to the thirty-fourth convolutional layer , the thirty-fifth convolutional layer to the thirty-seventh convolutional layer, and the thirty-eighth convolutional layer to the fortieth convolutional layer all adopt the same structure as the twenty-third convolutional layer to the twenty-fifth convolutional layer ;

第四十一卷积层到第四十三卷积层的卷积核大小分别为(1，1，512)、(3，3，512)、(1，1，2048)，激活函数都采用ReLu函数；该三层卷积层和激活函数组成一个卷积块，将该卷积块的输入值既作为第四十一卷积层的输入值，也作为卷积块的第三层激活函数的输入值；第四十四卷积层到第四十六卷积层、第四十七卷积层到四十九卷积层采用与第四十一卷积层到第四十二卷积层相同的结构；The convolution kernel sizes of the forty-first convolutional layer to the forty-third convolutional layer are (1, 1, 512), (3, 3, 512), (1, 1, 2048), respectively, and the activation functions are all used ReLu function; the three-layer convolution layer and the activation function form a convolution block, and the input value of the convolution block is used as both the input value of the 41st convolution layer and the activation function of the third layer of the convolution block. The input value of ; the forty-fourth convolutional layer to the forty-sixth convolutional layer, the forty-seventh convolutional layer to the forty-ninth convolutional layer are convolutional with the forty-first convolutional layer to the forty-second convolutional layer layers of the same structure;

在第四十九卷积层后为三个并行的卷积层，每个卷积层，使用2048个卷积核，第一并行卷积层到第三并行卷积层的尺寸大小分别为(3,3,1024)、(5,5,1024)和(7,7,1024)，通过一个连接层将三个并行的卷积层的通道进行合并，其后的max-pooling为(4,4)；After the forty-ninth convolutional layer, there are three parallel convolutional layers. Each convolutional layer uses 2048 convolution kernels. The sizes of the first parallel convolutional layer to the third parallel convolutional layer are ( 3, 3, 1024), (5, 5, 1024) and (7, 7, 1024), the channels of the three parallel convolutional layers are merged through a connection layer, and the subsequent max-pooling is (4, 4);

最后一层是使用1024个卷积核，且尺寸大小为(2,2,2048)的末端卷积层；The last layer is an end convolutional layer with 1024 convolution kernels and a size of (2, 2, 2048);

所述卷积层B使用2个卷积核，尺寸大小为(1，1，4096)；The convolutional layer B uses 2 convolution kernels with a size of (1, 1, 4096);

所述所述融合细粒度表征的行人关联模型SAM-Dets的结构包括：K个注意力分支和拼接层；每一所述注意力分支均包括以下六层，其中：The structure of the pedestrian association model SAM-Dets fused with fine-grained representation includes: K attention branches and splicing layers; each of the attention branches includes the following six layers, wherein:

第一层为卷积层A，卷积核尺寸为1×1，步长为1；用于提取输入的行人整体特征的高层特征；The first layer is the convolution layer A, the size of the convolution kernel is 1×1, and the stride is 1; it is used to extract the high-level features of the input pedestrian overall features;

第三层为维度扩大层，将通道维度扩大为512维；The third layer is the dimension expansion layer, which expands the channel dimension to 512 dimensions;

第五层为全局平均池化层，尺寸为1×1，步长为1，用于降低特征维度；The fifth layer is a global average pooling layer with a size of 1×1 and a step size of 1, which is used to reduce the feature dimension;

优选地，在步骤S3中，根据每对行人外观适配度，结合时空关系，建立最小费用流图模型，求解最优的行人关联解的具体过程，包括以下步骤：Preferably, in step S3, according to the appearance fit of each pair of pedestrians, combined with the space-time relationship, a minimum cost flow graph model is established, and the specific process of solving the optimal pedestrian association solution includes the following steps:

步骤S31：设给定t_p-1时刻下的完成即时对齐后的费用流图为

则当t_p时刻，为视野内行人集

和出视野行人集

中每一个目标新增进出两个节点，更新新增节点与源点、汇点间有向边连接；Step S31: Set the cost flow graph after the real-time alignment is completed at a given time t _p -1 as

Then at time t _p , it is the set of pedestrians in the field of view

and out of sight pedestrian set

In each target, two new nodes are added, and the new node is updated with the directed edge connection between the source node and the sink node;

步骤S32：根据视野内行人集

和出视野行人集

中两两目标间的行人外观适配度，更新相应节点间有向边，得到t_p时刻新的费用流图

Step S32: According to the set of pedestrians in the field of view

and out of sight pedestrian set

The pedestrian appearance adaptation degree between the two targets in the middle, update the directed edges between the corresponding nodes, and obtain a new cost flow graph at time t _p

步骤S33：删除所有对齐目标节点以及视野内行人集

剩余未对齐的目标节点，并将视野内行人集

剩余未对齐的目标节点为新进入的目标行人，得到费用流图

等待下一时刻更新对齐。Step S33: Delete all alignment target nodes and pedestrian sets in the field of view

Remaining unaligned target nodes and set pedestrians within the field of view

The remaining unaligned target nodes are the newly entered target pedestrians, and the cost flow graph is obtained

Wait for the next moment to update the alignment.

优选地，在步骤S4中，对行人进行保持原有标识或者赋予新的身份标识的操作之后，交由跟踪器延续追踪；所述跟踪器采用采用KCF算法进行跟踪；所述KCF算法为每个行人分配一个跟踪器。Preferably, in step S4, after the operation of maintaining the original identification or giving a new identification identification to the pedestrian, the tracker is handed over to the tracker to continue tracking; the tracker adopts the KCF algorithm to track; the KCF algorithm is for each Pedestrians are assigned a tracker.

优选地，将摄像头的跟踪视野区域划分为核心区域和临界区域，在步骤S1中，通过所述检测器只检测临界区域的行人。Preferably, the tracking visual field area of the camera is divided into a core area and a critical area, and in step S1, only pedestrians in the critical area are detected by the detector.

相较于现有技术，本发明及其优选方案实现了跨摄像头行人的在线跟踪，其识别准确、效率高，且不受目标离开摄像头视野区域后，进入盲区，跟踪断续的影响。Compared with the prior art, the present invention and its preferred solution realize online tracking of pedestrians across cameras, with accurate identification and high efficiency, and are not affected by intermittent tracking after the target leaves the field of view of the camera and enters the blind spot.

其中，跟踪功能的实现由现有成熟FasterR-CNN实现行人检测，由KCF算法实现在线行人跟踪；而作为实现本发明及优选方案的核心在于根据时空信息和行人相似性值建立最小费用流图模型即时完成行人身份对齐任务，以及整合了F-CCT和SAM-Dets模型或SAM-Dets模型和DR-ResNet网络模型融合，解决了行人的外观适配度度量的问题。Among them, the realization of the tracking function is realized by the existing mature FasterR-CNN to realize the pedestrian detection, and the KCF algorithm is used to realize the online pedestrian tracking; and the core of the realization of the present invention and the preferred solution is to establish the minimum cost flow graph model according to the space-time information and the pedestrian similarity value. The task of pedestrian identity alignment is instantly completed, and the F-CCT and SAM-Dets models or the fusion of the SAM-Dets model and the DR-ResNet network model are integrated to solve the problem of pedestrian appearance fitness measurement.

附图说明Description of drawings

下面结合附图和具体实施方式对本发明进一步详细的说明：The present invention will be described in further detail below in conjunction with the accompanying drawings and specific embodiments:

图1是本发明实施例1整体流程示意图；1 is a schematic diagram of the overall flow of Embodiment 1 of the present invention;

图2是本发明实施例SAM-Dets模型结构示意图；2 is a schematic structural diagram of a SAM-Dets model according to an embodiment of the present invention;

图3是本发明实施例SAM-Dets模型网络结构示意图；3 is a schematic diagram of a network structure of a SAM-Dets model according to an embodiment of the present invention;

图4是本发明实施例2行人细粒度关联模型示意图；4 is a schematic diagram of a fine-grained association model for pedestrians in Embodiment 2 of the present invention;

图5是本发明实施例2在跨摄像头场景下行人身份对齐模型效果示意图1；5 is a schematic diagram 1 of the effect of a pedestrian identity alignment model in a cross-camera scene according to Embodiment 2 of the present invention;

图6是本发明实施例2在跨摄像头场景下行人身份对齐模型效果示意图2。FIG. 6 is a schematic diagram 2 of the effect of a pedestrian identity alignment model in a cross-camera scene according to Embodiment 2 of the present invention.

具体实施方式Detailed ways

为让本专利的特征和优点能更明显易懂，下文特举2个实施例，作详细说明如下：In order to make the features and advantages of this patent more obvious and easy to understand, two embodiments are given below, which are described in detail as follows:

如图1所示，在本发明的第一个实施例中，实现跨摄像头场景下行人身份对齐的整体方案包括以下步骤：As shown in FIG. 1, in the first embodiment of the present invention, the overall solution for realizing pedestrian identity alignment across camera scenes includes the following steps:

步骤S1：当行人进入到摄像头视野，多个摄像头各自将其通过检测器检测到的行人图像加入待关联的候选池中。Step S1: When the pedestrian enters the field of view of the camera, each of the plurality of cameras adds the pedestrian image detected by the detector to the candidate pool to be associated.

在本实施例中，检测器选用基于深度学习的目标检测代表性方法Faster R-CNN，其在选择候选框时使用RPN以检测网络共享全图的卷积特征，让分类和回归任务有相同的卷积特征。In this embodiment, the detector selects Faster R-CNN, a representative method of target detection based on deep learning, which uses RPN when selecting candidate frames to detect the convolutional features of the entire image shared by the network, so that the classification and regression tasks have the same Convolutional features.

步骤S2：对待关联的候选池中属于不同摄像头的两个行人图像计算外观适配度。Step S2: Calculate the appearance fit of two pedestrian images belonging to different cameras in the candidate pool to be associated.

在实施例中，计算外观适配度具体包括以下步骤：In an embodiment, calculating the appearance fit specifically includes the following steps:

步骤A21：使用模糊C均值聚类F-CCT模型完成图像预处理，设行人图像A的整体性特征为X＝{x₁,x₂,...,x_N}，行人图像B的整体性特征为Y＝{y₁,y₂,...,y_N}；使用模糊聚类算法对图像进行聚类域划分，通过源和目标图像的聚类域匹配，在聚类域间实现局部颜色亮度迁移，并引入隶属度因子提高颜色亮度迁移效果。Step A21: Use the fuzzy C-means clustering F-CCT model to complete image preprocessing, and set the overall feature of pedestrian image A to be X={x ₁ , x ₂ ,...,x _N }, and the overall feature of pedestrian image B The feature is Y={y ₁ , y ₂ ,...,y _N }; use fuzzy clustering algorithm to divide the clustering domain of the image, and achieve localization between the clustering domains by matching the clustering domains of the source and target images. Color brightness migration, and the introduction of membership factor to improve the effect of color brightness migration.

步骤A22：将步骤S21处理后的图像作为融合细粒度表征的行人关联模型SAM-Dets的输入数据：以X为输入向量，Y为权值向量，通过融合细粒度表征的行人关联模型SAM-Dets编码行人A具有的局部细粒度特征f₁；以Y为输入向量，X为权值向量，通过融合细粒度表征的行人关联模型SAM-Dets编码行人B具有的局部细粒度特征f₂。Step A22: The image processed in step S21 is used as the input data of the pedestrian association model SAM-Dets fused with fine-grained representation: X is the input vector, Y is the weight vector, and the pedestrian association model SAM-Dets is fused with fine-grained representation. Encode the local fine-grained feature f ₁ of pedestrian A; take Y as the input vector and X as the weight vector, encode the local fine-grained feature f ₂ of pedestrian B through the pedestrian association model SAM-Dets fused with fine-grained representation.

如图2所示，在本实施例中，融合细粒度表征的行人关联模型SAM-Dets，由多个注意力组成，模型中每个分支具有相同的功能模块，输入数据都为基础网络提取的行人全局特征。每一条注意力分支都一次经过局部检测器、全局池化和线性嵌入三个模块，最后将K条分支的结果按通道进行拼接获得注意力模型的完整输出结果。As shown in Figure 2, in this embodiment, the pedestrian association model SAM-Dets fused with fine-grained representation is composed of multiple attentions. Each branch in the model has the same functional module, and the input data is extracted by the basic network. Pedestrian global features. Each attention branch goes through the three modules of local detector, global pooling and linear embedding at a time, and finally the results of the K branches are spliced by channel to obtain the complete output of the attention model.

在局部检测器模块中，首先获取输入信息的高层特征，使用softmax函数对高层特征完成归一化得到符合概率分布取值区间的注意力权重，为下一步的求和操作需要先对高层特征的维度进行扩大处理，最后使用加权求和函数获得相应特征的注意力分配概率分布。全局池化模块和线性嵌入模块中，注意力分配概率分布筛选保留对应的行人特征，最后输出具有注意力分布的高层行人特征。In the local detector module, first obtain the high-level features of the input information, and use the softmax function to normalize the high-level features to obtain the attention weights that fit the value range of the probability distribution. The dimensions are enlarged, and finally, the weighted sum function is used to obtain the probability distribution of attention distribution of the corresponding features. In the global pooling module and the linear embedding module, the attention distribution probability distribution filters and retains the corresponding pedestrian features, and finally outputs the high-level pedestrian features with the attention distribution.

如图3所示，具体地，融合细粒度表征的行人关联模型SAM-Dets的网络结构包括：K个注意力分支和拼接层；每一注意力分支均包括以下六层，其中：As shown in Figure 3, specifically, the network structure of the pedestrian association model SAM-Dets fused with fine-grained representation includes: K attention branches and splicing layers; each attention branch includes the following six layers, where:

第三层为维度扩大层；The third layer is the dimension expansion layer;

拼接层将K个注意力分支得到的结果按通道拼接，输出行人局部细粒度特征。The splicing layer concatenates the results obtained by the K attention branches by channel, and outputs the local fine-grained features of pedestrians.

卷积层A的卷积核尺寸为1×1，步长为1；维度扩大层将通道维度扩大为512维；全局平均池化层的尺寸为1×1，步长为1。The size of the convolution kernel of convolutional layer A is 1×1 and the stride is 1; the dimension expansion layer expands the channel dimension to 512 dimensions; the size of the global average pooling layer is 1×1 and the stride is 1.

步骤A23：将要计算输入的一对行人的相似性值，转化成对f₁和f₂特征的相似性比较。引入一个无参数层Square层来对f₁和f₂特征求解平方差，作为f₁和f₂相似性比较层，并记该Square层为：f_s＝(f₁-f₂)²；将f_s＝(f₁-f₂)²作为两个核大小为1×1×4096的卷积层C的输入值，将softmax作为输出函数，输出一个二维向量(q₁,q₂)，表示输入两个对象属于现实世界中同一个人的概率值，作为外观适配度。Step A23: Convert the similarity value of a pair of pedestrians input to be calculated into a similarity comparison of f ₁ and f ₂ features. A non-parameter layer Square layer is introduced to solve the square difference of f ₁ and f ₂ features, as the similarity comparison layer of f ₁ and f ₂ , and the Square layer is recorded as: f _s =(f ₁ -f ₂ ) ² ; f _s =(f ₁ -f ₂ ) ² is used as the input value of two convolutional layers C with kernel size of 1×1×4096, and softmax is used as the output function to output a two-dimensional vector (q ₁ ,q ₂ ), Represents the probability value that the input two objects belong to the same person in the real world, as the appearance fit.

进一步地，根据获得的一对行人间的相似性概率值作为图的权值，将新进入的行人和待关联的目标行人分别作为两个不同的顶点集合，建立带权值匹配图；通过求解最大权值匹配图问题的解，获得新进入的行人与等待关联的目标行人间的数据关联的解。Further, according to the obtained similarity probability value between a pair of pedestrians as the weight of the graph, the newly entered pedestrian and the target pedestrian to be associated are regarded as two different sets of vertices, respectively, to establish a matching graph with weights; by solving The solution of the maximum weight matching graph problem is obtained, and the solution of the data association between the newly entered pedestrian and the target pedestrian waiting to be associated is obtained.

步骤S3：将待关联的候选池中的行人图像两两配对计算完外观适配度后，根据每对行人外观适配度，结合时空关系，建立最小费用流图模型，求解最优的行人关联解。Step S3: After the pedestrian images in the candidate pool to be associated are paired to calculate the appearance fit degree, according to the appearance fit degree of each pair of pedestrians, combined with the spatiotemporal relationship, a minimum cost flow graph model is established to solve the optimal pedestrian association untie.

其具体包括以下步骤：It specifically includes the following steps:

步骤S31：设给定t_p-1时刻下的完成即时对齐后的费用流图为

则当t_p时刻，为视野内行人集

和出视野行人集

Then at time t _p , it is the set of pedestrians in the field of view

and out of sight pedestrian set

步骤S32：根据视野内行人集

和出视野行人集

Step S32: According to the set of pedestrians in the field of view

and out of sight pedestrian set

步骤S33：删除所有对齐目标节点以及视野内行人集

剩余未对齐的目标节点，并将视野内行人集

剩余未对齐的目标节点为新进入的目标行人，得到费用流图

Remaining unaligned target nodes and set pedestrians within the field of view

Wait for the next moment to update the alignment.

步骤S4：根据步骤S3的关联结果，对行人进行保持原有标识或者赋予新的身份标识的操作，并交由跟踪器延续追踪；跟踪器采用采用KCF算法进行跟踪；KCF算法为每个行人分配一个跟踪器。该算法在目标区域形成循环矩阵，再利用循环矩阵在傅里叶空间可以对角化等一些性质，通过回归岭回归得到通用的预测公式。Step S4: According to the association result of Step S3, keep the original identification or assign a new identification to the pedestrian, and hand it over to the tracker to continue tracking; the tracker adopts the KCF algorithm to track; the KCF algorithm assigns each pedestrian a tracker. The algorithm forms a circulant matrix in the target area, and then uses some properties such as the circulant matrix can be diagonalized in Fourier space, and obtains a general prediction formula through regression ridge regression.

同时，本实施例中，将摄像头的跟踪视野区域划分为核心区域和临界区域，在步骤S1中，通过检测器只检测临界区域的行人。这是考虑到任何目标的进入和离开都必先经过临界区域，因此，只考察临界区域内的行人具有合理空间转移关系，以最大程度地保证后续行人对齐求解的普适性。Meanwhile, in this embodiment, the tracking field of view area of the camera is divided into a core area and a critical area, and in step S1, only pedestrians in the critical area are detected by the detector. This is to consider that the entry and exit of any target must first pass through the critical area. Therefore, only the pedestrians in the critical area have a reasonable spatial transfer relationship to ensure the universality of the subsequent pedestrian alignment solution to the greatest extent.

在本发明的第二个实施例中，如图4所示，其提供了步骤S2：对待关联的候选池中属于不同摄像头的两个行人图像计算外观适配度的另一种优选的可实现方案，其具体包括以下步骤：In the second embodiment of the present invention, as shown in FIG. 4 , it provides step S2: another preferred implementation of calculating the appearance adaptation degree of two pedestrian images belonging to different cameras in the candidate pool to be associated The scheme specifically includes the following steps:

步骤B23：使用卷积层B进一步提取目标行人高层特征，作为分类模型和融合细粒度表征的行人关联模型SAM-Dets的输入数据；分类模型分别输出行人图像A和行人图像B各自的身份识别表示号，融合细粒度表征的行人关联模型SAM-Dets输出行人A具有的局部细粒度特征f₁，行人B具有的局部细粒度特征f₂。Step B23: Use the convolutional layer B to further extract the high-level features of the target pedestrian as the input data of the classification model and the pedestrian association model SAM-Dets fused with fine-grained representation; the classification model outputs the respective identification representations of the pedestrian image A and the pedestrian image B respectively , the pedestrian association model SAM-Dets fused with fine-grained representation outputs the local fine-grained feature f ₁ of pedestrian A and the local fine-grained feature f ₂ of pedestrian B.

其中，DR-ResNet基础网络包括两个权重共享的完全相同的深度卷积孪生神经基础网络模块R-ResNet；深度卷积孪生神经基础网络模块R-ResNet的结构包括四十九层卷积层、三个并行的卷积层、以及末端卷积层：Among them, the DR-ResNet basic network includes two identical deep convolutional Siamese neural basic network modules R-ResNet with shared weights; the structure of the deep convolutional Siamese neural basic network module R-ResNet includes forty-nine convolutional layers, Three parallel convolutional layers, and end convolutional layers:

卷积层B使用2个卷积核，尺寸大小为(1，1，4096)。Convolutional layer B uses 2 convolution kernels of size (1, 1, 4096).

如图5、图6所示，通过本实施例的方案，与现有的离线跨摄像头跟踪的方案均实现了跨摄像头跟踪的性能，区别在于，本实施例方案直接可以实现用于在线的跟踪。As shown in FIG. 5 and FIG. 6 , the solution of this embodiment and the existing offline cross-camera tracking solution both achieve the performance of cross-camera tracking. The difference is that the solution of this embodiment can directly realize online tracking. .

本专利不局限于上述最佳实施方式，任何人在本专利的启示下都可以得出其它各种形式的跨摄像头场景下行人身份对齐方法，凡依本发明申请专利范围所做的均等变化与修饰，皆应属本专利的涵盖范围。This patent is not limited to the above-mentioned best embodiment, anyone can come up with other various forms of pedestrian identity alignment methods in cross-camera scenes under the inspiration of this patent. Modifications should all fall within the scope of this patent.

Claims

1. A pedestrian identity alignment method under a camera-crossing scene is characterized by comprising the following steps:

step S1: the cameras respectively add the pedestrian images detected by the cameras through the detector into a candidate pool to be associated;

step S2: calculating the appearance adaptation degree of two pedestrian images belonging to different cameras in a candidate pool to be associated;

step S3: after the pedestrian images in the candidate pool to be associated are pairwise paired and the appearance adaptation degrees are calculated, according to the appearance adaptation degrees of each pair of pedestrians, a minimum cost flow diagram model is established by combining the space-time relationship, and the optimal pedestrian association solution is solved;

step S4: according to the correlation result of the step S3, the operation of keeping the original identification or giving a new identification is carried out on the pedestrian;

in step S1, the detector is Faster R-CNN;

in step S2, the specific steps of calculating the degree of appearance suitability are step a 21-step a23, or step B21-step B24:

wherein the step A21-the step A23 specifically comprises the following steps:

step A21: image preprocessing is finished by using a fuzzy C mean value clustering F-CCT model, and the integral characteristic of the pedestrian image A is set as X ═ X ₁ ,x ₂ ,...,x _N The overall characteristic of the pedestrian image B is Y ═ Y ₁ ,y ₂ ,...,y _N }；

Step A22: will be provided withThe image processed in the step S21 is used as input data of a pedestrian correlation model SAM-Dets with fine-grained representation fused: using X as an input vector and Y as a weight vector, and encoding the local fine-grained characteristic f of the pedestrian A through the pedestrian correlation model SAM-Dets fused with the fine-grained characterization ₁ (ii) a And with Y as an input vector and X as a weight vector, coding local fine-grained features f of the pedestrian B through the pedestrian correlation model SAM-Dets fused with the fine-grained characterization ₂ ；

Step A23: will f is _s ＝(f ₁ -f ₂ ) ² As input values of two convolution layers C with a kernel size of 1 × 1 × 4096, softmax is used as an output function to output a two-dimensional vector (q) ₁ ,q ₂ ) Representing the probability value of inputting two objects belonging to the same person in the real world as the appearance suitability;

in step a22, the structure of the fused fine-grained representation pedestrian association model SAM-Dets includes: k attention branches and splice layers; each of the attention branches includes the following six layers, wherein:

the first layer is a convolution layer A and is used for extracting high-level features of the input pedestrian overall features;

the second layer is an activation layer, and the activation function is softmax;

the third layer is a dimension expansion layer;

the fourth layer is a summation layer, and the overall pedestrian characteristics are added with the result obtained by the third layer;

the fifth layer is a global average pooling layer for reducing feature dimension;

the sixth layer is a full connection layer and is used for finishing inner product calculation of the input vector and the weight vector in the weight matrix;

the splicing layer splices results obtained by the K attention branches according to channels and outputs the local fine-grained characteristic of the pedestrian;

the convolution kernel size of the convolution layer A is 1 multiplied by 1, and the step length is 1; the dimension expanding layer expands the channel dimension into 512 dimensions; the size of the global average pooling layer is 1 multiplied by 1, and the step length is 1;

the step B21-the step B24 are specifically:

step B21: image preprocessing is finished by using a fuzzy C mean value clustering F-CCT model, and the integral characteristic of the pedestrian image A is set as X ═ X ₁ ,x ₂ ,...,x _N The overall characteristic of the pedestrian image B is Y ═ Y ₁ ,y ₂ ,...,y _N }；

Step B22: extracting pedestrian abstract characteristics from the pedestrian image A and the pedestrian image B through a DR-ResNet basic network;

step B23: further extracting high-level features of the target pedestrian by using the convolutional layer B, and using the high-level features as input data of a classification model and a pedestrian correlation model SAM-Dets fusing fine-grained representation; the classification model respectively outputs the identity identification representation numbers of the pedestrian image A and the pedestrian image B, and the pedestrian correlation model SAM-Dets fused with the fine-grained representation outputs the local fine-grained feature f of the pedestrian A ₁ Local fine-grained feature f of pedestrian B ₂ ；

Step B24: will f is _s ＝(f ₁ -f ₂ ) ² As input values of two convolution layers C with a kernel size of 1 × 1 × 4096, softmax is used as an output function to output a two-dimensional vector (q) ₁ ,q ₂ ) Representing the probability value of inputting two objects belonging to the same person in the real world as the appearance suitability;

in step B22, the DR-ResNet basis network comprises two weight-shared identical deep convolution twin neural basis network modules R-ResNet; the structure of the deep convolution twin neural basic network module R-ResNet comprises forty-nine convolution layers, three parallel convolution layers and a tail end convolution layer:

wherein the convolution kernel size of the first convolution layer is (7,7, 64), the max-firing is (3,3), and the sliding step length is 2;

the sizes of convolution kernels of the second convolution layer to the fourth convolution layer are (1,1,64), (3,3,64) and (1,1,256), and the ReLu function is adopted as the activation function; forming a convolution block by the three layers of convolution layers and the activation function, and taking an input value of the convolution block as an input value of a second convolution layer and an input value of a third layer of activation function of the convolution block; the fifth to seventh convolutional layers, and the eighth to tenth convolutional layers all adopt the same structures as the second to fourth convolutional layers;

the sizes of convolution kernels of the eleventh convolution layer to the thirteenth convolution layer are (1,1, 128), (3,3, 128) and (1,1, 512), and the ReLu function is adopted as the activation function; the three convolutional layers and the activation function form a convolutional block, and the input value of the convolutional block is used as the input value of the first convolutional layer and also used as the input value of the third layer of the activation function of the convolutional block; the fourteenth to sixteenth convolutional layers, the seventeenth to nineteenth convolutional layers, and the twentieth to twenty second convolutional layers all adopt the same structures as the eleventh to thirteenth convolutional layers;

the sizes of convolution kernels from the twenty-third convolution layer to the twenty-fifth convolution layer are (1,1,256), (3,3, 256) and (1,1, 1024), and the ReLu function is adopted as the activation function; the three convolutional layers and the activation function form a convolutional block, and the input value of the convolutional block is used as the input value of a twenty-third convolutional layer and also used as the input value of a third layer of the activation function of the convolutional block; the twenty-sixth to twenty-eighth, twenty-ninth to thirty-first, thirty-second to thirty-fourth, thirty-fifth to thirty-seventh, and thirty-eighth to forty-fourth convolutional layers all adopt the same structure as the twenty-third to twenty-fifth convolutional layers;

the sizes of convolution kernels from the forty-th convolution layer to the forty-third convolution layer are (1,1, 512), (3,3, 512) and (1,1, 2048), and the activation functions all adopt ReLu functions; the three convolutional layers and the activation function form a convolutional block, and the input value of the convolutional block is used as the input value of a forty-th convolutional layer and also used as the input value of a third layer of the activation function of the convolutional block; the forty-fourth to forty-sixth and forty-seventh to forty-ninth buildup layers have the same structure as the forty-fourth to forty-second buildup layers;

after the forty-ninth convolutional layer, three parallel convolutional layers are formed, each convolutional layer uses 2048 convolutional cores, the sizes of the first parallel convolutional layer to the third parallel convolutional layer are respectively (3, 1024), (5, 1024) and (7, 1024), the channels of the three parallel convolutional layers are combined through a connecting layer, and then the max-pooling is (4, 4);

the last layer is the end convolution layer with size (2, 2048) using 1024 convolution kernels;

the convolutional layer B uses 2 convolutional kernels and has the size of (1,1, 4096);

the structure of the pedestrian correlation model SAM-Dets fused with the fine-grained representation comprises the following steps: k attention branches and splice layers; each of the attention branches includes the following six layers, wherein:

the first layer is convolution layer A, the convolution kernel size is 1 multiplied by 1, and the step length is 1; the high-level features are used for extracting the input overall features of the pedestrians;

the second layer is an active layer, and the active function is softmax;

the third layer is a dimension expanding layer, and the dimension of the channel is expanded into 512 dimensions;

the fifth layer is a global average pooling layer with the size of 1 multiplied by 1 and the step length of 1, and is used for reducing the characteristic dimension;

in step S3, according to the degree of adaptation of each pair of pedestrians, a minimum cost flow graph model is established in combination with the spatio-temporal relationship, and a specific process of solving an optimal pedestrian association solution includes the following steps:

step S31: let given t _p The cost flow graph at time-1 after the instant alignment is completed is

When t is _p Time of day, is the collection of pedestrians in the field of view

And go out of field pedestrian collection

Each target newly adds in and out two nodes, and the newly added nodes are updated to be connected with the directed edges between the source points and the sinks;

step S32: according to the pedestrian set in the field of vision

And go out of field pedestrian collection

The pedestrian appearance adaptation degree between every two middle targets is updated, the directed edges between the corresponding nodes are updated, and t is obtained _p New cost flow graph of time of day

Step S33: deleting all aligned target nodes and set of pedestrians in view

The target nodes which are not aligned remain, and the pedestrians in the visual field are collected

Obtaining a cost flow graph by taking the residual unaligned target nodes as newly entered target pedestrians

Waiting for updating alignment at the next moment;

in step S4, the pedestrian is given the operation of keeping the original identification or giving a new identification, and then tracked by the tracker; the tracker adopts a KCF algorithm for tracking; the KCF algorithm allocates a tracker for each pedestrian;

the tracking visual field area of the camera is divided into a core area and a critical area, and in step S1, only pedestrians in the critical area are detected by the detector.