CN111753625A

CN111753625A - A pedestrian detection method, device, equipment and medium

Info

Publication number: CN111753625A
Application number: CN202010192213.4A
Authority: CN
Inventors: 马事伟; 吴江旭; 胡淼枫; 王璟璟; 聂铭君; 刘永文; 戚龙雨; 石金玉; 徐达炜; 张然; 赵旭民
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2020-03-18
Filing date: 2020-03-18
Publication date: 2020-10-09
Anticipated expiration: 2040-03-18
Also published as: CN111753625B

Abstract

Embodiments of the present invention disclose a pedestrian detection method, device, equipment and medium. The method includes: acquiring an image to be detected; inputting the to-be-detected image into a trained single-shot target detector to obtain a single-shot target detector According to the output information of the single-shot object detector, the pedestrian detection result of the image to be detected is determined; wherein, the single-shot object detector is the original detection of the single-shot object detector and the head detection network including the initial construction in advance. model, obtained by training. The pedestrian detection method provided by the embodiment of the present invention performs pedestrian detection on the single-shot target detector obtained by training the original detection model including the initially constructed single-shot target detector and the head detection network in advance. On the basis of the pedestrian detection speed of the object detector, the pedestrian detection accuracy is improved.

Description

A pedestrian detection method, device, equipment and medium

技术领域technical field

本发明实施例涉及目标检测领域，尤其涉及一种行人检测方法、装置、设备及介质。Embodiments of the present invention relate to the field of target detection, and in particular, to a pedestrian detection method, device, device, and medium.

背景技术Background technique

行人检测在计算机视觉领域中具有许多应用场景，比如安防监控、自动驾驶、机器人等。目前主流的行人检测方法大都基于深度学习，比如基于候选区域的目标检测器Faster RCNN，或者单次目标检测器SSD、YOLO等。基于候选区域的目标检测器分为两部分，一部分是区域候选Region Proposal Networks，RPN)网络，一部分是基于区域的卷积(FastR-CNN)网络。使用时，先由RPN粗略地提取前景框的候选区域，再由Fast R-CNN对候选区域进行精调，回归最终的物体坐标以及物体分类结果。单次目标检测器没有RPN网络，直接回归物体坐标以及物体分类结果。Pedestrian detection has many application scenarios in the field of computer vision, such as security monitoring, autonomous driving, robotics, etc. Most of the current mainstream pedestrian detection methods are based on deep learning, such as the target detector Faster RCNN based on candidate regions, or the single-shot target detector SSD, YOLO, etc. The object detector based on candidate regions is divided into two parts, one is the region candidate Region Proposal Networks, RPN) network, and the other is the region-based convolution (FastR-CNN) network. When used, the candidate region of the foreground frame is roughly extracted by RPN, and then the candidate region is fine-tuned by Fast R-CNN, and the final object coordinates and object classification results are returned. The single-shot object detector does not have an RPN network, and directly returns the object coordinates and object classification results.

在实现本发明的过程中，发明人发现现有技术中至少存在以下技术问题：上述方法在标准的行人检测数据上已经取得了较好的结果，但在遮挡场景中(包括类内遮挡，人与人的遮挡，以及类间遮挡，人与物的遮挡等)，还未取得令人满意的结果。目前，为提高遮挡场景下的行人识别精度，提出了一些应用在基于候选区域的目标检测器的优化方法，但是基于候选区域的目标检测器虽然精度高，但是速度较慢，而单次目标检测器的速度较快，因此如何在保证单次目标检测器的行人检测速度的基础上提高行人检测精度是一个亟待解决的技术问题。In the process of realizing the present invention, the inventor found that there are at least the following technical problems in the prior art: the above method has achieved good results on standard pedestrian detection data, but in occlusion scenes (including intra-class occlusion, people occlusion with people, as well as occlusion between classes, occlusion between people and objects, etc.), has not achieved satisfactory results. At present, in order to improve the accuracy of pedestrian recognition in occluded scenes, some optimization methods for target detectors based on candidate regions have been proposed. Therefore, how to improve the pedestrian detection accuracy on the basis of ensuring the pedestrian detection speed of the single-shot target detector is an urgent technical problem to be solved.

发明内容SUMMARY OF THE INVENTION

本发明实施例提供了一种行人检测方法、装置、设备及介质，以实现在保证单次目标检测器的行人检测速度的基础上提高行人检测精度。The embodiments of the present invention provide a pedestrian detection method, device, equipment and medium, so as to improve the pedestrian detection accuracy on the basis of ensuring the pedestrian detection speed of the single-shot target detector.

第一方面，本发明实施例提供了一种行人检测方法，包括：In a first aspect, an embodiment of the present invention provides a pedestrian detection method, including:

获取待检测图像；Obtain the image to be detected;

将所述待检测图像输入至训练好的单次目标检测器中，获取所述单次目标检测器的输出信息；Input the image to be detected into the trained single-shot target detector, and obtain the output information of the single-shot target detector;

根据所述单次目标检测器的输出信息，确定对所述待检测图像的行人检测结果；According to the output information of the single-shot target detector, determine the pedestrian detection result of the image to be detected;

其中，所述单次目标检测器是预先对包含初始构建的单次目标检测器和头部检测网络的原始检测模型，进行训练得到的。The single-shot target detector is obtained by pre-training an original detection model including an initially constructed single-shot target detector and a head detection network.

第二方面，本发明实施例还提供了一种行人检测装置，包括：In a second aspect, an embodiment of the present invention further provides a pedestrian detection device, including:

待检测图像获取模块，用于获取待检测图像；a to-be-detected image acquisition module for acquiring the to-be-detected image;

图像行人检测模块，用于将所述待检测图像输入至训练好的单次目标检测器中，获取所述单次目标检测器的输出信息，其中，所述单次目标检测器是预先对包含初始构建的单次目标检测器和头部检测网络的原始检测模型，进行训练得到的；The image pedestrian detection module is used to input the image to be detected into the trained single-shot target detector, and obtain the output information of the single-shot target detector, wherein the single-shot target detector The initially constructed single-shot target detector and the original detection model of the head detection network are obtained by training;

检测结果确定模块，用于根据所述单次目标检测器的输出信息，确定对所述待检测图像的行人检测结果。The detection result determination module is configured to determine the pedestrian detection result on the image to be detected according to the output information of the single-shot target detector.

第三方面，本发明实施例还提供了一种计算机设备，所述设备包括：In a third aspect, an embodiment of the present invention further provides a computer device, the device comprising:

一个或多个处理器；one or more processors;

存储装置，用于存储一个或多个程序；a storage device for storing one or more programs;

当所述一个或多个程序被所述一个或多个处理器执行，使得所述一个或多个处理器实现如本发明任意实施例所提供的行人检测方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the pedestrian detection method provided by any embodiment of the present invention.

第四方面，本发明实施例还提供了一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现如本发明任意实施例所提供的行人检测方法。In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the pedestrian detection method provided by any embodiment of the present invention.

本发明实施例通过获取待检测图像；将待检测图像输入至训练好的单次目标检测器中，获取单次目标检测器的输出信息；根据单次目标检测器的输出信息，确定对待检测图像的行人检测结果；其中，单次目标检测器是预先对包含初始构建的单次目标检测器和头部检测网络的原始检测模型，进行训练得到的，通过使用预先对包含初始构建的单次目标检测器和头部检测网络的原始检测模型，进行训练得到的单次目标检测器进行行人检测，实现在保证单次目标检测器的行人检测速度的基础上提高行人检测精度。In the embodiment of the present invention, the image to be detected is acquired; the image to be detected is input into the trained single-shot target detector, and the output information of the single-shot target detector is obtained; according to the output information of the single-shot target detector, the image to be detected is determined The pedestrian detection results of The original detection model of the detector and the head detection network, the single-shot target detector obtained by training is used for pedestrian detection, and the pedestrian detection accuracy is improved on the basis of ensuring the pedestrian detection speed of the single-shot target detector.

附图说明Description of drawings

图1是本发明实施例一所提供的一种行人检测方法的流程图；1 is a flowchart of a pedestrian detection method provided in Embodiment 1 of the present invention;

图2是本发明实施例二所提供的一种行人检测方法的流程图；2 is a flowchart of a pedestrian detection method provided in Embodiment 2 of the present invention;

图3a是本发明实施例三所提供的一种行人检测方法的流程图；3a is a flowchart of a pedestrian detection method provided in Embodiment 3 of the present invention;

图3b是本发明实施例三提供的一种原始检测模型的网络架构示意图；3b is a schematic diagram of a network architecture of an original detection model provided in Embodiment 3 of the present invention;

图4是本发明实施例四所提供的一种行人检测装置的结构示意图；4 is a schematic structural diagram of a pedestrian detection device according to Embodiment 4 of the present invention;

图5是本发明实施例五所提供的一种计算机设备的结构示意图。FIG. 5 is a schematic structural diagram of a computer device according to Embodiment 5 of the present invention.

具体实施方式Detailed ways

下面结合附图和实施例对本发明作进一步的详细说明。可以理解的是，此处所描述的具体实施例仅仅用于解释本发明，而非对本发明的限定。另外还需要说明的是，为了便于描述，附图中仅示出了与本发明相关的部分而非全部结构。The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention. In addition, it should be noted that, for the convenience of description, the drawings only show some but not all structures related to the present invention.

实施例一Example 1

图1是本发明实施例一所提供的一种行人检测方法的流程图。本实施例可适用于进行行人检测时的情形。该方法可以由行人检测装置执行，该行人检测装置可以采用软件和/或硬件的方式实现，例如，该行人检测装置可配置于计算机设备中。如图1所示，所述方法包括：FIG. 1 is a flowchart of a pedestrian detection method provided by Embodiment 1 of the present invention. This embodiment is applicable to the situation when pedestrian detection is performed. The method may be performed by a pedestrian detection apparatus, which may be implemented in software and/or hardware, for example, the pedestrian detection apparatus may be configured in computer equipment. As shown in Figure 1, the method includes:

S110、获取待检测图像。S110. Acquire an image to be detected.

在本实施例中，待检测图像可以为需要进行行人检测的图像。其中，待检测图像的获取方式在此不做限定。可选的，可以直接获取摄像头拍摄的视频帧作为待检测图像，也可以对已有视频进行处理，得到用于行人检测的待检测图像。In this embodiment, the image to be detected may be an image that needs to be detected for pedestrians. The acquisition method of the image to be detected is not limited herein. Optionally, a video frame captured by a camera may be directly acquired as an image to be detected, or an existing video may be processed to obtain an image to be detected for pedestrian detection.

S120、将待检测图像输入至训练好的单次目标检测器中，获取单次目标检测器的输出信息，其中，单次目标检测器是预先对包含初始构建的单次目标检测器和头部检测网络的原始检测模型，进行训练得到的。S120: Input the image to be detected into the trained single-shot target detector, and obtain the output information of the single-shot target detector, wherein the single-shot target detector is a pre-built single-shot target detector including the initial construction and the head The original detection model of the detection network is obtained by training.

获取待检测图像后，使用预先训练好的单次目标检测器对待检测图像进行检测，获取单次目标检测器的输出信息，以依据单次目标检测器的输出信息确定行人检测结果。其中，单次目标检测器可以为单次多框检测器(Single Shot MultiBox Detector，SSD)、YOLO(You Only Live Once)、RetinaNet等检测器。可选的，单次目标检测器的输出信息可以为识别出的待检测图像中的行人框，以及各行人框的分值。After acquiring the image to be detected, the pre-trained single-shot target detector is used to detect the to-be-detected image, and the output information of the single-shot target detector is obtained, so as to determine the pedestrian detection result according to the output information of the single-shot target detector. Among them, the single-shot target detector may be a single-shot multi-box detector (Single Shot MultiBox Detector, SSD), YOLO (You Only Live Once), RetinaNet and other detectors. Optionally, the output information of the single-shot target detector may be the identified pedestrian frames in the image to be detected, and the score of each pedestrian frame.

需要说明的是，单次目标检测器是预先对包含初始构建的单次目标检测器和头部检测网络的原始检测模型，进行训练得到的。为了解决现有技术中单次目标检测器在遮挡场景下检测结果不准确的技术问题，在对单次目标检测器进行训练时，将头部检测网络和单次目标检测器结合训练，结合头部检测网络的检测结果以及单次目标检测器的检测结果，调整单次目标检测器中的设置参数，实现对单次目标检测器中参数的优化，使得训练出的单次目标检测器识别结果更加准确。It should be noted that the single-shot target detector is obtained by pre-training the original detection model including the initially constructed single-shot target detector and the head detection network. In order to solve the technical problem of inaccurate detection results of the single-shot target detector in the occlusion scene in the prior art, when training the single-shot target detector, the head detection network and the single-shot target detector are combined for training. The detection results of the partial detection network and the detection results of the single-shot target detector are adjusted, and the setting parameters in the single-shot target detector are adjusted to realize the optimization of the parameters in the single-shot target detector, so that the trained single-shot target detector recognizes the results. more precise.

S130、根据单次目标检测器的输出信息，确定待检测图像的行人检测结果。S130: Determine the pedestrian detection result of the image to be detected according to the output information of the single-shot target detector.

在本实施例中，得到单次目标检测器对待检测图像进行检测后的输出信息后，根据单次目标检测器的输出信息确定待检测图像的行人检测结果。可选的，待检测图像的行人检测结果可以根据检测需求设定。示例性的，若检测需求为检测待检测图像中包含行人的数量，则统计单次目标检测器输出的行人框数量作为待检测图像的行人检测结果；若检测需求为检测待检测图像中的行人位置，则根据单次目标检测器输出的行人框位置确定待检测图像中的行人位置。In this embodiment, after obtaining the output information after the single-shot target detector detects the to-be-detected image, the pedestrian detection result of the to-be-detected image is determined according to the output information of the single-shot target detector. Optionally, the pedestrian detection result of the image to be detected may be set according to detection requirements. Exemplarily, if the detection requirement is to detect the number of pedestrians included in the image to be detected, the number of pedestrian frames output by the single target detector is counted as the pedestrian detection result of the image to be detected; if the detection requirement is to detect pedestrians in the image to be detected position, the position of the pedestrian in the image to be detected is determined according to the position of the pedestrian frame output by the single target detector.

本发明实施例通过获取待检测图像；将待检测图像输入至训练好的单次目标检测器中，获取单次目标检测器的输出信息；根据单次目标检测器的输出信息，确定对待检测图像的行人检测结果；其中，单次目标检测器是预先对包含初始构建的单次目标检测器和头部检测网络的原始检测模型，进行训练得到的。通过使用预先对包含初始构建的单次目标检测器和头部检测网络的原始检测模型，进行训练得到的单次目标检测器进行行人检测，实现在保证单次目标检测器的行人检测速度的基础上提高行人检测精度。In the embodiment of the present invention, the image to be detected is acquired; the image to be detected is input into the trained single-shot target detector, and the output information of the single-shot target detector is obtained; according to the output information of the single-shot target detector, the image to be detected is determined Among them, the single-shot object detector is obtained by pre-training the original detection model including the initially constructed single-shot object detector and the head detection network. By using the original detection model including the initially constructed single-shot object detector and the head detection network in advance, the single-shot object detector obtained by training is used to detect pedestrians, so as to ensure the pedestrian detection speed of the single-shot object detector. improve pedestrian detection accuracy.

实施例二Embodiment 2

图2是本发明实施例二所提供的一种行人检测方法的流程图。本实施例在上述实施例的基础上，对单次目标检测器的训练进行了具体化。如图2所示，所述方法包括：FIG. 2 is a flowchart of a pedestrian detection method provided by Embodiment 2 of the present invention. On the basis of the above-mentioned embodiments, the present embodiment implements the training of the single-shot target detector. As shown in Figure 2, the method includes:

S210、获取样本图像、样本图像对应的行人框标注结果以及样本图像对应的行人头部标注结果。S210. Obtain a sample image, a pedestrian frame annotation result corresponding to the sample image, and a pedestrian head annotation result corresponding to the sample image.

在本实施例中，样本图像可以为包含有行人的图像，优选的，可以为包含有被遮挡了的行人的图像。对样本图像进行人工标注，标注出样本图像中行人框以及行人头部，得到样本图像以及样本图像对应的行人框标注结果以及样本图像对应的行人头部标注结果。In this embodiment, the sample image may be an image including a pedestrian, preferably, an image including a occluded pedestrian. The sample image is manually annotated, the pedestrian frame and pedestrian head in the sample image are marked, and the sample image and the pedestrian frame annotation result corresponding to the sample image and the pedestrian head annotation result corresponding to the sample image are obtained.

S220、基于样本图像、样本图像对应的行人框标注结果以及样本图像对应的行人头部标注结果生成训练样本对，使用训练样本对对预先构建的原始检测模型进行训练，得到训练好的原始检测模型。S220 , generating a training sample pair based on the sample image, the pedestrian frame labeling result corresponding to the sample image, and the pedestrian head labeling result corresponding to the sample image, and using the training sample pair to train the pre-built original detection model to obtain a trained original detection model .

对样本图像标注完成后，基于样本图像、样本图像对应的行人框标注结果以及样本图像对应的行人头部标注结果生成训练样本对，使用训练样本对对预先构建的原始检测模型进行训练，得到训练好的原始检测模型。其中，预先构建的原始检测模型中可以包括单次目标检测器和至少一个头部检测网络。单次目标检测器中每个原始特征网络层的输出端与头部检测网络的输入端相连。After the sample image is annotated, a training sample pair is generated based on the sample image, the pedestrian frame annotation result corresponding to the sample image, and the pedestrian head annotation result corresponding to the sample image, and the training sample pair is used to train the pre-built original detection model to obtain training. Good original detection model. The pre-built original detection model may include a single-shot target detector and at least one head detection network. The output of each original feature network layer in the single-shot object detector is connected to the input of the head detection network.

在本发明的一种实施方式中，使用训练样本对对预先构建的原始检测模型进行训练，得到训练好的原始检测模型，包括：将样本图像输入至初始构建的单次目标检测器中，获得单次目标检测器输出的原始特征图、行人框检测结果以及行人框检测结果对应的检测分值；根据检测分值对行人框检测结果进行排序，根据排序结果获取预设数量的目标行人框检测结果；将原始特征图以及目标行人框检测结果输入至头部检测网络中，获得头部检测网络输出的行人头部检测结果；根据行人框检测结果以及行人框标注结果确定第一损失值，根据行人头部检测结果以及行人头部标注结果确定第二损失值，根据第一损失值和第二损失值确定目标损失值；以目标损失值达到收敛条件为目标，对原始检测模型进行训练。In an embodiment of the present invention, using training samples to train a pre-built original detection model to obtain a trained original detection model includes: inputting a sample image into an initially constructed single-shot target detector, obtaining The original feature map output by the single target detector, the pedestrian frame detection result, and the detection score corresponding to the pedestrian frame detection result; the pedestrian frame detection results are sorted according to the detection score, and a preset number of target pedestrian frame detections are obtained according to the sorting results. Result; input the original feature map and the target pedestrian frame detection result into the head detection network, and obtain the pedestrian head detection result output by the head detection network; determine the first loss value according to the pedestrian frame detection result and the pedestrian frame labeling result, according to The pedestrian head detection result and the pedestrian head labeling result determine the second loss value, and the target loss value is determined according to the first loss value and the second loss value; the original detection model is trained with the goal that the target loss value reaches the convergence condition.

具体的，使用训练样本对对预先构建的原始检测模型进行训练可以为：将样本图像输入至初始构建的单次目标检测器中，获得单次目标检测器输出的行人框检测结果，以及各行人框检测结果对应的检测分值，根据检测分值对行人框检测结果进行逆序排序，取排序前预设数量的行人框检测结果作为目标行人框检测结果；将目标行人框检测结果以及各原始特征网络层输出的原始特征图输入至头部检测网络中，获得头部检测网络输出的行人头部检测结果。然后基于设定的行人框损失函数、行人框检测结果以及行人框标注结果计算行人框检测结果对应的第一损失值，基于设定的行人头部损失函数、行人头部检测结果以及行人头部标注结果计算行人头部检测结果对应的第二损失值，根据第一损失值以及第二损失值计算目标损失值。当目标损失值不满足收敛条件时，调整头部检测网络中的参数以及单次目标检测器中的参数，基于调整后的参数对样本图像再次预测，直到目标损失值满足收敛条件，得到训练好的原始检测模型。其中，行人框损失函数与行人头部损失函数可以相同，也可以不同。Specifically, using the training samples to train the pre-built original detection model can be as follows: input the sample image into the initially constructed single-shot target detector, and obtain the pedestrian frame detection result output by the single-shot target detector, and each pedestrian The detection score corresponding to the frame detection result, the pedestrian frame detection results are sorted in reverse order according to the detection score, and the preset number of pedestrian frame detection results before sorting is taken as the target pedestrian frame detection result; the target pedestrian frame detection result and each original feature The original feature map output by the network layer is input into the head detection network, and the pedestrian head detection result output by the head detection network is obtained. Then, the first loss value corresponding to the pedestrian frame detection result is calculated based on the set pedestrian frame loss function, the pedestrian frame detection result and the pedestrian frame labeling result, and the first loss value corresponding to the pedestrian frame detection result is calculated based on the set pedestrian head loss function, pedestrian head detection result and pedestrian head. The labeling result calculates the second loss value corresponding to the pedestrian head detection result, and calculates the target loss value according to the first loss value and the second loss value. When the target loss value does not meet the convergence condition, adjust the parameters in the head detection network and the parameters in the single-shot target detector, and predict the sample image again based on the adjusted parameters until the target loss value meets the convergence condition, and the training is completed. the original detection model. Among them, the pedestrian frame loss function and the pedestrian head loss function may be the same or different.

可选的，根据第一损失值以及第二损失值计算目标损失值可以为：将第一损失值与第二损失值的和作为目标损失值。目标损失值满足收敛条件可以为：迭代次数满足设定次数或相邻两次的目标损失值之差小于设定阈值。Optionally, calculating the target loss value according to the first loss value and the second loss value may be: taking the sum of the first loss value and the second loss value as the target loss value. The target loss value satisfies the convergence condition may be: the number of iterations satisfies the set number of times or the difference between the two adjacent target loss values is less than the set threshold.

可选的，头部检测网络可以包括区域提取模块和头部标记模块，区域提取模块用于从原始特征图中提取出目标行人框检测结果对应的头部特征图，头部标记模块用于对头部特征图进行标记，得到行人头部检测结果。Optionally, the head detection network may include a region extraction module and a head labeling module, the region extraction module is used to extract the head feature map corresponding to the detection result of the target pedestrian frame from the original feature map, and the head labeling module is used to The head feature map is marked to obtain the pedestrian head detection result.

在上述方案的基础上，单次目标检测器中包括多个原始特征网络层，原始检测模型中的目标特征网络层与头部检测网络之间还包括上采样模块，所述方法还包括：根据各原始特征网络层输出的原始特征图的图像大小，选取至少一个原始特征网络层作为目标特征网络层；在目标特征网络层后添加上采样模块，在上采样模块后添加头部检测网络。On the basis of the above solution, the single-shot target detector includes a plurality of original feature network layers, an upsampling module is further included between the target feature network layer in the original detection model and the head detection network, and the method further includes: according to The image size of the original feature map output by each original feature network layer, select at least one original feature network layer as the target feature network layer; add an upsampling module after the target feature network layer, and add a head detection network after the upsampling module.

可选的，考虑到单次目标检测器中部分原始特征网络层输出的原始特征图较小，基于原始特征图进行的头部检测特征提取会导致行人信息损失，影响行人头部检测结果。在本实施例中，根据原始特征图的图像大小，从原始特征网络层中选取目标特征网络层，在目标原始特征网络层后添加上采样模块，在上采样模块后添加头部检测网络，使用上采样后的原始特征图进行头部检测，提高行人头部检测结果。Optionally, considering that the original feature map output by some of the original feature network layers in the single-shot target detector is small, the head detection feature extraction based on the original feature map will lead to the loss of pedestrian information and affect the pedestrian head detection result. In this embodiment, according to the image size of the original feature map, the target feature network layer is selected from the original feature network layer, an upsampling module is added after the target original feature network layer, and a head detection network is added after the upsampling module, using The upsampled original feature map is used for head detection to improve pedestrian head detection results.

可选的，根据各原始特征网络层输出的原始特征图的图像大小，选取至少一个原始特征网络层作为目标特征网络层，包括：将图像大小小于设定阈值的原始特征图对应的原始特征网络层作为目标特征网络层。一个实施例中，可以预先设定图像大小阈值，将图像大小小于图像大小阈值的原始特征图对应的原始特征网络层作为目标特征网络层。在本实施例中，对上采样模块不做限定，只要能够实现将原始特征图上采样为图像大小不小于设定图像大小阈值的上采样特征图即可。示例性的，上采样模块可以为转置卷积模块。Optionally, selecting at least one original feature network layer as the target feature network layer according to the image size of the original feature map output by each original feature network layer, including: selecting the original feature network corresponding to the original feature map whose image size is smaller than the set threshold. layer as the target feature network layer. In one embodiment, an image size threshold may be preset, and the original feature network layer corresponding to the original feature map whose image size is smaller than the image size threshold may be used as the target feature network layer. In this embodiment, the up-sampling module is not limited, as long as the original feature map can be up-sampled into an up-sampling feature map whose image size is not smaller than the set image size threshold. Exemplarily, the upsampling module may be a transposed convolution module.

在本发明的一种实施方式中，在将原始特征图以及目标行人框检测结果输入至头部检测网络中，获得头部检测网络输出的行人头部检测结果之前，还包括：获取目标特征网络层输出的目标原始特征图，将目标原始特征图输入至上采样模块中，得到上采样模块输出的上采样特征图；相应的，将原始特征图以及目标行人框检测结果输入至头部检测网络中，获得头部检测网络输出的行人头部检测结果，包括：将原始特征网络层中除目标特征网络层以外的其他原始特征网络层输出的原始特征图、上采样特征图以及目标行人框检测结果输入至头部检测网络中，获得头部检测网络输出的行人头部检测结果。In an embodiment of the present invention, before inputting the original feature map and the detection result of the target pedestrian frame into the head detection network, and obtaining the pedestrian head detection result output by the head detection network, the method further includes: acquiring the target feature network The original feature map of the target output by the layer, input the original feature map of the target into the up-sampling module, and obtain the up-sampling feature map output by the up-sampling module; correspondingly, input the original feature map and the detection result of the target pedestrian frame into the head detection network. , to obtain the pedestrian head detection results output by the head detection network, including: the original feature map, the upsampling feature map and the target pedestrian frame detection results output by other original feature network layers in the original feature network layer except the target feature network layer Input to the head detection network to obtain the pedestrian head detection result output by the head detection network.

在目标网络特征层与头部检测网络之间添加上采样模块后，相应的，在进行行人头部的预测时，先由上采样模块将目标网络特征层输出的目标原始特征图进行上采样，得上采样特征图，然后再将上采样特征图以及原始特征网络层中除目标特征网络层之外的其他原始网络特征层输出的原始特征图作为头部检测网络的输入，使用头部检测网络进行行人头部的检测，得到头部检测网络输出的行人头部检测结果。将目标原始特征图进行上采样后通过头部检测网络进行检测，能够保留原始特征图中的行人信息，避免行人信息丢失对行人头部检测造成影响。After adding an upsampling module between the target network feature layer and the head detection network, correspondingly, when predicting the pedestrian head, the upsampling module first upsamples the target original feature map output by the target network feature layer. The upsampling feature map is obtained, and then the upsampling feature map and the original feature map output from the original feature network layer except the target feature network layer in the original feature network layer are used as the input of the head detection network, and the head detection network is used. Detect the pedestrian head, and obtain the pedestrian head detection result output by the head detection network. Upsampling the original feature map of the target and then detecting it through the head detection network can retain the pedestrian information in the original feature map and avoid the impact of pedestrian information loss on pedestrian head detection.

S230、将训练好的原始检测模型中的单次目标检测器作为训练好的单次目标检测器。S230. Use the single-shot target detector in the trained original detection model as the trained single-shot target detector.

在本实施例中，得到训练好的原始检测模型后，将训练好的原始检测模型中的单次目标检测器作为训练好的单次目标检测器，使用训练好的单次目标检测器进行行人检测。在检测时仅使用了单次目标检测器进行检测的流程，实现了在保证单次目标检测器的检测速度的基础上，提高了单次目标检测器的检测精度。In this embodiment, after the trained original detection model is obtained, the single-shot target detector in the trained original detection model is used as the trained single-shot target detector, and the trained single-shot target detector is used for pedestrian detection. detection. In the detection process, only the single-shot target detector is used for detection, which improves the detection accuracy of the single-shot target detector on the basis of ensuring the detection speed of the single-shot target detector.

S240、获取待检测图像。S240. Acquire an image to be detected.

S250、将待检测图像输入至训练好的单次目标检测器中，获取单次目标检测器的输出信息。S250: Input the image to be detected into the trained single-shot target detector, and obtain output information of the single-shot target detector.

S260、根据单次目标检测器的输出信息，确定对待检测图像的行人检测结果。S260: Determine the pedestrian detection result of the image to be detected according to the output information of the single target detector.

本发明实施例对单次目标检测器的训练进行了具体化，通过获取样本图像、样本图像对应的行人框标注结果以及样本图像对应的行人头部标注结果；基于样本图像、样本图像对应的行人框标注结果以及样本图像对应的行人头部标注结果生成训练样本对，使用训练样本对对预先构建的原始检测模型进行训练，得到训练好的原始检测模型；将训练好的原始检测模型中的单次目标检测器作为训练好的单次目标检测器，通过增加行人头部特征作为训练特征，提高了单次目标检测器的训练准确度，从而提高了单次目标检测器的行人检测结果精度。The embodiment of the present invention concretizes the training of the single-shot target detector, and obtains the sample image, the pedestrian frame labeling result corresponding to the sample image, and the pedestrian head labeling result corresponding to the sample image; The frame annotation results and the pedestrian head annotation results corresponding to the sample images generate training sample pairs, and use the training samples to train the pre-built original detection model to obtain the trained original detection model; The secondary target detector is used as a trained single-shot target detector. By adding pedestrian head features as training features, the training accuracy of the single-shot target detector is improved, thereby improving the accuracy of the pedestrian detection results of the single-shot target detector.

实施例三Embodiment 3

图3a是本发明实施例三所提供的一种行人检测方法的流程图。本实施例在上述实施例的基础上，提供了一种优选实施例。如图3a所示，所述方法包括：FIG. 3 a is a flowchart of a pedestrian detection method according to Embodiment 3 of the present invention. This embodiment provides a preferred embodiment on the basis of the above-mentioned embodiment. As shown in Figure 3a, the method includes:

S310、基于单次目标检测器构建待训练的原始检测模型。S310. Build an original detection model to be trained based on the single-shot target detector.

本实施例在单次目标检测器的基础上，加入头部的预测，得到构建好的原始检测模型。整个原始检测模型在训练过程中即预测行人框，也预测行人头部的标记，利用头部检测任务辅助提升行人检测准确率。其中，单次目标检测器可以为SSD、YOLO、RetinaNet等检测器。In this embodiment, on the basis of the single-shot target detector, the prediction of the head is added to obtain the constructed original detection model. The entire original detection model predicts the pedestrian frame and the mark of the pedestrian head during the training process, and uses the head detection task to assist in improving the accuracy of pedestrian detection. Among them, the single-shot target detector can be a detector such as SSD, YOLO, and RetinaNet.

图3b是本发明实施例三提供的一种原始检测模型的网络架构示意图，图3b中示意性的示出了以SSD网络为基础的原始检测网络模型。如图3b所示，原始检测模型中包括SSD网络310、上采样模块320以及头部检测模块330。Fig. 3b is a schematic diagram of a network architecture of an original detection model according to Embodiment 3 of the present invention, and Fig. 3b schematically shows an original detection network model based on an SSD network. As shown in FIG. 3b , the original detection model includes an SSD network 310 , an upsampling module 320 and a head detection module 330 .

其中，SSD网络310部分包含图像输入层、基础网络层、特征网络层和检测层，特征网络层包括特征层1、特征层2和特征层3。SSD网络310的输入是待检测的图片，输出是检测到的行人框以及三个特征图。输入图像经过基础网络层和特征网络层后得到不同尺度的特征图，得到的特征图经过检测层后得到行人框的预测结果，包括行人框坐标以及行人框的分数。The SSD network 310 part includes an image input layer, a basic network layer, a feature network layer and a detection layer, and the feature network layer includes a feature layer 1, a feature layer 2 and a feature layer 3. The input of the SSD network 310 is the image to be detected, and the output is the detected pedestrian frame and three feature maps. After the input image passes through the basic network layer and the feature network layer, feature maps of different scales are obtained, and the obtained feature map passes through the detection layer to obtain the prediction result of the pedestrian frame, including the coordinates of the pedestrian frame and the score of the pedestrian frame.

上采样模块320包括目标特征网络层以及上采样层，其中，目标特征网络层为图像大小小于设定图像大小阈值的特征图对应的特征网络层。上采样模块320的输入是SSD中小尺度的特征图，输出是经过上采样后的特征图。小尺度(图像大小小于设定阈值)的特征图(即图3b中特征层3输出的特征图)通过转置卷积上采样为大的特征图，避免原始特征图较小时，从原始特征图提取出头部特征图造成的信息丢失。示例性的，假设原始特征图的大小为8*8，通过转置卷积后得到上采样特征图的大小为64*64。The upsampling module 320 includes a target feature network layer and an upsampling layer, wherein the target feature network layer is a feature network layer corresponding to a feature map whose image size is smaller than a set image size threshold. The input of the upsampling module 320 is the feature map of small scale in SSD, and the output is the feature map after upsampling. The feature map of small scale (the image size is smaller than the set threshold) (that is, the feature map output by feature layer 3 in Figure 3b) is upsampled to a large feature map by transposed convolution, to avoid when the original feature map is small, from the original feature map. The information loss caused by the extraction of the head feature map. Exemplarily, assuming that the size of the original feature map is 8*8, the size of the up-sampled feature map obtained after the transposed convolution is 64*64.

头部检测模块330部分包含ROIAlign和标记层。头部检测模块330的输入是SSD中特征层1输出的特征图1、特征层2输出的特征图2、行人检测结果以及上采样层输出的获得的上采样特征图3，输出是行人头部检测结果。该部分首先按分值将行人框排序，然后提取排序前100名的行人框，接着通过ROIAlign上述前100名行人框对应位置的头部特征图，并通过尺寸调整缩放，统一输出为设定大小(如28*28)的头部特征图，最后在头部特征图上进行行人头部检测。The head detection module 330 partially includes the ROIAlign and marker layers. The input of the head detection module 330 is the feature map 1 output by the feature layer 1 in the SSD, the feature map 2 output by the feature layer 2, the pedestrian detection result and the obtained upsampling feature map 3 output by the upsampling layer, and the output is the pedestrian head. Test results. This part first sorts the pedestrian frames according to the score, then extracts the top 100 pedestrian frames, and then uses ROIAlign the head feature map of the corresponding position of the above-mentioned top 100 pedestrian frames, and adjusts and zooms through the size, and the unified output is the set size. (such as 28*28) head feature map, and finally perform pedestrian head detection on the head feature map.

S320、获取样本数据，基于获取的样本数据对原始检测模型进行训练，得到训练好的原始检测模型。S320. Obtain sample data, and train an original detection model based on the obtained sample data to obtain a trained original detection model.

具体的，获取样本数据(样本图像)后，对样本数据进行标注，得到训练样本对，其中，训练样本对数据中需要包含样本图像、行人框的标注以及行人头部的标注，行人框的标注作为行人检测的标准数据，行人头部的标注作为行人头部标注的标准数据。在对原始检测模型进行训练时，将样本图像输入至单次目标检测器中，获取单次目标检测器输出的行人框的预测结果，同时计算行人框检测损失值。然后根据各行人框的分值对行人框进行排序，获得前100个行人框检测结果，以及上述行人框检测结果对应的特征图。然后使用上采样模块将小尺度的特征图进行放大，得到上采样特征图。通过ROIAlign从尺度符合设定尺寸需求的原始特征图以及上采样特征图中提取出前100个行人框对应的头部特征图，并在提取出的头部特征图上进行标记，得到行人头部检测结果，将行人头部检测结果与行人头部标注进行对比，得到行人头部检测损失值。然后将行人框检测损失值与行人头部检测损失值相加，作为整体损失，以整体损失收敛为目标，训练原始检测模型，得到训练好的原始检测模型。通过在单次目标检测器中引入头部预测，提高了遮挡场景下的行人检测准确率。Specifically, after obtaining the sample data (sample image), the sample data is marked to obtain a training sample pair, wherein the training sample pair data needs to include the sample image, the annotation of the pedestrian frame, the annotation of the pedestrian head, and the annotation of the pedestrian frame. As the standard data for pedestrian detection, the annotation of pedestrian head is used as the standard data for pedestrian head annotation. When training the original detection model, input the sample image into the single-shot target detector, obtain the prediction result of the pedestrian frame output by the single-shot target detector, and calculate the pedestrian frame detection loss value at the same time. Then, the pedestrian frames are sorted according to the scores of each pedestrian frame, and the first 100 pedestrian frame detection results and the feature maps corresponding to the above pedestrian frame detection results are obtained. Then use the up-sampling module to enlarge the small-scale feature map to obtain the up-sampled feature map. Through ROIAlign, the head feature maps corresponding to the first 100 pedestrian frames are extracted from the original feature maps whose scales meet the set size requirements and the up-sampled feature maps, and marked on the extracted head feature maps to obtain pedestrian head detection. As a result, the pedestrian head detection result is compared with the pedestrian head annotation to obtain the pedestrian head detection loss value. Then, the pedestrian frame detection loss value and the pedestrian head detection loss value are added as the overall loss, and the original detection model is trained with the overall loss convergence as the goal, and the trained original detection model is obtained. By introducing head prediction into the single-shot object detector, the pedestrian detection accuracy in occluded scenes is improved.

S330、将训练好的原始检测模型中的单次目标检测器作为待测试单次目标检测器，对待测试单次目标检测器进行测试。S330. Use the single-shot target detector in the trained original detection model as the single-shot target detector to be tested, and test the single-shot target detector to be tested.

得到训练好的原始检测模型后，提取出训练好的原始检测模型中的单次目标检测器，将其作为待测试单次目标检测器，使用测试数据对待测试单次目标检测器进行测试，得到测试结果。After obtaining the trained original detection model, extract the single-shot target detector in the trained original detection model, use it as the single-shot target detector to be tested, and use the test data to test the single-shot target detector to be tested, and obtain Test Results.

S340、待测试单次目标检测器测试通过后，使用测试通过的单次目标检测器进行行人检测。S340 , after the single-shot target detector to be tested passes the test, use the single-shot target detector that has passed the test to perform pedestrian detection.

当待测试单次目标检测器测试通过时，可直接使用测试通过的单次目标检测器进行行人检测。当待测试单次目标检测器测试不通过时，重新获取训练数据对原始检测模型进行训练，直到训练好的原始检测模型中的单次目标检测器测试通过。When the single-shot target detector to be tested passes the test, the single-shot target detector that has passed the test can be directly used for pedestrian detection. When the single target detector to be tested fails the test, the training data is re-acquired to train the original detection model, until the single target detector in the trained original detection model passes the test.

本发明实施例在单次目标检测器上，加入行人头部标记检测分支，同时加入上采样模块，增大小尺度的特征图，使其也能参与到行人头部标记检测中，提升了基于单次目标检测器的遮挡场景下的行人检测准确率。In the embodiment of the present invention, a pedestrian head mark detection branch is added to the single-shot target detector, and an up-sampling module is added at the same time to increase the small-scale feature map, so that it can also participate in the pedestrian head mark detection. Pedestrian detection accuracy in occluded scenes with secondary object detectors.

实施例四Embodiment 4

图4是本发明实施例四所提供的一种行人检测装置的结构示意图。该行人检测装置可以采用软件和/或硬件的方式实现，例如该行人检测装置可以配置于计算机设备中。如图4所示，所述装置包括待检测图像获取模块410、图像行人检测模块420和检测结果确定模块430，其中：FIG. 4 is a schematic structural diagram of a pedestrian detection device according to Embodiment 4 of the present invention. The pedestrian detection apparatus may be implemented in software and/or hardware, for example, the pedestrian detection apparatus may be configured in a computer device. As shown in FIG. 4 , the device includes an image acquisition module 410 to be detected, an image pedestrian detection module 420 and a detection result determination module 430, wherein:

待检测图像获取模块410，用于获取待检测图像；a to-be-detected image acquisition module 410, configured to acquire the to-be-detected image;

图像行人检测模块420，用于将所述待检测图像输入至训练好的单次目标检测器中，获取所述单次目标检测器的输出信息，其中，所述单次目标检测器是预先对包含初始构建的单次目标检测器和头部检测网络的原始检测模型，进行训练得到的；The image pedestrian detection module 420 is configured to input the to-be-detected image into the trained single-shot target detector, and obtain the output information of the single-shot target detector, wherein the single-shot target detector is pre-tested. The original detection model including the initially constructed single-shot target detector and the head detection network is obtained by training;

检测结果确定模块430，用于根据所述单次目标检测器的输出信息，确定对所述待检测图像的行人检测结果。The detection result determination module 430 is configured to determine the pedestrian detection result of the to-be-detected image according to the output information of the single-shot target detector.

本发明实施例通过待检测图像获取模块获取待检测图像；图像行人检测模块将所述待检测图像输入至训练好的单次目标检测器中，获取所述单次目标检测器的输出信息；检测结果确定模块根据所述单次目标检测器的输出信息，确定对所述待检测图像的行人检测结果；其中，所述单次目标检测器是预先对包含初始构建的单次目标检测器和头部检测网络的原始检测模型，进行训练得到的。通过使用预先对包含初始构建的单次目标检测器和头部检测网络的原始检测模型，进行训练得到的单次目标检测器进行行人检测，实现在保证单次目标检测器的行人检测速度的基础上提高行人检测精度。In the embodiment of the present invention, an image to be detected is acquired through an image acquisition module to be detected; the image pedestrian detection module inputs the to-be-detected image into the trained single-shot target detector, and obtains the output information of the single-shot target detector; detection The result determination module determines the pedestrian detection result of the image to be detected according to the output information of the single-shot target detector; wherein, the single-shot target detector is a pre-determined single-shot target detector including the initial construction and the head The original detection model of the partial detection network is obtained by training. By using the original detection model including the initially constructed single-shot object detector and the head detection network in advance, the single-shot object detector obtained by training is used to detect pedestrians, so as to ensure the pedestrian detection speed of the single-shot object detector. improve pedestrian detection accuracy.

可选的，在上述方案的基础上，所述装置还包括单次目标检测器确定模块，用于：Optionally, on the basis of the above solution, the device further includes a single-shot target detector determination module for:

获取样本图像、所述样本图像对应的行人框标注结果以及所述样本图像对应的行人头部标注结果；obtaining a sample image, a pedestrian frame annotation result corresponding to the sample image, and a pedestrian head annotation result corresponding to the sample image;

基于所述样本图像、所述样本图像对应的行人框标注结果以及所述样本图像对应的行人头部标注结果生成训练样本对，使用训练样本对对预先构建的原始检测模型进行训练，得到训练好的原始检测模型；A training sample pair is generated based on the sample image, the pedestrian frame labeling result corresponding to the sample image, and the pedestrian head labeling result corresponding to the sample image, and the training sample pair is used to train the pre-built original detection model to obtain a trained The original detection model of ;

将训练好的原始检测模型中的单次目标检测器作为训练好的单次目标检测器。Take the one-shot object detector in the trained original detection model as the trained one-shot object detector.

可选的，在上述方案的基础上，所述单次目标检测器确定模块包括：Optionally, on the basis of the above solution, the single-shot target detector determination module includes:

行人框检测单元，用于将所述样本图像输入至初始构建的单次目标检测器中，获得所述单次目标检测器输出的原始特征图、行人框检测结果以及所述行人框检测结果对应的检测分值；The pedestrian frame detection unit is used to input the sample image into the initially constructed single-shot target detector, and obtain the original feature map output by the single-shot target detector, the pedestrian frame detection result, and the corresponding pedestrian frame detection result. test score;

目标行人框确定单元，用于根据所述检测分值对所述行人框检测结果进行排序，根据排序结果获取预设数量的目标行人框检测结果；a target pedestrian frame determination unit, configured to sort the pedestrian frame detection results according to the detection score, and obtain a preset number of target pedestrian frame detection results according to the sorting results;

行人头部检测单元，用于将所述原始特征图以及所述目标行人框检测结果输入至所述头部检测网络中，获得所述头部检测网络输出的行人头部检测结果；a pedestrian head detection unit, configured to input the original feature map and the target pedestrian frame detection result into the head detection network, and obtain the pedestrian head detection result output by the head detection network;

损失值确定单元，用于根据所述行人框检测结果以及所述行人框标注结果确定第一损失值，根据所述行人头部检测结果以及所述行人头部标注结果确定第二损失值，根据所述第一损失值和所述第二损失值确定目标损失值；A loss value determination unit, configured to determine a first loss value according to the pedestrian frame detection result and the pedestrian frame labeling result, determine a second loss value according to the pedestrian head detection result and the pedestrian head labeling result, and determine a second loss value according to the pedestrian head detection result and the pedestrian head labeling result. the first loss value and the second loss value determine a target loss value;

原始检测模型训练单元，用于以所述目标损失值达到收敛条件为目标，对所述原始检测模型进行训练。The original detection model training unit is configured to train the original detection model with the target loss value reaching a convergence condition as the goal.

可选的，在上述方案的基础上，所述单次目标检测器中包括多个原始特征网络层，所述原始检测模型中的目标特征网络层与所述头部检测网络之间还包括上采样模块，所述装置还包括原始检测模型构建模块，用于：Optionally, on the basis of the above solution, the single-shot target detector includes a plurality of original feature network layers, and the target feature network layer in the original detection model and the head detection network also includes an upper layer. Sampling module, the device also includes an original detection model building module for:

根据各所述原始特征网络层输出的原始特征图的图像大小，选取至少一个所述原始特征网络层作为目标特征网络层；According to the image size of the original feature map output by each of the original feature network layers, at least one of the original feature network layers is selected as the target feature network layer;

在所述目标特征网络层后添加上采样模块，在所述上采样模块后添加所述头部检测网络。An upsampling module is added after the target feature network layer, and the head detection network is added after the upsampling module.

可选的，在上述方案的基础上，所述原始检测模型构建模块具体用于：Optionally, on the basis of the above solution, the original detection model building module is specifically used for:

将所述图像大小小于设定阈值的原始特征图对应的原始特征网络层作为所述目标特征网络层。The original feature network layer corresponding to the original feature map whose image size is smaller than the set threshold is used as the target feature network layer.

可选的，在上述方案的基础上，所述单次目标检测器确定模块还包括上采样单元，用于：Optionally, on the basis of the above solution, the single-shot target detector determination module further includes an up-sampling unit for:

获取所述目标特征网络层输出的目标原始特征图，将所述目标原始特征图输入至所述上采样模块中，得到所述上采样模块输出的上采样特征图；Obtain the target original feature map output by the target feature network layer, input the target original feature map into the upsampling module, and obtain the upsampling feature map output by the upsampling module;

相应的，所述行人头部检测单元具体用于：Correspondingly, the pedestrian head detection unit is specifically used for:

将所述原始特征网络层中除目标特征网络层以外的其他原始特征网络层输出的原始特征图、所述上采样特征图以及所述目标行人框检测结果输入至所述头部检测网络中，获得所述头部检测网络输出的行人头部检测结果。Inputting the original feature map, the up-sampling feature map and the target pedestrian frame detection result output by other original feature network layers except the target feature network layer in the original feature network layer into the head detection network, Obtain the pedestrian head detection result output by the head detection network.

可选的，在上述方案的基础上，所述上采样模块为转置卷积模块。Optionally, based on the above solution, the upsampling module is a transposed convolution module.

本发明实施例所提供的行人检测装置可执行本发明任意实施例所提供的行人检测方法，具备执行方法相应的功能模块和有益效果。The pedestrian detection device provided by the embodiment of the present invention can execute the pedestrian detection method provided by any embodiment of the present invention, and has functional modules and beneficial effects corresponding to the execution method.

实施例五Embodiment 5

图5是本发明实施例五所提供的一种计算机设备的结构示意图。图5示出了适于用来实现本发明实施方式的示例性计算机设备512的框图。图5显示的计算机设备512仅仅是一个示例，不应对本发明实施例的功能和使用范围带来任何限制。FIG. 5 is a schematic structural diagram of a computer device according to Embodiment 5 of the present invention. Figure 5 shows a block diagram of an exemplary computer device 512 suitable for use in implementing embodiments of the present invention. The computer device 512 shown in FIG. 5 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present invention.

如图5所示，计算机设备512以通用计算设备的形式表现。计算机设备512的组件可以包括但不限于：一个或者多个处理器516，系统存储器528，连接不同系统组件(包括系统存储器528和处理器516)的总线518。As shown in FIG. 5, computer device 512 takes the form of a general-purpose computing device. Components of computer device 512 may include, but are not limited to, one or more processors 516, system memory 528, and a bus 518 connecting various system components, including system memory 528 and processor 516.

总线518表示几类总线结构中的一种或多种，包括存储器总线或者存储器控制器，外围总线，图形加速端口，处理器516或者使用多种总线结构中的任意总线结构的局域总线。举例来说，这些体系结构包括但不限于工业标准体系结构(ISA)总线，微通道体系结构(MAC)总线，增强型ISA总线、视频电子标准协会(VESA)局域总线以及外围组件互连(PCI)总线。Bus 518 represents one or more of several types of bus structures, including a memory bus or memory controller, peripheral bus, graphics acceleration port, processor 516, or a local bus using any of a variety of bus structures. By way of example, these architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MAC) bus, Enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect ( PCI) bus.

计算机设备512典型地包括多种计算机系统可读介质。这些介质可以是任何能够被计算机设备512访问的可用介质，包括易失性和非易失性介质，可移动的和不可移动的介质。Computer device 512 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by computer device 512, including both volatile and nonvolatile media, removable and non-removable media.

系统存储器528可以包括易失性存储器形式的计算机系统可读介质，例如随机存取存储器(RAM)530和/或高速缓存存储器532。计算机设备512可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例，存储装置534可以用于读写不可移动的、非易失性磁介质(图5未显示，通常称为“硬盘驱动器”)。尽管图5中未示出，可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器，以及对可移动非易失性光盘(例如CD-ROM，DVD-ROM或者其它光介质)读写的光盘驱动器。在这些情况下，每个驱动器可以通过一个或者多个数据介质接口与总线518相连。存储器528可以包括至少一个程序产品，该程序产品具有一组(例如至少一个)程序模块，这些程序模块被配置以执行本发明各实施例的功能。System memory 528 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 530 and/or cache memory 532 . Computer device 512 may further include other removable/non-removable, volatile/non-volatile computer system storage media. For example only, storage device 534 may be used to read and write to non-removable, non-volatile magnetic media (not shown in FIG. 5, commonly referred to as a "hard disk drive"). Although not shown in Figure 5, a disk drive may be provided for reading and writing to removable non-volatile magnetic disks (eg "floppy disks"), as well as removable non-volatile optical disks (eg CD-ROM, DVD-ROM) or other optical media) to read and write optical drives. In these cases, each drive may be connected to bus 518 through one or more data media interfaces. Memory 528 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present invention.

具有一组(至少一个)程序模块542的程序/实用工具540，可以存储在例如存储器528中，这样的程序模块542包括但不限于操作系统、一个或者多个应用程序、其它程序模块以及程序数据，这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块542通常执行本发明所描述的实施例中的功能和/或方法。A program/utility 540 having a set (at least one) of program modules 542, which may be stored, for example, in memory 528, such program modules 542 including, but not limited to, an operating system, one or more application programs, other program modules, and program data , each or some combination of these examples may include an implementation of a network environment. Program modules 542 generally perform the functions and/or methods of the described embodiments of the present invention.

计算机设备512也可以与一个或多个外部设备514(例如键盘、指向设备、显示器524等)通信，还可与一个或者多个使得用户能与该计算机设备512交互的设备通信，和/或与使得该计算机设备512能与一个或多个其它计算设备进行通信的任何设备(例如网卡，调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口522进行。并且，计算机设备512还可以通过网络适配器520与一个或者多个网络(例如局域网(LAN)，广域网(WAN)和/或公共网络，例如因特网)通信。如图所示，网络适配器520通过总线518与计算机设备512的其它模块通信。应当明白，尽管图中未示出，可以结合计算机设备512使用其它硬件和/或软件模块，包括但不限于：微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。Computer device 512 may also communicate with one or more external devices 514 (eg, keyboard, pointing device, display 524, etc.), may also communicate with one or more devices that enable a user to interact with the computer device 512, and/or communicate with Any device (eg, network card, modem, etc.) that enables the computer device 512 to communicate with one or more other computing devices. Such communication may occur through input/output (I/O) interface 522 . Also, the computer device 512 may communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN), and/or a public network such as the Internet) through a network adapter 520 . As shown, network adapter 520 communicates with other modules of computer device 512 via bus 518 . It should be appreciated that, although not shown, other hardware and/or software modules may be used in conjunction with computer device 512, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives and data backup storage systems.

处理器516通过运行存储在系统存储器528中的程序，从而执行各种功能应用以及数据处理，例如实现本发明实施例所提供的行人检测方法，该方法包括：The processor 516 executes various functional applications and data processing by running the program stored in the system memory 528, for example, to implement the pedestrian detection method provided by the embodiment of the present invention, and the method includes:

获取待检测图像；Obtain the image to be detected;

当然，本领域技术人员可以理解，处理器还可以实现本发明任意实施例所提供的行人检测方法的技术方案。Of course, those skilled in the art can understand that the processor can also implement the technical solution of the pedestrian detection method provided by any embodiment of the present invention.

实施例六Embodiment 6

本发明实施例六还提供了一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现本发明实施例所提供的行人检测方法，该方法包括：Embodiment 6 of the present invention further provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, implements the pedestrian detection method provided by the embodiment of the present invention, and the method includes:

获取待检测图像；Obtain the image to be detected;

当然，本发明实施例所提供的一种计算机可读存储介质，其上存储的计算机程序不限于如上所述的方法操作，还可以执行本发明任意实施例所提供的行人检测方法的相关操作。Certainly, the computer program stored on the computer-readable storage medium provided by the embodiment of the present invention is not limited to the above-mentioned method operations, and can also perform related operations of the pedestrian detection method provided by any embodiment of the present invention.

本发明实施例的计算机存储介质，可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。The computer storage medium in the embodiments of the present invention may adopt any combination of one or more computer-readable mediums. The computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples (a non-exhaustive list) of computer readable storage media include: electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), Erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In this document, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质，该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。A computer-readable signal medium may include a propagated data signal in baseband or as part of a carrier wave, with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .

计算机可读介质上包含的程序代码可以用任何适当的介质传输，包括——但不限于无线、电线、光缆、RF等等，或者上述的任意合适的组合。Program code embodied on a computer readable medium may be transmitted using any suitable medium, including - but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

可以以一种或多种程序设计语言或其组合来编写用于执行本发明操作的计算机程序代码，所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++，还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中，远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机，或者，可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for carrying out operations of the present invention may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, but also conventional Procedural programming language - such as the "C" language or similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).

注意，上述仅为本发明的较佳实施例及所运用技术原理。本领域技术人员会理解，本发明不限于这里所述的特定实施例，对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本发明的保护范围。因此，虽然通过以上实施例对本发明进行了较为详细的说明，但是本发明不仅仅限于以上实施例，在不脱离本发明构思的情况下，还可以包括更多其他等效实施例，而本发明的范围由所附的权利要求范围决定。Note that the above are only preferred embodiments of the present invention and applied technical principles. Those skilled in the art will understand that the present invention is not limited to the specific embodiments described herein, and various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the protection scope of the present invention. Therefore, although the present invention has been described in detail through the above embodiments, the present invention is not limited to the above embodiments, and can also include more other equivalent embodiments without departing from the concept of the present invention. The scope is determined by the scope of the appended claims.

Claims

1. A pedestrian detection method, characterized by comprising:

acquiring an image to be detected;

inputting the image to be detected into a trained single target detector, and acquiring output information of the single target detector;

determining a pedestrian detection result of the image to be detected according to the output information of the single target detector;

the single-time target detector is obtained by training an original detection model which comprises an initially constructed single-time target detector and a head detection network in advance.

2. The method of claim 1, further comprising, before inputting the image to be detected into a trained single-pass object detector:

acquiring a sample image, a pedestrian frame marking result corresponding to the sample image and a pedestrian head marking result corresponding to the sample image;

generating a training sample pair based on the sample image, the pedestrian frame labeling result corresponding to the sample image and the pedestrian head labeling result corresponding to the sample image, and training a pre-constructed original detection model by using the training sample pair to obtain a trained original detection model;

and taking the single target detector in the trained original detection model as the trained single target detector.

3. The method of claim 2, wherein the training of the pre-constructed raw detection model using the training sample pairs to obtain a trained raw detection model comprises:

inputting the sample image into an initially constructed single target detector to obtain an original feature map, a pedestrian frame detection result and a detection score corresponding to the pedestrian frame detection result, which are output by the single target detector;

sorting the pedestrian frame detection results according to the detection scores, and acquiring a preset number of target pedestrian frame detection results according to the sorting results;

inputting the original feature map and the target pedestrian frame detection result into the head detection network to obtain a pedestrian head detection result output by the head detection network;

determining a first loss value according to the pedestrian frame detection result and the pedestrian frame marking result, determining a second loss value according to the pedestrian head detection result and the pedestrian head marking result, and determining a target loss value according to the first loss value and the second loss value;

and training the original detection model by taking the goal that the target loss value reaches a convergence condition as a goal.

4. The method of claim 3, wherein the single-pass object detector comprises a plurality of primitive feature network layers, and wherein an upsampling module is further included between the object feature network layer in the primitive detection model and the header detection network, and wherein the method further comprises:

selecting at least one original characteristic network layer as a target characteristic network layer according to the image size of the original characteristic graph output by each original characteristic network layer;

and adding an up-sampling module behind the target feature network layer, and adding the head detection network behind the up-sampling module.

5. The method according to claim 4, wherein the selecting at least one of the original feature network layers as a target feature network layer according to an image size of an original feature map output by each of the original feature network layers comprises:

and taking the original characteristic network layer corresponding to the original characteristic graph with the image size smaller than the set threshold value as the target characteristic network layer.

6. The method according to claim 5, before inputting the original feature map and the target pedestrian frame detection result into the head detection network and obtaining the pedestrian head detection result output by the head detection network, further comprising:

acquiring a target original characteristic diagram output by the target characteristic network layer, and inputting the target original characteristic diagram into the up-sampling module to obtain an up-sampling characteristic diagram output by the up-sampling module;

correspondingly, the inputting the original feature map and the target pedestrian frame detection result into the head detection network to obtain the pedestrian head detection result output by the head detection network includes:

and inputting the original feature map, the up-sampling feature map and the target pedestrian frame detection result output by other original feature network layers except the target feature network layer in the original feature network layer into the head detection network to obtain the pedestrian head detection result output by the head detection network.

7. The method of claim 4, wherein the upsampling module is a transposed convolution module.

8. A pedestrian detection device, characterized by comprising:

the image acquisition module to be detected is used for acquiring an image to be detected;

the image pedestrian detection module is used for inputting the image to be detected into a trained single target detector and acquiring output information of the single target detector, wherein the single target detector is obtained by training an original detection model which comprises the initially constructed single target detector and a head detection network in advance;

and the detection result determining module is used for determining the pedestrian detection result of the image to be detected according to the output information of the single target detector.

9. A computer device, the device comprising:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the pedestrian detection method of any one of claims 1-7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the pedestrian detection method according to any one of claims 1 to 7.