WO2020024851A1 - 目标跟踪方法、计算机设备和存储介质 - Google Patents
目标跟踪方法、计算机设备和存储介质 Download PDFInfo
- Publication number
- WO2020024851A1 WO2020024851A1 PCT/CN2019/097343 CN2019097343W WO2020024851A1 WO 2020024851 A1 WO2020024851 A1 WO 2020024851A1 CN 2019097343 W CN2019097343 W CN 2019097343W WO 2020024851 A1 WO2020024851 A1 WO 2020024851A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- image
- model
- frame
- current
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/277—Analysis of motion involving stochastic approaches, e.g. using Kalman filters
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20076—Probabilistic image processing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
Definitions
- the present application relates to the field of computer technology, and in particular, to an object tracking method, a computer device, and a storage medium.
- the present application provides a target tracking method, computer equipment and storage medium, which can solve the problem of high target tracking loss rate in the traditional method.
- a target tracking method includes:
- a computer device includes a memory and a processor.
- the memory stores a computer program.
- the processor causes the processor to perform the following steps:
- a storage medium storing a computer program.
- the processor causes the processor to perform the following steps:
- the above-mentioned target tracking method, computer equipment, and storage medium intercept the target candidate image according to the target candidate area of the current image frame, determine the target area in the target candidate image, and determine the motion prediction of the next image frame relative to the current image frame through the motion prediction model.
- Data by moving the target area of the current image frame through the motion prediction data, the target candidate area of the next image frame can be determined.
- the approximate location of the target can be determined. This ensures that when switching from the current image frame to the next image frame, the target candidate region can be accurately determined, which improves the accuracy of determining the target candidate region.
- the target area can also be determined in the target candidate area, thereby improving the accuracy of target tracking and reducing the target tracking loss rate.
- FIG. 1 is an application scenario diagram of a target tracking method in an embodiment
- FIG. 2 is an application scenario diagram of a target tracking method in another embodiment
- FIG. 3 is a schematic flowchart of a target tracking method according to an embodiment
- FIG. 4 is a schematic flowchart of a step of determining a target area in an embodiment
- FIG. 5 is a schematic flowchart of a step of obtaining motion prediction data in an embodiment
- FIG. 6 is a schematic flowchart of steps of training a motion prediction model in an embodiment
- FIG. 7 is a schematic flowchart of a step of determining sports training data in an embodiment
- FIG. 8 is a schematic diagram of connection of models in an object tracking method according to an embodiment
- FIG. 9 is a schematic structural diagram of a multitasking model in an embodiment
- FIG. 10 is a schematic flowchart of a target according method according to an embodiment
- FIG. 11 is a schematic diagram of labeling a preset prediction classification in an embodiment
- FIG. 12 is a schematic diagram of determining a preset prediction classification of an image frame in an embodiment
- FIG. 13 is a schematic diagram of determining a preset prediction classification of an image frame in another embodiment
- FIG. 14 is a block diagram of an object tracking device in an embodiment
- 15 is a block diagram of a target tracking device in another embodiment
- FIG. 16 is a schematic diagram of an internal structure of a computer device in an embodiment.
- FIG. 1 is an application scenario diagram of a target tracking method in an embodiment.
- the application scenario includes a terminal 110 and at least one camera 120, and the camera is configured to collect an image.
- the terminal 110 may be a desktop terminal or a mobile terminal, and the mobile terminal may be at least one of a mobile phone, a tablet computer, and a notebook computer.
- FIG. 2 is an application scenario diagram of a target tracking method in another embodiment.
- the application scenario includes a terminal 200, and the terminal 200 is a mobile terminal.
- the mobile terminal may be at least one of a mobile phone, a tablet computer, and a notebook computer.
- a camera 210 is installed in the terminal 200, and the camera is used for collecting images.
- the terminal 200 displays the image 220 collected by the camera 210 on a display screen.
- a target tracking method is provided.
- the target tracking method can be applied to the terminal 110 in FIG. 1 or the terminal 200 in FIG. 2 described above. This embodiment is mainly described by using the method applied to the terminal in FIG. 1 or FIG. 2 as an example. Referring to FIG. 3, the target tracking method specifically includes the following steps:
- the terminal determines a target candidate region of the current image frame.
- the current image frame is an image frame currently being processed by the terminal.
- An image frame is the smallest unit image of a sequence of video frames that make up a video image.
- the target candidate region is a candidate region that determines the target region.
- the target candidate region includes a target region.
- the target can be moving or stationary.
- the target can be a moving human face, a moving car, a moving airplane, and so on.
- the target area may refer to one or more image areas where the target is located, and the target area may be represented by a rectangular frame.
- the camera can collect the current image frames in the current field of view in real time, and send the current image frames collected in real time to the terminal.
- the terminal receives the current image frame returned by the camera, identifies the current image frame, identifies a target prediction range in the current image frame, and determines a target candidate region of the current image frame according to the identified target position.
- the target prediction range refers to a range of images in which a target may exist.
- the terminal obtains the current image frame within the current field of view of the camera through the camera, calls a target recognition program to identify the target in the current image frame, obtains the target position through recognition, and determines the target candidate area according to the target position.
- the terminal obtains a current image frame, and obtains a target candidate area determined according to a target area and motion prediction data in a previous image frame.
- the motion prediction data may include at least one of a motion speed, a motion direction, and a motion distance.
- the terminal intercepts a target candidate image matching the target candidate region in the current image frame.
- the target candidate image is a partial image obtained from the current image according to the target candidate region.
- the terminal intercepts the image in the target candidate region, and obtains a target candidate image that matches the target candidate region.
- the image obtained by the interception can be determined as a target candidate image matching the target candidate region.
- the terminal after the terminal recognizes the target candidate region in the current image frame, the terminal enlarges the target candidate region by a preset multiple, and intercepts the target candidate image in the current image frame according to the enlarged target candidate region.
- the terminal expands the side length of the target candidate area by a preset multiple, and intercepts the target candidate image in the current image frame according to the enlarged side length to match the target candidate area.
- S304 specifically includes: expanding the target candidate region in the current image frame according to a preset multiple; determining a target candidate image that matches the enlarged target candidate region in the current image frame; intercepting and determining from the current image frame Target candidate image.
- the terminal enlarges the target candidate region according to a preset multiple, and determines the enlarged target candidate region in the current image frame according to the enlarged target candidate region.
- the terminal intercepts the target candidate image in the current image frame according to the determined target candidate area, and the intercepted target candidate image matches the size of the determined target candidate area.
- the preset multiple may be 1.3 times.
- the terminal takes the length of each side of the target candidate area as a center, and extends the length of each side of the target candidate area toward both ends by a preset multiple.
- the terminal translates the extended side lengths to the outside of the target candidate area in a vertical direction until the endpoints of the side lengths coincide with each other, and the closed area formed by each side length is the enlarged target candidate area.
- the preset multiple may be 1.3 times.
- the terminal determines a target area of the current image frame according to the image characteristics of the target candidate image.
- the target area is an image area where the identified target is located.
- the terminal after intercepting the target candidate image, extracts image features of the target candidate image, performs feature analysis according to the image characteristics, and determines a target region in the target candidate image through feature analysis.
- the terminal inputs the target candidate image into the image feature extraction model, acquires the image features input by the image feature extraction model, inputs the acquired image features into the target positioning model, and determines the target area of the current image frame through the target positioning model.
- the terminal determines the motion prediction data of the next image frame relative to the current image frame according to the image characteristics of the target candidate image by using a motion prediction model.
- the motion prediction data of the next image frame with respect to the current image frame is the data of the target in the next image frame with respect to the target of the current image frame, and the prediction target movement data.
- the motion prediction data includes at least one of a motion direction, a motion speed, and a motion distance. It can be understood that the motion prediction data can be used to indicate the possible movement of the target in the next image frame relative to the target in the current image frame obtained through the prediction, such as which direction to move, the speed of the movement, and how far to move Wait.
- the terminal after acquiring the image features of the target candidate image, the terminal inputs the image features into the motion prediction model, performs feature analysis on the image features through the motion prediction model, and obtains the motion prediction data output by the motion prediction model to obtain the motion prediction data.
- the feature analysis may specifically be at least one of convolution processing, matrix calculation, and vector calculation of image features.
- the terminal determines a target candidate region of a next image frame according to the target region and the motion prediction data.
- the terminal moves the target area in the current image frame according to the motion prediction data, and obtains the target area after the movement.
- the position information of the current image frame is used to determine the target candidate region in the next image frame according to the obtained position information.
- the terminal moves the target area in the current image frame according to the motion prediction data, expands the moved target area by a multiple, obtains the position information of the enlarged target area, and uses the obtained position information to perform the next image
- the target candidate area is determined in the frame.
- the target candidate image is intercepted according to the target candidate region of the current image frame, the target region is determined in the target candidate image, and the motion prediction data of the next image frame relative to the current image frame is determined by the motion prediction model.
- the target candidate area of the next image frame can be determined. This can ensure that the target candidate region can be accurately determined when switching from the current image frame to the next image frame, which improves the accuracy of determining the target candidate region.
- it can also be guaranteed to be in the target candidate region. Determine the target area, thereby improving the accuracy of target tracking and reducing the target tracking loss rate.
- S306 specifically includes a step of determining a target area, and this step specifically includes the following content:
- the terminal determines a target keypoint position according to an image feature of the target candidate image through a target positioning model.
- the key point of the target is used to determine the key point of the target.
- the target key point may be a point in the human face that marks the position of the facial features.
- the target key point may be a point marking the contour of the car.
- the terminal extracts the image features of the target candidate image, inputs the extracted image features into the target positioning model, analyzes the image features through the target positioning model, and obtains the target key points output by the target positioning model. position.
- the target keypoint position is the position of the target keypoint in the target candidate image.
- S402 specifically includes the following: inputting the target candidate image into the image feature extraction model; obtaining the image features output by the image feature extraction model; using the image features as the input of the target positioning model to obtain the target key points of the current image frame position.
- the terminal inputs the target candidate image into the image feature extraction model, analyzes the target candidate image through the image feature extraction model, obtains the image feature of the target candidate image output by the image feature extraction model, and inputs the obtained image feature to the target positioning model. , Analyze the image features through the target positioning model, and output the target keypoint position of the current image frame.
- the terminal determines the classification feature according to the image feature, and uses the classification feature as the input of the target positioning model to obtain the target key point position of the current image frame.
- the terminal determines a target area of the current image frame according to the position of the target keypoint.
- the terminal determines the target position in the current image frame according to the target keypoint position, and determines the target area according to the target position.
- the target keypoint position is determined according to the image feature of the target candidate image through the target positioning model, which improves the accuracy of determining the target keypoint position, and determines the target area of the current image frame according to the target keypoint position, further improving To determine the accuracy of the target area in the current image frame.
- S308 further includes a step of obtaining motion prediction data, and this step specifically includes the following content:
- the terminal inputs an image feature into a classification feature extraction model.
- the terminal uses the image feature as an input of the classification feature extraction model and inputs the image feature to the classification feature extraction model.
- the classification feature extraction model is a model for determining classification features based on image features.
- the terminal obtains classification features output by the classification feature extraction model.
- the classification feature extraction model analyzes the image features to obtain the classification features, and outputs the classification features.
- the terminal obtains the classification features output by the classification feature extraction model.
- the terminal determines the confidence level of the target candidate image according to the classification feature through the target determination model.
- the target determination model is a machine learning model for determining the existence probability of a target in a target candidate image.
- the terminal inputs the classification feature extracted by the classification feature extraction model, inputs the classification feature into the target determination model, analyzes the classification feature through the target determination model, and outputs the confidence level of the target candidate image.
- the terminal uses the classification feature as an input of a motion prediction model to obtain motion prediction data of the next image frame relative to the current image frame.
- the terminal compares the determined confidence level with a preset reliability threshold value.
- the classification feature extracted by the terminal classification feature extraction model is input to the motion prediction model and the motion prediction model. Analyze classification features and output motion prediction data.
- the terminal acquires the motion prediction data output by the motion prediction model, and uses the acquired motion prediction data to obtain the motion prediction data of the next image frame relative to the current image frame.
- the classification feature is used as the input of the motion prediction model to obtain the motion prediction data of the next image frame relative to the current image frame.
- the motion prediction model determines the probability corresponding to each preset prediction classification according to the classification feature. Value; determining a preset prediction classification corresponding to the maximum probability value; obtaining motion prediction data corresponding to the determined preset prediction classification.
- the preset prediction classification is a classification based on the target motion data. Each preset prediction classification corresponds to unique motion prediction data.
- the terminal inputs the classification feature into the motion prediction model, and the motion prediction model determines the probability value corresponding to each preset prediction classification according to the classification feature, compares each probability value, determines the maximum probability value through comparison, and selects the corresponding maximum probability value Preset prediction classification, to obtain motion prediction data corresponding to the selected preset prediction classification.
- the terminal uses the acquired motion prediction data as the motion prediction data of the next image frame relative to the current image frame.
- the terminal compares the determined confidence with a preset reliability threshold, and when the determined confidence is less than the preset reliability threshold, the target tracking is ended.
- the image feature is input into a classification feature extraction model
- the classification feature is extracted through the classification feature extraction model
- the classification feature is input into a target determination model to determine the confidence level of the target candidate image
- the confidence level is used to determine whether a target exists in the target candidate image.
- the classification feature is input to the input of the motion prediction model to obtain the motion prediction data of the next image frame relative to the current image frame, thereby Can accurately track the target and improve the tracking efficiency of the target.
- the target tracking method further includes a step of training a motion prediction model, which specifically includes the following content:
- the terminal obtains model training data.
- the model training data is sample data used for training a machine learning model.
- the terminal obtains the storage path of the model training data, and obtains the model training data according to the storage path.
- the terminal obtains a storage path of model training data, generates a data acquisition request according to the storage path, and acquires model training data from a database according to the data acquisition request.
- the terminal reads the current training frame and the next training frame from the model training data.
- the model training data includes continuous image training frames.
- the terminal reads the current training frame and the next training frame from the image training frames according to the arrangement order of the image training frames.
- the terminal extracts image features in the current training frame.
- the model training data includes image features corresponding to each image training frame. After reading the current training frame, the terminal extracts image features corresponding to the current training frame from the model training data.
- the terminal performs model training according to the image features extracted from the current training frame, the position of the labeled target key point, and the confidence level of the label to obtain a target positioning model and a target determination model.
- the target positioning model is a model for locating target key points in an image frame.
- the target determination model is a model for determining whether a target exists in an image frame.
- the terminal extracts the target keypoint position and the confidence level corresponding to the current training frame from the model training data, uses the extracted target keypoint position as the labeled target keypoint position, and uses the extracted confidence level as the labeled confidence level. degree.
- the terminal takes the extracted image features as the input of the target positioning model, and uses the labeled target keypoint position as the output of the target positioning model for training to obtain the target positioning model.
- the terminal uses the extracted image features as the input of the target determination model, and uses the confidence level of the mark as the output of the target determination model for training to obtain the target determination model.
- the terminal determines classification features according to the extracted image features, and uses the determined classification features as input to the target determination model, and uses the confidence level of the mark as the output of the target determination model for training to obtain the target determination model.
- the above training process may be performed by any computer equipment including the terminal, and the training model is then sent to the terminal for use, so as to save the processing resources of the terminal and ensure the normal operation of the terminal.
- S608 includes the steps of training each model.
- This step specifically includes the following: performing model training according to the current training frame and the image features in the current training frame to obtain an image feature extraction model; using the image in the current training frame Features as input, model training with target keypoint positions marked in the current training frame as output to obtain the target positioning model; image features in the current training frame as input, and classification features marked in the current training frame as output for model training To obtain a classification feature extraction model; perform model training according to the classification features of the current training frame marker and the confidence level of the current training frame marker to obtain a target determination model.
- the terminal uses the current training frame as the input of the image feature extraction model, and uses the image features in the current training frame as the output of the image feature extraction model for training to obtain the image feature extraction model.
- the terminal uses the image features in the current training frame as the input of the target positioning model, and uses the target keypoint position marked in the current training frame as the output of the target positioning model to perform model training to obtain the target positioning model.
- the terminal uses the image features in the current training frame as the input of the classification feature extraction model, and uses the classification features marked in the current training frame as the output of the classification feature extraction model to perform model training to obtain the classification feature extraction model.
- the terminal uses the classification features of the current training frame marker as the input of the target determination model, and uses the confidence of the current training frame marker as the output of the target determination model to perform model training to obtain the target determination model.
- the terminal determines motion training data of a next training frame relative to a current training frame.
- the model training data includes motion training data between two adjacent frames. After reading the current training frame and the next training frame, the terminal extracts the motion training data of the next training frame relative to the current training frame from the model training data.
- the exercise training data includes at least one of exercise speed, exercise method, and exercise distance.
- the terminal trains a motion prediction model according to the extracted image features and the determined motion training data.
- the terminal uses the extracted image features as the input of the motion prediction model, and uses the training data of the motion as the output of the motion prediction model for model training, and obtains the motion prediction model through training.
- the terminal determines classification features based on the extracted image features, determines preset prediction classifications based on the determined training data, and uses the determined classification features as input to the motion prediction model, and uses the determined preset prediction classifications as output. Perform model training, and get a motion prediction model through training.
- the terminal when training the motion prediction model, uses L as a loss function, where L is as follows:
- T represents the number of preset prediction categories
- s j represents a probability value belonging to the j-th preset prediction category.
- the current training frame and the next training frame are read from the model training data, and the image frames in the current training frame are extracted.
- Frame training is performed separately on the motion training data of the current training frame, and the motion prediction model, target positioning model, and target determination model are obtained by the model training. Accuracy of prediction data to accurately track targets.
- S610 further includes a step of determining sports training data, and this step specifically includes the following content:
- the terminal obtains a target area marked in a next training frame.
- the model training data includes the position of the target region corresponding to each frame in the image frame.
- the terminal queries the model training data for the position of the labeled target region corresponding to the next training frame, and determines the labeled target region in the next training frame according to the position of the target region.
- the terminal determines a target prediction region of a next training frame corresponding to each preset prediction classification according to the current training frame.
- the terminal determines the target area in the current training frame, moves the determined target area according to the preset motion training data corresponding to each preset prediction category, and obtains the moved target area corresponding to each preset prediction category to The moved target area is used as the target prediction area for the next training frame.
- the terminal obtains the prediction accuracy corresponding to each preset prediction classification according to the target area and the target prediction area.
- the terminal determines the intersection area and the union area of the target prediction area and the target area in the next training frame, and divides the intersection area by The union area is used to obtain the prediction accuracy corresponding to the preset prediction classification, thereby obtaining the prediction accuracy corresponding to each preset prediction classification.
- S706 specifically includes the following: determining an intersection region and a union region between the target prediction region and the target region in the next training frame corresponding to each preset prediction classification; and calculating each preset prediction separately The area ratio between the intersection area corresponding to the classification and the corresponding union area is obtained to obtain the prediction accuracy corresponding to the corresponding preset prediction classification.
- the terminal determines an intersection region and a union region between the target prediction region and the target region for the target prediction region and the target region in the next training frame corresponding to each preset prediction classification to obtain the current preset prediction classification correspondence.
- Intersection area and union area The terminal calculates the area area of the intersection area and the union area for each preset prediction classification corresponding to the intersection area and the union area, and divides the area area of the intersection area by the area area of the union area to obtain the intersection area and the union area.
- the area ratio of a region uses the area ratio corresponding to each preset prediction classification as the prediction accuracy.
- the terminal determines the preset training data corresponding to the preset prediction classification corresponding to the highest prediction accuracy as the training data of the next training frame relative to the current training frame.
- the terminal compares the prediction accuracy, determines the highest prediction accuracy through comparison, determines the preset prediction classification corresponding to the highest prediction accuracy, and obtains and determines Training data corresponding to a preset prediction classification of.
- Sports training data includes exercise speed and direction.
- each area is expressed according to the area ratio of the intersection area and the union area between the target area and the target prediction area of the next image frame.
- FIG. 8 is a schematic diagram of connection of models in an object tracking method according to an embodiment.
- the image feature extraction model is connected to a target positioning model and a classification feature extraction model, respectively, and the classification feature extraction model is connected to a target determination model and a motion prediction model, respectively.
- the image feature extraction model receives the input target candidate image of the current image frame, extracts the image features of the target candidate image, and inputs the image feature input into the target positioning model and the classification feature extraction model, respectively.
- the target positioning model outputs target keypoint positions based on image features.
- the classification feature extraction model outputs classification features according to the image features, and the classification features are input into the target determination model and the motion prediction model, respectively.
- the target decision model outputs confidence based on classification features.
- the motion prediction model outputs motion prediction data of the next image frame relative to the current image frame according to the classification feature.
- FIG. 9 is a schematic structural diagram of a multitasking model in an embodiment.
- the multi-task model is composed of an image feature extraction branch, a target positioning branch, a classification feature extraction branch, a target determination branch, and a motion prediction branch.
- the image feature extraction branch is connected to the target positioning branch and the classification feature extraction branch, respectively
- the classification feature extraction branch is connected to the target determination branch and the motion prediction branch, respectively.
- the image feature extraction branch is composed of an image feature extraction model
- the target positioning branch is composed of a target positioning model
- the classification feature extraction branch is composed of a classification feature extraction model
- the target determination branch is composed of a target determination model
- the motion prediction branch is composed of a motion prediction model.
- the target candidate image frame is input into the image feature extraction branch, and the image feature extraction branch receives the input target candidate image of the current image frame, extracts the image feature of the target candidate image, and divides the image.
- Feature input is divided into target localization branch and classification feature extraction branch.
- the target positioning branch outputs target keypoint positions according to image features.
- the classification feature extraction branch outputs classification features according to the image features, and the classification features are input into the target determination model and the motion prediction branch, respectively.
- the target decision branch outputs confidence based on the classification features.
- the motion prediction branch generates motion prediction data of the next image frame relative to the current image frame according to the classification feature.
- the multi-task model outputs the motion prediction data generated by the motion prediction branches.
- FIG. 10 is a schematic flowchart of a target according method according to an embodiment.
- the target is a human face
- the target candidate region is a human face candidate region
- the target candidate image is a human face candidate image
- the target key point position is a human face key point position.
- the terminal uses the first image frame as the current image frame, it performs face detection on the current image frame, determines the face candidate area through the face detection, intercepts the face candidate image based on the face candidate area, and inputs the face candidate image into the image features.
- Extraction model image feature extraction model extracts the image and transfers it to you, and inputs the image features into the target positioning model and the classification feature extraction model;
- the target positioning model outputs the key position of the face according to the image feature;
- the classification feature extraction model outputs the classification based on the image feature Feature, and input the classification feature into the target judgment model;
- the target judgment model outputs the confidence level according to the classification feature.
- the target tracking ends; when the confidence level is greater than or equal to the preset reliability threshold value, the classification feature extraction
- the model inputs the classification features into the motion prediction model; the motion prediction model outputs the motion prediction data of the next image frame relative to the current image frame according to the classification characteristics; the terminal according to the position of the face keypoints of the current image frame and the next image frame relative to the current image Frame motion prediction data
- the face candidate area of the next image frame is determined, the next image frame is used as the current image frame, and the face candidate image is intercepted and executed according to the face candidate area until the target tracking is ended.
- FIG. 11 is a schematic diagram of labeling a preset prediction classification in an embodiment.
- the gray area in FIG. 11 identifies the target candidate area.
- a mark with a moving speed of 0 in the next image frame relative to the target candidate area in the current image frame is a preset prediction category 0.
- the next image frame is relative to the current image frame.
- the target candidate region has a motion speed of 1 and is labeled as the preset prediction classifications 1-8 according to the preset prediction classifications of the 8 motion directions.
- FIG. 12 is a schematic diagram of determining a preset prediction classification of an image frame in an embodiment.
- the moving direction corresponding to the image frame in FIG. 12 is to the right and the moving speed is 1.
- the preset prediction classification of the image frame in FIG. 12 can be determined to be 3.
- FIG. 13 is a schematic diagram of determining a preset prediction classification of an image frame in another embodiment.
- the movement speed corresponding to the image frame in FIG. 13 is 0, and based on the preset prediction classification marked in FIG. 11, it can be determined that the preset prediction classification of the image frame in FIG. 13 is 0.
- a target tracking device 1400 is provided.
- the device specifically includes the following: a candidate region determination module 1402, a candidate image capture module 1404, a target region determination module 1406, and a prediction data determination module 1408. And prediction region determination module 1410.
- the candidate region determination module 1402 is configured to determine a target candidate region of the current image frame.
- the candidate image interception module 1404 is configured to intercept a target candidate image matching the target candidate region in the current image frame.
- the target area determination module 1406 is configured to determine a target area of the current image frame according to the image characteristics of the target candidate image.
- the prediction data determining module 1408 is configured to determine the motion prediction data of the next image frame relative to the current image frame by using a motion prediction model and according to the image characteristics of the target candidate image.
- a prediction region determining module 1410 is configured to determine a target candidate region of a next image frame according to a target region and motion prediction data.
- the target candidate image is intercepted according to the target candidate region of the current image frame, the target region is determined in the target candidate image, and the motion prediction data of the next image frame relative to the current image frame is determined by the motion prediction model.
- the target candidate area of the next image frame can be determined. This can ensure that the target candidate region can be accurately determined when switching from the current image frame to the next image frame, which improves the accuracy of determining the target candidate region.
- it can also be guaranteed to be in the target candidate region. Determine the target area, thereby improving the accuracy of target tracking and reducing the target tracking loss rate.
- the candidate image interception module 1404 is further configured to expand the target candidate region in the current image frame according to a preset multiple; determine a target candidate image that matches the enlarged target candidate region in the current image frame; and from the current image The determined target candidate image is intercepted in the frame.
- the target area determination module 1406 is further configured to determine a target keypoint position based on an image feature of the target candidate image by using a target positioning model; and determine a target area of the current image frame according to the target keypoint position.
- the target area determination module 1406 is further configured to input the target candidate image into the image feature extraction model; obtain the image features output by the image feature extraction model; use the image features as the input of the target positioning model to obtain the target of the current image frame Key point location.
- the target keypoint position is determined according to the image feature of the target candidate image through the target positioning model, which improves the accuracy of determining the target keypoint position, and determines the target area of the current image frame according to the target keypoint position, further improving To determine the accuracy of the target area in the current image frame.
- the prediction data determination module 1408 is further configured to input the image features into the classification feature extraction model; obtain the classification features output by the classification feature extraction model; use the classification features as the input of the motion prediction model to obtain the next image frame relative to Motion prediction data for the current image frame.
- the prediction data determination module 1408 is further configured to determine the confidence level of the target candidate image by using the target determination model and according to the classification feature; when the determined confidence level is greater than or equal to a preset confidence threshold value, the classification feature is performed as
- the input of the motion prediction model is a step of obtaining the motion prediction data of the next image frame relative to the current image frame; when the determined confidence is less than a preset confidence threshold, the target tracking ends.
- the prediction data determination module 1408 is further configured to determine a probability value corresponding to each preset prediction classification according to the classification characteristics by using a motion prediction model; determine a preset prediction classification corresponding to a maximum probability value; and obtain the determined preset Motion prediction data corresponding to prediction classification.
- the image feature is input into a classification feature extraction model
- the classification feature is extracted through the classification feature extraction model
- the classification feature is input into a target determination model to determine the confidence level of the target candidate image
- the confidence level is used to determine whether a target exists in the target candidate image.
- the classification feature is input to the input of the motion prediction model to obtain the motion prediction data of the next image frame relative to the current image frame, thereby Can accurately track the target and improve the tracking efficiency of the target.
- the target tracking device 1400 further includes a training data acquisition module 1412, a training frame reading module 1414, an image feature extraction module 1416, a motion data determination module 1418, and a prediction module training module 1420. .
- a training data acquisition module 1412 is configured to acquire model training data.
- the training frame reading module 1414 is configured to read the current training frame and the next training frame from the model training data.
- An image feature extraction module 1416 is used to extract image features in the current training frame.
- a motion data determination module 1418 is configured to determine motion training data of a next training frame relative to a current training frame.
- a model training module 1420 is configured to train a motion prediction model according to the extracted image features and determined motion training data.
- the current training frame and the next training frame are read from the model training data, and the image frames in the current training frame are extracted.
- Frame training is performed separately on the motion training data of the current training frame, and the motion prediction model, target positioning model, and target determination model are obtained by the model training. Accuracy of prediction data to accurately track targets.
- the model training module 1420 is further configured to perform model training according to the image features extracted to the current training frame, the position of the labeled target key point, and the confidence of the label to obtain a target positioning model and a target determination model.
- the model training module 1420 is further configured to perform model training according to the current training frame and the image features in the current training frame to obtain an image feature extraction model; take the image features in the current training frame as input and use the current training frame The target keypoint positions marked in the middle are used as output to perform model training to obtain the target positioning model; the image features in the current training frame are used as input, and the classification features marked in the current training frame are used as output to perform model training to obtain the classification feature extraction model; The classification features of the current training frame marker and the confidence level of the current training frame marker are used to train the model to obtain a target determination model.
- the motion data determination module 1418 is further configured to obtain a target region marked in a next training frame; determine a target prediction region of a next training frame corresponding to each preset prediction classification according to the current training frame; according to the target Area and target prediction area to obtain the prediction accuracy corresponding to each preset prediction classification; determine the preset training data corresponding to the preset prediction classification corresponding to the highest prediction accuracy as the next training frame relative to the current training frame Sports training data.
- the motion data determination module 1418 is further configured to determine an intersection region and a union region between a target prediction region and a target region in a next training frame corresponding to each preset prediction classification; and calculate each prediction separately. Set the area ratio between the intersection area corresponding to the prediction classification and the corresponding union area to obtain the prediction accuracy corresponding to the corresponding preset prediction classification.
- each area is expressed according to the area ratio of the intersection area and the union area between the target area and the target prediction area of the next image frame.
- FIG. 16 is a schematic diagram of an internal structure of a computer device in an embodiment.
- the computer device may be the terminal 200 shown in FIG. 2.
- the computer device includes a processor, a memory, a camera, and a network interface connected through a system bus.
- the memory includes a non-volatile storage medium and an internal memory.
- the non-volatile storage medium of the computer device can store an operating system and a computer program. When the computer program is executed, it can cause the processor to execute a target tracking method.
- the processor of the computer equipment is used to provide computing and control capabilities to support the operation of the entire computer equipment.
- a computer program may be stored in the internal memory, and when the computer program is executed by the processor, the processor may execute a target tracking method.
- the network interface of the computer device is used for network communication.
- the camera is used to capture images.
- FIG. 16 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer equipment or robot to which the solution of the present application is applied.
- the specific computer The device may include more or fewer components than shown in the figure, or some components may be combined, or have different component arrangements.
- the target tracking device 1400 provided in this application may be implemented in the form of a computer program, and the computer program may be run on a computer device as shown in FIG. 16.
- the memory of the computer device may store various program modules constituting the target tracking device, for example, the candidate region determination module 1402, the candidate image capture module 1404, the target region determination module 1406, the prediction data determination module 1408, and the prediction region shown in FIG. Determine module 1410.
- the computer program constituted by each program module causes the processor to execute the steps in the target tracking method of each embodiment of the application described in this specification.
- the computer device shown in FIG. 16 may determine the target candidate area of the current image frame through the candidate area determination module 1402 in the target tracking device 1400 shown in FIG. 14.
- the computer device may intercept the target candidate image matching the target candidate region in the current image frame through the candidate image interception module 1404.
- the computer device may determine the target area of the current image frame according to the image characteristics of the target candidate image through the target area determination module 1406.
- the computer device may determine the motion prediction data of the next image frame relative to the current image frame by using the motion prediction model and the image characteristics of the target candidate image through the prediction data determination module 1408.
- the computer device may determine a target candidate region of the next image frame according to the target region and the motion prediction data through the prediction region determination module 1410.
- a computer device includes a memory and a processor.
- a computer program is stored in the memory.
- the processor causes the processor to perform the following steps: determining a target candidate region of the current image frame; intercepting and targeting the current image frame Target candidate image with candidate region matching; Determine the target region of the current image frame according to the image characteristics of the target candidate image; determine the motion prediction of the next image frame relative to the current image frame through the motion prediction model and according to the image characteristics of the target candidate image Data; determining a target candidate region for the next image frame based on the target region and motion prediction data.
- the processor further performs the following method steps:
- the target candidate region is enlarged in the current image frame according to a preset multiple; the target candidate image matching the enlarged target candidate region is determined in the current image frame; and the determined target candidate image is intercepted from the current image frame.
- the processor further performs the following method steps:
- the target keypoint position is determined; according to the target keypoint position, the target area of the current image frame is determined.
- the processor further performs the following method steps:
- the target candidate image is input to the image feature extraction model; the image feature output from the image feature extraction model is obtained; and the image feature is used as the input of the target positioning model.
- the target keypoint position of the current image frame is input to the image feature extraction model; the image feature output from the image feature extraction model is obtained; and the image feature is used as the input of the target positioning model.
- the processor further performs the following method steps:
- the image feature is input into the classification feature extraction model; the classification feature output by the classification feature extraction model is obtained; the classification feature is used as the input of the motion prediction model to obtain the motion prediction data of the next image frame relative to the current image frame.
- the processor after obtaining the classification features output by the classification feature extraction model, when the computer program is executed by the processor, the processor causes the processor to further perform the following steps: determine the model by the target and determine the confidence level of the target candidate image according to the classification features; When the determined confidence is greater than or equal to a preset confidence threshold, the classification feature is used as an input of the motion prediction model to obtain the motion prediction data of the next image frame relative to the current image frame; when the determined confidence is less than a preset Confidence threshold, target tracking ends.
- the processor further performs the following method steps:
- the probability values corresponding to each preset prediction classification are determined according to the classification characteristics; the preset prediction classification corresponding to the maximum probability value is determined; and the motion prediction data corresponding to the determined preset prediction classification is obtained.
- the processor when the computer program is executed by the processor, the processor causes the processor to further perform the following steps: obtaining model training data; reading the current training frame and the next training frame from the model training data; extracting an image in the current training frame Features; determine the motion training data of the next training frame relative to the current training frame; train a motion prediction model based on the extracted image features and the determined motion training data.
- the processor after extracting the image features in the current training frame, when the computer program is executed by the processor, the processor causes the processor to further perform the following steps: according to the image features extracted to the current training frame, the position of the labeled keypoints and the marker The model is trained with the confidence level of to obtain the target positioning model and target determination model.
- the processor further performs the following method steps:
- Get the target positioning model take the image features in the current training frame as input, and use the classification features marked by the current training frame as output to perform model training to obtain the classification feature extraction model; according to the classification features of the current training frame marker and the current training frame marker Confidence is used for model training to obtain the target decision model.
- the processor further performs the following method steps:
- the target area marked in the next training frame determines the target prediction area of the next training frame corresponding to each preset prediction classification according to the current training frame; obtain the corresponding corresponding predictions of each preset prediction category according to the target area and target prediction area Prediction accuracy; the preset training training data corresponding to the preset prediction classification corresponding to the highest prediction accuracy is determined as the training training data of the next training frame relative to the current training frame.
- the processor further performs the following method steps:
- the target candidate image is intercepted according to the target candidate region of the current image frame, the target region is determined in the target candidate image, and the motion prediction data of the next image frame relative to the current image frame is determined by the motion prediction model.
- the target candidate area of the next image frame can be determined. This can ensure that the target candidate region can be accurately determined when switching from the current image frame to the next image frame, which improves the accuracy of determining the target candidate region.
- it can also be guaranteed to be in the target candidate region. Determine the target area, thereby improving the accuracy of target tracking and reducing the target tracking loss rate.
- a storage medium storing a computer program, when the computer program is executed by a processor, causes the processor to perform the following steps: determining a target candidate region of a current image frame; and intercepting a target candidate matching the target candidate region in the current image frame Image; determine the target area of the current image frame according to the image characteristics of the target candidate image; determine the motion prediction data of the next image frame relative to the current image frame through the motion prediction model and according to the image characteristics of the target candidate image; Motion prediction data to determine the target candidate area for the next image frame.
- intercepting the target candidate image matching the target candidate region in the current image frame includes: expanding the target candidate region in the current image frame by a preset multiple; determining and expanding the target candidate region in the current image frame The matched target candidate image; the determined target candidate image is intercepted from the current image frame.
- determining the target area of the current image frame according to the image characteristics of the target candidate image includes: determining a target keypoint position by using a target positioning model and according to the image characteristics of the target candidate image; determining the current image according to the target keypoint position The target area of the frame.
- determining the position of the target key point by using the target positioning model and according to the image characteristics of the target candidate image includes: inputting the target candidate image into the image feature extraction model; obtaining the image features output by the image feature extraction model; and using the image features As the input of the target positioning model, the target keypoint position of the current image frame is obtained.
- determining the motion prediction data of the next image frame relative to the current image frame by using the motion prediction model and according to the image characteristics of the target candidate image includes: inputting the image characteristics into the classification feature extraction model; obtaining the classification feature extraction model The output classification feature; the classification feature is used as the input of the motion prediction model to obtain the motion prediction data of the next image frame relative to the current image frame.
- the processor after obtaining the classification features output by the classification feature extraction model, when the computer program is executed by the processor, the processor causes the processor to further perform the following steps: determine the model by the target and determine the confidence level of the target candidate image according to the classification features; When the determined confidence is greater than or equal to a preset confidence threshold, the classification feature is used as an input of the motion prediction model to obtain the motion prediction data of the next image frame relative to the current image frame; when the determined confidence is less than a preset Confidence threshold, target tracking ends.
- the classification feature is used as the input of the motion prediction model to obtain the motion prediction data of the next image frame relative to the current image frame.
- the motion prediction model determines the probability corresponding to each preset prediction classification according to the classification feature. Value; determining a preset prediction classification corresponding to the maximum probability value; obtaining motion prediction data corresponding to the determined preset prediction classification.
- the processor when the computer program is executed by the processor, the processor causes the processor to further perform the following steps: obtaining model training data; reading the current training frame and the next training frame from the model training data; extracting an image in the current training frame Features; determine the motion training data of the next training frame relative to the current training frame; train a motion prediction model based on the extracted image features and the determined motion training data.
- the processor after extracting the image features in the current training frame, when the computer program is executed by the processor, the processor causes the processor to further perform the following steps: according to the image features extracted to the current training frame, the position of the labeled target keypoints, and the label The model is trained with the confidence level of to obtain the target positioning model and target determination model.
- the model training is performed according to the image features extracted from the current training frame, the position of the labeled target keypoints, and the confidence of the label.
- Obtaining the target positioning model and the target determination model includes: Model training based on the image features of the image to obtain an image feature extraction model; use the image features in the current training frame as input and the target keypoint position marked in the current training frame as output to perform model training to obtain the target positioning model; use the current training frame
- the image features in the image are used as input, and the classification features of the current training frame mark are used as output to perform model training to obtain a classification feature extraction model.
- the model training is performed according to the classification features of the current training frame mark and the confidence of the current training frame mark to obtain the target decision model.
- determining the motion training data of the next training frame relative to the current training frame includes: obtaining a target area marked in the next training frame; and determining the next training corresponding to each preset prediction classification according to the current training frame.
- the target prediction area of the frame according to the target area and the target prediction area, obtain the prediction accuracy corresponding to each preset prediction classification; determine the preset motion training data corresponding to the preset prediction classification corresponding to the highest prediction accuracy as the following The training data of a training frame relative to the current training frame.
- obtaining the prediction accuracy corresponding to each preset prediction classification according to the target region and the target prediction region includes: determining a target prediction region and a target region in a next training frame corresponding to each preset prediction classification. Between the intersection area and the union area; calculate the area ratio between the intersection area corresponding to each preset prediction category and the corresponding union area, respectively, to obtain the prediction accuracy corresponding to the corresponding preset prediction category.
- the target candidate image is intercepted according to the target candidate region of the current image frame, the target region is determined in the target candidate image, and the motion prediction data of the next image frame relative to the current image frame is determined by the motion prediction model.
- the target candidate area of the next image frame can be determined. This can ensure that the target candidate region can be accurately determined when switching from the current image frame to the next image frame, which improves the accuracy of determining the target candidate region.
- it can also be guaranteed to be in the target candidate region. Determine the target area, thereby improving the accuracy of target tracking and reducing the target tracking loss rate.
- Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
- Volatile memory can include random access memory (RAM) or external cache memory.
- RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
- SRAM static RAM
- DRAM dynamic RAM
- SDRAM synchronous DRAM
- DDRSDRAM dual data rate SDRAM
- ESDRAM enhanced SDRAM
- SLDRAM synchronous chain Synchlink DRAM
- Rambus direct RAM
- DRAM direct memory bus dynamic RAM
- RDRAM memory bus dynamic RAM
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
本申请涉及一种目标跟踪方法、计算机设备和存储介质,该方法包括:确定当前图像帧的目标候选区域;在当前图像帧中截取与所述目标候选区域匹配的目标候选图像;根据所述目标候选图像的图像特征确定所述当前图像帧的目标区域;通过运动预测模型,并根据所述目标候选图像的图像特征,确定下一图像帧相对于当前图像帧的运动预测数据;根据所述目标区域和所述运动预测数据,确定下一图像帧的目标候选区域。本申请方案可以保证在目标候选区域中确定目标区域,从而提高了目标跟踪的准确性,降低了目标跟踪丢失率。
Description
本申请要求于2018年8月1日提交的申请号为2018108670368、发明名称为“目标跟踪方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及计算机技术领域,特别是涉及一种目标跟踪方法、计算机设备和存储介质。
随着计算技术的飞速发展,图像处理技术也加快了发展的步伐。在图像处理技术领域中,尤其是视频图像处理领域,需要对视频图像中的目标进行跟踪。
然而,传统的视频图像中的目标跟踪,一般只能对视频图像中移动速度较慢的目标进行跟踪。在对视频图像中的目标进行跟踪中,如果目标的移动速度较快,很容易出现目标跟踪失败或者目标丢失的情况,很难对目标进行跟踪,从而导致目标跟踪丢失率较高的情况。
发明内容
基于此,本申请提供了一种目标跟踪方法、计算机设备和存储介质,能够解决传统方法中目标跟踪丢失率较高的问题。
一种目标跟踪方法,所述方法包括:
确定当前图像帧的目标候选区域;
在当前图像帧中截取与所述目标候选区域匹配的目标候选图像;
根据所述目标候选图像的图像特征确定所述当前图像帧的目标区域;
通过运动预测模型,根据所述目标候选图像的图像特征,确定下一图像帧相对于当前图像帧的运动预测数据;
根据所述目标区域和所述运动预测数据,确定下一图像帧的目标候选区域。
一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行如下步骤:
确定当前图像帧的目标候选区域;
在当前图像帧中截取与所述目标候选区域匹配的目标候选图像;
根据所述目标候选图像的图像特征确定所述当前图像帧的目标区域;
通过运动预测模型,根据所述目标候选图像的图像特征,确定下一图像帧相对于当前图像帧的运动预测数据;
根据所述目标区域和所述运动预测数据,确定下一图像帧的目标候选区域。
一种存储有计算机程序的存储介质,所述计算机程序被处理器执行时,使得处理器执行如下步骤:
确定当前图像帧的目标候选区域;
在当前图像帧中截取与所述目标候选区域匹配的目标候选图像;
根据所述目标候选图像的图像特征确定所述当前图像帧的目标区域;
通过运动预测模型,根据所述目标候选图像的图像特征,确定下一图像帧相对于当前图像帧的运动预测数据;
根据所述目标区域和所述运动预测数据,确定下一图像帧的目标候选区域。
上述目标跟踪方法、计算机设备和存储介质,根据当前图像帧的目标候选区域截取目标候选图像,在目标候选图像中确定目标区域,通过运动预测模型确定下一图像帧相对于当前图像帧的运动预测数据,通过运动预测数据对当前图像帧的目标区域移动,即可确定下一图像帧的目标候选区域。由于基于运动预测数据,可以确定目标大概移动到的位置,因此,这样可以保证在从当前图像帧切换到下一图像帧时,可以准确确定目标候选区域,提高了确定目标候选区域的准确性,在切换到下一图像帧时,也可以保证在目标候选区域中确定目标区域,从而提高了目标跟踪的准确性,降低了目标跟踪丢失率。
图1为一个实施例中目标跟踪方法的应用场景图;
图2为另一个实施例中目标跟踪方法的应用场景图;
图3为一个实施例中目标跟踪方法的流程示意图;
图4为一个实施例中确定目标区域的步骤的流程示意图;
图5为一个实施例中获得运动预测数据的步骤的流程示意图;
图6为一个实施例中训练运动预测模型的步骤的流程示意图;
图7为一个实施例中确定运动训练数据的步骤的流程示意图;
图8为一个实施例中目标跟踪方法中模型的连接示意图;
图9为一个实施例中多任务模型的结构示意图;
图10为一实施例中目标根据方法的流程示意图;
图11为一个实施例中标记预设预测分类的示意图;
图12为一个实施例中确定图像帧的预设预测分类的示意图;
图13为另一个实施例中确定图像帧的预设预测分类的示意图;
图14为一个实施例中目标跟踪装置的框图;
图15为另一个实施例中目标跟踪装置的框图;
图16为一个实施例中计算机设备的内部结构示意图。
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
图1为一个实施例中目标跟踪方法的应用场景图。参照图1,该应用场景包括终端110和至少一个摄像头120,摄像头用于采集图像。终端110具体可以是台式终端或移动终端,移动终端具体可以是手机、平板电脑和笔记本电脑中的至少一种。
图2为另一个实施例中目标跟踪方法的应用场景图。参照图2,该应用场景包括终端200,终端200为移动终端,移动终端具体可以是手机、平板电脑和笔记本电脑中的至少一种。终端200中安装有摄像头210,摄像头用于采集图像。终端200将摄像头210采集到的图像220展示在显示屏上。
如图3所示,在一个实施例中,提供一种目标跟踪方法。目标跟踪方法可以应用于上述图1中的终端110或图2中的终端200。本实施例主要以该方法应用于图1或图2中的终端来举例说明。参照图3,该目标跟踪方法,具体包括以下步骤:
S302,终端确定当前图像帧的目标候选区域。
其中,当前图像帧为终端当前正在处理的图像帧。图像帧为组成视频图像 的视频帧序列的最小单位图像。目标候选区域为确定目标区域的候选区域。目标候选区域包括目标区域。目标可以是移动的,也可以是静止的。举例说明,目标可以是移动的人脸、移动的汽车和移动的飞机等。
其中,目标区域可以是指目标所在的一个或多个图像区域,该目标区域可以采用矩形框的形式来表示。
具体地,摄像头可以实时采集当前视场范围内的当前图像帧,将实时采集到的当前图像帧发送至终端。终端接收摄像头返回的当前图像帧,对当前图像帧进行识别,识别当前图像帧中目标预测范围,根据识别到的目标位置确定当前图像帧的目标候选区域。其中,目标预测范围是指目标可能存在的图像范围。
在一个实施例中,终端通过摄像头获取摄像头的当前视场范围内的当前图像帧,调用目标识别程序对当前图像帧中的目标进行识别,通过识别得到目标位置,根据目标位置确定目标候选区域。
在一个实施例中,终端获取当前图像帧,获取根据上一图像帧中的目标区域和运动预测数据所确定的目标候选区域。运动预测数据可以包括运动速度、运动方向和运动距离中的至少一种。
S304,该终端在当前图像帧中截取与目标候选区域匹配的目标候选图像。
其中,目标候选图像为根据目标候选区域从当前图像中截取到的部分图像。
具体地,终端在识别当前图像帧中的目标候选区域后,对目标候选区域内的图像进行截取,截取得到与目标候选区域匹配的目标候选图像。通过该截取所得到的图像,可以被确定为与目标候选区域匹配的目标候选图像。
在一个实施例中,终端识别到当前图像帧中的目标候选区域后,按照预设倍数扩大目标候选区域,根据扩大后的目标候选区域在当前图像帧中截取目标候选图像。
在一个实施例中,终端将目标候选区域的边长扩大预设倍数,根据扩大后的边长在当前图像帧中截取,与目标候选区域匹配的目标候选图像。
在一个实施例中,S304具体包括:按照预设倍数在当前图像帧中扩大目标候选区域;在当前图像帧中确定与扩大后的目标候选区域匹配的目标候选图像;从当前图像帧中截取确定的目标候选图像。
具体地,终端按照预设倍数将目标候选区域扩大,按照扩大后的目标候选区域,在当前图像帧中确定扩大后的目标候选区域。终端根据确定的目标候选区域在当前图像帧中截取目标候选图像,截取的目标候选图像与确定的目标候 选区域的大小相匹配。预设倍数具体可以是1.3倍。
在一个实施例中,终端以目标候选区域的各边长为中心,将目标候选区域的各边长向两端方向按照预设倍数延长。终端将延长后的各边长进行按照垂直方向向目标候选区域外部平移,直至各边长的端点两两重合,以各边长形成的闭合区域为扩大后的目标候选区域。预设倍数具体可以是1.3倍。
S306,该终端根据目标候选图像的图像特征确定当前图像帧的目标区域。
其中,目标区域为识别到的目标所在的图像区域。
具体地,终端在截取到目标候选图像后,提取目标候选图像的图像特征,根据图像特征进行特征分析,通过特征分析在目标候选图像中确定目标区域。
在一个实施例中,终端将目标候选图像输入到图像特征提取模型,获取图像特征提取模型输入的图像特征,将获取到的图像特征输入目标定位模型,通过目标定位模型确定当前图像帧的目标区域。
S308,该终端通过运动预测模型,根据目标候选图像的图像特征,确定下一图像帧相对于当前图像帧的运动预测数据。
其中,下一图像帧相对于当前图像帧的运动预测数据为下一图像帧中的目标相对于当前图像帧的目标,预测目标移动的数据。运动预测数据包括运动方向、运动速度和运动距离中的至少一种。可以理解的是,运动预测数据可以用于表示通过预测得到的下一图像帧中的目标相对于当前图像帧中目标的可能移动情况,例如,向哪个方向移动、运动速度以及移动的多大距离等等。
具体地,终端在获取到目标候选图像的图像特征后,将图像特征输入运动预测模型,通过运动预测模型对图像特征进行特征分析,获得运动预测模型输出的运动预测数据,以获得的运动预测数据作为下一图像帧相对于当前图像帧的运动预测数据。特征分析具体可以是对图像特征进行卷积处理、矩阵计算和向量计算中的至少一种。
S310,该终端根据目标区域和运动预测数据,确定下一图像帧的目标候选区域。
具体地,终端在确定当前图像帧的目标区域和下一图像帧相对于当前图像帧的运动预测数据后,根据运动预测数据对当前图像帧中的目标区域进行移动,获取移动后的目标区域在当前图像帧的位置信息,根据获取到的位置信息在下一图像帧中确定目标候选区域。
在一个实施例中,终端根据运动预测数据对当前图像帧中的目标区域进行 移动,按照倍数扩大移动后的目标区域,获取扩大后的目标区域的位置信息,根据获取到的位置信息在下一图像帧中确定目标候选区域。
本实施例中,根据当前图像帧的目标候选区域截取目标候选图像,在目标候选图像中确定目标区域,通过运动预测模型确定下一图像帧相对于当前图像帧的运动预测数据,通过运动预测数据对当前图像帧的目标区域移动,即可确定下一图像帧的目标候选区域。这样可以保证在从当前图像帧切换到下一图像帧时,可以准确确定目标候选区域,提高了确定目标候选区域的准确性,在切换到下一图像帧时,也可以保证在目标候选区域中确定目标区域,从而提高了目标跟踪的准确性,降低了目标跟踪丢失率。
如图4所示,在一个实施例中,S306具体包括确定目标区域的步骤,该步骤具体包括以下内容:
S402,该终端通过目标定位模型,根据目标候选图像的图像特征,确定目标关键点位置。
其中,目标关键点用于确定目标的关键点。
举例说明,当目标是人脸时,目标关键点可以是人脸中的标记五官位置的点。当目标为汽车时,目标关键点可以是标记汽车轮廓的点。
具体地,终端在截取到目标候选图像后,提取目标候选图像的图像特征,将提取到的图像特征输入目标定位模型,通过目标定位模型对图像特征进行分析,获取目标定位模型输出的目标关键点位置。目标关键点位置为在目标候选图像中目标关键点的位置。
在一个实施例中,S402具体包括以下内容:将目标候选图像输入图像特征提取模型;获取图像特征提取模型输出的图像特征;以图像特征作为目标定位模型的输入,得到当前图像帧的目标关键点位置。
具体地,终端将目标候选图像输入图像特征提取模型,通过图像特征提取模型对目标候选图像进行分析,得到图像特征提取模型输出的目标候选图像的图像特征,将得到的图像特征输入到目标定位模型,通过目标定位模型对图像特征进行分析,输出当前图像帧的目标关键点位置。
在一个实施例中,终端根据图像特征确定分类特征,将分类特征作为目标定位模型的输入,得到当前图像帧目标关键点位置。
S404,该终端根据目标关键点位置确定当前图像帧的目标区域。
具体地,终端在获取到目标关键点位置后,根据目标关键点位置在当前图 像帧中确定目标所在位置,根据目标所在位置确定目标区域。
本实施例中,通过目标定位模型根据目标候选图像的图像特征,确定目标关键点位置,提高了确定目标关键点位置的准确性,且根据目标关键点位置确定当前图像帧的目标区域,进一步提高了确定当前图像帧中目标区域的准确性。
如图5所示,在一个实施例中,S308具体还包括获得运动预测数据的步骤,该步骤具体包括以下内容:
S502,该终端将图像特征输入分类特征提取模型。
具体地,终端在获取到目标候选图像的图像特征后,将图像特征作为分类特征提取模型的输入,输入到分类特征提取模型。分类特征提取模型用于根据图像特征确定分类特征的模型。
S504,该终端获取分类特征提取模型输出的分类特征。
具体地,分类特征提取模型在接收到输入的图像特征后,对图像特征进行分析得到分类特征,输出分类特征。终端获取分类特征提取模型输出的分类特征。
S506,该终端通过目标判定模型,根据分类特征确定目标候选图像的置信度。
其中,置信度用于表示目标候选图像中存在目标的概率值。目标判定模型用于确定目标候选图像中目标的存在概率的机器学习模型。
具体地,终端将分类特征提取模型提取到的分类特征,将分类特征输入目标判定模型,通过目标判定模型对分类特征进行分析,输出目标候选图像的置信度。
S508,当确定的置信度大于等于预设置信度阈值,该终端以分类特征作为运动预测模型的输入,得到下一图像帧相对于当前图像帧的运动预测数据。
具体地,终端将确定的置信度与预设置信度阈值进行比较,当确定的置信度大于等于预设置信度,则终端分类特征提取模型提取到的分类特征输入到运动预测模型,运动预测模型对分类特征进行分析,输出运动预测数据。终端获取运动预测模型输出的运动预测数据,以获取到的运动预测数据得到下一图像帧相对于当前图像帧的运动运动预测数据。
在一个实施例中,以分类特征作为运动预测模型的输入,得到下一图像帧相对于当前图像帧的运动预测数据包括:通过运动预测模型,根据分类特征确定各预设预测分类分别对应的概率值;确定最大概率值对应的预设预测分类; 获取确定的预设预测分类所对应的运动预测数据。
其中,预设预测分类为根据目标运动数据进行的分类。每个预设预测分类都与唯一的运动预测数据相对应。
具体地,终端将分类特征输入运动预测模型,运动预测模型根据分类特征确定各预设预测分类分别对应的概率值,将各概率值进行比较,通过比较确定最大概率值,选取最大概率值对应的预设预测分类,获取与选取的预设预测分类对应的运动预测数据。终端以获取到的运动预测数据作为下一图像帧相对于当前图像帧的运动预测数据。
S510,当确定的置信度小于预设置信度阈值,则该终端结束目标跟踪。
具体地,终端将确定的置信度与预设置信度阈值进行比较,当确定的置信度小于预设置信度阈值,则结束目标跟踪。
本实施例中,将图像特征输入分类特征提取模型,通过分类特征提取模型提取分类特征,将分类特征输入目标判定模型确定目标候选图像的置信度,通过置信度确定目标候选图像中是否存在目标,在确定目标候选图像中存在目标时,即确定的置信度大于等于预设置信度阈值时,将分类特征输入运动预测模型的输入,得到下一图像帧相对于当前图像帧的运动预测数据,从而可对目标进行准确的跟踪,提高对目标的跟踪效率。
如图6所示,在一个实施例中,目标跟踪方法还包括训练运动预测模型的步骤,该步骤具体包括以下内容:
S602,终端获取模型训练数据。
其中,模型训练数据为用于训练机器学习模型所用的到的样本数据。
具体地,终端获取模型训练数据的存储路径,根据存储路径获取模型训练数据。
在一个实施例中,终端获取模型训练数据的存储路径,根据存储路径生成数据获取请求,根据数据获取请求从数据库获取模型训练数据。
S604,该终端从模型训练数据中读取当前训练帧和下一训练帧。
具体地,模型训练数据中包括连续的图像训练帧。终端按照图像训练帧的排列顺序从图像训练帧中读取当前训练帧和下一训练帧。
S606,该终端提取当前训练帧中的图像特征。
具体地,模型训练数据中包括每个图像训练帧对应的图像特征。终端在读取当前训练帧后,从模型训练数据中提取与当前训练帧对应的图像特征。
S608,该终端根据提取到当前训练帧的图像特征、标记的目标关键点位置和标记的置信度进行模型训练,得到目标定位模型和目标判定模型。
其中,目标定位模型为对图像帧中的目标关键点进行定位的模型。目标判定模型为对图像帧中是否存在目标所进行判定的模型。
具体地,终端从模型训练数据中提取当前训练帧所对应的目标关键点位置和置信度,以提取到的目标关键点位置作为标记的目标关键点位置,以提取到的置信度作为标记的置信度。终端以提取到的图像特征作为目标定位模型的输入,以标记的目标关键点位置作为目标定位模型的输出进行训练,得到目标定位模型。终端以提取到的图像特征作为目标判定模型的输入,以标记的置信度作为目标判定模型的输出进行训练,得到目标判定模型。
在一个实施例中,终端根据提取到的图像特征确定分类特征,以确定的分类特征作为目标判定模型的输入,以标记的置信度作为目标判定模型的输出进行训练,得到目标判定模型。
在一些实施例中,上述训练过程可以由包括该终端的任一种计算机设备来进行,再将训练得到的模型下发至终端来使用,以节约终端的处理资源,保证终端的正常运行。
在一个实时例中,S608包括训练各模型的步骤,该步骤具体包括以下内容:根据当前训练帧和当前训练帧中的图像特征进行模型训练,得到图像特征提取模型;以当前训练帧中的图像特征作为输入,以当前训练帧中标记的目标关键点位置作为输出进行模型训练,得到目标定位模型;以当前训练帧中的图像特征作为输入,以当前训练帧标记的分类特征作为输出进行模型训练,得到分类特征提取模型;根据当前训练帧标记的分类特征和当前训练帧标记的置信度进行模型训练,得到目标判定模型。
具体地,终端以当前训练帧作为图像特征提取模型的输入,以当前训练帧中的图像特征作为图像特征提取模型的输出进行训练,得到图像特征提取模型。
在一个实施例中,终端以当前训练帧中的图像特征作为目标定位模型的输入,以当前训练帧中标记的目标关键点位置作为目标定位模型的输出进行模型训练,得到目标定位模型。
在一个实施例中,终端以当前训练帧中的图像特征作为分类特征提取模型的输入,以当前训练帧标记的分类特征作为分类特征提取模型的输出进行模型训练,得到分类特征提取模型。
在一个实施例中,终端以当前训练帧标记的分类特征作为目标判定模型的输入,以当前训练帧标记的置信度作为目标判定模型的输出进行模型训练,得到目标判定模型。
S610,该终端确定下一训练帧相对于当前训练帧的运动训练数据。
具体地,模型训练数据中包括相邻两帧之间的运动训练数据。终端在读取到当前训练帧和下一训练帧后,从模型训练数据中提取下一训练帧相对于当前训练帧的运动训练数据。运动训练数据包括运动速度、运动方法和运动距离中的至少一种。
S612,该终端根据提取到的图像特征和确定的运动训练数据训练运动预测模型。
具体地,终端以提取到的图像特征作为运动预测模型的输入,以运动训练数据作为运动预测模型的输出进行模型训练,通过训练得到运动预测模型。
在一个实施例中,终端根据提取到的图像特征确定分类特征,根据确定的运动训练数据确定预设预测分类,以确定的分类特征作为运动预测模型的输入,以确定的预设预测分类作为输出进行模型训练,通过训练得到运动预测模型。
在一个实施例中,终端在训练运动预测模型时,以L作为损失函数,其中L如下述公式:
其中,T表示预设预测分类的数量,s
j表示属于第j个预设预测分类的概率值。
本实施例中,读取模型训练数据中读取当前训练帧和下一训练帧,提取当前训练帧中的图像帧,根据图像特征、标记的目标关键点位置、标记的置信度以及下一训练帧相对于当前训练帧的运动训练数据进行分别进行模型训练,通过模型训练得到运动预测模型、目标定位模型和目标判定模型等,通过运动预测模型、目标定位模型和目标判定模型的配合,提高运动预测数据的准确性,从而准确对目标进行跟踪。
如图7所示,在一个实施例中,S610具体还包括确定运动训练数据的步骤,该步骤具体包括以下内容:
S702,该终端获取下一训练帧中标记的目标区域。
具体地,模型训练数据中包括图像帧中每一帧所对应标记的目标区域位置。终端从模型训练数据中查询与下一训练帧对应的标记的目标区域位置,根据目 标区域位置确定下一训练帧中标记的目标区域。
S704,该终端根据当前训练帧,确定各预设预测分类分别对应的下一训练帧的目标预测区域。
具体地,终端确定当前训练帧中目标区域,对确定的目标区域按照各预设预测分类分别对应的预设运动训练数据进行移动,得到每个预设预测分类对应的移动后的目标区域,以移动后的目标区域作为下一训练帧的目标预测区域。
S706,该终端根据目标区域和目标预测区域,获取各预设预测分类分别对应的预测准确度。
具体地,对于每个预设预测分类对应的下一训练帧的目标预测区域和目标区域,终端在下一训练帧中确定目标预测区域和目标区域的交集面积和并集面积,以交集面积除以并集面积得到预设预测分类对应的预测准确度,从而得到每个预设预测分类对应的预测准确度。
在一个实施例中,S706具体包括以下内容:确定每一预设预测分类对应的下一训练帧中的目标预测区域与目标区域之间的交集区域和并集区域;分别计算每一预设预测分类对应的交集区域与所对应的并集区域之间的面积比值,得到相应预设预测分类所对应的预测准确度。
具体地,终端对于每一预设预测分类对应的下一训练帧中的目标预测区域与目标区域,确定目标预测区域与目标区域之间的交集区域和并集区域,得到当前预设预测分类对应的交集区域和并集区域。终端对于每一预设预测分类对应的交集区域和并集区域,分别计算交集区域和并集区域的区域面积,将交集区域的区域面积除以并集区域的区域面积,得到交集区域和并集区域的面积比值,以各预设预测分类对应面积比值作为预测准确度。
S708,该终端将最高预测准确度对应的预设预测分类所对应的预设运动训练数据,确定为下一训练帧相对于当前训练帧的运动训练数据。
具体地,终端在得到各预设预测分类分别对应的预测准确度后,将各预测准确度进行比较,通过比较确定最高预测准确度,确定最高预测准确度对应的预设预测分类,获取与确定的预设预测分类对应的运动训练数据。运动训练数据包括运动速度和运动方向。
本实施例中,对于每个预设预测分类对应的下一图像帧中目标预测区,根据下一图像帧的目标区域和目标预测区域之间交集区域和并集区域的面积比值,来表示每中预设预测分类对应的预测准确度,以最高预测准确度对应的预设预 测分类作为根据当前图像帧对下一图像帧进行预测时标记的预设预测分类,从而提高了模型训练数据的准确性,提高了模型训练数据的训练准确性。
图8为一个实施例中目标跟踪方法中模型的连接示意图。参照图8,图像特征提取模型分别与目标定位模型和分类特征提取模型相连接,分类特征提取模型分别与目标判定模型和运动预测模型相连接。
图像特征提取模型接收输入的当前图像帧的目标候选图像,提取目标候选图像的图像特征,将图像特征输入分别输入目标定位模型和分类特征提取模型。目标定位模型根据图像特征输出目标关键点位置。分类特征提取模型根据图像特征输出分类特征,将分类特征分别输入目标判定模型和运动预测模型。目标判定模型根据分类特征输出置信度。运动预测模型根据分类特征输出下一图像帧相对于当前图像帧的运动预测数据。
图9为一个实施例中多任务模型的结构示意图。参照图9,多任务模型由图像特征提取分支、目标定位分支、分类特征提取分支、目标判定分支和运动预测分支组成的。图像特征提取分支分别与目标定位分支和分类特征提取分支相连接,分类特征提取分支分别与目标判定分支和运动预测分支相连接。
其中,图像特征提取分支由图像特征提取模型构成,目标定位分支有目标定位模型构成,分类特征提取分支由分类特征提取模型构成,目标判定分支由目标判定模型构成,运动预测分支由运动预测模型构成。
多任务模型接收到当前图像帧的目标候选图像时,将目标候选图像帧输入图像特征提分支,图像特征提取分支接收输入的当前图像帧的目标候选图像,提取目标候选图像的图像特征,将图像特征输入分别输入目标定位分支和分类特征提取分支。目标定位分支根据图像特征输出目标关键点位置。分类特征提取分支根据图像特征输出分类特征,将分类特征分别输入目标判定模型和运动预测分支。目标判定分支根据分类特征输出置信度。运动预测分支根据分类特征生成下一图像帧相对于当前图像帧的运动预测数据。多任务模型将运动预测分支生成的运动预测数据输出。
图10为一个实施例中目标根据方法的流程示意图。参照图10,目标为人脸,目标候选区域为人脸候选区域,目标候选图像为人脸候选图像,目标关键点位置为人脸关键点位置。
终端以第一图像帧作为当前图像帧时,对当前图像帧进行人脸检测,通过人脸检测确定人脸候选区域,根据人脸候选区域截取人脸候选图像,将人脸候 选图像输入图像特征提取模型;图像特征提取模型提取图像特转给你,分别将图像特征输入目标定位模型和分类特征提取模型;目标定位模型根据图像特征输出人脸关键点位置;分类特征提取模型根据图像特征输出分类特征,将分类特征输入目标判定模型;目标判定模型根据分类特征输出置信度,当置信度小于预设置信度阈值时,结束目标跟踪;当置信度大于等于预设置信度阈值时,分类特征提取模型将分类特征输入运动预测模型;运动预测模型根据分类特征输出下一图像帧相对于当前图像帧的运动预测数据;终端根据当前图像帧的人脸关键点位置和下一图像帧相对于当前图像帧的运动预测数据,确定下一图像帧的人脸候选区域,以下一图像帧作为当前图像帧,返回根据人脸候选区域截取人脸候选图像进行执行,直至结束目标跟踪。
图11为一个实施例中标记预设预测分类的示意图。参考图11,图11中灰色区域标识目标候选区域,下一图像帧相对于当前图像帧中目标候选区域的运动速度为0的标记为预设预测分类0;下一图像帧相对于当前图像帧中目标候选区域的运动速度为1,且根据8个运动方向的预设预测分类分别标记为预设预测分类1-8。
图12为一个实施例中确定图像帧的预设预测分类的示意图。图12中的图像帧对应的运动方向向右,运动速度为1,则基于图10中标记的预设预测分类,则可以确定图12中的图像帧的预设预测分类为3。
图13为另一个实施例中确定图像帧的预设预测分类的示意图。图13中的图像帧对应的运动速度为0,则基于图11中标记的预设预测分类,则可以确定图13中的图像帧的预设预测分类为0。
如图14所示,在一个实施例中,提供一种目标跟踪装置1400,该装置具体包括以下内容:候选区域确定模块1402、候选图像截取模块1404、目标区域确定模块1406、预测数据确定模块1408和预测区域确定模块1410。
候选区域确定模块1402,用于确定当前图像帧的目标候选区域。
候选图像截取模块1404,用于在当前图像帧中截取与目标候选区域匹配的目标候选图像。
目标区域确定模块1406,用于根据目标候选图像的图像特征确定当前图像帧的目标区域。
预测数据确定模块1408,用于通过运动预测模型,并根据目标候选图像的 图像特征,确定下一图像帧相对于当前图像帧的运动预测数据。
预测区域确定模块1410,用于根据目标区域和运动预测数据,确定下一图像帧的目标候选区域。
本实施例中,根据当前图像帧的目标候选区域截取目标候选图像,在目标候选图像中确定目标区域,通过运动预测模型确定下一图像帧相对于当前图像帧的运动预测数据,通过运动预测数据对当前图像帧的目标区域移动,即可确定下一图像帧的目标候选区域。这样可以保证在从当前图像帧切换到下一图像帧时,可以准确确定目标候选区域,提高了确定目标候选区域的准确性,在切换到下一图像帧时,也可以保证在目标候选区域中确定目标区域,从而提高了目标跟踪的准确性,降低了目标跟踪丢失率。
在一个实施例中,候选图像截取模块1404还用于按照预设倍数在当前图像帧中扩大目标候选区域;在当前图像帧中确定与扩大后的目标候选区域匹配的目标候选图像;从当前图像帧中截取确定的目标候选图像。
在一个实施例中,目标区域确定模块1406还用于通过目标定位模型,并根据目标候选图像的图像特征,确定目标关键点位置;根据目标关键点位置确定当前图像帧的目标区域。
在一个实施例中,目标区域确定模块1406还用于将目标候选图像输入图像特征提取模型;获取图像特征提取模型输出的图像特征;以图像特征作为目标定位模型的输入,得到当前图像帧的目标关键点位置。
本实施例中,通过目标定位模型根据目标候选图像的图像特征,确定目标关键点位置,提高了确定目标关键点位置的准确性,且根据目标关键点位置确定当前图像帧的目标区域,进一步提高了确定当前图像帧中目标区域的准确性。
在一个实施例中,预测数据确定模块1408还用于将图像特征输入分类特征提取模型;获取分类特征提取模型输出的分类特征;以分类特征作为运动预测模型的输入,得到下一图像帧相对于当前图像帧的运动预测数据。
在一个实施例中,预测数据确定模块1408还用于通过目标判定模型,并根据分类特征确定目标候选图像的置信度;当确定的置信度大于等于预设置信度阈值,则执行以分类特征作为运动预测模型的输入,得到下一图像帧相对于当前图像帧的运动预测数据的步骤;当确定的置信度小于预设置信度阈值,则结束目标跟踪。
在一个实施例中,预测数据确定模块1408还用于通过运动预测模型,根据 分类特征确定各预设预测分类分别对应的概率值;确定最大概率值对应的预设预测分类;获取确定的预设预测分类所对应的运动预测数据。
本实施例中,将图像特征输入分类特征提取模型,通过分类特征提取模型提取分类特征,将分类特征输入目标判定模型确定目标候选图像的置信度,通过置信度确定目标候选图像中是否存在目标,在确定目标候选图像中存在目标时,即确定的置信度大于等于预设置信度阈值时,将分类特征输入运动预测模型的输入,得到下一图像帧相对于当前图像帧的运动预测数据,从而可对目标进行准确的跟踪,提高对目标的跟踪效率。
如图15所示,在一个实施例中,目标跟踪装置1400具体还包括:训练数据获取模块1412、训练帧读取模块1414、图像特征提取模块1416、运动数据确定模块1418和预测模块训练模块1420。
训练数据获取模块1412,用于获取模型训练数据。
训练帧读取模块1414,用于从模型训练数据中读取当前训练帧和下一训练帧。
图像特征提取模块1416,用于提取当前训练帧中的图像特征。
运动数据确定模块1418,用于确定下一训练帧相对于当前训练帧的运动训练数据。
模型训练模块1420,用于根据提取到的图像特征和确定的运动训练数据训练运动预测模型。
本实施例中,读取模型训练数据中读取当前训练帧和下一训练帧,提取当前训练帧中的图像帧,根据图像特征、标记的目标关键点位置、标记的置信度以及下一训练帧相对于当前训练帧的运动训练数据进行分别进行模型训练,通过模型训练得到运动预测模型、目标定位模型和目标判定模型等,通过运动预测模型、目标定位模型和目标判定模型的配合,提高运动预测数据的准确性,从而准确对目标进行跟踪。
在一个实施例中,模型训练模块1420还用于根据提取到当前训练帧的图像特征、标记的目标关键点位置和标记的置信度进行模型训练,得到目标定位模型和目标判定模型。
在一个实施例中,模型训练模块1420还用于根据当前训练帧和当前训练帧中的图像特征进行模型训练,得到图像特征提取模型;以当前训练帧中的图像特征作为输入,以当前训练帧中标记的目标关键点位置作为输出进行模型训练, 得到目标定位模型;以当前训练帧中的图像特征作为输入,以当前训练帧标记的分类特征作为输出进行模型训练,得到分类特征提取模型;根据当前训练帧标记的分类特征和当前训练帧标记的置信度进行模型训练,得到目标判定模型。
在一个实施例中,运动数据确定模块1418还用于获取下一训练帧中标记的目标区域;根据当前训练帧,确定各预设预测分类分别对应的下一训练帧的目标预测区域;根据目标区域和目标预测区域,获取各预设预测分类分别对应的预测准确度;将最高预测准确度对应的预设预测分类所对应的预设运动训练数据,确定为下一训练帧相对于当前训练帧的运动训练数据。
在一个实施例中,运动数据确定模块1418还用于确定每一预设预测分类对应的下一训练帧中的目标预测区域与目标区域之间的交集区域和并集区域;分别计算每一预设预测分类对应的交集区域与所对应的并集区域之间的面积比值,得到相应预设预测分类所对应的预测准确度。
本实施例中,对于每个预设预测分类对应的下一图像帧中目标预测区,根据下一图像帧的目标区域和目标预测区域之间交集区域和并集区域的面积比值,来表示每中预设预测分类对应的预测准确度,以最高预测准确度对应的预设预测分类作为根据当前图像帧对下一图像帧进行预测时标记的预设预测分类,从而提高了模型训练数据的准确性,提高了模型训练数据的训练准确性。
图16为一个实施例中计算机设备的内部结构示意图。参照图16,该计算机设备可以是图2中所示的终端200,该计算机设备包括通过系统总线连接的处理器、存储器、摄像头和网络接口。其中,存储器包括非易失性存储介质和内存储器。该计算机设备的非易失性存储介质可存储操作系统和计算机程序。该计算机程序被执行时,可使得处理器执行一种目标跟踪方法。该计算机设备的处理器用于提供计算和控制能力,支撑整个计算机设备的运行。该内存储器中可储存有计算机程序,该计算机程序被处理器执行时,可使得处理器执行一种目标跟踪方法。计算机设备的网络接口用于进行网络通信。摄像头用于采集图像。
本领域技术人员可以理解,图16中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备或机器人的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,本申请提供的目标跟踪装置1400可以实现为一种计算机 程序的形式,计算机程序可在如图16所示的计算机设备上运行。计算机设备的存储器中可存储组成该目标跟踪装置的各个程序模块,比如,图14所示的候选区域确定模块1402、候选图像截取模块1404、目标区域确定模块1406、预测数据确定模块1408和预测区域确定模块1410。各个程序模块构成的计算机程序使得处理器执行本说明书中描述的本申请各个实施例的目标跟踪方法中的步骤。
例如,图16所示的计算机设备可以通过如图14所示的目标跟踪装置1400中的候选区域确定模块1402确定当前图像帧的目标候选区域。计算机设备可通过候选图像截取模块1404在当前图像帧中截取与目标候选区域匹配的目标候选图像。计算机设备可通过目标区域确定模块1406根据目标候选图像的图像特征确定当前图像帧的目标区域。计算机设备可通过预测数据确定模块1408通过运动预测模型,并根据目标候选图像的图像特征,确定下一图像帧相对于当前图像帧的运动预测数据。计算机设备可通过预测区域确定模块1410根据目标区域和运动预测数据,确定下一图像帧的目标候选区域。
一种计算机设备,包括存储器和处理器,存储器中存储有计算机程序,计算机程序被处理器执行时,使得处理器执行如下步骤:确定当前图像帧的目标候选区域;在当前图像帧中截取与目标候选区域匹配的目标候选图像;根据目标候选图像的图像特征确定当前图像帧的目标区域;通过运动预测模型,并根据目标候选图像的图像特征,确定下一图像帧相对于当前图像帧的运动预测数据;根据目标区域和运动预测数据,确定下一图像帧的目标候选区域。
在一个实施例中,所述处理器还执行下述方法步骤:
按照预设倍数在当前图像帧中扩大目标候选区域;在当前图像帧中确定与扩大后的目标候选区域匹配的目标候选图像;从当前图像帧中截取确定的目标候选图像。
在一个实施例中,所述处理器还执行下述方法步骤:
通过目标定位模型,并根据目标候选图像的图像特征,确定目标关键点位置;根据目标关键点位置确定当前图像帧的目标区域。
在一个实施例中,所述处理器还执行下述方法步骤:
通过目标定位模型,并根据目标候选图像的图像特征,确定目标关键点位置包括:将目标候选图像输入图像特征提取模型;获取图像特征提取模型输出的图像特征;以图像特征作为目标定位模型的输入,得到当前图像帧的目标关 键点位置。
在一个实施例中,所述处理器还执行下述方法步骤:
将图像特征输入分类特征提取模型;获取分类特征提取模型输出的分类特征;以分类特征作为运动预测模型的输入,得到下一图像帧相对于当前图像帧的运动预测数据。
在一个实施例中,获取分类特征提取模型输出的分类特征之后,计算机程序被处理器执行时,使得处理器还执行如下步骤:通过目标判定模型,并根据分类特征确定目标候选图像的置信度;当确定的置信度大于等于预设置信度阈值,则执行以分类特征作为运动预测模型的输入,得到下一图像帧相对于当前图像帧的运动预测数据的步骤;当确定的置信度小于预设置信度阈值,则结束目标跟踪。
在一个实施例中,所述处理器还执行下述方法步骤:
通过运动预测模型,根据分类特征确定各预设预测分类分别对应的概率值;确定最大概率值对应的预设预测分类;获取确定的预设预测分类所对应的运动预测数据。
在一个实施例中,计算机程序被处理器执行时,使得处理器还执行如下步骤:获取模型训练数据;从模型训练数据中读取当前训练帧和下一训练帧;提取当前训练帧中的图像特征;确定下一训练帧相对于当前训练帧的运动训练数据;根据提取到的图像特征和确定的运动训练数据训练运动预测模型。
在一个实施例中,提取当前训练帧中的图像特征之后,计算机程序被处理器执行时,使得处理器还执行如下步骤:根据提取到当前训练帧的图像特征、标记的目标关键点位置和标记的置信度进行模型训练,得到目标定位模型和目标判定模型。
在一个实施例中,所述处理器还执行下述方法步骤:
根据当前训练帧和当前训练帧中的图像特征进行模型训练,得到图像特征提取模型;以当前训练帧中的图像特征作为输入,以当前训练帧中标记的目标关键点位置作为输出进行模型训练,得到目标定位模型;以当前训练帧中的图像特征作为输入,以当前训练帧标记的分类特征作为输出进行模型训练,得到分类特征提取模型;根据当前训练帧标记的分类特征和当前训练帧标记的置信度进行模型训练,得到目标判定模型。
在一个实施例中,所述处理器还执行下述方法步骤:
获取下一训练帧中标记的目标区域;根据当前训练帧,确定各预设预测分类分别对应的下一训练帧的目标预测区域;根据目标区域和目标预测区域,获取各预设预测分类分别对应的预测准确度;将最高预测准确度对应的预设预测分类所对应的预设运动训练数据,确定为下一训练帧相对于当前训练帧的运动训练数据。
在一个实施例中,所述处理器还执行下述方法步骤:
确定每一预设预测分类对应的下一训练帧中的目标预测区域与目标区域之间的交集区域和并集区域;分别计算每一预设预测分类对应的交集区域与所对应的并集区域之间的面积比值,得到相应预设预测分类所对应的预测准确度。
本实施例中,根据当前图像帧的目标候选区域截取目标候选图像,在目标候选图像中确定目标区域,通过运动预测模型确定下一图像帧相对于当前图像帧的运动预测数据,通过运动预测数据对当前图像帧的目标区域移动,即可确定下一图像帧的目标候选区域。这样可以保证在从当前图像帧切换到下一图像帧时,可以准确确定目标候选区域,提高了确定目标候选区域的准确性,在切换到下一图像帧时,也可以保证在目标候选区域中确定目标区域,从而提高了目标跟踪的准确性,降低了目标跟踪丢失率。
一种存储有计算机程序的存储介质,所述计算机程序被处理器执行时,使得处理器执行如下步骤:确定当前图像帧的目标候选区域;在当前图像帧中截取与目标候选区域匹配的目标候选图像;根据目标候选图像的图像特征确定当前图像帧的目标区域;通过运动预测模型,并根据目标候选图像的图像特征,确定下一图像帧相对于当前图像帧的运动预测数据;根据目标区域和运动预测数据,确定下一图像帧的目标候选区域。
在一个实施例中,在当前图像帧中截取与目标候选区域匹配的目标候选图像包括:按照预设倍数在当前图像帧中扩大目标候选区域;在当前图像帧中确定与扩大后的目标候选区域匹配的目标候选图像;从当前图像帧中截取确定的目标候选图像。
在一个实施例中,根据目标候选图像的图像特征确定当前图像帧的目标区域包括:通过目标定位模型,并根据目标候选图像的图像特征,确定目标关键点位置;根据目标关键点位置确定当前图像帧的目标区域。
在一个实施例中,通过目标定位模型,并根据目标候选图像的图像特征, 确定目标关键点位置包括:将目标候选图像输入图像特征提取模型;获取图像特征提取模型输出的图像特征;以图像特征作为目标定位模型的输入,得到当前图像帧的目标关键点位置。
在一个实施例中,通过运动预测模型,并根据目标候选图像的图像特征,确定下一图像帧相对于当前图像帧的运动预测数据包括:将图像特征输入分类特征提取模型;获取分类特征提取模型输出的分类特征;以分类特征作为运动预测模型的输入,得到下一图像帧相对于当前图像帧的运动预测数据。
在一个实施例中,获取分类特征提取模型输出的分类特征之后,计算机程序被处理器执行时,使得处理器还执行如下步骤:通过目标判定模型,并根据分类特征确定目标候选图像的置信度;当确定的置信度大于等于预设置信度阈值,则执行以分类特征作为运动预测模型的输入,得到下一图像帧相对于当前图像帧的运动预测数据的步骤;当确定的置信度小于预设置信度阈值,则结束目标跟踪。
在一个实施例中,以分类特征作为运动预测模型的输入,得到下一图像帧相对于当前图像帧的运动预测数据包括:通过运动预测模型,根据分类特征确定各预设预测分类分别对应的概率值;确定最大概率值对应的预设预测分类;获取确定的预设预测分类所对应的运动预测数据。
在一个实施例中,计算机程序被处理器执行时,使得处理器还执行如下步骤:获取模型训练数据;从模型训练数据中读取当前训练帧和下一训练帧;提取当前训练帧中的图像特征;确定下一训练帧相对于当前训练帧的运动训练数据;根据提取到的图像特征和确定的运动训练数据训练运动预测模型。
在一个实施例中,提取当前训练帧中的图像特征之后,计算机程序被处理器执行时,使得处理器还执行如下步骤:根据提取到当前训练帧的图像特征、标记的目标关键点位置和标记的置信度进行模型训练,得到目标定位模型和目标判定模型。
在一个实施例中,根据提取到当前训练帧的图像特征、标记的目标关键点位置和标记的置信度进行模型训练,得到目标定位模型和目标判定模型包括:根据当前训练帧和当前训练帧中的图像特征进行模型训练,得到图像特征提取模型;以当前训练帧中的图像特征作为输入,以当前训练帧中标记的目标关键点位置作为输出进行模型训练,得到目标定位模型;以当前训练帧中的图像特征作为输入,以当前训练帧标记的分类特征作为输出进行模型训练,得到分类 特征提取模型;根据当前训练帧标记的分类特征和当前训练帧标记的置信度进行模型训练,得到目标判定模型。
在一个实施例中,确定下一训练帧相对于当前训练帧的运动训练数据包括:获取下一训练帧中标记的目标区域;根据当前训练帧,确定各预设预测分类分别对应的下一训练帧的目标预测区域;根据目标区域和目标预测区域,获取各预设预测分类分别对应的预测准确度;将最高预测准确度对应的预设预测分类所对应的预设运动训练数据,确定为下一训练帧相对于当前训练帧的运动训练数据。
在一个实施例中,根据目标区域和目标预测区域,获取各预设预测分类分别对应的预测准确度包括:确定每一预设预测分类对应的下一训练帧中的目标预测区域与目标区域之间的交集区域和并集区域;分别计算每一预设预测分类对应的交集区域与所对应的并集区域之间的面积比值,得到相应预设预测分类所对应的预测准确度。
本实施例中,根据当前图像帧的目标候选区域截取目标候选图像,在目标候选图像中确定目标区域,通过运动预测模型确定下一图像帧相对于当前图像帧的运动预测数据,通过运动预测数据对当前图像帧的目标区域移动,即可确定下一图像帧的目标候选区域。这样可以保证在从当前图像帧切换到下一图像帧时,可以准确确定目标候选区域,提高了确定目标候选区域的准确性,在切换到下一图像帧时,也可以保证在目标候选区域中确定目标区域,从而提高了目标跟踪的准确性,降低了目标跟踪丢失率。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线 (Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。
Claims (20)
- 一种目标跟踪方法,所述方法包括:确定当前图像帧的目标候选区域;在当前图像帧中截取与所述目标候选区域匹配的目标候选图像;根据所述目标候选图像的图像特征确定所述当前图像帧的目标区域;通过运动预测模型,根据所述目标候选图像的图像特征,确定下一图像帧相对于当前图像帧的运动预测数据;根据所述目标区域和所述运动预测数据,确定下一图像帧的目标候选区域。
- 根据权利要求1所述的方法,其特征在于,所述在当前图像帧中截取与所述目标候选区域匹配的目标候选图像包括:按照预设倍数在当前图像帧中扩大目标候选区域;在当前图像帧中确定与扩大后的目标候选区域匹配的目标候选图像;从当前图像帧中截取确定的目标候选图像。
- 根据权利要求1所述的方法,其特征在于,所述根据所述目标候选图像的图像特征确定所述当前图像帧的目标区域包括:通过目标定位模型,根据所述目标候选图像的图像特征,确定目标关键点位置;根据所述目标关键点位置确定所述当前图像帧的目标区域。
- 根据权利要求3所述的方法,其特征在于,所述通过目标定位模型,根据所述目标候选图像的图像特征,确定目标关键点位置包括:将所述目标候选图像输入图像特征提取模型;获取所述图像特征提取模型输出的图像特征;以所述图像特征作为目标定位模型的输入,得到所述当前图像帧的目标关键点位置。
- 根据权利要求4所述的方法,其特征在于,所述通过运动预测模型,根据所述目标候选图像的图像特征,确定下一图像帧相对于当前图像帧的运动预 测数据包括:将所述图像特征输入分类特征提取模型;获取所述分类特征提取模型输出的分类特征;以所述分类特征作为所述运动预测模型的输入,得到下一图像帧相对于当前图像帧的运动预测数据。
- 根据权利要求5所述的方法,其特征在于,所述获取所述分类特征提取模型输出的分类特征之后,所述方法还包括:通过目标判定模型,并根据所述分类特征确定目标候选图像的置信度;当确定的置信度大于等于预设置信度阈值,则执行所述以所述分类特征作为所述运动预测模型的输入,得到下一图像帧相对于当前图像帧的运动预测数据的步骤;当确定的置信度小于预设置信度阈值,则结束目标跟踪。
- 根据权利要求5所述的方法,其特征在于,所述以所述分类特征作为所述运动预测模型的输入,得到下一图像帧相对于当前图像帧的运动预测数据包括:通过运动预测模型,根据所述分类特征确定各预设预测分类分别对应的概率值;确定最大概率值对应的预设预测分类;获取确定的预设预测分类所对应的运动预测数据。
- 根据权利要求1所述的方法,其特征在于,所述方法还包括:获取模型训练数据;从所述模型训练数据中读取当前训练帧和下一训练帧;提取当前训练帧中的图像特征;确定下一训练帧相对于当前训练帧的运动训练数据;根据提取到的图像特征和确定的运动训练数据训练运动预测模型。
- 根据权利要求8所述的方法,其特征在于,所述提取当前训练帧中的图像特征之后,还包括:根据提取到当前训练帧的图像特征、标记的目标关键点位置和标记的置信度进行模型训练,得到目标定位模型和目标判定模型。
- 根据权利要求9所述的方法,其特征在于,所述根据提取到当前训练帧的图像特征、标记的目标关键点位置和标记的置信度进行模型训练,得到目标定位模型和目标判定模型包括:根据当前训练帧和当前训练帧中的图像特征进行模型训练,得到图像特征提取模型;以当前训练帧中的图像特征作为输入,以当前训练帧中标记的目标关键点位置作为输出进行模型训练,得到目标定位模型;以当前训练帧中的图像特征作为输入,以当前训练帧标记的分类特征作为输出进行模型训练,得到分类特征提取模型;根据当前训练帧标记的分类特征和当前训练帧标记的置信度进行模型训练,得到目标判定模型。
- 根据权利要求8所述的方法,其特征在于,所述确定下一训练帧相对于当前训练帧的运动训练数据包括:获取下一训练帧中标记的目标区域;根据所述当前训练帧,确定各预设预测分类分别对应的下一训练帧的目标预测区域;根据所述目标区域和所述目标预测区域,获取所述各预设预测分类分别对应的预测准确度;将最高预测准确度对应的预设预测分类所对应的预设运动训练数据,确定为下一训练帧相对于当前训练帧的运动训练数据。
- 根据权利要求11所述的方法,其特征在于,所述根据所述目标区域和所述目标预测区域,获取所述各预设预测分类分别对应的预测准确度包括:确定每一预设预测分类对应的下一训练帧中的目标预测区域与所述目标区域之间的交集区域和并集区域;分别计算每一预设预测分类对应的所述交集区域与所对应的所述并集区域之间的面积比值,得到相应预设预测分类所对应的预测准确度。
- 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行下述方法的步骤:确定当前图像帧的目标候选区域;在当前图像帧中截取与所述目标候选区域匹配的目标候选图像;根据所述目标候选图像的图像特征确定所述当前图像帧的目标区域;通过运动预测模型,根据所述目标候选图像的图像特征,确定下一图像帧相对于当前图像帧的运动预测数据;根据所述目标区域和所述运动预测数据,确定下一图像帧的目标候选区域。
- 根据权利要求13所述的计算机设备,其特征在于,所述处理器还执行下述方法步骤:按照预设倍数在当前图像帧中扩大目标候选区域;在当前图像帧中确定与扩大后的目标候选区域匹配的目标候选图像;从当前图像帧中截取确定的目标候选图像。
- 根据权利要求13所述的计算机设备,其特征在于,所述处理器还执行下述方法步骤:通过目标定位模型,根据所述目标候选图像的图像特征,确定目标关键点位置;根据所述目标关键点位置确定所述当前图像帧的目标区域。
- 根据权利要求15所述的计算机设备,其特征在于,所述处理器还执行下述方法步骤:将所述目标候选图像输入图像特征提取模型;获取所述图像特征提取模型输出的图像特征;以所述图像特征作为目标定位模型的输入,得到所述当前图像帧的目标关键点位置。
- 根据权利要求16所述的计算机设备,其特征在于,所述处理器还执行下述方法步骤:将所述图像特征输入分类特征提取模型;获取所述分类特征提取模型输出的分类特征;以所述分类特征作为所述运动预测模型的输入,得到下一图像帧相对于当前图像帧的运动预测数据。
- 根据权利要求17所述的计算机设备,其特征在于,所述处理器还执行下述方法步骤:通过目标判定模型,并根据所述分类特征确定目标候选图像的置信度;当确定的置信度大于等于预设置信度阈值,则执行所述以所述分类特征作为所述运动预测模型的输入,得到下一图像帧相对于当前图像帧的运动预测数据的步骤;当确定的置信度小于预设置信度阈值,则结束目标跟踪。
- 根据权利要求17所述的计算机设备,其特征在于,所述处理器还执行下述方法步骤:通过运动预测模型,根据所述分类特征确定各预设预测分类分别对应的概率值;确定最大概率值对应的预设预测分类;获取确定的预设预测分类所对应的运动预测数据。
- 一种存储有计算机程序的存储介质,所述计算机程序被处理器执行时,使得处理器执行如下述方法的步骤:确定当前图像帧的目标候选区域;在当前图像帧中截取与所述目标候选区域匹配的目标候选图像;根据所述目标候选图像的图像特征确定所述当前图像帧的目标区域;通过运动预测模型,根据所述目标候选图像的图像特征,确定下一图像帧相对于当前图像帧的运动预测数据;根据所述目标区域和所述运动预测数据,确定下一图像帧的目标候选区域。
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP19844552.0A EP3754608A4 (en) | 2018-08-01 | 2019-07-23 | TARGET MONITORING PROCESS, COMPUTER DEVICE AND STORAGE MEDIA |
| US17/033,675 US11961242B2 (en) | 2018-08-01 | 2020-09-25 | Target tracking method, computer device, and storage medium |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810867036.8 | 2018-08-01 | ||
| CN201810867036.8A CN108961315B (zh) | 2018-08-01 | 2018-08-01 | 目标跟踪方法、装置、计算机设备和存储介质 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/033,675 Continuation US11961242B2 (en) | 2018-08-01 | 2020-09-25 | Target tracking method, computer device, and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2020024851A1 true WO2020024851A1 (zh) | 2020-02-06 |
Family
ID=64466919
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2019/097343 Ceased WO2020024851A1 (zh) | 2018-08-01 | 2019-07-23 | 目标跟踪方法、计算机设备和存储介质 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US11961242B2 (zh) |
| EP (1) | EP3754608A4 (zh) |
| CN (1) | CN108961315B (zh) |
| WO (1) | WO2020024851A1 (zh) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112906558A (zh) * | 2021-02-08 | 2021-06-04 | 浙江商汤科技开发有限公司 | 图像特征的提取方法、装置、计算机设备及存储介质 |
Families Citing this family (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10740620B2 (en) * | 2017-10-12 | 2020-08-11 | Google Llc | Generating a video segment of an action from a video |
| JP7122821B2 (ja) * | 2017-12-15 | 2022-08-22 | 川崎重工業株式会社 | ロボットシステム及びロボット制御方法 |
| CN108961315B (zh) * | 2018-08-01 | 2020-02-18 | 腾讯科技(深圳)有限公司 | 目标跟踪方法、装置、计算机设备和存储介质 |
| CN110163068B (zh) * | 2018-12-13 | 2024-12-13 | 腾讯科技(深圳)有限公司 | 目标对象跟踪方法、装置、存储介质和计算机设备 |
| CN109657615B (zh) * | 2018-12-19 | 2021-11-02 | 腾讯科技(深圳)有限公司 | 一种目标检测的训练方法、装置及终端设备 |
| CN109800667A (zh) * | 2018-12-28 | 2019-05-24 | 广州烽火众智数字技术有限公司 | 一种行人跟踪方法和系统 |
| CN110189364B (zh) * | 2019-06-04 | 2022-04-01 | 北京字节跳动网络技术有限公司 | 用于生成信息的方法和装置,以及目标跟踪方法和装置 |
| CN110866049A (zh) * | 2019-11-27 | 2020-03-06 | 北京明略软件系统有限公司 | 目标对象类别的确认方法及装置、存储介质、电子装置 |
| CN111242981A (zh) * | 2020-01-21 | 2020-06-05 | 北京捷通华声科技股份有限公司 | 一种定置物品的跟踪方法、装置和安保设备 |
| CN111598924B (zh) * | 2020-05-08 | 2022-09-30 | 腾讯科技(深圳)有限公司 | 目标跟踪方法、装置、计算机设备及存储介质 |
| CN113191353A (zh) * | 2021-04-15 | 2021-07-30 | 华北电力大学扬中智能电气研究中心 | 一种车速确定方法、装置、设备和介质 |
| CN114219938A (zh) * | 2021-11-30 | 2022-03-22 | 阿里巴巴(中国)有限公司 | 感兴趣区域获取方法 |
| CN115272988A (zh) * | 2022-07-18 | 2022-11-01 | 天翼云科技有限公司 | 一种车辆跟踪方法、装置、设备及介质 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9696404B1 (en) * | 2014-05-06 | 2017-07-04 | The United States Of America As Represented By The Secretary Of The Air Force | Real-time camera tracking system using optical flow feature points |
| CN107784279A (zh) * | 2017-10-18 | 2018-03-09 | 北京小米移动软件有限公司 | 目标跟踪方法及装置 |
| CN108280843A (zh) * | 2018-01-24 | 2018-07-13 | 新华智云科技有限公司 | 一种视频目标检测跟踪方法及设备 |
| CN108961315A (zh) * | 2018-08-01 | 2018-12-07 | 腾讯科技(深圳)有限公司 | 目标跟踪方法、装置、计算机设备和存储介质 |
Family Cites Families (17)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20080073933A (ko) * | 2007-02-07 | 2008-08-12 | 삼성전자주식회사 | 객체 트래킹 방법 및 장치, 그리고 객체 포즈 정보 산출방법 및 장치 |
| JP5394952B2 (ja) * | 2010-03-03 | 2014-01-22 | セコム株式会社 | 移動物体追跡装置 |
| US20170337711A1 (en) * | 2011-03-29 | 2017-11-23 | Lyrical Labs Video Compression Technology, LLC | Video processing and encoding |
| JP5983243B2 (ja) * | 2012-09-26 | 2016-08-31 | 株式会社デンソー | 検出装置 |
| CN103259962B (zh) * | 2013-04-17 | 2016-02-17 | 深圳市捷顺科技实业股份有限公司 | 一种目标追踪方法和相关装置 |
| CN103345644B (zh) * | 2013-06-17 | 2016-08-24 | 华为终端有限公司 | 在线训练的目标检测方法及装置 |
| CN104517275A (zh) * | 2013-09-27 | 2015-04-15 | 株式会社理光 | 对象检测方法和系统 |
| US9449230B2 (en) * | 2014-11-26 | 2016-09-20 | Zepp Labs, Inc. | Fast object tracking framework for sports video recognition |
| CN108028021B (zh) | 2015-09-29 | 2021-10-15 | 索尼公司 | 信息处理设备、信息处理方法和程序 |
| CN106295567B (zh) | 2016-08-10 | 2019-04-12 | 腾讯科技(深圳)有限公司 | 一种关键点的定位方法及终端 |
| CN106778585B (zh) * | 2016-12-08 | 2019-04-16 | 腾讯科技(上海)有限公司 | 一种人脸关键点跟踪方法和装置 |
| CN107066990B (zh) * | 2017-05-04 | 2019-10-11 | 厦门美图之家科技有限公司 | 一种目标跟踪方法及移动设备 |
| US20190034734A1 (en) * | 2017-07-28 | 2019-01-31 | Qualcomm Incorporated | Object classification using machine learning and object tracking |
| US10628961B2 (en) * | 2017-10-13 | 2020-04-21 | Qualcomm Incorporated | Object tracking for neural network systems |
| US10510157B2 (en) * | 2017-10-28 | 2019-12-17 | Altumview Systems Inc. | Method and apparatus for real-time face-tracking and face-pose-selection on embedded vision systems |
| US20190130191A1 (en) * | 2017-10-30 | 2019-05-02 | Qualcomm Incorporated | Bounding box smoothing for object tracking in a video analytics system |
| US10699126B2 (en) * | 2018-01-09 | 2020-06-30 | Qualcomm Incorporated | Adaptive object detection and recognition |
-
2018
- 2018-08-01 CN CN201810867036.8A patent/CN108961315B/zh active Active
-
2019
- 2019-07-23 EP EP19844552.0A patent/EP3754608A4/en active Pending
- 2019-07-23 WO PCT/CN2019/097343 patent/WO2020024851A1/zh not_active Ceased
-
2020
- 2020-09-25 US US17/033,675 patent/US11961242B2/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9696404B1 (en) * | 2014-05-06 | 2017-07-04 | The United States Of America As Represented By The Secretary Of The Air Force | Real-time camera tracking system using optical flow feature points |
| CN107784279A (zh) * | 2017-10-18 | 2018-03-09 | 北京小米移动软件有限公司 | 目标跟踪方法及装置 |
| CN108280843A (zh) * | 2018-01-24 | 2018-07-13 | 新华智云科技有限公司 | 一种视频目标检测跟踪方法及设备 |
| CN108961315A (zh) * | 2018-08-01 | 2018-12-07 | 腾讯科技(深圳)有限公司 | 目标跟踪方法、装置、计算机设备和存储介质 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP3754608A4 |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112906558A (zh) * | 2021-02-08 | 2021-06-04 | 浙江商汤科技开发有限公司 | 图像特征的提取方法、装置、计算机设备及存储介质 |
| CN112906558B (zh) * | 2021-02-08 | 2024-06-11 | 浙江商汤科技开发有限公司 | 图像特征的提取方法、装置、计算机设备及存储介质 |
Also Published As
| Publication number | Publication date |
|---|---|
| US11961242B2 (en) | 2024-04-16 |
| US20210012510A1 (en) | 2021-01-14 |
| EP3754608A1 (en) | 2020-12-23 |
| EP3754608A4 (en) | 2021-08-18 |
| CN108961315B (zh) | 2020-02-18 |
| CN108961315A (zh) | 2018-12-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2020024851A1 (zh) | 目标跟踪方法、计算机设备和存储介质 | |
| KR102728564B1 (ko) | 검출 모델 훈련 방법과 장치, 컴퓨터 장치, 및 저장 매체 | |
| US10140508B2 (en) | Method and apparatus for annotating a video stream comprising a sequence of frames | |
| CN112883896B (zh) | 一种基于bert网络的微表情检测方法 | |
| CN109919977B (zh) | 一种基于时间特征的视频运动人物跟踪与身份识别方法 | |
| CN113557546B (zh) | 图像中关联对象的检测方法、装置、设备和存储介质 | |
| CN110490902A (zh) | 应用于智慧城市的目标跟踪方法、装置、计算机设备 | |
| WO2021012382A1 (zh) | 配置聊天机器人的方法、装置、计算机设备和存储介质 | |
| WO2020024395A1 (zh) | 疲劳驾驶检测方法、装置、计算机设备及存储介质 | |
| CN111160275B (zh) | 行人重识别模型训练方法、装置、计算机设备和存储介质 | |
| CN110598687A (zh) | 车辆识别码的检测方法、装置及计算机设备 | |
| CN110334569A (zh) | 客流量进出识别方法、装置、设备及存储介质 | |
| CN112580499B (zh) | 文本识别方法、装置、设备及存储介质 | |
| CN104992453A (zh) | 基于极限学习机的复杂背景下目标跟踪方法 | |
| CN114743026A (zh) | 目标对象的方位检测方法、装置、设备及计算机可读介质 | |
| CN112766218B (zh) | 基于非对称联合教学网络的跨域行人重识别方法和装置 | |
| CN113780145A (zh) | 精子形态检测方法、装置、计算机设备和存储介质 | |
| CN114663835A (zh) | 一种行人跟踪方法、系统、设备及存储介质 | |
| CN107038400A (zh) | 人脸识别装置及方法和利用其的目标人跟踪装置及方法 | |
| WO2021082045A1 (zh) | 微笑表情检测方法、装置、计算机设备及存储介质 | |
| CN113706481A (zh) | 精子质量检测方法、装置、计算机设备和存储介质 | |
| CN110717449A (zh) | 车辆年检人员的行为检测方法、装置和计算机设备 | |
| WO2021169625A1 (zh) | 网络翻拍照片的检测方法、装置、计算机设备及存储介质 | |
| CN112241705A (zh) | 基于分类回归的目标检测模型训练方法和目标检测方法 | |
| Akter et al. | Advancements in animal tracking: Assessing deep learning algorithms |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19844552 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2019844552 Country of ref document: EP Effective date: 20200915 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
