WO2020238560A1 - 视频目标跟踪方法、装置、计算机设备及存储介质 - Google Patents
视频目标跟踪方法、装置、计算机设备及存储介质 Download PDFInfo
- Publication number
- WO2020238560A1 WO2020238560A1 PCT/CN2020/088286 CN2020088286W WO2020238560A1 WO 2020238560 A1 WO2020238560 A1 WO 2020238560A1 CN 2020088286 W CN2020088286 W CN 2020088286W WO 2020238560 A1 WO2020238560 A1 WO 2020238560A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- image
- frame
- image frame
- detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/215—Motion-based segmentation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/0895—Weakly supervised learning, e.g. semi-supervised or self-supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/174—Segmentation; Edge detection involving the use of two or more images
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/194—Segmentation; Edge detection involving foreground-background segmentation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/248—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20021—Dividing image into blocks, subimages or windows
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20076—Probabilistic image processing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20092—Interactive image processing based on input by user
- G06T2207/20104—Interactive definition of region of interest [ROI]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20112—Image segmentation details
- G06T2207/20156—Automatic seed setting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20112—Image segmentation details
- G06T2207/20164—Salient point detection; Corner detection
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30232—Surveillance
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Definitions
- the embodiments of the application relate to the field of image recognition technology, and in particular to a video target tracking method, device, computer equipment, and storage medium.
- Video target tracking technology refers to tracking the target object of interest in the video, and identifying the target object from each image frame of the video.
- a video target tracking method based on semi-supervised learning is provided. First, train an image segmentation model through some training samples. Then, using the first image frame of the video to be detected, the parameters of the image segmentation model are adjusted so that the image segmentation model is suitable for the extraction of the target object in the video to be detected. Among them, the position of the target object in the first image frame can be manually marked. After that, the adjusted image segmentation model is used to identify the target object from the subsequent image frames of the video to be detected.
- the adjusted image segmentation model cannot accurately identify the target object from the subsequent image frames. In most cases, as the apparent information changes, the prediction results of the model will become very inaccurate.
- a video target tracking method, device, computer equipment, and storage medium are provided.
- a video target tracking method executed by a computer device, the method including:
- the partial detection map Acquiring a partial detection map corresponding to the target image frame in the video to be detected, the partial detection map being generated based on the apparent information of the target object that needs to be tracked by the image segmentation model in the video to be detected;
- Extracting the target object in the target image frame through the adjusted image segmentation model Extracting the target object in the target image frame through the adjusted image segmentation model.
- a video target tracking device comprising:
- the detection image acquisition module is configured to obtain a partial detection image corresponding to the target image frame in the video to be detected, the partial detection image being generated based on the apparent information of the target object in the video to be detected that needs to be tracked by the image segmentation model;
- a motion map acquisition module configured to acquire a relative motion saliency map corresponding to the target image frame, where the relative motion saliency map is generated based on the motion information of the target object;
- the constraint information acquisition module is configured to determine the constraint information corresponding to the target image frame according to the local detection map and the relative motion saliency map, and the constraint information includes absolute positive sample pixels and absolute pixels in the target image frame. Negative sample pixels and uncertain sample pixels;
- a model adjustment module configured to adjust the parameters of the image segmentation model through the constraint information to obtain an adjusted image segmentation model
- the target segmentation module is used to extract the target object in the target image frame through the adjusted image segmentation model.
- a computer device includes a processor and a memory, the memory stores at least one instruction, at least one program, code set, or instruction set, the at least one instruction, the at least one program, the code
- the set or instruction set is loaded and executed by the processor to realize the video target tracking method described above.
- a computer program product when the computer program product is executed, it is used to execute the above video target tracking method.
- Figure 1a exemplarily shows a schematic diagram of an application environment for video target tracking
- Figure 1b exemplarily shows a schematic diagram of video target tracking
- FIG. 2 is a flowchart of a video target tracking method provided by an embodiment of the present application
- Figure 3 exemplarily shows a schematic diagram of the overall process of the technical solution of the present application
- FIG. 4 exemplarily shows a schematic diagram of the parameter adjustment process of the target detection model
- Figure 5 exemplarily shows an architecture diagram of an image segmentation model
- Figure 6 exemplarily shows a schematic diagram of samples extracted by the traditional method and the method of the present application
- Fig. 7 is a block diagram of a video target tracking device provided by an embodiment of the present application.
- FIG. 8 is a block diagram of a video target tracking device provided by another embodiment of the present application.
- Fig. 9 is a structural block diagram of a computer device provided by an embodiment of the present application.
- the video target tracking method provided in this application can be applied to the application environment as shown in FIG. 1a.
- the computer device 102 and the video capture device 104 communicate through a network, as shown in FIG. 1a.
- the computer device 102 can obtain the video to be detected from the video capture device 104, and obtain the partial detection image corresponding to the target image frame in the video to be detected.
- the partial detection image is based on the appearance of the target object that needs to be tracked by the image segmentation model in the video to be detected.
- Information generation, the image segmentation model is a neural network model used to segment and extract the target object from the image frame of the video to be detected; obtain the relative motion saliency map corresponding to the target image frame, which is based on the motion information of the target object Generated; According to the local detection map and the relative motion saliency map, determine the constraint information corresponding to the target image frame.
- the constraint information includes the absolute positive sample pixels, absolute negative sample pixels and uncertain sample pixels in the target image frame;
- the parameters of the segmentation model are adjusted to obtain the adjusted image segmentation model; through the adjusted image segmentation model, the target object in the target image frame is extracted.
- the computer device 102 may be implemented by an independent server or a server cluster composed of multiple servers.
- the video capture device 104 may include a surveillance camera or a terminal with a camera.
- Video target tracking technology can be used in many different application scenarios. For example, in security scenarios, suspects in surveillance videos can be tracked and identified. For another example, in an application scenario of video analysis processing, image frames containing a specific character in a movie or TV series can be extracted, so as to integrate a video segment of the specific character.
- Figure 1b it exemplarily shows a schematic diagram of video target tracking.
- Figure 1b contains multiple image frames of the video, labeled 11, 12, 13, and 14. If you want to track the characters and vehicles in each image frame of the video, you can train an image segmentation model, input each image frame into the image segmentation model, and extract the characters and vehicles from the image segmentation model. .
- the person and the vehicle can be labeled with masks respectively, so that the person and the vehicle are marked in the image frame.
- the execution subject of each step is a computer device.
- Computer equipment can be any electronic equipment with computing, processing and storage capabilities.
- the computer device can be a PC (Personal Computer, personal computer) or a server, it can also be a terminal device such as a mobile phone, a tablet computer, a multimedia player, a wearable device, a smart TV, etc., or a drone, a vehicle-mounted terminal, etc.
- Other devices are not limited in this embodiment of the application.
- FIG. 2 shows a flowchart of a video target tracking method provided by an embodiment of the present application.
- the method can include the following steps (201 ⁇ 205):
- Step 201 Obtain a partial detection image corresponding to a target image frame in the video to be detected.
- an image frame can be given, the mask of the target object is marked in the image frame, and then the image segmentation model is used to segment and extract the other image frames of the video to be detected Out the target object.
- the target object may be a person or an object, which is not limited in the embodiment of the present application.
- the mask of the target object is marked in the first image frame of the video to be detected, and then the target object is segmented and extracted from subsequent image frames of the video to be detected by using an image segmentation model.
- marking the mask of the target object in the given image frame (such as the first image frame) can be done manually.
- the target image frame may be any image frame in the video to be detected without the target object marked, that is, the image frame of the target object needs to be extracted from the image segmentation model.
- the local detection map is generated based on the apparent information of the target object to be tracked.
- apparent information refers to information that can be distinguished visually, such as color, shape, texture and other information.
- the target image frame is processed by the target detection model to obtain the local detection map corresponding to the target image frame.
- the target detection model may be a model obtained by training a convolutional neural network.
- the size of the local detection map is the same as the size of the target image frame. For example, if the size of the target image frame is 800*600 pixels, the size of the local inspection map is also 800*600 pixels.
- the value of the target pixel in the local detection image reflects the probability that the target pixel at the same position in the target image frame belongs to the target object, and the probability is determined based on the performance information of the target pixel.
- the target object in the video to be detected is tracked and identified through the image segmentation model.
- the image segmentation model is a neural network model used to segment and extract the target object from the image frame of the video to be detected.
- the image segmentation model may be a deep learning model constructed based on a convolutional neural network.
- it is necessary to perform online adaptive training on the image segmentation model, adjust the parameters of the model (such as the weight of the neural network), and then Use the adjusted image segmentation model to segment the target object.
- this step may include the following sub-steps:
- the training samples are used to train the target detection model to adjust and optimize the parameters of the target detection model.
- the training sample includes a labeled image frame and a detection target frame corresponding to the labeled image frame.
- Annotated image frame refers to an image frame that has annotated the mask of the target object.
- the annotated image frame may include the image frame in which the mask of the target object is manually annotated as described above, or may include the image frame in which the mask of the target object is annotated by the image segmentation model.
- a training sample includes a labeled image frame and a detection target frame corresponding to this labeled image frame. Therefore, multiple training samples can be selected from a labeled image frame.
- the detection target frame refers to an image area where the proportion of the target object is greater than a preset threshold.
- a frame is added to a marked image frame. In the image area in this frame, part of the image area may belong to the target object, and some part does not belong to the target object.
- the preset threshold may be set in advance according to actual requirements. For example, the preset threshold is 0.5.
- the frame described above may be rectangular or other shapes, which is not limited in the embodiment of the present application.
- the training samples are selected in the following way: randomly scatter a frame in the labeled image frame, calculate the proportion of the target object in the frame, and if the proportion of the target object in the frame is greater than the preset threshold, then The frame is determined as the detection target frame corresponding to the labeled image frame, and the labeled image frame and the detection target frame are selected as training samples.
- the Faster-RCNN network is selected as the framework of the target detection model.
- the parameters (such as network weights) of the target detection model are fine-tuned through the training samples selected above to obtain the adjusted target detection model.
- the batch size can be 1, fine-tuning 600 rounds, and the frame size, aspect ratio, etc. can be adjusted during the training process , In order to finally train a higher precision target detection model.
- the target image frame is processed through the adjusted target detection model to obtain a partial detection map.
- the mask of the target object in the first image frame of the video to be detected is manually annotated.
- the target object is segmented and extracted sequentially. If you need to obtain the local detection image corresponding to the i-th image frame in the video to be detected, and i is an integer greater than 1, you can select at least one training sample from the first image frame and the i-1th image frame, and pass The training sample adjusts the parameters of the current target detection model to obtain the adjusted target detection model, and then uses the adjusted target detection model to process the i-th image frame to obtain the i-th image frame Local inspection map.
- Step 202 Obtain a relative motion saliency map corresponding to the target image frame.
- the relative motion saliency map is generated based on the motion information of the target object.
- the position of the target object in each image frame of the video to be detected may not be static, and it may move.
- the motion information reflects the motion of the target object, that is, the position change in different image frames.
- the relative motion saliency map is determined by detecting the optical flow between adjacent image frames, and the optical flow reflects the motion information of the target object.
- optical flow refers to the movement of pixels in a video image over time.
- the relative motion saliency map has the same size as the target image frame. For example, if the size of the target image frame is 800*600 pixels, the size of the relative motion saliency map is also 800*600 pixels.
- the value of the target pixel in the relative motion saliency map reflects the probability that the target pixel at the same position in the target image frame belongs to the target object, and the probability is determined based on the motion information of the target pixel.
- this step may include the following sub-steps:
- the adjacent image frame refers to the image frame adjacent to the target image frame in the video to be detected.
- the number of adjacent image frames may be one or multiple, which is not limited in the embodiment of the present application.
- the adjacent image frame may include the previous image frame, the subsequent image frame, or the previous image frame and the subsequent image frame at the same time.
- the previous image frame refers to the image frame located before the target image frame in the video to be detected
- the subsequent image frame refers to the image frame located after the target image frame in the video to be detected.
- the previous image frame is the previous image frame of the target image frame
- the subsequent image frame is the next image frame of the target image frame.
- the target image frame is the i-th image frame
- the previous image frame is the i-1th image frame
- the subsequent image frame is the i+1th image frame
- i is an integer greater than 1.
- FlowNet2 is used as the basic model for calculating the optical flow between the target image frame and the adjacent image frame.
- FlowNet2 is a model that uses CNN (Convolutional Neural Networks, convolutional neural network) to extract optical flow, which has the advantages of fast speed and high accuracy.
- CNN Convolutional Neural Networks, convolutional neural network
- a relative motion saliency map corresponding to the target image frame is generated according to the optical flow.
- the relative motion saliency map is generated as follows:
- the background area in the local detection image refers to the remaining area outside the area where the target object detected in the local detection image is located.
- the area where the target object is located and the background area can be determined.
- the average value of the optical flow of each pixel in the background area is taken as the background optical flow.
- the difference between the optical flow of each pixel and the background optical flow is calculated by RMS (Root Mean Square) to obtain the relative motion saliency map corresponding to the target image frame.
- RMS Root Mean Square
- the second norm of absolute optical flow can be increased, and the ratio of the two parts is 1:1, that is, the following formula is used to calculate the relative motion saliency image pixel (m, n)
- the value of RMS m,n is the value of RMS m,n :
- O m, n is the optical flow of the pixel (m, n), and ⁇ is the background optical flow.
- Step 203 Determine the constraint information corresponding to the target image frame according to the local detection map and the relative motion saliency map.
- the constraint information includes absolute positive sample pixels, absolute negative sample pixels and uncertain sample pixels in the target image frame.
- the absolute positive sample pixels refer to the pixels in the target image frame that are determined to belong to the target object based on the above-mentioned appearance information and motion information.
- absolutely negative sample pixels refer to pixels in the target image frame that are determined not to belong to the target object based on the above-mentioned appearance information and motion information.
- Uncertain sample pixels refer to pixels in the target image frame that cannot be determined whether they belong to the target object based on the above-mentioned appearance information and motion information.
- the restriction information may also be referred to as a restriction flow.
- the target pixel in the target image frame if the value of the target pixel in the local detection image meets the first preset condition, and the value of the target pixel in the relative motion saliency image meets the second preset condition, it is determined
- the target pixel is an absolute positive sample pixel; if the value of the target pixel in the local detection image does not meet the first preset condition, and the value of the target pixel in the relative motion saliency image does not meet the second preset condition, the target pixel is determined to be absolutely negative sample pixel; if the value of the target pixel in the local detection image meets the first preset condition, and the value of the target pixel in the relative motion saliency image does not meet the second preset condition, or the target pixel is in the local detection image If the value does not meet the first preset condition, and the value of the target pixel in the relative motion saliency map meets the second preset condition, it is determined that the target pixel is an uncertain sample pixel.
- the first preset condition is greater than the first threshold
- the second preset condition is greater than the second threshold.
- the first threshold is 0.7
- the second threshold is 0.5.
- the first threshold and the second threshold may be preset according to actual conditions, and the foregoing is only an example.
- Step 204 Adjust the parameters of the image segmentation model through the constraint information to obtain the adjusted image segmentation model.
- the constraint information can be used to adaptively learn the image segmentation model, fine-tune its parameters, and improve its accuracy when segmenting and extracting the target object from the target image frame.
- absolute positive sample pixels and absolute negative sample pixels are used to adjust the parameters of the image segmentation model to obtain an adjusted image segmentation model. That is, when adjusting the parameters of the image segmentation model, only absolute positive sample pixels and absolute negative sample pixels are used, and uncertain sample pixels are not considered.
- the loss function of the image segmentation model can adopt a cross-entropy loss function, the expression of which is:
- L represents the value of the loss function
- x is the target image frame
- Y is the pixel-level constraint information of the target image frame x
- Y + and Y - are absolute positive sample pixels and absolute negative sample pixels
- P( ⁇ ) is The prediction result of the target image frame x by the image segmentation model.
- the difference between the expression of the loss function and the expression of the traditional loss function is that the expression of the loss function does not calculate the loss of uncertain sample pixels, so that the overwhelming area can be ignored and the confidence area can be better learned.
- Step 205 Extract the target object in the target image frame through the adjusted image segmentation model.
- the target image frame is input to the adjusted image segmentation model, and the target object in the target image frame is extracted by segmentation.
- the image segmentation model can be trained for adaptive adjustment once for each image frame, or can be trained for adaptive adjustment once every several image frames (such as 5 image frames). Considering the small changes in the position of the target object in the adjacent image frames, the image segmentation model performs adaptive adjustment training every several image frames, which can reduce the amount of calculation and improve the accuracy of the model without losing as much as possible. The processing efficiency of the entire video.
- each adaptive adjustment training can be trained for one round or multiple rounds (such as 3 rounds), which is not limited in the embodiment of the present application.
- the parameters of the image segmentation model are adjusted through constraint information. Since the constraint information integrates the apparent information and motion information of the target object, on the one hand, it can overcome the expression of the target object in the video to be detected in different image frames. On the other hand, it can reduce the error propagation in the adaptive learning process. At the same time, through the complementation of the two parts, it can generate more accurate guidance for each model parameter update, thereby better constraining the model parameters. Adjustment process.
- FIG. 3 it exemplarily shows a schematic diagram of the overall flow of the technical solution of the present application.
- the detection target frame corresponding to the target image frame is extracted through the target detection model, and the partial detection map is further obtained, and the optical flow corresponding to the target image frame is extracted through the optical flow model, and further
- the relative motion saliency map corresponding to the target image frame is calculated, and the local detection map and the relative motion saliency map are merged to obtain constraint information.
- the parameters of the image segmentation model are adjusted to obtain the adjusted image segmentation model.
- the target object in the target image frame is extracted.
- the image segmentation model can include a feature extractor, a spatial hole convolution module, a deconvolution and upsampling module and other components.
- FIG. 4 it exemplarily shows a schematic diagram of the parameter adjustment process of the target detection model. Randomly select a frame in the labeled image frame, calculate the proportion of the target object in the frame, and select the training sample of the target detection model based on the proportion. The parameters of the target detection model are fine-tuned through the training samples to obtain the adjusted target detection model. After that, the target image frame is input to the adjusted target detection model, and a partial detection map corresponding to the target image frame is obtained.
- the parameters of the image segmentation model are adjusted through constraint information. Since the constraint information is obtained by combining the apparent information and motion information of the target object, it can be Overcome the problem of the large apparent difference of the target object in the video to be detected in different image frames. On the other hand, it can reduce the error propagation in the adaptive learning process. At the same time, through the complementarity of the two parts, a more accurate model can be generated every time.
- the parameter update guidance can better constrain the adjustment process of the model parameters, so that the performance of the image segmentation model after the parameter adjustment is better, and the accuracy of the target object extracted from the target image frame is finally higher.
- the motion information can be characterized more accurately.
- the pre-training process of the image segmentation model is as follows:
- the initial image segmentation model can be an end-to-end trainable convolutional neural network.
- the input is an image and the output is the mask of the target in the image.
- Deeplab V3+ is selected as an end-to-end trainable convolutional neural network. After the network obtains the input three-channel picture information, it can return a prediction mask of the same size.
- FIG. 5 it exemplarily shows an architecture diagram of an image segmentation model.
- ResNet convolutional neural network As the basic feature extractor, adds an ASPP (Atrous Spatial Pyramid Pooling, Atrous spatial pyramid pooling) module after the fifth layer of the ResNet model, and uses different scales of Atrous Convolution to process the output Features, fusion of the features extracted by the third layer ResNet model, which can better restore the segmentation prediction results at various scales, and then return the features learned by the network to high resolution through deconvolution or upsampling, which can effectively improve The accuracy of the image segmentation model.
- the network Corresponding to each frame in the video, the network will output a corresponding scale response map, which is the probability prediction result of segmentation.
- the ResNet 101 network is selected as the basic network of the Deeplab V3+ feature extractor. After the basic convolutional neural network, connect the ASPP module, and introduce the features extracted by the third-layer ResNet model at the same time, add the deconvolution process, and two deconvolution up-sampling modules to obtain high-resolution segmentation result prediction maps .
- the first sample set contains at least one labeled picture
- the second sample set contains at least one labeled video.
- the Pascal VOC database is selected as the first sample set, and the Pascal VOC database has 2913 pixel-level image segmentation data. By learning the semantic segmentation of images, the image segmentation model can be better trained. Initial training can use a batch size of 4, training 8000 rounds.
- the DAVIS16 database is selected as the second sample set to adapt the image segmentation model to the target segmentation task.
- the DAVIS16 database has 50 pixel-level annotated videos with a total of 3455 frames, 30 of which are used for training and 20 are used for testing.
- data expansion can be performed on the sample, for example, the original image can be expanded to multiple different scales, such as scaling the size of the original image by 0.8 times, 1.2 times, and 1.6 times, thereby Make the image segmentation model adapt to images of different scales.
- select the initial learning rate to be 0.001, and set 4 samples for each batch of learning, and drop to 1/10 of the original learning rate every 2400 rounds, a total of 6000 rounds of training, and finally get the pre-trained image segmentation model .
- the pre-training process of the above-mentioned image segmentation model can be executed in the computer device that executes the video target tracking method introduced above, or in other devices other than the computer device, and then other devices will
- the pre-trained image segmentation model is provided to a computer device, and the computer device uses the pre-trained image segmentation model to execute the video target tracking method.
- the pre-training process of the image segmentation model is performed on computer equipment or other equipment, when the computer equipment performs video target tracking on the video to be detected, it needs to adapt the parameters of the pre-trained image segmentation model using the video to be detected Learning and adjustment enable the image segmentation model to output accurate segmentation results for each frame.
- each frame performs an adaptive training process on the image segmentation model to learn and adjust the model parameters.
- the adjustment is based on the prediction result of the previous frame. For example, by using the erosion algorithm to generate absolute positive sample pixels for the prediction result of the previous frame, and then set the pixels outside a certain Euclidean distance of the absolute positive sample as absolute negative sample pixels, and guide the adjustment of model parameters through such constraints.
- the adjusted image segmentation model is used to predict the segmentation result of the target image frame to be detected.
- the traditional method will rely more on the accuracy of the previous frame, and is more rough, and it is difficult to obtain detailed information.
- the method provided by the embodiment of the present application can better consider the motion information and table. Observe the information, so as to supervise the adaptive learning process, in addition, it can better maintain local details.
- the absolute positive sample pixels and the absolute negative sample pixels marked in the adaptive learning process are more accurate and reliable, and the number of uncertain sample pixels is smaller.
- FIG. 6 it exemplarily shows a schematic diagram of absolute positive sample pixels, absolute negative sample pixels, and uncertain sample pixels marked in the adaptive learning process using the method provided by the embodiment of the application.
- the pixels in the area 61 are absolute positive sample pixels
- the pixels in the black area 62 are absolute negative sample pixels
- the pixels in the gray area 63 are uncertain sample pixels. It can be seen from Fig. 6 that the proportion of uncertain sample pixels is very small and has more accurate and reliable edges.
- the constraint information obtained by the method provided in the embodiment of this application not only has a high correct rate of positive and negative samples, but also has a small proportion of uncertain samples, so it can explain the method provided by the embodiment of this application Effectiveness.
- the results obtained by the method provided in the embodiments of the present application are more prominent.
- the method provided in the embodiments of the present application can obtain very accurate results.
- the method provided by the embodiments of the application can significantly improve the accuracy of video target segmentation, and better consider the fusion of the motion information and appearance information of the target object, and special cases such as occlusion, large appearance changes, and background clutter in the video target segmentation , It can effectively constrain the adaptive learning process of the model, and through the introduction of the optimized loss function to constrain the learning process of the model, the accuracy of target segmentation in the video can be improved.
- steps in the flowchart of FIG. 2 are displayed in sequence as indicated by the arrows, these steps are not necessarily performed in sequence in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in FIG. 2 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. The execution of these sub-steps or stages The sequence is not necessarily performed sequentially, but may be performed alternately or in detail with other steps or at least a part of the sub-steps or stages of other steps.
- FIG. 7 shows a block diagram of a video target tracking device provided by an embodiment of the present application.
- the device has the function of realizing the above method example, and the function can be realized by hardware, or by hardware executing corresponding software.
- the device can be a computer device, or it can be set on a computer device.
- the apparatus 700 may include: a detection image acquisition module 710, a motion image acquisition module 720, a constraint information acquisition module 730, a model adjustment module 740, and a target segmentation module 750.
- the detection image acquisition module 710 is configured to obtain a partial detection image corresponding to the target image frame in the video to be detected, and the partial detection image is generated based on the apparent information of the target object in the video to be detected that needs to be tracked by the image segmentation model
- the image segmentation model is a neural network model used to segment and extract the target object from the image frame of the video to be detected.
- the motion map acquisition module 720 is configured to acquire a relative motion saliency map corresponding to the target image frame, where the relative motion saliency map is generated based on the motion information of the target object.
- the constraint information acquisition module 730 is configured to determine constraint information corresponding to the target image frame according to the local detection map and the relative motion saliency map, and the constraint information includes absolute positive sample pixels in the target image frame, absolutely negative sample pixels and uncertain sample pixels.
- the model adjustment module 740 is configured to adjust the parameters of the image segmentation model through the constraint information to obtain an adjusted image segmentation model.
- the target segmentation module 750 is configured to extract the target object in the target image frame through the adjusted image segmentation model.
- the parameters of the image segmentation model are adjusted through constraint information. Since the constraint information is obtained by combining the apparent information and motion information of the target object, it can be Overcome the problem of the large apparent difference of the target object in the video to be detected in different image frames. On the other hand, it can reduce the error propagation in the adaptive learning process. At the same time, through the complementarity of the two parts, a more accurate model can be generated every time.
- the parameter update guidance can better constrain the adjustment process of the model parameters, so that the performance of the image segmentation model after the parameter adjustment is better, and the accuracy of the target object extracted from the target image frame is finally higher.
- the detection image acquisition module 710 includes: a sample selection submodule 711, a model adjustment submodule 712 and a detection image acquisition submodule 713.
- the sample selection submodule 711 is configured to select at least one training sample from the labeled image frame of the video to be detected, the training sample including the labeled image frame and the detection target frame corresponding to the labeled image frame ,
- the detection target frame refers to an image area whose proportion of the target object is greater than a preset threshold.
- the model adjustment sub-module 712 is configured to adjust the parameters of the target detection model through the training samples to obtain an adjusted target detection model.
- the detection image acquisition sub-module 713 is configured to process the target image frame through the adjusted target detection model to obtain the partial detection image.
- sample selection submodule 711 is configured to:
- the frame is determined as the detection target frame corresponding to the marked image frame, and the marked image frame and the The detection target frame is selected as the training sample.
- the motion picture acquisition module 720 includes: an optical flow calculation sub-module 721 and a motion picture acquisition sub-module 722.
- the optical flow calculation sub-module 721 is configured to calculate the optical flow between the target image frame and adjacent image frames.
- the motion map acquisition sub-module 722 is configured to generate the relative motion saliency map according to the optical flow.
- the motion map acquisition sub-module 722 is configured to:
- the background area in the local detection image refers to the area outside the area where the target object is detected in the local detection image Remaining area of
- the relative motion saliency map is generated according to the background optical flow and the optical flow corresponding to the target image frame.
- the restriction information obtaining module 730 is configured to:
- the target pixel in the target image frame when the value of the target pixel in the local detection image meets the first preset condition, and the value of the target pixel in the relative motion saliency image meets the second Under preset conditions, determining that the target pixel is the absolute positive sample pixel;
- the target pixel in the local detection image meets the first preset condition, and the value of the target pixel in the relative motion saliency image does not meet the second preset condition, or When the value of the target pixel in the local detection image does not meet the first preset condition, and the value of the target pixel in the relative motion saliency image meets the second preset condition, it is determined that the The target pixel is the uncertain sample pixel.
- the model adjustment module 740 is configured to use the absolutely positive sample pixels and the absolute negative sample pixels to retrain the image segmentation model to obtain the adjusted image segmentation model .
- the pre-training process of the image segmentation model is as follows:
- the second sample set is used to retrain the image segmentation model after preliminary training to obtain a pre-trained image segmentation model; wherein the second sample set contains at least one tagged video.
- the device provided in the above embodiment when implementing its functions, only uses the division of the above functional modules for illustration. In practical applications, the above functions can be allocated by different functional modules as needed, namely The internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
- the apparatus and method embodiments provided by the above-mentioned embodiments belong to the same concept, and the specific implementation process is detailed in the method embodiments, which will not be repeated here.
- FIG. 9 shows a structural block diagram of a computer device 900 according to an embodiment of the present application.
- the computer device 900 may be a mobile phone, a tablet computer, an e-book reading device, a wearable device, a smart TV, a multimedia playback device, a PC, a server, and the like.
- the terminal 900 includes a processor 901 and a memory 902.
- the processor 901 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on.
- the processor 901 can be implemented in at least one hardware form among DSP (Digital Signal Processing), FPGA (Field Programmable Gate Array), PLA (Programmable Logic Array, Programmable Logic Array) .
- the processor 901 may also include a main processor and a coprocessor.
- the main processor is a processor used to process data in the awake state, also called a CPU (Central Processing Unit, central processing unit); the coprocessor is A low-power processor used to process data in the standby state.
- the processor 901 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used for rendering and drawing content that needs to be displayed on the display screen.
- the processor 901 may further include an AI (Artificial Intelligence) processor, and the AI processor is used to process calculation operations related to machine learning.
- AI Artificial Intelligence
- the memory 902 may include one or more computer-readable storage media, which may be non-transitory.
- the memory 902 may also include high-speed random access memory and non-volatile memory, such as one or more magnetic disk storage devices and flash memory storage devices.
- the non-transitory computer-readable storage medium in the memory 902 is used to store a computer program, and the computer program is used to be executed by the processor 901 to implement the interface switching method provided in the method embodiment of the present application.
- the terminal 900 may optionally further include: a peripheral device interface 903 and at least one peripheral device.
- the processor 901, the memory 902, and the peripheral device interface 903 may be connected by a bus or a signal line.
- Each peripheral device can be connected to the peripheral device interface 903 through a bus, a signal line, or a circuit board.
- the peripheral device may include: at least one of a radio frequency circuit 904, a display screen 905, a camera 906, an audio circuit 907, a positioning component 908, and a power supply 909.
- Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
- Volatile memory may include random access memory (RAM) or external cache memory.
- RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
- SRAM static RAM
- DRAM dynamic RAM
- SDRAM synchronous DRAM
- DDRSDRAM double data rate SDRAM
- ESDRAM enhanced SDRAM
- SLDRAM synchronous chain Channel
- memory bus Radbus direct RAM
- RDRAM direct memory bus dynamic RAM
- RDRAM memory bus dynamic RAM
- FIG. 9 does not constitute a limitation to the terminal 900, and may include more or fewer components than shown, or combine certain components, or adopt different component arrangements.
- a computer device in the illustrated embodiment, includes a processor and a memory, and at least one instruction, at least a section of program, code set, or instruction set is stored in the memory.
- the at least one instruction, at least one program, code set, or instruction set is configured to be executed by one or more processors to implement the video target tracking method described above.
- a computer-readable storage medium stores at least one instruction, at least one program, a code set, or an instruction set, the at least one instruction, the at least one program ,
- the code set or the instruction set implements the above video target tracking method when executed by the processor of the computer device.
- the aforementioned computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
- a computer program product is also provided.
- the computer program product When executed, it is used to implement the aforementioned video target tracking method.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (20)
- 一种视频目标跟踪方法,由计算机设备执行,其特征在于,所述方法包括:获取待检测视频中的目标图像帧对应的局部检测图,所述局部检测图是基于所述待检测视频中需要通过图像分割模型跟踪的目标对象的表观信息生成;获取所述目标图像帧对应的相对运动显著图,所述相对运动显著图是基于所述目标对象的运动信息生成的;根据所述局部检测图和所述相对运动显著图,确定所述目标图像帧对应的约束信息,所述约束信息包括所述目标图像帧中的绝对正样本像素、绝对负样本像素和不确定样本像素;通过所述约束信息对所述图像分割模型的参数进行调整,得到调整后的图像分割模型;通过所述调整后的图像分割模型,提取所述目标图像帧中的所述目标对象。
- 根据权利要求1所述的方法,其特征在于,所述获取待检测视频中的目标图像帧对应的局部检测图,包括:从所述待检测视频的已标注图像帧中,选取至少一个训练样本,所述训练样本包括所述已标注图像帧以及所述已标注图像帧对应的检测目标框,所述检测目标框是指所述目标对象在所述检测目标框的占比大于预设阈值的图像区域;通过所述训练样本对目标检测模型的参数进行调整,得到调整后的目标检测模型;通过所述调整后的目标检测模型对所述目标图像帧进行处理,得到所述局部检测图。
- 根据权利要求2所述的方法,其特征在于,所述从所述待检测视频的已标注图像帧中,选取至少一个训练样本,包括:在所述已标注图像帧中随机撒框;计算所述目标对象在随机撒的所述框中的占比;若所述目标对象在所述框中的占比大于所述预设阈值,则将所述框确定为所述已标注图像帧对应的检测目标框,并将所述已标注图像帧和所述检测目标框选取为所述训练样本。
- 根据权利要求1所述的方法,其特征在于,所述获取所述目标图像帧对应的相对运动显著图,包括:计算所述目标图像帧与邻近图像帧之间的光流;根据所述光流生成所述相对运动显著图。
- 根据权利要求4所述的方法,其特征在于,所述根据所述光流生成所述相对运动显著图,包括:根据所述局部检测图中的背景区域的光流,确定背景光流;其中,所述局部检测图中的背景区域,是指所述局部检测图中检测出的所述目标对象所在区域之外的剩余区域;根据所述背景光流以及所述目标图像帧对应的所述光流,生成所述相对运动显著图。
- 根据权利要求1至5任一项所述的方法,其特征在于,所述根据所述局部检测图和所述相对运动显著图,确定所述目标图像帧对应的约束信息,包括:对于所述目标图像帧中的目标像素,若所述目标像素在所述局部检测图中的值符合第一预设条件,且所述目标像素在所述相对运动显著图中的值符合第二预设条件,则确定所述目标像素为所述绝对正样本像素;若所述目标像素在所述局部检测图中的值不符合所述第一预设条件,且所述目标像素在所述相对运动显著图中的值不符合所述第二预设条件,则确定所述目标像素为所述绝对负样本像素;若所述目标像素在所述局部检测图中的值符合所述第一预设条件,且所述目标像素在所述相对运动显著图中的值不符合所述第二预设条件,则确定所述目标像素为所述不确定样本像素;或者,若所述目标像素在所述局部检测图中的值不符合所述第一预设条件,且所述目标像素在所述相对运动显著图中的值符合所述第二预设条件,则确定所述目标像素为所述不确定样本像素。
- 根据权利要求1至5任一项所述的方法,其特征在于,所述通过所述约束信息对图像分割模型的参数进行调整,得到调整后的图像分割模型,包括:采用所述绝对正样本像素和所述绝对负样本像素,对所述图像分割模型的参数进行调整,得到所述调整后的图像分割模型。
- 根据权利要求1至5任一项所述的方法,其特征在于,所述图像分割模 型的预训练过程如下:构建初始的图像分割模型;采用第一样本集对所述初始的图像分割模型进行初步训练,得到初步训练后的图像分割模型;其中,所述第一样本集中包含至少一个带标注的图片;采用第二样本集对所述初步训练后的图像分割模型进行再训练,得到预训练完成的图像分割模型;其中,所述第二样本集中包含至少一个带标注的视频。
- 一种视频目标跟踪装置,其特征在于,所述装置包括:检测图获取模块,用于获取待检测视频中的目标图像帧对应的局部检测图,所述局部检测图是基于所述待检测视频中需要通过图像分割模型跟踪的目标对象的表观信息生成的,所述图像分割模型是用于从所述待检测视频的图像帧中分割提取出所述目标对象的神经网络模型;运动图获取模块,用于获取所述目标图像帧对应的相对运动显著图,所述相对运动显著图是基于所述目标对象的运动信息生成的;约束信息获取模块,用于根据所述局部检测图和所述相对运动显著图,确定所述目标图像帧对应的约束信息,所述约束信息包括所述目标图像帧中的绝对正样本像素、绝对负样本像素和不确定样本像素;模型调整模块,用于通过所述约束信息对所述图像分割模型的参数进行调整,得到调整后的图像分割模型;目标分割模块,用于通过所述调整后的图像分割模型,提取所述目标图像帧中的所述目标对象。
- 根据权利要求9所述的装置,其特征在于,所述检测图获取模块,包括:样本选取子模块,用于从所述待检测视频的已标注图像帧中,选取至少一个训练样本,所述训练样本包括所述已标注图像帧以及所述已标注图像帧对应的检测目标框,所述检测目标框是指所述目标对象的占比大于预设阈值的图像区域;模型调整子模块,用于通过所述训练样本对目标检测模型的参数进行调整,得到调整后的目标检测模型;检测图获取子模块,用于通过所述调整后的目标检测模型对所述目标图像 帧进行处理,得到所述局部检测图。
- 根据权利要求9所述的装置,其特征在于,所述运动图获取模块,包括:光流计算子模块,用于计算所述目标图像帧与邻近图像帧之间的光流;运动图获取子模块,用于根据所述光流生成所述相对运动显著图。
- 根据权利要求9至11任一项所述的装置,其特征在于,所述约束信息获取模块,用于:对于所述目标图像帧中的目标像素,当所述目标像素在所述局部检测图中的值符合第一预设条件,且所述目标像素在所述相对运动显著图中的值符合第二预设条件时,确定所述目标像素为所述绝对正样本像素;当所述目标像素在所述局部检测图中的值不符合所述第一预设条件,且所述目标像素在所述相对运动显著图中的值不符合所述第二预设条件时,确定所述目标像素为所述绝对负样本像素;当所述目标像素在所述局部检测图中的值符合所述第一预设条件,且所述目标像素在所述相对运动显著图中的值不符合所述第二预设条件,或所述目标像素在所述局部检测图中的值不符合所述第一预设条件,且所述目标像素在所述相对运动显著图中的值符合所述第二预设条件时,确定所述目标像素为所述不确定样本像素。
- 根据权利要求9至11任一项所述的装置,其特征在于,所述模型调整模块,用于:采用所述绝对正样本像素和所述绝对负样本像素,对所述图像分割模型进行再训练,得到所述调整后的图像分割模型。
- 一种计算机设备,其特征在于,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行,使得所述处理器执行以下步骤:获取待检测视频中的目标图像帧对应的局部检测图,所述局部检测图是基于所述待检测视频中需要通过图像分割模型跟踪的目标对象的表观信息生成;获取所述目标图像帧对应的相对运动显著图,所述相对运动显著图是基于所述目标对象的运动信息生成的;根据所述局部检测图和所述相对运动显著图,确定所述目标图像帧对应的约束信息,所述约束信息包括所述目标图像帧中的绝对正样本像素、绝对负样本像素和不确定样本像素;通过所述约束信息对所述图像分割模型的参数进行调整,得到调整后的图像分割模型;通过所述调整后的图像分割模型,提取所述目标图像帧中的所述目标对象。
- 如权利要求14所述的计算机设备,其特征在于,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行获取待检测视频中的目标图像帧对应的局部检测图的步骤时,使得所述处理器具体执行以下步骤:从所述待检测视频的已标注图像帧中,选取至少一个训练样本,所述训练样本包括所述已标注图像帧以及所述已标注图像帧对应的检测目标框,所述检测目标框是指所述目标对象在所述检测目标框的占比大于预设阈值的图像区域;通过所述训练样本对目标检测模型的参数进行调整,得到调整后的目标检测模型;通过所述调整后的目标检测模型对所述目标图像帧进行处理,得到所述局部检测图。
- 如权利要求15所述的计算机设备,其特征在于,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行从所述待检测视频的已标注图像帧中,选取至少一个训练样本的步骤时,使得所述处理器具体执行以下步骤:在所述已标注图像帧中随机撒框;计算所述目标对象在随机撒的所述框中的占比;若所述目标对象在所述框中的占比大于所述预设阈值,则将所述框确定为所述已标注图像帧对应的检测目标框,并将所述已标注图像帧和所述检测目标框选取为所述训练样本。
- 如权利要求14所述的计算机设备,其特征在于,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行获取所述目标图像帧对应的相对运动显著图的步骤时,使得所述处理器具体执行以下步骤:计算所述目标图像帧与邻近图像帧之间的光流;根据所述光流生成所述相对运动显著图。
- 一种计算机可读存储介质,其特征在于,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行,使得所述处理器执行以下步骤:获取待检测视频中的目标图像帧对应的局部检测图,所述局部检测图是基于所述待检测视频中需要通过图像分割模型跟踪的目标对象的表观信息生成;获取所述目标图像帧对应的相对运动显著图,所述相对运动显著图是基于所述目标对象的运动信息生成的;根据所述局部检测图和所述相对运动显著图,确定所述目标图像帧对应的约束信息,所述约束信息包括所述目标图像帧中的绝对正样本像素、绝对负样本像素和不确定样本像素;通过所述约束信息对所述图像分割模型的参数进行调整,得到调整后的图像分割模型;通过所述调整后的图像分割模型,提取所述目标图像帧中的所述目标对象。
- 如权利要求18所述的存储介质,其特征在于,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行获取待检测视频中的目标图像帧对应的局部检测图的步骤时,使得所述处理器具体执行以下步骤:从所述待检测视频的已标注图像帧中,选取至少一个训练样本,所述训练样本包括所述已标注图像帧以及所述已标注图像帧对应的检测目标框,所述检测目标框是指所述目标对象在所述检测目标框的占比大于预设阈值的图像区域;通过所述训练样本对目标检测模型的参数进行调整,得到调整后的目标检测模型;通过所述调整后的目标检测模型对所述目标图像帧进行处理,得到所述局部检测图。
- 如权利要求19所述的存储介质,其特征在于,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行从所述待检测 视频的已标注图像帧中,选取至少一个训练样本的步骤时,使得所述处理器具体执行以下步骤:在所述已标注图像帧中随机撒框;计算所述目标对象在随机撒的所述框中的占比;若所述目标对象在所述框中的占比大于所述预设阈值,则将所述框确定为所述已标注图像帧对应的检测目标框,并将所述已标注图像帧和所述检测目标框选取为所述训练样本。
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP20812620.1A EP3979200B1 (en) | 2019-05-27 | 2020-04-30 | Video target tracking method and apparatus, computer device and storage medium |
| FIEP20812620.1T FI3979200T3 (fi) | 2019-05-27 | 2020-04-30 | Videokohteen seurantamenetelmä ja -laite, tietokonelaite ja tallennusväline |
| JP2021537733A JP7236545B2 (ja) | 2019-05-27 | 2020-04-30 | ビデオターゲット追跡方法と装置、コンピュータ装置、プログラム |
| US17/461,978 US12067733B2 (en) | 2019-05-27 | 2021-08-30 | Video target tracking method and apparatus, computer device, and storage medium |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910447379.3 | 2019-05-27 | ||
| CN201910447379.3A CN110176027B (zh) | 2019-05-27 | 2019-05-27 | 视频目标跟踪方法、装置、设备及存储介质 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/461,978 Continuation US12067733B2 (en) | 2019-05-27 | 2021-08-30 | Video target tracking method and apparatus, computer device, and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2020238560A1 true WO2020238560A1 (zh) | 2020-12-03 |
Family
ID=67696270
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2020/088286 Ceased WO2020238560A1 (zh) | 2019-05-27 | 2020-04-30 | 视频目标跟踪方法、装置、计算机设备及存储介质 |
Country Status (6)
| Country | Link |
|---|---|
| US (1) | US12067733B2 (zh) |
| EP (1) | EP3979200B1 (zh) |
| JP (1) | JP7236545B2 (zh) |
| CN (1) | CN110176027B (zh) |
| FI (1) | FI3979200T3 (zh) |
| WO (1) | WO2020238560A1 (zh) |
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112733802A (zh) * | 2021-01-25 | 2021-04-30 | 腾讯科技(深圳)有限公司 | 图像的遮挡检测方法、装置、电子设备及存储介质 |
| CN113361373A (zh) * | 2021-06-02 | 2021-09-07 | 武汉理工大学 | 一种农业场景下的航拍图像实时语义分割方法 |
| CN114979652A (zh) * | 2022-05-20 | 2022-08-30 | 北京字节跳动网络技术有限公司 | 一种视频处理方法、装置、电子设备及存储介质 |
| CN115546890A (zh) * | 2022-09-20 | 2022-12-30 | 国武时代国际文化传媒(北京)有限公司 | 一种基于用户运动特征提取的纠偏指导方法和系统 |
| EP4138045A1 (en) * | 2021-08-20 | 2023-02-22 | INTEL Corporation | Resource-efficient video coding and motion estimation |
| WO2023096685A1 (en) * | 2021-11-24 | 2023-06-01 | Microsoft Technology Licensing, Llc. | Feature prediction for efficient video processing |
| CN116612498A (zh) * | 2023-05-26 | 2023-08-18 | 百鸟数据科技(北京)有限责任公司 | 一种鸟类识别模型训练方法、鸟类识别方法、装置及设备 |
| CN116912546A (zh) * | 2023-03-30 | 2023-10-20 | 北京罗克维尔斯科技有限公司 | 图像质量确定方法、装置、电子设备及存储介质 |
Families Citing this family (46)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109086709B (zh) * | 2018-07-27 | 2023-04-07 | 腾讯科技(深圳)有限公司 | 特征提取模型训练方法、装置及存储介质 |
| CN110176027B (zh) * | 2019-05-27 | 2023-03-14 | 腾讯科技(深圳)有限公司 | 视频目标跟踪方法、装置、设备及存储介质 |
| CN110503074B (zh) | 2019-08-29 | 2022-04-15 | 腾讯科技(深圳)有限公司 | 视频帧的信息标注方法、装置、设备及存储介质 |
| KR102739304B1 (ko) * | 2019-10-29 | 2024-12-05 | 한양대학교 에리카산학협력단 | 상황 인식 기반 보행자 검출 장치 및 방법 |
| CN110807784B (zh) * | 2019-10-30 | 2022-07-26 | 北京百度网讯科技有限公司 | 用于分割物体的方法和装置 |
| CN112784638B (zh) * | 2019-11-07 | 2023-12-08 | 北京京东乾石科技有限公司 | 训练样本获取方法和装置、行人检测方法和装置 |
| CN112862855B (zh) * | 2019-11-12 | 2024-05-24 | 北京京邦达贸易有限公司 | 图像标注方法、装置、计算设备及存储介质 |
| CN110866515B (zh) * | 2019-11-22 | 2023-05-09 | 盛景智能科技(嘉兴)有限公司 | 厂房内对象行为识别方法、装置以及电子设备 |
| CN111242973A (zh) * | 2020-01-06 | 2020-06-05 | 上海商汤临港智能科技有限公司 | 目标跟踪方法、装置、电子设备及存储介质 |
| CN111260679B (zh) * | 2020-01-07 | 2022-02-01 | 广州虎牙科技有限公司 | 图像处理方法、图像分割模型训练方法及相关装置 |
| CN111274892B (zh) * | 2020-01-14 | 2020-12-18 | 北京科技大学 | 一种鲁棒的遥感影像变化检测方法及系统 |
| CN111208148A (zh) * | 2020-02-21 | 2020-05-29 | 凌云光技术集团有限责任公司 | 一种挖孔屏漏光缺陷检测系统 |
| CN111340101B (zh) * | 2020-02-24 | 2023-06-30 | 广州虎牙科技有限公司 | 稳定性评估方法、装置、电子设备和计算机可读存储介质 |
| CN111444826B (zh) * | 2020-03-25 | 2023-09-29 | 腾讯科技(深圳)有限公司 | 视频检测方法、装置、存储介质及计算机设备 |
| CN111476252B (zh) * | 2020-04-03 | 2022-07-29 | 南京邮电大学 | 一种面向计算机视觉应用的轻量化无锚框目标检测方法 |
| CN111461130B (zh) * | 2020-04-10 | 2021-02-09 | 视研智能科技(广州)有限公司 | 一种高精度图像语义分割算法模型及分割方法 |
| JP7557958B2 (ja) | 2020-04-23 | 2024-09-30 | 株式会社日立システムズ | 画素レベル対象物検出システムおよびそのプログラム |
| CN111654746B (zh) * | 2020-05-15 | 2022-01-21 | 北京百度网讯科技有限公司 | 视频的插帧方法、装置、电子设备和存储介质 |
| CN112132871B (zh) * | 2020-08-05 | 2022-12-06 | 天津(滨海)人工智能军民融合创新中心 | 一种基于特征光流信息的视觉特征点追踪方法、装置、存储介质及终端 |
| CN112525145B (zh) * | 2020-11-30 | 2022-05-17 | 北京航空航天大学 | 一种飞机降落相对姿态动态视觉测量方法及系统 |
| CN112541475B (zh) * | 2020-12-24 | 2024-01-19 | 北京百度网讯科技有限公司 | 感知数据检测方法及装置 |
| KR20220099210A (ko) * | 2021-01-05 | 2022-07-13 | 삼성디스플레이 주식회사 | 표시 장치, 이를 포함하는 가상 현실 표시 시스템 및 이를 이용한 입력 영상 기반 사용자 움직임 추정 방법 |
| CN113011371A (zh) * | 2021-03-31 | 2021-06-22 | 北京市商汤科技开发有限公司 | 目标检测方法、装置、设备及存储介质 |
| CN113763420B (zh) * | 2021-05-07 | 2025-06-03 | 腾讯科技(深圳)有限公司 | 一种目标定位方法、系统及存储介质和终端设备 |
| CN113361519B (zh) * | 2021-05-21 | 2023-07-28 | 北京百度网讯科技有限公司 | 目标处理方法、目标处理模型的训练方法及其装置 |
| CN113518256B (zh) * | 2021-07-23 | 2023-08-08 | 腾讯科技(深圳)有限公司 | 视频处理方法、装置、电子设备及计算机可读存储介质 |
| CN113807185B (zh) * | 2021-08-18 | 2024-02-27 | 苏州涟漪信息科技有限公司 | 一种数据处理方法和装置 |
| CN113920077B (zh) * | 2021-09-30 | 2026-03-17 | 北京鹰瞳科技发展股份有限公司 | 一种训练眼底图像分割模型的方法以及动静脉分割方法 |
| US12293560B2 (en) * | 2021-10-26 | 2025-05-06 | Autobrains Technologies Ltd | Context based separation of on-/off-vehicle points of interest in videos |
| CN113989387B (zh) * | 2021-10-28 | 2025-08-29 | 维沃移动通信有限公司 | 相机拍摄参数调整方法、装置及电子设备 |
| CN114155278A (zh) * | 2021-11-26 | 2022-03-08 | 浙江商汤科技开发有限公司 | 目标跟踪及相关模型的训练方法和相关装置、设备、介质 |
| CN114140488B (zh) * | 2021-11-30 | 2025-01-07 | 北京达佳互联信息技术有限公司 | 视频目标分割方法及装置、视频目标分割模型的训练方法 |
| CN114359973A (zh) * | 2022-03-04 | 2022-04-15 | 广州市玄武无线科技股份有限公司 | 基于视频的商品状态识别方法、设备及计算机可读介质 |
| CN114723675B (zh) * | 2022-03-17 | 2025-11-25 | 武汉飞流智能技术有限公司 | 光伏组件检测方法、装置、设备及存储介质 |
| CN114639171B (zh) * | 2022-05-18 | 2022-07-29 | 松立控股集团股份有限公司 | 一种停车场全景安全监控方法 |
| CN115052154B (zh) * | 2022-05-30 | 2023-04-14 | 北京百度网讯科技有限公司 | 一种模型训练和视频编码方法、装置、设备及存储介质 |
| CN115082682A (zh) * | 2022-07-13 | 2022-09-20 | 青岛信芯微电子科技股份有限公司 | 一种图像分割方法及装置 |
| CN116051834A (zh) * | 2022-12-30 | 2023-05-02 | 苏州科达科技股份有限公司 | 视频处理方法、系统、设备及存储介质 |
| CN115860275B (zh) * | 2023-02-23 | 2023-05-05 | 深圳市南湖勘测技术有限公司 | 一种用于土地整备利益统筹测绘采集方法及系统 |
| CN116308996A (zh) * | 2023-03-22 | 2023-06-23 | 展讯通信(上海)有限公司 | 图形显示方法、装置、设备、存储介质及程序产品 |
| CN116188460B (zh) * | 2023-04-24 | 2023-08-25 | 青岛美迪康数字工程有限公司 | 基于运动矢量的图像识别方法、装置和计算机设备 |
| CN116503374B (zh) * | 2023-05-12 | 2026-01-02 | 爱威科技股份有限公司 | 一种滴虫检测装置、方法、设备及计算机可读存储介质 |
| CN117274735B (zh) * | 2023-09-19 | 2026-04-07 | 英特灵达信息技术(深圳)有限公司 | 一种火焰检测方法、特征提取模型训练方法及装置 |
| CN118379486A (zh) * | 2024-02-28 | 2024-07-23 | 珠海视熙科技有限公司 | 电子屏幕区域的处理方法、装置、以及智能板书提取系统 |
| CN118657927B (zh) * | 2024-07-08 | 2024-11-29 | 北京鼎星科技有限公司 | 一种基于特征融合的改进YOLOv8n小目标检测方法 |
| CN119418129B (zh) * | 2024-11-06 | 2025-10-28 | 合肥工业大学 | 基于谱体积的自然场景生成视频检测方法和系统 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8467570B2 (en) * | 2006-06-14 | 2013-06-18 | Honeywell International Inc. | Tracking system with fused motion and object detection |
| CN106127807A (zh) * | 2016-06-21 | 2016-11-16 | 中国石油大学(华东) | 一种实时的视频多类多目标跟踪方法 |
| CN108122247A (zh) * | 2017-12-25 | 2018-06-05 | 北京航空航天大学 | 一种基于图像显著性和特征先验模型的视频目标检测方法 |
| CN109035293A (zh) * | 2018-05-22 | 2018-12-18 | 安徽大学 | 适用于视频图像中显著人体实例分割的方法 |
| CN110176027A (zh) * | 2019-05-27 | 2019-08-27 | 腾讯科技(深圳)有限公司 | 视频目标跟踪方法、装置、设备及存储介质 |
Family Cites Families (20)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101968884A (zh) * | 2009-07-28 | 2011-02-09 | 索尼株式会社 | 检测视频图像中的目标的方法和装置 |
| US9107604B2 (en) * | 2011-09-26 | 2015-08-18 | Given Imaging Ltd. | Systems and methods for generating electromagnetic interference free localization data for an in-vivo device |
| US11100335B2 (en) * | 2016-03-23 | 2021-08-24 | Placemeter, Inc. | Method for queue time estimation |
| CN106530330B (zh) * | 2016-12-08 | 2017-07-25 | 中国人民解放军国防科学技术大学 | 基于低秩稀疏的视频目标跟踪方法 |
| US11423548B2 (en) * | 2017-01-06 | 2022-08-23 | Board Of Regents, The University Of Texas System | Segmenting generic foreground objects in images and videos |
| US20180204076A1 (en) * | 2017-01-13 | 2018-07-19 | The Regents Of The University Of California | Moving object detection and classification image analysis methods and systems |
| CN106709472A (zh) * | 2017-01-17 | 2017-05-24 | 湖南优象科技有限公司 | 一种基于光流特征的视频目标检测与跟踪方法 |
| CN106934346B (zh) * | 2017-01-24 | 2019-03-15 | 北京大学 | 一种目标检测性能优化的方法 |
| CN107066990B (zh) * | 2017-05-04 | 2019-10-11 | 厦门美图之家科技有限公司 | 一种目标跟踪方法及移动设备 |
| CN108305275B (zh) * | 2017-08-25 | 2021-02-12 | 深圳市腾讯计算机系统有限公司 | 主动跟踪方法、装置及系统 |
| CN107679455A (zh) * | 2017-08-29 | 2018-02-09 | 平安科技(深圳)有限公司 | 目标跟踪装置、方法及计算机可读存储介质 |
| CN107644429B (zh) * | 2017-09-30 | 2020-05-19 | 华中科技大学 | 一种基于强目标约束视频显著性的视频分割方法 |
| CN107886515B (zh) * | 2017-11-10 | 2020-04-21 | 清华大学 | 利用光流场的图像分割方法及装置 |
| CN108765465B (zh) * | 2018-05-31 | 2020-07-10 | 西安电子科技大学 | 一种无监督sar图像变化检测方法 |
| CN109145781B (zh) * | 2018-08-03 | 2021-05-04 | 北京字节跳动网络技术有限公司 | 用于处理图像的方法和装置 |
| CN109376603A (zh) * | 2018-09-25 | 2019-02-22 | 北京周同科技有限公司 | 一种视频识别方法、装置、计算机设备及存储介质 |
| CN109461168B (zh) * | 2018-10-15 | 2021-03-16 | 腾讯科技(深圳)有限公司 | 目标对象的识别方法和装置、存储介质、电子装置 |
| CN109635657B (zh) * | 2018-11-12 | 2023-01-06 | 平安科技(深圳)有限公司 | 目标跟踪方法、装置、设备及存储介质 |
| CN109492608B (zh) * | 2018-11-27 | 2019-11-05 | 腾讯科技(深圳)有限公司 | 图像分割方法、装置、计算机设备及存储介质 |
| CN109711445B (zh) * | 2018-12-18 | 2020-10-16 | 绍兴文理学院 | 目标跟踪分类器在线训练样本的超像素中智相似加权方法 |
-
2019
- 2019-05-27 CN CN201910447379.3A patent/CN110176027B/zh active Active
-
2020
- 2020-04-30 EP EP20812620.1A patent/EP3979200B1/en active Active
- 2020-04-30 JP JP2021537733A patent/JP7236545B2/ja active Active
- 2020-04-30 WO PCT/CN2020/088286 patent/WO2020238560A1/zh not_active Ceased
- 2020-04-30 FI FIEP20812620.1T patent/FI3979200T3/fi active
-
2021
- 2021-08-30 US US17/461,978 patent/US12067733B2/en active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8467570B2 (en) * | 2006-06-14 | 2013-06-18 | Honeywell International Inc. | Tracking system with fused motion and object detection |
| CN106127807A (zh) * | 2016-06-21 | 2016-11-16 | 中国石油大学(华东) | 一种实时的视频多类多目标跟踪方法 |
| CN108122247A (zh) * | 2017-12-25 | 2018-06-05 | 北京航空航天大学 | 一种基于图像显著性和特征先验模型的视频目标检测方法 |
| CN109035293A (zh) * | 2018-05-22 | 2018-12-18 | 安徽大学 | 适用于视频图像中显著人体实例分割的方法 |
| CN110176027A (zh) * | 2019-05-27 | 2019-08-27 | 腾讯科技(深圳)有限公司 | 视频目标跟踪方法、装置、设备及存储介质 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP3979200A4 * |
Cited By (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112733802A (zh) * | 2021-01-25 | 2021-04-30 | 腾讯科技(深圳)有限公司 | 图像的遮挡检测方法、装置、电子设备及存储介质 |
| CN112733802B (zh) * | 2021-01-25 | 2024-02-09 | 腾讯科技(深圳)有限公司 | 图像的遮挡检测方法、装置、电子设备及存储介质 |
| CN113361373A (zh) * | 2021-06-02 | 2021-09-07 | 武汉理工大学 | 一种农业场景下的航拍图像实时语义分割方法 |
| EP4138045A1 (en) * | 2021-08-20 | 2023-02-22 | INTEL Corporation | Resource-efficient video coding and motion estimation |
| WO2023096685A1 (en) * | 2021-11-24 | 2023-06-01 | Microsoft Technology Licensing, Llc. | Feature prediction for efficient video processing |
| US12106487B2 (en) | 2021-11-24 | 2024-10-01 | Microsoft Technology Licensing, Llc | Feature prediction for efficient video processing |
| CN114979652A (zh) * | 2022-05-20 | 2022-08-30 | 北京字节跳动网络技术有限公司 | 一种视频处理方法、装置、电子设备及存储介质 |
| CN115546890A (zh) * | 2022-09-20 | 2022-12-30 | 国武时代国际文化传媒(北京)有限公司 | 一种基于用户运动特征提取的纠偏指导方法和系统 |
| CN116912546A (zh) * | 2023-03-30 | 2023-10-20 | 北京罗克维尔斯科技有限公司 | 图像质量确定方法、装置、电子设备及存储介质 |
| CN116612498A (zh) * | 2023-05-26 | 2023-08-18 | 百鸟数据科技(北京)有限责任公司 | 一种鸟类识别模型训练方法、鸟类识别方法、装置及设备 |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2022534337A (ja) | 2022-07-29 |
| FI3979200T3 (fi) | 2026-03-27 |
| CN110176027B (zh) | 2023-03-14 |
| US20210398294A1 (en) | 2021-12-23 |
| EP3979200B1 (en) | 2026-02-18 |
| EP3979200A4 (en) | 2022-07-27 |
| CN110176027A (zh) | 2019-08-27 |
| EP3979200A1 (en) | 2022-04-06 |
| US12067733B2 (en) | 2024-08-20 |
| JP7236545B2 (ja) | 2023-03-09 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2020238560A1 (zh) | 视频目标跟踪方法、装置、计算机设备及存储介质 | |
| CN112446380B (zh) | 图像处理方法和装置 | |
| Dvornik et al. | On the importance of visual context for data augmentation in scene understanding | |
| CN113344932B (zh) | 一种半监督的单目标视频分割方法 | |
| CN111080628A (zh) | 图像篡改检测方法、装置、计算机设备和存储介质 | |
| WO2020228446A1 (zh) | 模型训练方法、装置、终端及存储介质 | |
| JP2008243187A (ja) | ビデオのフレームのシーケンスにおいてオブジェクトを追跡するコンピュータに実装される方法 | |
| CN113744280B (zh) | 图像处理方法、装置、设备及介质 | |
| CN110825900A (zh) | 特征重构层的训练方法、图像特征的重构方法及相关装置 | |
| Tan et al. | High dynamic range imaging for dynamic scenes with large-scale motions and severe saturation | |
| CN111445496B (zh) | 一种水下图像识别跟踪系统及方法 | |
| CN117237547B (zh) | 图像重建方法、重建模型的处理方法和装置 | |
| CN114359361A (zh) | 深度估计方法、装置、电子设备和计算机可读存储介质 | |
| US20240070812A1 (en) | Efficient cost volume processing within iterative process | |
| CN114241481B (zh) | 基于文本骨架的文本检测方法、装置和计算机设备 | |
| CN119169045B (zh) | 基于类脑脉冲的光流估计方法、装置、介质和计算机设备 | |
| CN116543246A (zh) | 图像去噪模型的训练方法、图像去噪方法、装置及设备 | |
| CN113971671B (zh) | 实例分割方法、装置、电子设备及存储介质 | |
| US12452542B2 (en) | Hallucinating details for over-exposed pixels in videos using learned reference frame selection | |
| CN117197183B (zh) | 一种基于多尺度膨胀卷积编码-解码的运动目标检测方法 | |
| CN118823500A (zh) | 人像分割模型训练及应用方法、装置、设备、介质和产品 | |
| CN111275039B (zh) | 水尺字符定位方法、装置、计算设备及存储介质 | |
| US12363249B2 (en) | Method and system for generation of a plurality of portrait effects in an electronic device | |
| CN115908115B (zh) | 人脸图像处理方法、直播图像处理方法、装置和电子设备 | |
| CN114596325B (zh) | 视频的目标分割方法、装置、设备及计算机可读存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20812620 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2021537733 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2020812620 Country of ref document: EP Effective date: 20220103 |
|
| WWG | Wipo information: grant in national office |
Ref document number: 2020812620 Country of ref document: EP |
