WO2023151237A1 - 人脸位姿估计方法、装置、电子设备及存储介质 - Google Patents
人脸位姿估计方法、装置、电子设备及存储介质 Download PDFInfo
- Publication number
- WO2023151237A1 WO2023151237A1 PCT/CN2022/107825 CN2022107825W WO2023151237A1 WO 2023151237 A1 WO2023151237 A1 WO 2023151237A1 CN 2022107825 W CN2022107825 W CN 2022107825W WO 2023151237 A1 WO2023151237 A1 WO 2023151237A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- feature
- face
- information
- pose
- feature map
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24137—Distances to cluster centroïds
- G06F18/2414—Smoothing the distance, e.g. radial basis function networks [RBFN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
- G06V40/165—Detection; Localisation; Normalisation using facial parts and geometric relationships
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20112—Image segmentation details
- G06T2207/20132—Image cropping
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
Definitions
- the present disclosure relates to the field of computer technology, and in particular to a face pose estimation method, device, electronic equipment and storage medium.
- Face pose estimation is an important research direction in the field of computer vision, and the change of face pose is also a key factor affecting the effect of face recognition. Only by effectively estimating the pose of the face image can the influence of the face pose on the face recognition effect be further weakened. Face pose estimation is also widely used in the field of computer vision, such as living body recognition, human-computer interaction, virtual reality, intelligent monitoring, etc.
- the traditional face pose estimation method In the current traditional face pose estimation method, some irrelevant facial features are ignored by capturing the pose angle information, so it is impossible to make full use of the face and surrounding information for model optimization, and the traditional face pose estimation based on the general recognition model The way of pose recognition cannot fully obtain the key information of face pose.
- the traditional face pose estimation method directly uses the detected face frame as input, without considering the inaccuracy of the face frame, which leads to the traditional face pose estimation method when directly regressing or classifying the pose angle. It is difficult to obtain the optimal prediction effect, thereby reducing the accuracy of the face pose estimation results.
- the embodiments of the present disclosure provide a face pose estimation method, device, electronic equipment, and storage medium to solve the problem that the key information of the face pose cannot be fully obtained in the prior art, and the face pose estimation The accuracy of the results is poor.
- the first aspect of the embodiments of the present disclosure provides a face pose estimation method, including: acquiring a target image containing face information, and inputting the target image into a pre-built pose estimation model; In the model, the shallow dense connection layer is used to extract the features of the target image, and multiple first feature maps containing shallow feature information are obtained; multiple first feature maps are used as the input of the deep feature multiplexing layer, and the deep feature multiplexing layer is used to obtain multiple first feature maps.
- the second aspect of the embodiments of the present disclosure provides a face pose estimation device, including: an acquisition module configured to acquire a target image containing face information, and input the target image to a pre-built pose estimation model Middle; the extraction module is configured to perform feature extraction on the target image using a shallow densely connected layer in the pose estimation model to obtain a plurality of first feature maps containing shallow feature information; the fusion module is configured to combine multiple The first feature map is used as the input of the deep feature multiplexing layer, and the deep feature multiplexing layer is used to perform an information fusion operation on multiple first feature maps to obtain a second feature map, so as to integrate deep feature information into the shallow feature information;
- the prediction module is configured to use the attention layer to extract the face pose information in the second feature map to obtain a third feature map containing the face pose information, and use the classifier to predict the third feature map to obtain The third feature map corresponds to the prediction result of the face pose, and the face pose in the target image is determined according to the prediction result.
- a third aspect of the embodiments of the present disclosure provides an electronic device, including a memory, a processor, and a computer program stored in the memory and operable on the processor. When the processor executes the program, the steps of the above method are implemented.
- a fourth aspect of the embodiments of the present disclosure provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the above method are implemented.
- the target image containing face information and input the target image into the pre-built pose estimation model; in the pose estimation model, use the shallow dense connection layer to extract the feature of the target image, and obtain the shallow feature
- a plurality of first feature maps of information using multiple first feature maps as the input of the deep feature multiplexing layer, using the deep feature multiplexing layer to perform information fusion operations on the multiple first feature maps to obtain the second feature map, so that Integrate deep feature information into the shallow feature information; use the attention layer to extract the face pose information in the second feature map to obtain the third feature map containing face pose information, and use the classifier to classify the third feature
- the prediction result of the face pose corresponding to the third feature map is obtained, and the face pose in the target image is determined according to the prediction result.
- the disclosure can fully obtain the key information of the face pose and make the estimation result of the face pose more accurate.
- FIG. 1 is a schematic diagram of a network structure of a pose estimation model provided by an embodiment of the present disclosure
- FIG. 2 is a schematic flowchart of a face pose estimation method provided by an embodiment of the present disclosure
- FIG. 3 is a schematic structural diagram of a face pose estimation device provided by an embodiment of the present disclosure.
- Fig. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
- the present disclosure proposes a new face pose estimation method based on a neural network pose estimation model.
- the neural network model of the present disclosure mainly includes a backbone network, a Neck module and a Head module.
- the training set used contains the face frame and the pose annotations of multiple points, including the pose annotations composed of the annotation points of multiple face frames and multiple pose angles. information.
- the augmented training of the pose estimation model is realized.
- Use the backbone network composed of the shallow dense connection layer and the deep feature multiplexing layer in the pose estimation model to fuse the shallow features and deep features in the face image, and accurately estimate based on the attention module and the group angle regression module The pose angle of the face in the image.
- FIG. 1 is a schematic diagram of the network structure of the pose estimation model provided by an embodiment of the present disclosure; as shown in FIG. 1 , the network structure of the pose estimation model Specifically can include:
- the backbone network includes a shallow dense connection layer (corresponding to C1 ⁇ C5) and a deep feature multiplexing layer (corresponding to P3 ⁇ P5).
- the shallow dense connection layer is mainly used to extract the forward features in the face image.
- the dense connection is It means that the output of each module is not only used as the input of the next module, but also as the input of other subsequent modules; the deep feature multiplexing layer is used to integrate the extracted shallow feature information into the deep feature information when performing feature expression.
- a feature map containing more semantic information ie, pose information
- the Neck module consists of a SE attention module and a Transformer-based feature transformation module.
- the SE attention module amplifies important features by weighting the features of different channels, while suppressing relatively unimportant features.
- Transformer has a strong feature extraction ability through ingenious structural design, and extracts effective features through a multi-head attention mechanism, thereby highlighting important features in the feature map and suppressing interference features.
- the Head module corresponds to the grouping angle regression module (that is, the classifier).
- the grouping angle regression module that is, the classifier.
- Fig. 2 is a schematic flowchart of a method for estimating a face pose provided by an embodiment of the present disclosure.
- the face pose estimation method in FIG. 2 can be executed by a server.
- the face pose estimation method may specifically include:
- the face pose in the target image is determined according to the prediction result.
- the target image in the embodiment of the present disclosure refers to the face image after face detection is performed on the collected original image, that is, the input of the pose estimation model is the face image after face detection, not the original image.
- the pose estimation model Before using the pose estimation model to predict the pose angle, it is necessary to build the pose estimation model and perform training, and then use the trained pose estimation model as the model for actual use.
- the pose estimation model is used for prediction, and three bits are output Attitude angle, that is, pitch angle, yaw angle and roll angle. Due to the differences of different faces and face detection models, the face boundary does not have a completely unified standard, so the pose estimation model obtained by using the detected face images for training has better robustness.
- the present disclosure obtains the target image containing face information, and inputs the target image into the pre-built pose estimation model; in the pose estimation model, the shallow dense connection layer is used Perform feature extraction on the target image to obtain multiple first feature maps containing shallow feature information; use multiple first feature maps as the input of the deep feature multiplexing layer, and use the deep feature multiplexing layer to separately process multiple first feature maps
- the image performs information fusion operation to obtain the second feature map, so as to integrate the deep feature information into the shallow feature information; use the attention layer to extract the face pose information in the second feature map, and obtain the face pose information containing
- the classifier is used to predict the third feature map to obtain a prediction result of the face pose corresponding to the third feature map, and determine the face pose in the target image according to the prediction result.
- the disclosure can fully obtain the key information of the face pose and make the estimation result of the face pose more accurate.
- the pose estimation model is constructed in the following manner, including: acquiring an original image containing face information, using a face detection model to detect the original image, and obtaining a face image and a face frame corresponding to the original image, And obtain the face pose information in the original image, use the face image, the position coordinates of the face frame, and the face pose information to generate the first data set; based on the original image and the position coordinates of the face frame, use the preset
- the cropping method is used to crop the original image to obtain a cropped face image, and use the cropped face image, the position coordinates of the face frame, and the face pose information to generate a second data set; for the first data set and The second data set is combined to obtain a training set, and the training set is used to train the pose estimation model to obtain the trained pose estimation model.
- the training of the pose estimation model mainly includes two parts, that is, the labeling of the data set and the data augmentation during training.
- the contents of these two parts will be introduced in detail below in conjunction with specific embodiments, which can specifically include the following:
- the annotation information in the training set includes 7-point pose annotations, that is, the annotation points of 4 detection frames and 3 pose angles.
- the above-mentioned detected face image and the annotation information composed of 7-point pose annotations are used as the first data set.
- the face image in the original image is randomly cropped according to a certain cropping ratio (such as 0.5 times) based on the original image through the random cropping augmentation method, such as the random cropping ratio
- a face image with a slightly larger (1.0 to 1.2 times) face detection frame thereby generating a face image similar to the face image in the first data set, and combining the newly generated face image and the face image corresponding to the
- the annotation information composed of 7-point pose annotations is used as the second data set.
- the scale of the training data set is expanded by means of data augmentation, and when the augmented training set is used to train the pose estimation model, not only the inaccurate labeling frame is reduced Influence, but also expand the amount of data to improve the robustness of the model.
- the training set contains face images and annotation information
- the annotation information is used as a label during model training
- the annotation information includes a plurality of annotation points corresponding to the face frame, and a plurality of pose angles; wherein , the label points of the face frame include the coordinates of any corner point corresponding to the face frame, and the width and height of the face frame, and the pose angle includes pitch angle, yaw angle and roll angle.
- the shallow densely connected layer is used to extract features of the target image to obtain a plurality of first feature maps containing shallow feature information, including: the shallow densely connected layer contains a plurality of sequentially connected convolution modules , use each convolution module to perform convolution operation on the feature map input to the convolution module in turn, and use the output of each convolution module as the input of the next convolution module, and the input of each convolution module also contains The output of the previous convolution module, the output of the last multiple convolution modules in the shallow densely connected layer is used as the first feature map.
- the embodiment of the present disclosure proposes a dense A backbone network composed of a connection layer and a deep feature multiplexing layer.
- the shallow feature information and deep feature information in the face image are extracted through the backbone network, and the shallow feature information is integrated into the deep feature information to obtain a feature map containing more semantic information.
- the shallow densely connected layer is divided into five convolution modules, C1, C2, C3, C4, and C5, in the pose estimation model, and the densely connected It means that the output of each convolution module is not only used as the input of the next convolution module, but also used as the input of other subsequent convolution modules. This not only enables the deep feature reuse layer to receive more shallow information, but also improves the ability of the feature to express the pose information, and improves the utilization efficiency of the feature.
- the features extracted by each part can be used by more modules. .
- the deep feature multiplexing layer is used to perform information fusion operations on multiple first feature maps to obtain the second feature map, so as to incorporate deep feature information into the shallow feature information, including: In the deep feature multiplexing layer Including convolution modules corresponding to the number of the first feature map, using the convolution module of the deep feature multiplexing layer to perform convolution transformation on the first feature map to obtain the second feature map, so that the second feature map containing shallow feature information Integrating deep feature information into the second feature map, performing global average pooling on the second feature map, and obtaining the corresponding second feature map after global average pooling.
- the deep feature multiplexing layer when using the deep feature multiplexing layer for feature expression, by integrating the shallow feature information into the deep feature information, as shown in Figure 1, use the outputs of the convolution modules C3, C4, and C5 to make three Different levels of feature expression, namely P3, P4, and P5, to incorporate deep feature information into the feature maps output by C3, C4, and C5.
- the implementation of the deep feature multiplexing layer will be described in detail below in conjunction with Figure 1 in the above-mentioned embodiment, which may specifically include the following:
- C3, C4, and C5 correspond to the convolution modules in the shallow densely connected layer, and the outputs of C3, C4, and C5 are used as the inputs of P3, P4, and P5, and the convolution modules in the deep feature multiplexing layer are used to pair C3, C4, and
- the feature map output by C5 performs convolution operation, and outputs a feature map containing semantic information (ie, pose information).
- the attention layer includes an SE attention module and a feature transformation module, and the attention layer is used to extract the face pose information in the second feature map to obtain a third feature map containing the face pose information , including: using the SE attention module to calculate the weight of the feature channel in the second feature map, and weighting the feature channel according to the channel weight to obtain a weighted second feature map; using the feature transformation module to weight the second feature Feature extraction is performed on the graph to obtain a third feature map containing effective feature information, and the effective feature information includes face pose information.
- this disclosure provides a Neck module based on SE attention module and Transformer.
- the SE attention module is used to weight the features of different channels to amplify important features while suppressing relatively unimportant features.
- the Transformer has a strong feature extraction ability, which can highlight important features and suppress interference features.
- the SE attention module contains two fully connected layers and a sigmoid layer (ie, a normalization layer), assuming that the number of channels of the second feature map input to the SE attention module is c, then the number of channels is The second feature map of c is processed by two fully connected layers and a sigmoid layer to obtain the channel weight of each feature channel, and then multiply each channel weight with its corresponding feature channel to obtain a weighted feature map.
- a sigmoid layer ie, a normalization layer
- the input weighted feature map is processed by Multi-Head Attention and multi-layer fully connected layer MLP to obtain transformed feature maps of c groups.
- Multi-Head Attention and multi-layer fully connected layer MLP to obtain transformed feature maps of c groups.
- the classifier is used to predict the third feature map to obtain the prediction result of the face pose corresponding to the third feature map, and determine the face pose in the target image according to the prediction result, including:
- the face pose corresponds to multiple third feature maps, and each third feature map corresponds to multiple classifiers.
- Each classifier is used to predict a number of angle values based on the third feature map, and calculate each classification according to a number of angle values.
- the pose angle predicted by the classifier sum the pose angles predicted by all classifiers, and obtain the pose angle corresponding to each face pose, and use the pose angles corresponding to the three face poses as the target image Estimation results of face pose in .
- the disclosure adds a grouped angle regression module, also called a classifier module, to the pose estimation model, which is the Head module in FIG. 1 .
- a classifier module for each pose angle, each set of features can contain 3 classifiers, and each classifier predicts 10 angle values.
- the results of the nine classifiers are added together to obtain the final pose angle. Perform the above calculations for the three pose angles respectively to obtain the final predicted three pose angles.
- each third feature map input into the Head module corresponds to three classifiers, each classifier is used to predict 10 different angle values, and according to the prediction results of each classifier, all classifiers (total 9 classifiers), the final prediction result of the Head module is used as the angle value corresponding to the current pose angle.
- the model will output an angle value, and use the angle values corresponding to all pose angles as the estimation result of the face pose in the target image.
- the present disclosure adopts the data set format of 7-point pose labeling (4 detection frame label points and 3 pose angles), and based on the labeling scheme, the data augmentation of face images is designed. wide, thereby expanding the amount of data in the training set.
- the present disclosure designs a backbone network with shallow dense connection and deep feature reuse, which improves the utilization efficiency and expression ability of features.
- the present disclosure also discloses a Neck module based on SE attention and Transformer, thereby further extracting key information of poses.
- a grouping angle regression module is added to the model, and 3 pose angles are predicted more accurately through 27 classifiers.
- Fig. 3 is a schematic structural diagram of a face pose estimation device provided by an embodiment of the present disclosure. As shown in Figure 3, the face pose estimation device includes:
- the obtaining module 301 is configured to obtain a target image containing face information, and input the target image into a pre-built pose estimation model;
- the extraction module 302 is configured to perform feature extraction on the target image using a shallow densely connected layer in the pose estimation model to obtain a plurality of first feature maps containing shallow feature information;
- the fusion module 303 is configured to use the multiple first feature maps as the input of the deep feature multiplexing layer, and use the deep feature multiplexing layer to perform an information fusion operation on the multiple first feature maps to obtain the second feature map, so that the shallow Integrate deep feature information into layer feature information;
- the prediction module 304 is configured to use the attention layer to extract the face pose information in the second feature map to obtain a third feature map containing the face pose information, and use a classifier to predict the third feature map, A prediction result of the face pose corresponding to the third feature map is obtained, and the face pose in the target image is determined according to the prediction result.
- the acquisition module 301 in FIG. 3 also adopts the following method to construct the pose estimation model, including: acquiring the original image containing face information, using the face detection model to detect the original image, and obtaining the person corresponding to the original image Face image and face frame, and obtain the face pose information in the original image, use the face image, the position coordinates of the face frame, and the face pose information to generate the first data set; based on the original image and face frame The position coordinates of the original image are cropped using the preset cropping method to obtain the cropped face image, and the second data set is generated by using the cropped face image, the position coordinates of the face frame, and the face pose information ; Combining the first data set and the second data set to obtain a training set, using the training set to train the pose estimation model, and obtaining the trained pose estimation model.
- the training set contains face images and annotation information
- the annotation information is used as a label during model training
- the annotation information includes a plurality of annotation points corresponding to the face frame, and a plurality of pose angles; wherein , the label points of the face frame include the coordinates of any corner point corresponding to the face frame, and the width and height of the face frame, and the pose angle includes pitch angle, yaw angle and roll angle.
- the shallow densely connected layer contains a plurality of sequentially connected convolution modules
- the extraction module 302 of FIG. 3 uses each convolution module to sequentially perform a convolution operation on the feature map input to the convolution module, and The output of each convolution module is used as the input of the next convolution module, and the input of each convolution module also includes the output of the previous convolution module, and the output of the last multiple convolution modules in the shallow dense connection layer is used as The first feature map.
- the deep feature multiplexing layer contains convolution modules corresponding to the number of the first feature maps
- the fusion module 303 of FIG. 3 uses the convolution modules of the deep feature multiplexing layer to perform
- the second feature map is obtained by convolution transformation, so as to incorporate deep feature information into the second feature map containing shallow feature information, and perform global average pooling on the second feature map to obtain the corresponding second feature map after global average pooling .
- the attention layer includes an SE attention module and a feature transformation module.
- the prediction module 304 of FIG. performing weighting to obtain a weighted second feature map; using a feature transformation module to perform feature extraction on the weighted second feature map to obtain a third feature map containing effective feature information, and the effective feature information includes face pose information.
- each face pose corresponds to multiple third feature maps
- each third feature map corresponds to multiple classifiers
- each classifier is used to predict a number of angle values according to the third feature map, Calculate the pose angle predicted by each classifier according to a number of angle values, sum the pose angles predicted by all classifiers, and obtain the pose angle corresponding to each face pose, and combine the three face poses
- the pose angle corresponding to the pose is used as the estimation result of the face pose in the target image.
- FIG. 4 is a schematic structural diagram of an electronic device 4 provided by an embodiment of the present disclosure.
- the electronic device 4 of this embodiment includes: a processor 401 , a memory 402 , and a computer program 403 stored in the memory 402 and capable of running on the processor 401 .
- the processor 401 executes the computer program 403
- the steps in the foregoing method embodiments are implemented.
- the processor 401 executes the computer program 403 the functions of the modules/units in the foregoing device embodiments are implemented.
- the computer program 403 can be divided into one or more modules/units, and one or more modules/units are stored in the memory 402 and executed by the processor 401 to complete the present disclosure.
- One or more modules/units may be a series of computer program instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer program 403 in the electronic device 4 .
- the electronic device 4 can be electronic devices such as desktop computers, notebooks, palmtop computers, and cloud servers.
- the electronic device 4 may include but not limited to a processor 401 and a memory 402.
- FIG. 4 is only an example of the electronic device 4, and does not constitute a limitation to the electronic device 4. It may include more or less components than those shown in the illustration, or combine some components, or different components.
- an electronic device may also include an input and output device, a network access device, a bus, and the like.
- the processor 401 can be a central processing unit (Central Processing Unit, CPU), and can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), on-site Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
- the storage 402 may be an internal storage unit of the electronic device 4 , for example, a hard disk or a memory of the electronic device 4 .
- the memory 402 can also be an external storage device of the electronic device 4, for example, a plug-in hard disk equipped on the electronic device 4, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash memory card ( Flash Card), etc. Further, the memory 402 may also include both an internal storage unit of the electronic device 4 and an external storage device.
- the memory 402 is used to store computer programs and other programs and data required by the electronic device.
- the memory 402 can also be used to temporarily store data that has been output or will be output.
- the disclosed apparatus/computer equipment and methods can be implemented in other ways.
- the device/computer device embodiments described above are only illustrative, for example, the division of modules or units is only a logical function division, and there may be other division methods in actual implementation, and multiple units or components can be Incorporation may either be integrated into another system, or some features may be omitted, or not implemented.
- the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
- a unit described as a separate component may or may not be physically separated, and a component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
- each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
- an integrated module/unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
- the present disclosure realizes all or part of the processes in the methods of the above embodiments, and can also be completed by instructing related hardware through computer programs.
- the computer programs can be stored in computer-readable storage media, and the computer programs can be processed. When executed by the controller, the steps in the above-mentioned method embodiments can be realized.
- a computer program may include computer program code, which may be in source code form, object code form, executable file, or some intermediate form or the like.
- the computer-readable medium may include: any entity or device capable of carrying computer program code, recording medium, U disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory (Read-Only Memory, ROM), random access Memory (Random Access Memory, RAM), electrical carrier signal, telecommunication signal and software distribution medium, etc. It should be noted that the content contained in computer readable media may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, computer readable media may not Including electrical carrier signals and telecommunication signals.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Human Computer Interaction (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Geometry (AREA)
- Biodiversity & Conservation Biology (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (10)
- 一种人脸位姿估计方法,其特征在于,包括:获取包含人脸信息的目标图像,并将所述目标图像输入到预先构建的位姿估计模型中;在所述位姿估计模型中,利用浅层密集连接层对所述目标图像进行特征提取,得到包含浅层特征信息的多个第一特征图;将多个所述第一特征图作为深层特征复用层的输入,利用所述深层特征复用层分别对多个所述第一特征图执行信息融合操作得到第二特征图,以便在所述浅层特征信息中融入深层特征信息;利用注意力层对所述第二特征图中的人脸位姿信息进行提取,得到包含所述人脸位姿信息的第三特征图,利用分类器对所述第三特征图进行预测,得到所述第三特征图对应人脸位姿的预测结果,根据所述预测结果确定所述目标图像中的人脸位姿。
- 根据权利要求1所述的方法,其特征在于,采用以下方式构建所述位姿估计模型,包括:获取包含人脸信息的原始图像,利用人脸检测模型对所述原始图像进行检测,得到所述原始图像对应的人脸图像以及人脸框,并获取所述原始图像中的人脸位姿信息,利用所述人脸图像、所述人脸框的位置坐标、以及所述人脸位姿信息生成第一数据集;基于所述原始图像以及所述人脸框的位置坐标,利用预设的裁剪方式对所述原始图像进行裁剪,得到裁剪后的人脸图像,利用所述裁剪后的人脸图像、所述人脸框的位置坐标、以及所述人脸位姿信息生成第二数据集;对所述第一数据集以及所述第二数据集进行组合得到训练集,利用所述训练集对位姿估计模型进行训练,得到训练后的位姿估计模型。
- 根据权利要求2所述的方法,其特征在于,所述训练集中包含人脸图像以及标注信息,将所述标注信息用于作为模型训练时的标签,所述标注信息中包含所述人脸框对应的多个标注点,以及多个位姿角度;其中,所述人脸框的标注点包括人脸框对应的任一角点坐标、以及所述人脸框的宽度和高度,所述位姿角度包括俯仰角、偏航角和翻滚角。
- 根据权利要求1所述的方法,其特征在于,所述利用浅层密集连接层对所述目标图像进行特征提取,得到包含浅层特征信息的多个第一特征图,包括:所述浅层密集连接层中包含多个依次连接的卷积模块,利用每个卷积模块依次对输入到所述卷积模块的特征图执行卷积运算,并且将每个所述卷积模块的输出作为下一个卷积模 块的输入,每个所述卷积模块的输入中还包含之前卷积模块的输出,将所述浅层密集连接层中最后多个卷积模块的输出作为所述第一特征图。
- 根据权利要求1所述的方法,其特征在于,所述利用所述深层特征复用层分别对多个所述第一特征图执行信息融合操作得到第二特征图,以便在所述浅层特征信息中融入深层特征信息,包括:所述深层特征复用层中包含与所述第一特征图的数量相对应的卷积模块,利用所述深层特征复用层的卷积模块,对所述第一特征图进行卷积变换得到第二特征图,以便在包含所述浅层特征信息的第二特征图中融入所述深层特征信息,对所述第二特征图进行全局平均池化,得到所述全局平均池化后对应的第二特征图。
- 根据权利要求1所述的方法,其特征在于,所述注意力层包括SE注意力模块和特征变换模块,所述利用注意力层对所述第二特征图中的人脸位姿信息进行提取,得到包含所述人脸位姿信息的第三特征图,包括:利用所述SE注意力模块对所述第二特征图中的特征通道进行权重计算,并根据通道权重对所述特征通道进行加权得到加权后的第二特征图;利用所述特征变换模块对所述加权后的第二特征图进行特征提取,得到包含有效特征信息的第三特征图,所述有效特征信息包含人脸位姿信息。
- 根据权利要求1所述的方法,其特征在于,所述利用分类器对所述第三特征图进行预测,得到所述第三特征图对应人脸位姿的预测结果,根据所述预测结果确定所述目标图像中的人脸位姿,包括:每一种人脸位姿对应多个所述第三特征图,每个所述第三特征图对应多个分类器,每个分类器用于根据所述第三特征图预测若干数量的角度值,根据所述若干数量的角度值计算每个所述分类器预测的位姿角度,将全部所述分类器预测的位姿角度进行求和,得到每一种所述人脸位姿对应的位姿角度,将三种所述人脸位姿对应的位姿角度作为对所述目标图像中人脸位姿的估计结果。
- 一种人脸位姿估计装置,其特征在于,包括:获取模块,被配置为获取包含人脸信息的目标图像,并将所述目标图像输入到预先构建的位姿估计模型中;提取模块,被配置为在所述位姿估计模型中,利用浅层密集连接层对所述目标图像进行特征提取,得到包含浅层特征信息的多个第一特征图;融合模块,被配置为将多个所述第一特征图作为深层特征复用层的输入,利用所述深层特征复用层分别对多个所述第一特征图执行信息融合操作得到第二特征图,以便在所述浅 层特征信息中融入深层特征信息;预测模块,被配置为利用注意力层对所述第二特征图中的人脸位姿信息进行提取,得到包含所述人脸位姿信息的第三特征图,利用分类器对所述第三特征图进行预测,得到所述第三特征图对应人脸位姿的预测结果,根据所述预测结果确定所述目标图像中的人脸位姿。
- 一种电子设备,包括存储器,处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如权利要求1所述的方法。
- 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1所述的方法。
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18/834,647 US20250037305A1 (en) | 2022-02-11 | 2022-07-26 | Face pose estimation method and apparatus, electronic device, and storage medium |
| JP2024545137A JP7770581B2 (ja) | 2022-02-11 | 2022-07-26 | 顔姿勢推定方法、装置、電子機器及び記憶媒体 |
| EP22925596.3A EP4471737B1 (en) | 2022-02-11 | 2022-07-26 | Face pose estimation method and apparatus, electronic device, and storage medium |
| KR1020247024587A KR20240144139A (ko) | 2022-02-11 | 2022-07-26 | 얼굴 포즈 추정 방법, 장치, 전자 디바이스 및 저장 매체 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210129785.7 | 2022-02-11 | ||
| CN202210129785.7A CN114519881B (zh) | 2022-02-11 | 2022-02-11 | 人脸位姿估计方法、装置、电子设备及存储介质 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023151237A1 true WO2023151237A1 (zh) | 2023-08-17 |
Family
ID=81596114
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2022/107825 Ceased WO2023151237A1 (zh) | 2022-02-11 | 2022-07-26 | 人脸位姿估计方法、装置、电子设备及存储介质 |
Country Status (6)
| Country | Link |
|---|---|
| US (1) | US20250037305A1 (zh) |
| EP (1) | EP4471737B1 (zh) |
| JP (1) | JP7770581B2 (zh) |
| KR (1) | KR20240144139A (zh) |
| CN (1) | CN114519881B (zh) |
| WO (1) | WO2023151237A1 (zh) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117036363A (zh) * | 2023-10-10 | 2023-11-10 | 国网四川省电力公司信息通信公司 | 一种基于多特征融合的遮挡绝缘子检测方法 |
| CN117133059A (zh) * | 2023-08-18 | 2023-11-28 | 北京科技大学 | 一种基于局部注意力机制的人脸活体检测方法及装置 |
| CN117173476A (zh) * | 2023-09-05 | 2023-12-05 | 北京交通大学 | 一种单源域泛化行人再识别方法 |
| CN118506400A (zh) * | 2024-05-16 | 2024-08-16 | 中国矿业大学 | 人体姿态估计方法及系统、人体姿态估计模型 |
Families Citing this family (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114519881B (zh) * | 2022-02-11 | 2024-11-19 | 深圳须弥云图空间科技有限公司 | 人脸位姿估计方法、装置、电子设备及存储介质 |
| CN115661866A (zh) * | 2022-11-07 | 2023-01-31 | 河海大学 | 注意力机制与深度学习相结合的红外行人检测方法、系统 |
| CN115984934A (zh) * | 2023-01-04 | 2023-04-18 | 北京龙智数科科技服务有限公司 | 人脸位姿估计模型的训练方法、人脸位姿估计方法及装置 |
| CN116129501A (zh) * | 2023-02-01 | 2023-05-16 | 北京龙智数科科技服务有限公司 | 人脸位姿估计方法及装置 |
| CN116977417B (zh) * | 2023-06-30 | 2024-10-29 | 北京百度网讯科技有限公司 | 位姿估计方法、装置、电子设备及存储介质 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108256454A (zh) * | 2018-01-08 | 2018-07-06 | 浙江大华技术股份有限公司 | 一种基于cnn模型的训练方法、人脸姿态估测方法及装置 |
| JP2020113000A (ja) * | 2019-01-10 | 2020-07-27 | 日本電信電話株式会社 | 物体検出認識装置、方法、及びプログラム |
| CN112766186A (zh) * | 2021-01-22 | 2021-05-07 | 北京工业大学 | 一种基于多任务学习的实时人脸检测及头部姿态估计方法 |
| CN114519881A (zh) * | 2022-02-11 | 2022-05-20 | 深圳集智数字科技有限公司 | 人脸位姿估计方法、装置、电子设备及存储介质 |
Family Cites Families (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6959109B2 (en) * | 2002-06-20 | 2005-10-25 | Identix Incorporated | System and method for pose-angle estimation |
| JP4951498B2 (ja) * | 2007-12-27 | 2012-06-13 | 日本電信電話株式会社 | 顔画像認識装置、顔画像認識方法、顔画像認識プログラムおよびそのプログラムを記録した記録媒体 |
| CN110232316A (zh) * | 2019-05-05 | 2019-09-13 | 杭州电子科技大学 | 一种基于改进的dsod模型的车辆检测与识别方法 |
| CN110321872B (zh) * | 2019-07-11 | 2021-03-16 | 京东方科技集团股份有限公司 | 人脸表情识别方法及装置、计算机设备、可读存储介质 |
| CN110837773A (zh) * | 2019-09-27 | 2020-02-25 | 深圳市华付信息技术有限公司 | 基于深度学习的大角度人脸姿态估计方法 |
| CN111339813B (zh) * | 2019-09-30 | 2022-09-27 | 深圳市商汤科技有限公司 | 人脸属性识别方法、装置、电子设备和存储介质 |
| CN111932555B (zh) * | 2020-07-31 | 2025-04-01 | 上海商汤善萃医疗科技有限公司 | 一种图像处理方法及装置、计算机可读存储介质 |
| CN113066013B (zh) * | 2021-05-18 | 2023-02-10 | 广东奥普特科技股份有限公司 | 视觉图像增强的生成方法、系统、装置及存储介质 |
| CN113393457B (zh) * | 2021-07-14 | 2023-02-28 | 长沙理工大学 | 一种结合残差密集块与位置注意力的无锚框目标检测方法 |
| CN113688723B (zh) * | 2021-08-21 | 2024-03-19 | 河南大学 | 一种基于改进YOLOv5的红外图像行人目标检测方法 |
| CN113901884B (zh) * | 2021-09-15 | 2024-09-24 | 杭州欣禾圣世科技有限公司 | 基于特征匹配的人脸姿态估计方法、系统、装置及存储介质 |
-
2022
- 2022-02-11 CN CN202210129785.7A patent/CN114519881B/zh active Active
- 2022-07-26 WO PCT/CN2022/107825 patent/WO2023151237A1/zh not_active Ceased
- 2022-07-26 KR KR1020247024587A patent/KR20240144139A/ko active Pending
- 2022-07-26 EP EP22925596.3A patent/EP4471737B1/en active Active
- 2022-07-26 US US18/834,647 patent/US20250037305A1/en active Pending
- 2022-07-26 JP JP2024545137A patent/JP7770581B2/ja active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108256454A (zh) * | 2018-01-08 | 2018-07-06 | 浙江大华技术股份有限公司 | 一种基于cnn模型的训练方法、人脸姿态估测方法及装置 |
| JP2020113000A (ja) * | 2019-01-10 | 2020-07-27 | 日本電信電話株式会社 | 物体検出認識装置、方法、及びプログラム |
| CN112766186A (zh) * | 2021-01-22 | 2021-05-07 | 北京工业大学 | 一种基于多任务学习的实时人脸检测及头部姿态估计方法 |
| CN114519881A (zh) * | 2022-02-11 | 2022-05-20 | 深圳集智数字科技有限公司 | 人脸位姿估计方法、装置、电子设备及存储介质 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4471737A4 * |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117133059A (zh) * | 2023-08-18 | 2023-11-28 | 北京科技大学 | 一种基于局部注意力机制的人脸活体检测方法及装置 |
| CN117133059B (zh) * | 2023-08-18 | 2024-03-01 | 北京科技大学 | 一种基于局部注意力机制的人脸活体检测方法及装置 |
| CN117173476A (zh) * | 2023-09-05 | 2023-12-05 | 北京交通大学 | 一种单源域泛化行人再识别方法 |
| CN117173476B (zh) * | 2023-09-05 | 2024-05-24 | 北京交通大学 | 一种单源域泛化行人再识别方法 |
| CN117036363A (zh) * | 2023-10-10 | 2023-11-10 | 国网四川省电力公司信息通信公司 | 一种基于多特征融合的遮挡绝缘子检测方法 |
| CN117036363B (zh) * | 2023-10-10 | 2024-01-30 | 国网四川省电力公司信息通信公司 | 一种基于多特征融合的遮挡绝缘子检测方法 |
| CN118506400A (zh) * | 2024-05-16 | 2024-08-16 | 中国矿业大学 | 人体姿态估计方法及系统、人体姿态估计模型 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN114519881B (zh) | 2024-11-19 |
| EP4471737A4 (en) | 2025-04-30 |
| EP4471737B1 (en) | 2026-04-08 |
| EP4471737A1 (en) | 2024-12-04 |
| CN114519881A (zh) | 2022-05-20 |
| KR20240144139A (ko) | 2024-10-02 |
| US20250037305A1 (en) | 2025-01-30 |
| JP2025504056A (ja) | 2025-02-06 |
| JP7770581B2 (ja) | 2025-11-14 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2023151237A1 (zh) | 人脸位姿估计方法、装置、电子设备及存储介质 | |
| CN112784869B (zh) | 一种基于注意力感知与对抗学习的细粒度图像识别方法 | |
| CN110969245B (zh) | 医学图像的目标检测模型训练方法和装置 | |
| CN110246181B (zh) | 基于锚点的姿态估计模型训练方法、姿态估计方法和系统 | |
| WO2022001623A1 (zh) | 基于人工智能的图像处理方法、装置、设备及存储介质 | |
| CN110363817B (zh) | 目标位姿估计方法、电子设备和介质 | |
| CN108171133B (zh) | 一种基于特征协方差矩阵的动态手势识别方法 | |
| WO2018108129A1 (zh) | 用于识别物体类别的方法及装置、电子设备 | |
| CN114549557A (zh) | 一种人像分割网络训练方法、装置、设备及介质 | |
| CN115881265B (zh) | 电子病历智能病案质控方法、系统、设备及存储介质 | |
| CN113850136A (zh) | 基于yolov5与BCNN的车辆朝向识别方法及系统 | |
| CN119941731B (zh) | 基于大模型的肺结节分析方法、系统、设备及介质 | |
| CN116597471A (zh) | 人体跟踪方法、电子设备及存储介质 | |
| CN116704511A (zh) | 设备清单文字识别方法和装置 | |
| CN120726243B (zh) | 场景补全方法、装置、电子设备和存储介质 | |
| Xia et al. | A multilevel fusion network for 3D object detection | |
| CN114419103B (zh) | 一种骨架检测跟踪方法、装置及电子设备 | |
| CN111723688A (zh) | 人体动作识别结果的评价方法、装置和电子设备 | |
| WO2023109086A1 (zh) | 文字识别方法、装置、设备及存储介质 | |
| WO2026001201A1 (zh) | 关键点预测模型的训练方法、装置、设备、介质及产品 | |
| CN120198449A (zh) | 基于空间3d高斯模型的语义特征嵌入开放分割方法 | |
| CN116881886A (zh) | 身份识别方法、装置、计算机设备和存储介质 | |
| CN111967579B (zh) | 使用卷积神经网络对图像进行卷积计算的方法和装置 | |
| Qian et al. | Multi-Scale tiny region gesture recognition towards 3D object manipulation in industrial design | |
| CN112990144B (zh) | 一种用于行人重识别的数据增强方法及系统 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22925596 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024545137 Country of ref document: JP |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 18834647 Country of ref document: US |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2022925596 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2022925596 Country of ref document: EP Effective date: 20240829 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWG | Wipo information: grant in national office |
Ref document number: 2022925596 Country of ref document: EP |