WO2020063744A1 - 人脸检测方法及装置、业务处理方法、终端设备及存储介质 - Google Patents

人脸检测方法及装置、业务处理方法、终端设备及存储介质 Download PDF

Info

Publication number
WO2020063744A1
WO2020063744A1 PCT/CN2019/108145 CN2019108145W WO2020063744A1 WO 2020063744 A1 WO2020063744 A1 WO 2020063744A1 CN 2019108145 W CN2019108145 W CN 2019108145W WO 2020063744 A1 WO2020063744 A1 WO 2020063744A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
face
data set
face image
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2019/108145
Other languages
English (en)
French (fr)
Inventor
郑克松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to EP19864907.1A priority Critical patent/EP3754541A4/en
Publication of WO2020063744A1 publication Critical patent/WO2020063744A1/zh
Priority to US17/032,370 priority patent/US11256905B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/32Normalisation of the pattern dimensions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/754Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries involving a deformation of the sample pattern or of the reference pattern; Elastic matching
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • the present application relates to the field of image processing technology, and in particular, to a face detection method and device, a business processing method, a terminal device, and a storage medium.
  • Face detection also known as image processing, is the technology of processing images with a computer to achieve the desired result.
  • face detection has become a hot research topic.
  • Face detection can include face alignment detection.
  • the so-called face alignment detection can also be called face keypoint detection, which refers to: Detect and locate key feature points on the face, such as key feature points such as eyes, nose, and mouth corners. How to perform better face detection on face images has become a research hotspot.
  • the embodiments of the present application provide a face detection method and device, a business processing method, a terminal device, and a storage medium, which can better perform face detection on a face image and improve the accuracy of the detection result.
  • an embodiment of the present application provides a face detection method, which is executed by a terminal device and includes:
  • a feature region of the target face image is determined according to the target keypoint set.
  • an embodiment of the present application provides a service processing method, which is executed by a terminal device, and includes:
  • a camera device of the terminal device is called to obtain a target face image of the requester
  • an embodiment of the present application provides a face detection device, including:
  • An obtaining unit configured to obtain a target face image to be detected
  • a training unit which is used for hierarchical fitting training using a face alignment algorithm and a sample data set to obtain a target face alignment model
  • a detecting unit for invoking the target face alignment model to perform face alignment detection on the target face image to obtain a target keypoint set of the target face image
  • a determining unit configured to determine a feature region of the target face image according to the target keypoint set.
  • an embodiment of the present application provides a terminal device, including a processor, an input device, an output device, and a memory.
  • the processor, the input device, the output device, and the memory are connected to each other, and the memory is used to store a computer.
  • a program, the computer program includes a first program instruction, and the processor is configured to call the first program instruction to execute the above-mentioned face detection method; or the computer program includes a second program instruction, the The processor is configured to call the second program instruction to execute the foregoing service processing method.
  • an embodiment of the present application provides a computer storage medium that stores a first computer program instruction that is used to implement the above-mentioned face detection method when the first computer program instruction is executed; or the computer The storage medium stores a second computer program instruction, and when the second computer program instruction is executed, it is used to implement the foregoing service processing method.
  • FIG. 1a is a schematic diagram of a target face image provided by an embodiment of the present application.
  • FIG. 1b is a schematic diagram of another target face image provided by an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a face detection method according to an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a face detection method according to another embodiment of the present application.
  • 4a is a schematic diagram of a displacement process provided by an embodiment of the present application.
  • 4b is a schematic diagram of a rotation process according to an embodiment of the present application.
  • 4c is a schematic diagram of a mirroring process provided by an embodiment of the present application.
  • 4d is a schematic diagram of a compression process provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of face area division provided by an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of a service processing method according to an embodiment of the present application.
  • FIG. 7 is an application scenario diagram of a service processing method according to an embodiment of the present application.
  • FIG 8 is an application scenario diagram of another service processing method according to an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a face detection device according to an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a service processing apparatus according to an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a terminal according to an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of an implementation environment provided by an embodiment of the present application.
  • Face key points can also be referred to as facial feature points, which usually include points that form facial features (eyebrows, eyes, nose, mouth, and ears) and contours of the face.
  • a method for detecting a face image and labeling one or more key points in the face image may be referred to as a face key point detection method or a face alignment detection method.
  • the feature regions in the face image can be determined, and the feature regions here may include, but are not limited to, eyebrow region, eye region, nose region, mouth region, ear region, etc. Wait.
  • a target face alignment model (also referred to as a target face keypoint detection model) may be provided to implement face alignment detection. After obtaining the target face image to be detected, the target face alignment model can be called to perform face alignment detection on the target face image to determine multiple key points in the target face image and label information of each key point.
  • the key points here may include, but are not limited to: key points on the mouth, key points on the eyebrows, key points on the nose, key points on the nose, and key points on the ear, etc .; the labeling information of the key points may include but is not limited to: If the position of the key point is marked), shape information (if marked as a dot shape), feature information, etc., where the feature information is used to indicate the category of the key point, if the feature information is the feature information of the eye, it indicates that The key point is the key point of the eyes, and if the feature information is the feature information of the nose, it indicates that the key point is the key point of the nose and so on.
  • a plurality of key points determined in the target face image may be shown as gray dots in FIG. 1a.
  • a feature region of the target face image can be determined based on the label information of each key point. For example, according to the positions of the gray dots marked in FIG. 1 a, the eyebrow region 11, the eye region 12, the nose region 13, the mouth region 14, and the ear region 15 may be respectively determined, as shown in FIG. 1 b.
  • an embodiment of the present application proposes a face detection method.
  • the face detection method may be implemented by a terminal device, such as a mobile terminal such as a smart phone or a tablet computer.
  • the method may include the following: Steps S201-S203:
  • the target face image may be a face image obtained by the terminal invoking a camera device (such as a camera) for real-time shooting of the environment image, or a stored face image obtained by the terminal from a local gallery or a cloud album.
  • the cloud album here refers to a web album based on a cloud computing platform.
  • the terminal if it detects a trigger event of face alignment detection, it can obtain a target face image to be detected.
  • the trigger event of the face alignment detection here can be used as a service request.
  • a service request that requires face alignment detection is detected; when a service request is detected, the camera device of the terminal device is called to obtain the requester ’s A face image is used as the target face image.
  • Applications based on face alignment detection may include, but are not limited to, facial expression recognition applications, face-changing special effects applications, smart mapping applications, and so on.
  • the terminal needs to obtain a target face image, and perform face alignment detection on the target face image to determine a feature area, so as to perform expression recognition, face change special effects, smart mapping, and other operations based on the feature area.
  • the trigger event of the face alignment detection may also be an event of detecting that the terminal performs identity verification according to the target face image.
  • the terminal performs identity verification according to the target face image, it needs to perform face alignment detection on the target face image to determine a feature area, and then perform information matching and other operations based on the determined feature area and preset face information.
  • the terminal if it detects that the user sends an instruction to perform face alignment detection, it can obtain the target face image to be detected.
  • the instruction may be a voice instruction, a press / click instruction, and a face alignment detection function. Instructions, etc.
  • S202 Use the face alignment algorithm and the sample data set to perform hierarchical fitting training to obtain the target face alignment model; call the target face alignment model to perform face alignment detection on the target face image to obtain the target key points of the target face image set.
  • the terminal may input the target face image into the target face alignment model, so that the target face alignment model may perform face alignment detection on the target face image. Thereby, the target keypoint set of the target face image is obtained.
  • the target keypoint set here may include multiple target keypoints and label information of each target keypoint.
  • the target keypoint may be any of the following: mouth keypoint, eyebrow keypoint, eye keypoint, nose keypoint, Ear points, etc.
  • the label information of the target key point may include position information, shape information, feature information, and the like of the target key point.
  • the target face alignment model is obtained by using the face alignment algorithm and the sample data set for hierarchical fitting training.
  • the face alignment algorithm here may include but is not limited to: machine learning regression algorithms, such as SDM (Supervised Descent Method, supervised descent ) Algorithm, LBF (Local Binary Features) algorithm; or CNN (Convolutional Neural Network, Convolutional Neural Network) algorithm, such as TCDCN (Facial Landmark Detection) by Deep Multi-task Learning Face landmark detection) algorithm, 3DDFA (Dense Face Alignment) algorithm.
  • SDM Supervised Descent Method, supervised descent
  • LBF Local Binary Features
  • CNN Convolutional Neural Network, Convolutional Neural Network
  • TCDCN Temporal Landmark Detection
  • 3DDFA Deep Multi-task Learning Face landmark detection
  • the method before acquiring the target face image to be detected, the method further includes: acquiring a sample data set, where the sample data set includes a plurality of sample face images and a reference key point set of each sample face image, and each sample The reference key point set of the face image includes multiple reference key points and label information of each reference key point; according to the multiple reference key points and the label information of each reference key point, a plurality of reference key points for each sample face image are determined. Feature area.
  • the feature area includes any of the following: eyebrow area, eye area, nose area, mouth area, ear area; the face alignment algorithm includes a machine learning regression algorithm or a convolutional neural network algorithm.
  • the so-called hierarchical fitting training refers to: determining the training priority of each feature area according to the loss weight of the feature area of each sample face image; using the face alignment algorithm, the training of each sample face image according to the training priority Feature region for fitting training.
  • the detection difficulty of each feature region is different.
  • different loss weights are set for each feature region; the feature regions with larger loss weights have higher priority during training.
  • the face alignment algorithm is used to fit the feature regions.
  • a difficult sample face image is obtained by filtering from the sample data set; iterative training is performed according to the face alignment algorithm and the sample data set, and the result of the iterative training is optimized according to the difficult sample face image to obtain the target face Align the model.
  • the difficult sample face image refers to a sample face image that is difficult to detect and is filtered from the sample data set.
  • the value of the loss function can be used to describe the face alignment model under different model parameters. The loss value.
  • the model parameters can be continuously changed to reduce the value of the loss function, thereby achieving the purpose of model training and optimization.
  • the value of the loss function meets the preset conditions, it indicates that the training is completed.
  • the face alignment model obtained at this time is the target face alignment model.
  • the preset conditions here may include but are not limited to: the value of the loss function meets the preset value.
  • the value range or loss function has the smallest value.
  • the trained target face alignment model can be Key point detection is performed more accurately on the feature areas with larger loss weights (feature areas where key points that are difficult to detect). It can be seen that the accuracy of the target face alignment model obtained through hierarchical fitting training is high.
  • S203 Determine a feature region of the target face image according to a target keypoint set.
  • the feature area of the target face image can be determined according to the labeling information of each target keypoint in the target keypoint set.
  • the label information may include: feature information, location information, and the like.
  • a feature region may be determined according to feature information of each target keypoint.
  • the category of each target keypoint may be determined according to the feature information of each target keypoint, and a region formed by target keypoints of the same category is used as a feature region, and the category is used as a category of the feature region. For example, select target keypoints whose feature information is all characteristic information of the nose, and the categories of these target keypoints are nose keypoints; use the area formed by these target keypoints as the nose area.
  • the feature area may be determined according to the position information of each target keypoint. Specifically, the labeled positions of each target keypoint can be determined first according to the position information, and the target keypoints of adjacent positions are connected. If the obtained shape is connected to the facial features (eyebrows, eyes, nose, mouth, ears) If any one of the shapes is similar, the area formed by the target keypoints of these adjacent positions is determined as the feature area, and the type of the feature area is determined according to the shape. For example, if the shapes obtained by connecting the target keypoints in adjacent positions are similar to the shape of the nose, the area formed by the target keypoints in these adjacent positions may be determined as the nose region.
  • a target face alignment model can be used for face alignment detection. Since the target face alignment model is obtained through hierarchical fitting training, the target face alignment model can accurately perform key on each feature region. Point detection to improve the accuracy of test results. In addition, the target face alignment model has a small memory and a fast running speed, which can improve the detection efficiency of face alignment.
  • FIG. 3 is a face detection method according to another embodiment of the present application.
  • the face detection method may be implemented by a terminal, such as a mobile terminal such as a smart phone or a tablet computer.
  • a terminal such as a mobile terminal such as a smart phone or a tablet computer.
  • this embodiment includes specific steps of hierarchical fitting training.
  • the method may include the following steps S301-S307:
  • the sample data set herein may include multiple sample face images and a reference key point set of each sample face image.
  • the reference key point set of each sample face image includes multiple reference key points and a label of each reference key point.
  • Information, multiple reference keypoints, and label information of each reference keypoint can be used to represent multiple feature regions of each sample face image.
  • the characteristic region here may include any one of the following: eyebrow region, eye region, nose region, mouth region, ear region.
  • a plurality of key points in the reference key point set and label information of each key point can be obtained by pre-labeling each sample face image by a professional labeler.
  • the specific process of iterative training may include the following steps S3021-S3022:
  • S3021 Preprocess the sample data set to obtain multiple training data sets, and each training data set includes multiple pre-processed sample face images.
  • the terminal may use different amplification parameters to preprocess the sample data set, and the preprocessing may include amplification processing and normalization processing to obtain multiple training data sets.
  • the plurality of training data sets may include a first training data set, and the first training data set may be any one of the plurality of training data sets.
  • the specific implementation of preprocessing the sample data set to obtain multiple training data sets may be:
  • a first amplification parameter is obtained, and a sample data set is amplified according to the first amplification parameter to obtain a first amplification data set.
  • the obtained first amplified data set may include a plurality of sample face images after the amplification process.
  • the amplification processing here includes at least one of the following: displacement processing, rotation processing, mirror processing, and compression processing
  • the corresponding amplification parameters include at least one of the following: displacement parameters, rotation angle parameters, and compression ratio parameters.
  • the displacement process refers to a process of changing a position of a face portion in a sample face image.
  • the sample face image can be subjected to displacement processing using the formula shown in Equation 1.1:
  • Rect is used to store the parameters that appear in pairs
  • Rect (x, y, w, h) represents the initial coordinates of the sample face image
  • x is the abscissa
  • y is the ordinate
  • w is the width value of the sample face image
  • H is the length value of the sample face image
  • Rect (x + dx, y + dy, w, h) represents the coordinates after the sample face image is subjected to displacement processing
  • dx is the change amount on the abscissa
  • dy is the Coordinate changes
  • dx and dy can be used as displacement parameters.
  • the initial coordinates of the sample face image may refer to the coordinates of the upper left corner of the sample face image, or the coordinates of the upper right corner of the sample face image, or the center point of the sample face image.
  • the coordinates, etc. are not limited here. Taking the initial coordinates of the sample face image as the coordinates of the center point of the sample face image as an example, a schematic diagram of the displacement process can be seen in Fig. 4a.
  • Rotation processing refers to the sample face image center point as the origin, and the sample face image is rotated clockwise ( ⁇ is positive) or counterclockwise ( ⁇ is negative) at the rotation angle ⁇ .
  • can be used as the rotation angle parameter.
  • the coordinates of the center point of the sample face image be (x, y), and any pixel (x 0 , y 0 ) in the sample face image can be rotated by using the rotation transformation matrix shown in Equation 1.2.
  • Mirroring processing can include horizontal mirroring processing and vertical mirroring processing, where horizontal mirroring processing refers to: swap the left and right parts of the sample face image with the vertical center axis of the sample face image as the center; vertical mirroring processing refers to: using the sample person The horizontal central axis of the face image is the center, and the upper and lower parts of the sample face image are exchanged.
  • the horizontal mirroring process is performed on the sample face image as an example. Specifically, for any pixel (x 0 , y 0 ) in the sample face image, horizontal mirroring can be performed using the formula shown in Equation 1.3. Processing, the coordinates of the pixel after the horizontal mirroring process are (x 1 , y 1 ), w in Equation 1.3 is the width value of the sample face image, and a schematic diagram of the horizontal mirroring process can be seen in FIG. 4c.
  • the sample face image when the sample face image is mirrored, the sample face image may be vertically mirrored, and the sample face image may be subjected to both horizontal mirroring and vertical mirroring.
  • Compression processing refers to the process of saving the sample face image according to the specified image quality parameters when the sample face image is saved in the image format.
  • the specified image quality parameter can be determined from the preset quality parameter range, and the preset quality parameter range It can be [0,100%]. The higher the image quality parameter, the higher the sharpness of the saved sample face image.
  • the image quality parameter here can be used as the compression ratio parameter. Taking the image quality parameter as 85% as an example, a schematic diagram of compressing the sample face image can be seen in FIG. 4d.
  • the sample data set and the first amplified data set may be combined to obtain a combined data set.
  • the plurality of sample faces in the first amplified data set may be obtained by sequentially performing the above-mentioned displacement processing, rotation processing, mirror processing, and compression processing on the sample face images in the sample data set. Sample face image.
  • the plurality of sample face images after the amplification process in the first amplified data set may also be sample people obtained by performing partial processing on the sample face images in the sample data set in the foregoing amplification process.
  • Face image for example, only a sample face image obtained after displacement processing, or a sample face image obtained after rotation processing only, or a sample face image obtained only after displacement processing and compression processing, etc. Wait.
  • the normalization processing includes: image normalization processing and / or annotation information normalization processing.
  • the image normalization processing refers to the normalization processing of performing floating point decentering on a sample face image.
  • the data type of the sample face image needs to be transformed first, and the data type is changed to a floating point type, so as to facilitate normalization processing on the sample face image.
  • an image is usually composed of multiple image channels, for example, a JPG image is usually composed of three RGB (Red Green Blue) image channels.
  • Equation 1.4 when normalizing the sample face image, for any one image channel CO of the sample face image, the mean m and the variance d of all pixel values of the image channel can be obtained, and then use Equation 1.4
  • the formula shown below performs normalization processing on the value CO i of the pixel i of the image channel to obtain a new image channel C i .
  • each pixel value in the normalized sample face image can be within a preset interval to improve the stability of the sample face image and the subsequent model training.
  • the preset interval can be determined according to actual business requirements, such as [0,1].
  • the labeling information normalization processing refers to normalizing the position information in the labeling information of each reference key point in the sample face image. Specifically, the position information (coordinates) of each reference key point may be normalized by using the formula shown in Formula 1.5.
  • (x, y) represents the coordinates of any reference key point in the sample face image
  • w is the width value of the sample face image
  • h is the length value of the sample face image.
  • Iterative training is performed using a face alignment algorithm and multiple training data sets to obtain a first face alignment model.
  • the multiple training data sets obtained through step s11 may be collectively referred to as the first training data set, or may be further divided into a second training data set and a third training data set.
  • the second training data set is first
  • the third training data set is selected.
  • the amplification parameters corresponding to the second training data set are greater than the amplification parameters corresponding to the third training data set.
  • iterative training using a face alignment algorithm and multiple training data sets to obtain a first face alignment model may be:
  • the face alignment algorithm and the first training data set can be used for training to obtain the initial face alignment model.
  • the face alignment algorithm may include, but is not limited to, a machine learning regression algorithm or a CNN algorithm.
  • a face alignment algorithm can be used to construct an original model, and the first training data set is used to optimize the original model to obtain an initial face alignment model, which can be further based on the second training data set and the third training data. Set even more training data sets to train and optimize the initial face alignment model. Among them, different training data sets use different amplification parameters.
  • the training optimization of the original model can be realized by using a supervised machine learning optimization algorithm, that is, the known reference key points in the face images of each sample based on the first training data set and the original model are obtained after detection.
  • the position difference between the detection key points is compared. If the difference is larger, the model parameters of the original model need to be adjusted until the difference between the detection key point and the reference key point is the smallest, or the difference is smaller than the preset Threshold, at which point the initial face alignment model can be obtained.
  • the loss function of the initial face alignment model can be set according to the hierarchical fitting rule.
  • the hierarchical fitting rule may be a rule set based on at least one feature region and loss weights of each feature region.
  • the loss weight is positively related to the fitting training sequence, and the feature region with a larger loss weight is preferentially fitted for training.
  • a plurality of feature regions for representing each sample face image may be determined according to multiple reference key points in the sample face image and label information of each reference key point.
  • the schematic diagram of the regional division can be shown in Figure 5. It should be understood that the number of reference key points is only an example, and is not limited to 51. It can also be 48 or 86. Wait.
  • a hierarchical fitting rule is determined according to the set loss weight.
  • the hierarchical fitting rule can show that the loss weight is positively related to the fitting training order, and the feature area with a larger loss weight is preferentially trained by the fit.
  • Equation 1.6 the loss function shown in Equation 1.6 can be set according to the hierarchical fitting rule.
  • x j and y j represent the labeled coordinates of each reference key point
  • x ' j and y' j represent the labeled coordinates of each detection key point
  • ⁇ j represents the loss weight of each reference key point
  • the value can be based on The loss weight of the feature area to which the reference key point belongs is determined. For example, the loss weight of the mouth region in the hierarchical fitting rule is 0.6, then the loss weight of the key points of the mouth in all mouth regions is 0.6.
  • the second training data set and the third training data set are sequentially selected to train the initial face alignment model to obtain the first face alignment model.
  • the second training data set is first used to train the initial face alignment model to obtain an intermediate face alignment model.
  • the above formula can be used 1.6 Get a value of the loss function.
  • the model parameters of the initial face alignment model will be adjusted, so that the next time the face keypoint detection is performed on the target sample face image, the value of the new loss function will change. It is small, so that for all sample face images in the second training data set, face keypoint detection, calculation of loss function values, and adjustment of model parameters are repeated to obtain an intermediate face alignment model.
  • the third training data set is used to train the middle face alignment model to obtain the first face alignment model.
  • the middle face alignment model For the process of training the middle face alignment model to obtain the first face alignment model, reference may be made to the above description of the training process from the initial face alignment model to the middle face alignment model.
  • the second training data set with the larger amplification parameter is used for training first, which can make the trained face alignment model first Adapt to more complex face images, and then use the third training data set with smaller amplification parameters for training, which can make the trained face alignment model adapt to simpler face images.
  • the complex and simple training process can be Improve model training efficiency.
  • the loss weight of each key point in the loss function priority can be given to the key points with larger training loss weights. For example, among multiple feature regions, the loss weight of the mouth region is the largest, so each time training, the key points of the mouth region can be prioritized and focused.
  • the third training data set is first used to train the initial face alignment model to obtain the middle face alignment model; then the second training data set is used to align the middle face.
  • the model is trained to get the first face alignment model.
  • multiple training data sets may include a second training data set, a first The three training data sets, the fourth training data set, and the fifth training data set. Iterative training is performed using multiple training data sets to obtain a first face alignment model.
  • the amplification parameters corresponding to the multiple training data sets range from large to large. The small sequence is: amplification parameters corresponding to the second training data set> amplification parameters corresponding to the third training data set> amplification parameters corresponding to the fourth training data set> amplification corresponding to the fifth training data set parameter.
  • Tests show that after sequentially training and optimizing the model based on the training data set obtained using different amplification parameters, the final target face alignment model can be used to more accurately detect key points of the face, and it is more robust.
  • step 303 a difficult sample face image is obtained by filtering from the sample data set.
  • the specific process of screening includes the following steps S3031 and S3032. among them,
  • the detection key point set includes multiple detection key points and label information of each detection key point.
  • the difference between the reference keypoint set and the detection keypoint set can be statistically calculated; the sample face image with a difference greater than a preset threshold is selected from the sample data set and determined as a difficult sample person Face image.
  • the preset threshold can be determined according to the business needs of the target face alignment model: if the accuracy requirement of the target face alignment model is high, the preset threshold can be a smaller value; if the accuracy of the target face alignment model is If the requirement is low, the preset threshold can take a larger value.
  • the Euclidean distance formula shown in Equation 1.7 may be used to count the difference between the reference key point set and the detection key point set of each sample face image.
  • p i any reference key point in the sample face image
  • q i any one detection key point in the sample face image
  • d (p, q) between the reference key point set and the detection key point set.
  • d (p, q) d (q, p).
  • the cosine similarity can be used to count the difference between the reference key point set and the detection key point set of each sample face image.
  • the coordinates of each reference keypoint in the reference keypoint set can be represented by vectors to obtain a reference vector set
  • the coordinates of each detection keypoint in the detection keypoint set can be represented by vectors to obtain a detection vector set, and then a cosine is used.
  • the similarity formula calculates the difference between the reference vector set and the detection vector set, thereby determining the difference between the reference key point set and the detection key point set of each sample face image.
  • the differences between the reference key point set and the detection key point set of each sample face image can also be counted using Manhattan distance, Hamming distance, Chebyshev distance, and the like.
  • the face images of the difficult samples can be amplified first, such as displacement processing, rotation processing, mirror processing, compression processing, etc .; then the face images of the difficult samples and the face images of the difficult samples after the amplification processing can be performed.
  • Normalization processing such as image normalization processing and annotation information normalization processing, to obtain a difficult training data set; then, according to the principle of reducing the value of the loss function, the difficult training data set can be used to align the first face
  • the model is optimized to obtain the target face alignment model. That is, after obtaining the difficult training data set, the first face alignment model can be further optimized based on the difficult training data set.
  • the optimization of the first face alignment model is mainly to optimize the model parameters in the first face alignment model according to the value of a loss function, and the process of optimization based on the value of the loss function can refer to the aforementioned formula 1.6 and its Related description.
  • the model parameters of the first face alignment model can be continuously changed to reduce the value of the loss function of the first face alignment model, so that the first person The value of the loss function of the face alignment model satisfies a preset condition, thereby achieving the purpose of optimizing the first face alignment model.
  • the target face alignment model trained through the above steps S302-S304 runs faster and has a smaller memory, which can reduce the difficulty of deployment on a mobile terminal and improve the detection accuracy of key points.
  • S306 Invoke a target face alignment model to perform face alignment detection on the target face image to obtain a target keypoint set of the target face image.
  • the target keypoint set includes a plurality of target keypoints and label information of each target keypoint.
  • steps S305-S307 reference may be made to steps S201-S203 in the foregoing embodiments of the present invention, which are not repeatedly described in this embodiment of the present application.
  • a target face alignment model can be used for face alignment detection. Since the target face alignment model is obtained through hierarchical fitting training, the target face alignment model can accurately perform key on each feature region. Point detection to improve the accuracy of test results. In addition, the target face alignment model has a small memory and a fast running speed, which can improve the detection efficiency of face alignment.
  • an embodiment of the present application further provides a service processing method.
  • the service processing method can be implemented by a terminal device, such as a mobile terminal such as a smart phone or a tablet computer. It can include the following steps S601-S603:
  • the service request can be automatically generated by the terminal.
  • the terminal detects that the user has turned on the face alignment detection function of the terminal, or detects that the user can automatically generate a service request when using the application based on the face alignment detection.
  • the business request corresponding to a smart map application is a smart map request
  • the business request corresponding to a face recognition application is an authentication request
  • the business request corresponding to a face effect application program is changed.
  • Process requests for face changing effects and more.
  • the terminal's camera device (such as a camera) can be called to take a picture of the requester to obtain the target face image of the requester.
  • a stored face image obtained from a local gallery or a cloud album may also be used as a target face image.
  • the face image displayed on the terminal screen is used as the target face image.
  • the terminal may parse the service request to determine the requested service corresponding to the service request.
  • the requested service here may include, but is not limited to, a face recognition service, an expression recognition service, an age analysis service, Any one or more of face-changing special effects business and smart map business.
  • S602 Perform face alignment detection on the target face image by using a face detection method to obtain a feature region of the target face image.
  • the face detection method may correspond to the face detection method described in the embodiment shown in FIG. 2 or FIG. 3 above.
  • the above method may be used to implement
  • the target face alignment model mentioned in the example performs face alignment detection to obtain characteristic regions of the target face image, such as the mouth region, eyebrow region, eye region, nose region, ear region, and so on.
  • S603. Process the requested service according to the feature area of the target face image to respond to the service request.
  • the requested service may be processed according to the feature area to respond to the service request.
  • the requested service is a face-changing special effects service
  • information such as the position and size of one or more key points in each feature region may be transformed to change the person in the target face image.
  • Face shape For example, the face-changing special effects business is a business with enlarged eyes and reduced nose. You can transform the position and size of multiple key points in the eye area to increase the size of the eye area, and position the multiple key points in the nose area. Information such as size, size, etc. are transformed to reduce the nose area, thereby completing the face-changing image service, as shown in FIG. 7.
  • each map in the template map template is correspondingly added to each feature area to obtain a target face image after smart map processing.
  • the target texture template is a texture image of a dog image
  • maps such as "dog ear”, “dog nose”, and "dog mouth” in the texture template can be added to each feature area accordingly.
  • a face detection method may be used to perform face alignment detection to obtain a characteristic region of the target face image, and the requested service is processed according to the characteristic region in response to the service request. Because the target face alignment model used in this face detection method is obtained through hierarchical fitting training, keypoint detection can be performed on each feature region more accurately, thereby improving the accuracy of business processing results.
  • the embodiment of the present application further provides a schematic structural diagram of a face detection device as shown in FIG. 9, which can perform the methods shown in FIG. 2 and FIG. 3. .
  • the face detection device in the embodiment of the present application may include:
  • the obtaining unit 101 is configured to obtain a target face image to be detected.
  • the training unit 102 is configured to perform hierarchical fitting training by using a face alignment algorithm and a sample data set to obtain a target face alignment model;
  • the detecting unit 103 is configured to call the target face alignment model to perform face alignment detection on the target face image to obtain a target keypoint set of the target face image.
  • a determining unit 104 is configured to determine a feature region of the target face image according to the target keypoint set.
  • the obtaining unit 101 may be further configured to obtain the sample data set, where the sample data set includes multiple sample face images and a reference key point set of each sample face image, and each sample face image
  • the reference key point set includes a plurality of reference key points and label information of each reference key point; according to the multiple reference key points and the label information of each reference key point, determining a plurality of feature regions representing each sample face image;
  • the training unit 102 is specifically configured to: determine the training priority of each feature area according to the loss weight of the feature area of each sample face image; and use the face alignment algorithm to determine the priority of each sample face image according to the training priority. Feature region for fitting training.
  • the feature region includes any of the following: eyebrow region, eye region, nose region, mouth region, ear region; and the face alignment algorithm includes a machine learning regression algorithm or a convolutional neural network algorithm.
  • the obtaining unit 101 is specifically configured to: when detecting that a user is using an application based on face alignment detection, detect a service request that requires face alignment detection; and when the service request is detected, call the terminal
  • the imaging device of the device acquires the face image of the requester as the target face image.
  • the sample data set includes multiple sample face images
  • the training unit 102 is specifically configured to: perform iterative training according to the face alignment algorithm and the sample data set; and filter from the sample data set A difficult sample face image is obtained; the result of the iterative training is optimized according to the difficult sample face image to obtain the target face alignment model.
  • the sample data set further includes: a reference key point set of each sample face image
  • the training unit 102 is specifically configured to preprocess the sample data set to obtain multiple training data sets, each The training data set includes multiple pre-processed sample face images; iterative training is performed using the face alignment algorithm and the multiple training data sets to obtain a first face alignment model; and the first face alignment is called The model performs face alignment detection on the sample data set to obtain the detection key point set of each sample face image in the sample data set. According to the difference between the reference key point set and the detection key point set, from The difficult sample face image is filtered from the sample data set; the first face alignment model is optimized by using the difficult sample face image to obtain the target face alignment model.
  • the plurality of training data sets includes a first training data set, and the first training data set is any one of the plurality of training data sets; the training unit 102 may be specifically configured to: obtain a first Amplification parameters, and performing amplification processing on the sample data set according to the first amplification parameter to obtain a first amplification data set; the first amplification data set includes a plurality of sample people after the amplification processing A face image; merging the sample data set with the first augmented data set; performing normalization processing on the merged data set to obtain a first training data set.
  • the plurality of training data sets includes a second training data set and a third training data set, and the second training data set is selected before the third training data set during iterative training; training The unit 102 may be specifically configured to: use the face alignment algorithm and the first training data set for training to obtain an initial face alignment model; and set a loss function of the initial face alignment model according to a hierarchical fitting rule; According to the principle of reducing the value of the loss function, the second training data set and the third training data set are sequentially selected to train the initial face alignment model to obtain the first face alignment model.
  • the amplification parameter corresponding to the second training data set is greater than the amplification parameter corresponding to the third training data set;
  • the amplification parameter includes at least one of the following: a displacement parameter and a rotation angle Parameters and compression ratio parameters.
  • the reference key point set of each sample face image includes multiple reference key points and label information of each reference key point
  • the training unit 102 is specifically configured to: according to the multiple reference key points and each reference Key point annotation information to determine multiple feature regions used to represent the face image of each sample; set different loss weights for each feature region based on the detection difficulty of each feature region; based on at least one feature region and the loss of each feature region Weight, the hierarchical fitting rule is set, wherein the feature area with larger loss weight is more preferentially trained by fitting.
  • the training unit 102 is specifically configured to: use the second training data set to train an initial face alignment model to obtain a middle face alignment model; select the third training data set to align the middle face The model is trained to get the first face alignment model.
  • the training unit 102 is specifically configured to: perform amplification processing on the face image of the difficult sample; perform normalization processing on the face image of the difficult sample and the face image of the difficult sample after the amplification processing. To obtain a difficult training data set; use the difficult training data set to optimize the first face alignment model to obtain the target face alignment model.
  • the training unit 102 is specifically configured to: for each sample face image, count differences between the reference keypoint set and the detection keypoint set; and filter out the sample data set from the sample data set.
  • a sample face image with a difference greater than a preset threshold is determined as the difficult sample face image.
  • the target keypoint set includes multiple target keypoints and label information of each target keypoint
  • the training unit 102 is specifically configured to: determine the target face image according to the target keypoint set
  • the feature area of the target area includes: determining the feature area of the target face image according to the labeled information of each target keypoint.
  • the labeling information includes feature information; the training unit 102 is specifically configured to: determine the category of each target keypoint according to the feature information of each target keypoint, and use the area formed by the target keypoints of the same category as A feature region, and the category is used as the category of the feature region.
  • the labeling information includes position information; the training unit 102 is specifically configured to: determine the labeling position of each target keypoint according to the position information, and connect target keypoints of adjacent positions; The shape of is similar to the shape of any of the facial features of the human face. Then, the area formed by the target key points of these adjacent positions is determined as the feature area, and the type of the feature area is determined according to the shape.
  • a target face alignment model can be used for face alignment detection. Since the target face alignment model is obtained through hierarchical fitting training, the target face alignment model can accurately perform key on each feature region. Point detection to improve the accuracy of test results. In addition, the target face alignment model has a small memory and a fast running speed, which can improve the efficiency of face alignment detection.
  • an embodiment of the present application further provides a schematic structural diagram of a service processing apparatus shown in FIG. 10, and the service processing apparatus may execute the method shown in FIG. 6.
  • the service processing apparatus in the embodiment of the present application may include:
  • An obtaining unit 201 is configured to call a camera device to obtain a target face image of a requester when a service request requiring face alignment detection is detected;
  • a detection unit 202 is configured to perform face alignment detection on the target face image by using the face detection method described in FIG. 2 or FIG. 3 to obtain a characteristic region of the target face image;
  • the processing unit 203 is configured to process the requested service according to a characteristic region of the target face image to respond to the service request.
  • a face detection method may be used to perform face alignment detection to obtain a characteristic region of the target face image, and the requested service is processed according to the characteristic region in response to the service request. Because the target face alignment model used in this face detection method is obtained through hierarchical fitting training, keypoint detection can be performed on each feature region more accurately, thereby improving the accuracy of business processing results.
  • an embodiment of the present application further provides a terminal.
  • the internal structure of the terminal includes at least a processor 301, an input device 302, an output device 303, and a memory 304.
  • the processor 301, the input device 302, the output device 303, and the memory 304 in the terminal may be connected through a bus or other methods.
  • the connection through the bus 305 is taken as an example.
  • the memory 304 may be used to store a computer program, the computer program includes a first program instruction and / or a second program instruction, and the processor 301 is configured to execute the first program instruction stored in the memory 304 to implement FIG. 2 Or the face detection method shown in FIG. 3.
  • the processor 301 may be further configured to execute a second program instruction stored in the memory 304 to implement the service processing method shown in FIG. 6.
  • the processor 301 may be a central processing unit (CPU), and the processor may also be another general-purpose processor, that is, a microprocessor or any conventional processor.
  • the memory 304 may include a read-only memory and a random access memory, and provide instructions and data to the processor 301. Therefore, the processor 301 and the memory 304 are not limited herein.
  • An embodiment of the present application further provides a computer storage medium (Memory), where the computer storage medium is a memory device in a terminal and is used to store programs and data.
  • the computer storage medium herein may include a built-in storage medium in the terminal, and of course, an extended storage medium supported by the terminal.
  • the computer storage medium provides a storage space that stores an operating system of the terminal.
  • computer program instructions suitable for being loaded and executed by the processor 301 are also stored in the storage space, and these instructions may be one or more computer programs (including program code).
  • the computer storage medium here may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), for example, at least one disk memory; optionally, at least one is located far away from the foregoing processor.
  • Computer storage media may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), for example, at least one disk memory; optionally, at least one is located far away from the foregoing processor.
  • the first computer program instructions stored in the computer storage medium may be loaded and executed by the processor 301 to implement the corresponding steps of the method in the above-mentioned embodiment of face detection; in specific implementation, the The first computer program instruction is loaded by the processor 301 and executes the following steps:
  • the face alignment algorithm and sample data set are used for hierarchical fitting training to obtain the target face alignment model
  • a feature region of the target face image is determined according to the target keypoint set.
  • the processor 301 can load and execute the second computer program instructions stored in the computer storage medium to implement the corresponding steps of the method in the foregoing business processing embodiment; in specific implementation, the first step in the computer storage medium Two computer program instructions are loaded by the processor 301 and execute the following steps:
  • a camera device When a service request requiring face alignment detection is detected, a camera device is called to obtain a target face image of the requester;
  • FIG. 12 is a schematic structural diagram of an implementation environment provided by an embodiment of the present application.
  • the face detection system 100 includes a user 101 and a terminal device 102.
  • the terminal device 102 includes a camera device 1021, an application program 1022, a face detection device 1023, and an operation button 1024.
  • the application program 1022 has a requirement for face alignment detection, for example, an expression recognition application program, a program for changing face special effects, a smart map application program, and an identity verification application program.
  • the terminal device 102 when the terminal device 102 detects that the user 101 is using the application program 1022 based on face alignment detection, as shown by an arrow 1031, it is detected whether the user 101 has issued a service request that requires face alignment detection.
  • the terminal device 102 calls the camera device 1021 to obtain a face image of the requester (such as the user 101 or other users other than the user 101) as a target face image, as shown by an arrow 1032.
  • the face detection device 1023 uses the face alignment algorithm and the sample data set for hierarchical fitting training to obtain the target face alignment model; the target face alignment model is called to perform face alignment detection on the target face image to obtain the target face A target keypoint set of an image, and a feature region of the target face image is determined according to the target keypoint set, for example, each feature region shown in FIG. 1b.
  • a face detection method may be used to perform face alignment detection to obtain a characteristic region of the target face image, and the requested service is processed according to the characteristic region in response to the service request. Because the target face alignment model used in this face detection method is obtained through hierarchical fitting training, keypoint detection can be performed on each feature region more accurately, thereby improving the accuracy of business processing results.
  • the program can be stored in a computer-readable storage medium.
  • the program When executed, the processes of the embodiments of the methods described above may be included.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random, Access Memory, RAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

一种人脸检测方法及装置、业务处理方法、终端设备及存储介质,其中方法包括:获取待检测的目标人脸图像(S201);采用人脸对齐算法和样本数据集进行分级拟合训练,得到目标人脸对齐模型;调研所述目标人脸对齐模型对所述目标人脸图像进行人脸对齐检测,得到所述目标人脸图像的目标关键点集合(S202);根据所述目标关键点集合确定所述目标人脸图像的特征区域(S203)。

Description

人脸检测方法及装置、业务处理方法、终端设备及存储介质
本申请要求于2018年9月30日提交中国专利局、申请号为201811165758.5、申请名称为“人脸检测方法、业务处理方法、装置、终端及介质”的中国专利申请的优先权。
技术领域
本申请涉及图像处理技术领域,尤其涉及一种人脸检测方法及装置、业务处理方法、终端设备及存储介质。
发明背景
图像处理,又可称为影像处理,是用计算机对图像进行处理以达到所需结果的技术。在图像处理领域中,人脸检测成为了一个热门的研究课题,人脸检测可以包括人脸对齐检测,所谓的人脸对齐检测又可称为人脸关键点检测,是指:对人脸图像进行检测,定位出人脸部分的关键特征点,例如眼睛、鼻子、嘴角等关键特征点。如何更好地对人脸图像进行人脸检测成为了研究热点。
发明内容
本申请实施例提供了一种人脸检测方法及装置、业务处理方法、终端设备及存储介质,可更好地对人脸图像进行人脸检测,提高检测结果的准确性。
一方面,本申请实施例提供了一种人脸检测方法,由终端设备执行,包括:
获取待检测的目标人脸图像;
采用人脸对齐算法和样本数据集进行分级拟合训练,得到目标人脸 对齐模型;
调用所述目标人脸对齐模型对所述目标人脸图像进行人脸对齐检测,得到所述目标人脸图像的目标关键点集合;
根据所述目标关键点集合确定所述目标人脸图像的特征区域。
另一方面,本申请实施例提供了一种业务处理方法,由终端设备执行,包括:
当检测到需要人脸对齐检测的业务请求时,调用所述终端设备的摄像装置,获取请求者的目标人脸图像;
采用人脸检测方法对所述目标人脸图像进行人脸对齐检测,得到所述目标人脸图像的特征区域;
根据所述目标人脸图像的特征区域,对所请求业务进行处理以响应所述业务请求。
再一方面,本申请实施例提供了一种人脸检测装置,包括:
获取单元,用于获取待检测的目标人脸图像;
训练单元,用于采用人脸对齐算法和样本数据集进行分级拟合训练,得到目标人脸对齐模型;
检测单元,用于调用所述目标人脸对齐模型对所述目标人脸图像进行人脸对齐检测,得到所述目标人脸图像的目标关键点集合;
确定单元,用于根据所述目标关键点集合确定所述目标人脸图像的特征区域。
再一方面,本申请实施例提供一种终端设备,包括处理器、输入设备、输出设备和存储器,所述处理器、输入设备、输出设备和存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括第一程序指令,所述处理器被配置用于调用所述第一程序指令,执行上述的人脸检测方法;或者,所述计算机程序包括第二程序指令,所述处 理器被配置用于调用所述第二程序指令,执行上述的业务处理方法。
再一方面,本申请实施例提供一种计算机存储介质,该计算机存储介质存储有第一计算机程序指令,该第一计算机程序指令被执行时用于实现上述的人脸检测方法;或者,该计算机存储介质存储有第二计算机程序指令,该第二计算机程序指令被执行时用于实现上述的业务处理方法。
附图简要说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1a是本申请实施例提供的一种目标人脸图像的示意图;
图1b是本申请实施例提供的另一种目标人脸图像的示意图;
图2是本申请实施例提供的一种人脸检测方法的流程示意图;
图3是本申请另一实施例提供的一种人脸检测方法的流程示意图;
图4a是本申请实施例提供的一种位移处理的示意图;
图4b是本申请实施例提供的一种旋转处理的示意图;
图4c是本申请实施例提供的一种镜像处理的示意图;
图4d是本申请实施例提供的一种压缩处理的示意图;
图5是本申请实施例提供的一种人脸区域划分的示意图;
图6是本申请实施例提供的一种业务处理方法的流程示意图;
图7是本申请实施例提供的一种业务处理方法的应用场景图;
图8是本申请实施例提供的另一种业务处理方法的应用场景图;
图9是本申请实施例提供的一种人脸检测装置的结构示意图;
图10是本申请实施例提供的一种业务处理装置的结构示意图;
图11是本申请实施例提供的一种终端的结构示意图;
图12为本申请实施例提供的实施环境的结构示意图。
实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。
人脸关键点(简称关键点),也可称为人脸特征点,通常包含了构成人脸五官(眉毛、眼睛、鼻子、嘴部以及耳朵)以及人脸轮廓的点。对人脸图像进行检测,标注人脸图像中的一个或多个关键点的方法,可以称为人脸关键点检测方法或者人脸对齐检测方法。通过对人脸图像进行人脸对齐检测,可以确定出人脸图像中的特征区域,此处的特征区域可以包括但不限于:眉毛区域、眼睛区域、鼻子区域、嘴部区域、耳朵区域,等等。
在本申请实施例中,可以提供一种目标人脸对齐模型(亦可称为目标人脸关键点检测模型)以实现人脸对齐检测。在获取到待检测的目标人脸图像之后,可以调用该目标人脸对齐模型对目标人脸图像进行人脸对齐检测,以确定目标人脸图像中的多个关键点以及各个关键点的标注信息,此处的关键点可以包括但不限于:嘴部关键点、眉毛关键点、眼睛关键点、鼻子关键点以及耳朵关键点,等等;关键点的标注信息可以包括但不限于:位置信息(如标注出关键点所在的位置),形状信息(如标注为圆点形状)、特征信息,等等,其中特征信息用于表示该关键点的类别,如特征信息为眼睛的特征信息,则表明该关键点为眼睛的关键点,又如特征信息为鼻子的特征信息,则表明该关键点为鼻子的关键点 等等。
在目标人脸图像中确定出的多个关键点可以如图1a中的灰色圆点所示。在确定了多个关键点之后,可以基于各个关键点的标注信息确定目标人脸图像的特征区域。例如,根据图1a中所标注的各个灰色圆点的位置,可以分别确定眉毛区域11、眼睛区域12、鼻子区域13、嘴部区域14以及耳朵区域15,如图1b所示。
基于上述的描述,本申请实施例提出了一种人脸检测方法,该人脸检测方法可以由终端设备来实现,例如智能手机、平板电脑等移动终端,请参见图2,该方法可以包括如下步骤S201-S203:
S201,获取待检测的目标人脸图像。
此目标人脸图像可以是终端调用摄像装置(例如摄像头)对环境图像进行实时拍摄所获取到的人脸图像,也可以是终端从本地图库或者云相册中获取到的已存储的人脸图像,此处的云相册是指基于云计算平台的网络相册。
在一个实施例中,终端若检测到人脸对齐检测的触发事件,则可以获取待检测的目标人脸图像。此处的人脸对齐检测的触发事件可以作为一种业务请求。
具体而言,当检测到用户正在使用基于人脸对齐检测的应用程序时,检测需要人脸对齐检测的业务请求;当检测到业务请求时,调用所述终端设备的摄像装置,获取请求者的人脸图像作为所述目标人脸图像。
基于人脸对齐检测的应用程序可以包括但不限于:表情识别应用程序、变脸特效应用程序、智能贴图应用程序,等等。用户在使用这些应用程序时,终端需要获取目标人脸图像,并对目标人脸图像进行人脸对齐检测以确定特征区域,从而基于该特征区域进行表情识别、变脸特效、 智能贴图等操作。
可选的,人脸对齐检测的触发事件还可以是:检测到终端根据目标人脸图像进行身份验证的事件。终端在根据目标人脸图像进行身份验证时,需要先对目标人脸图像进行人脸对齐检测以确定特征区域,从而基于该确定的特征区域与预设的人脸信息进行信息匹配等操作。
再一个实施例中,终端若检测到用户发送进行人脸对齐检测的指令,则可以获取待检测的目标人脸图像,该指令可以是语音指令、按压/点击指令、开启人脸对齐检测功能的指令,等等。
S202,采用人脸对齐算法和样本数据集进行分级拟合训练,得到目标人脸对齐模型;调用目标人脸对齐模型对目标人脸图像进行人脸对齐检测,得到目标人脸图像的目标关键点集合。
终端在获取到待检测的目标人脸图像之后,可以将该目标人脸图像输入至目标人脸对齐模型中,以使得该目标人脸对齐模型可以对该目标人脸图像进行人脸对齐检测,从而得到目标人脸图像的目标关键点集合。
此处的目标关键点集合可以包括多个目标关键点及各目标关键点的标注信息,目标关键点可以是以下任一种:嘴部关键点、眉毛关键点、眼睛关键点、鼻子关键点、耳朵关键点,等等。目标关键点的标注信息可以包括该目标关键点的位置信息、形状信息、特征信息,等等。
目标人脸对齐模型是采用人脸对齐算法和样本数据集进行分级拟合训练得到的,此处的人脸对齐算法可以包括但不限于:机器学习回归算法,例如SDM(Supervised Descent Method,监督下降)算法、LBF(Local Binary Features,局部二值特征)算法;或者CNN(Convolutional Neural Network,卷积神经网络)算法,例如TCDCN(Facial Landmark Detection by Deep Multi-task Learning,基于深度多任务学习的人脸标志 点检测)算法、3DDFA(3D Dense Face Alignment,密集人脸对齐)算法。基于这些算法,可以设计得到一个原始模型,然后基于原始模型和样本数据集进行训练后,最终可以得到目标人脸对齐模型。
在一个实施例中,在获取待检测的目标人脸图像之前,还包括:获取样本数据集,样本数据集包括多个样本人脸图像及各样本人脸图像的参考关键点集合,每个样本人脸图像的参考关键点集合包含多个参考关键点及各参考关键点的标注信息;根据多个参考关键点及各参考关键点的标注信息,确定用于表示各样本人脸图像的多个特征区域。
所述特征区域包括以下任一个:眉毛区域、眼睛区域、鼻子区域、嘴部区域、耳朵区域;所述人脸对齐算法包括机器学习回归算法或卷积神经网络算法。
所谓的分级拟合训练是指:根据各样本人脸图像的特征区域的损失权重,确定各特征区域的训练优先级;采用所述人脸对齐算法,根据训练优先级对各样本人脸图像的特征区域进行拟合训练。
具体而言,各个特征区域的检测难度是不同的。根据各特征区域的检测难度,为各特征区域设置不同的损失权重;损失权重越大的特征区域,训练时的优先级越高。根据训练优先级,采用人脸对齐算法对各特征区域进行拟合训练。
在一个实施例中,从样本数据集中筛选得到困难样本人脸图像;根据人脸对齐算法和样本数据集进行迭代训练,再根据困难样本人脸图像对迭代训练的结果进行优化,得到目标人脸对齐模型。其中,困难样本人脸图像是指从样本数据集中筛选出的较难检测的样本人脸图像。
越难检测的关键点所在的特征区域的损失权重越大,由于损失权重越大,对损失函数的值的影响越大,而损失函数的值可以用于描述人脸对齐模型在不同模型参数下的损失值。
在训练过程中,可以不断更改模型参数,以减小损失函数的值,从而达到模型训练和优化的目的。当损失函数的值满足预设条件时,表明训练完成,此时得到的人脸对齐模型为目标人脸对齐模型,此处的预设条件可以包括但不限于:损失函数的值满足预设取值范围或者损失函数的值最小。
因此,在训练过程中,为避免损失权重对损失函数的值的影响,可以更倾向于对损失权重越大的特征区域进行着重拟合训练,由此可以使得训练得到的目标人脸对齐模型可以较准确地对损失权重较大的特征区域(难检测的关键点所在的特征区域)进行关键点检测。由此可知,通过分级拟合训练所得到的目标人脸对齐模型的准确性较高。
S203,根据目标关键点集合确定所述目标人脸图像的特征区域。
在得到目标关键点集合之后,可以根据目标关键点集合中各个目标关键点的标注信息,确定目标人脸图像的特征区域。由前述可知,标注信息可以包括:特征信息、位置信息等。
在一个实施例中,可以根据各个目标关键点的特征信息确定特征区域。具体的,可以根据各目标关键点的特征信息确定各目标关键点的类别,将同一类别的目标关键点所构成的区域作为一个特征区域,并将类别作为该特征区域的类别。例如,选取特征信息全为鼻子的特征信息的目标关键点,这些目标关键点的类别都是鼻子关键点;将这些目标关键点所构成的区域作为鼻子区域。
再一个实施例中,可以根据各个目标关键点的位置信息确定特征区域。具体的,可以先根据位置信息确定各个目标关键点的标注位置,将相邻位置的目标关键点连接起来,若连接所得到的形状与人脸的五官(眉毛、眼睛、鼻子、嘴部、耳朵)中的任意一种形状相似,则将这些相邻位置的目标关键点所构成的区域确定为特征区域,并根据该形状确 定特征区域的类别。例如,若将相邻位置的目标关键点连接起来所得到的形状与鼻子的形状相似,则可以将这些相邻位置的目标关键点所构成的区域确定为鼻子区域。
本申请实施例可以采用目标人脸对齐模型进行人脸对齐检测,由于该目标人脸对齐模型是通过分级拟合训练得到的,因此该目标人脸对齐模型可较准确地对各特征区域进行关键点检测,提高检测结果的准确性。并且,该目标人脸对齐模型的内存较小,运行速度快,可以提高人脸对齐的检测效率。
请参见图3,是本申请另一实施例提供的一种人脸检测方法,该人脸检测方法可以由终端来实现,例如智能手机、平板电脑等移动终端。在图2所示实施例的基础之上,该实施例包括了分级拟合训练的具体步骤,请参见图3,该方法可以包括如下步骤S301-S307:
S301,获取样本数据集。
此处的样本数据集可以包括多个样本人脸图像及各样本人脸图像的参考关键点集合,每个样本人脸图像的参考关键点集合包含多个参考关键点及各参考关键点的标注信息,多个参考关键点及各参考关键点的标注信息可用于表示各样本人脸图像的多个特征区域。其中,此处的特征区域可以包括以下任一个:眉毛区域、眼睛区域、鼻子区域、嘴部区域、耳朵区域。
在一个实施例中,参考关键点集合中的多个关键点以及各关键点的标注信息可以通过专业的标注人员对各样本人脸图像进行预先标注得到。
S302,根据人脸对齐算法和样本数据集进行迭代训练。
迭代训练的具体过程可以包括以下步骤S3021-S3022:
S3021,对样本数据集进行预处理得到多个训练数据集,每个训练数据集包含多个预处理后的样本人脸图像。
终端可以采用不同的扩增参数对样本数据集进行预处理,该预处理可以包括扩增处理和归一化处理,从而得到多个训练数据集。该多个训练数据集可以包括第一训练数据集,该第一训练数据集可以为多个训练数据集中的任一个。相应的,对样本数据集进行预处理得到多个训练数据集的具体实施方式可以是:
首先,获取第一扩增参数,并按照第一扩增参数对样本数据集进行扩增处理得到第一扩增数据集。得到的第一扩增数据集可以包含多个扩增处理后的样本人脸图像。
此处的扩增处理包括以下至少一项:位移处理、旋转处理、镜像处理和压缩处理,相应的扩增参数包括以下至少一种:位移参数、旋转角度参数和压缩比例参数。
其中,位移处理是指将样本人脸图像中人脸部分的位置进行变化的处理。具体的,可以采用式1.1所示的式子对样本人脸图像进行位移处理:
Rect(x,y,w,h)→Rect(x+dx,y+dy,w,h)        式1.1
其中,Rect用来存储成对出现的参数,Rect(x,y,w,h)表示样本人脸图像的初始坐标,x为横坐标、y为纵坐标、w为样本人脸图像的宽度值、h为样本人脸图像的长度值;Rect(x+dx,y+dy,w,h)表示对样本人脸图像进行位移处理后的坐标,dx为横坐标上的变化量,dy是纵坐标上的变化量,dx和dy均可作为位移参数。
需要说明的是:样本人脸图像的初始坐标可以是指样本人脸图像的左上角的坐标,也可以是指样本人脸图像的右上角的坐标,还可以是指样本人脸图像的中心点的坐标,等等,在此不作限定。以样本人脸图像 的初始坐标为样本人脸图像的中心点的坐标为例,位移处理的示意图可以参见图4a。
旋转处理是指以样本人脸图像中心点为原点,将样本人脸图像以旋转角度θ进行顺时针(θ取正值)或者逆时针(θ取负值)的旋转处理,θ可作为旋转角度参数。具体的,设样本人脸图像的中心点的坐标为(x,y),针对样本人脸图像中的任意一个像素(x 0,y 0)可以采用式1.2所示的旋转变换的矩阵进行旋转处理,得到旋转后的像素坐标(x',y'),其中x'=(x-x 0)cosθ+(y-y 0)(-sinθ)+x 0,y'=(x-x 0)sinθ+(y-y 0)cosθ+y 0,旋转处理的示意图可以参见图4b。
Figure PCTCN2019108145-appb-000001
镜像处理可以包括水平镜像处理和垂直镜像处理,其中水平镜像处理是指:以样本人脸图像的垂直中轴线为中心,交换样本人脸图像的左右两部分;垂直镜像处理是指:以样本人脸图像的水平中轴线为中心,交换样本人脸图像的上下两部分。本申请实施例以对样本人脸图像进行水平镜像处理为例,具体的,针对样本人脸图像中的任意一个像素(x 0,y 0),可以采用式1.3所示的式子进行水平镜像处理,该像素经水平镜像处理后的坐标为(x 1,y 1),式1.3中的w为样本人脸图像的宽度值,水平镜像处理的示意图可以参见图4c。
Figure PCTCN2019108145-appb-000002
需要说明的是,在其他实施例中,对样本人脸图像进行镜像处理时,可以对样本人脸图像进行垂直镜像处理,也可以对样本人脸图像既进行水平镜像处理也进行垂直镜像处理。
压缩处理是指通过图像格式保存样本人脸图像时,按照指定的图像质量参数保存样本人脸图像的处理,指定的图像质量参数可以从预设的质量参数范围中确定,预设的质量参数范围可以为[0,100%],图像质量 参数越高,则表示保存的样本人脸图像的清晰度越高,此处的图像质量参数即可作为压缩比例参数。以图像质量参数为85%为例,对样本人脸图像进行压缩处理的示意图可以参见图4d。
其次,在得到第一扩增数据集之后,可以将样本数据集与第一扩增数据集进行合并,以得到合并后的数据集。其中,第一扩增数据集中的多个扩增处理后的样本人脸图像可以是对样本数据集中的样本人脸图像依次进行上述的位移处理、旋转处理、镜像处理以及压缩处理后所得到的样本人脸图像。
在其他实施例中,第一扩增数据集中的多个扩增处理后的样本人脸图像也可以是对样本数据集中的样本人脸图像进行上述扩增处理中的部分处理所得到的样本人脸图像,例如只进行位移处理后所得到的样本人脸图像,或者只进行旋转处理后所得到的样本人脸图像,又或者只进行位移处理和压缩处理后所得到的样本人脸图像,等等。
最后,可以对合并后的数据集进行归一化处理,得到第一训练数据集。所述归一化处理包括:图像归一化处理和/或标注信息归一化处理。
其中,图像归一化处理是指对样本人脸图像进行转浮点去中心的归一化处理。具体的,需要先对样本人脸图像的数据类型进行变换,将数据类型变为浮点型,以便于对样本人脸图像进行归一化处理。由于图像通常由多个图像通道构成,例如JPG图像通常由RGB(Red Green Blue)三个图像通道构成。因此,在对样本人脸图像进行归一化处理时,可以针对样本人脸图像的任意一个图像通道CO,求取该图像通道的所有像素值的均值m,以及方差d,然后采用式1.4所示的式子对该图像通道的像素i的值CO i进行归一化处理,得到新的图像通道C i
C i=(CO i-m)/d                式1.4
通过对样本人脸图像进行归一化处理,可以使得归一化处理后的样 本人脸图像中的各像素值处于预设区间内,以提高样本人脸图像的稳定性,以及后续模型训练的准确性,预设区间可以根据实际业务需求确定,例如[0,1]。
标注信息归一化处理是指对样本人脸图像中的各个参考关键点的标注信息中的位置信息进行归一化处理。具体的,可以采用式1.5所示的式子对各参考关键点的位置信息(坐标)进行归一化处理。
(x,y)→(x/w,y/h)                式1.5
其中,(x,y)表示样本人脸图像中任意一个参考关键点的坐标,w为样本人脸图像的宽度值、h为样本人脸图像的长度值。对样本人脸图像进行标注信息的归一化处理,可以提高后续模型训练的准确性。
S2022,采用人脸对齐算法和多个训练数据集进行迭代训练,得到第一人脸对齐模型。
通过步骤s11所得到的多个训练数据集可以统称为第一训练数据集,也可以将其进一步分为第二训练数据集和第三训练数据集,在迭代训练时该第二训练数据集先于该第三训练数据集被选用。其中,第二训练数据集所对应的扩增参数大于第三训练数据集所对应的扩增参数。例如,第二训练数据集所对应的扩增参数可以是:位移参数:dx=20,dy=20;旋转角度参数:θ=40°等;第三训练数据集所对应的扩增参数可以是:位移参数:dx=5,dy=5;旋转角度参数:θ=10°等。
相应的,采用人脸对齐算法和多个训练数据集进行迭代训练,得到第一人脸对齐模型的具体实施方式可以是:
首先,可以采用人脸对齐算法和第一训练数据集进行训练,得到初始人脸对齐模型。人脸对齐算法可以包括但不限于:机器学习回归算法或者CNN算法等。
具体的,可以采用人脸对齐算法构建一个原始模型,采用第一训练数据集对原始模型进行训练优化,得到初始人脸对齐模型,以便于后续进一步地基于第二训练数据集和第三训练数据集甚至更多的训练数据集对初始人脸对齐模型进行训练优化。其中,不同的训练数据集所使用的扩增参数不相同。
对原始模型的训练优化可以采用有监督的机器学习优化算法来实现,也就是说,将基于第一训练数据集的各个样本人脸图像中已知的参考关键点与经过原始模型进行检测后得到的检测关键点之间的位置差异进行比较,若差异越大,则越需要对原始模型的模型参数进行调整,直至检测关键点与参考关键点之间的差异最小,或者说差异小于预设的阈值,此时可以得到初始人脸对齐模型。
其次,可以按照分级拟合规则,设置初始人脸对齐模型的损失函数。该分级拟合规则可以为基于至少一个特征区域及各特征区域的损失权重设置的规则,该损失权重与拟合训练顺序正相关,损失权重越大的特征区域越优先被拟合训练。
实践表明,在对人脸图像进行人脸对齐检测时,通常嘴部区域的嘴部关键点的平均误差较大,也就是说嘴部区域较难检测,其准确性较低。因此在进行模型训练时,可以优先并着重对嘴部区域等较难检测的困难特征区域进行拟合训练,以使得目标人脸对齐模型可以较准确地对这些困难特征区域进行关键点检测。
基于此,可以先根据样本人脸图像中多个参考关键点及各参考关键点的标注信息,确定用于表示各样本人脸图像的多个特征区域。以参考关键点的数量为51个为例,区域划分的示意图可以如图5所示,应理解的是,参考关键点的数量只是举例,并非限定为51个,也可以是48个、86个等。
然后,可以根据各特征区域的检测难度,为各特征区域设置不同的损失权重,检测难度越大的特征区域的损失权重越大。然后,根据设置的损失权重确定一个分级拟合规则,分级拟合规则可以表明损失权重与拟合训练顺序正相关,损失权重越大的特征区域越优先被拟合训练。
最后,可以根据分级拟合规则设置如式1.6所示的损失函数。
Figure PCTCN2019108145-appb-000003
其中,x j和y j分别表示各参考关键点的标注坐标,x' j和y' j分别表示各检测关键点的标注坐标;ω j表示各参考关键点的损失权重,该取值可以根据参考关键点所属的特征区域的损失权重确定。例如分级拟合规则中的嘴部区域的损失权重为0.6,那么所有嘴部区域的嘴部关键点的损失权重均为0.6。
最后,可以按照减少损失函数的值的原则,依次选用该第二训练数据集及该第三训练数据集对该初始人脸对齐模型进行训练,得到第一人脸对齐模型。
在具体实施过程中,可以按照减少损失函数的值的原则,先选用第二训练数据集对初始人脸对齐模型进行训练,得到中间人脸对齐模型。
具体的,在基于第二训练数据集中的某个目标样本人脸图像进行训练时,在本次通过初始人脸对齐模型对目标样本人脸图像进行人脸关键点检测后,可以基于上述的式1.6得到一个损失函数的值,此时会调整初始人脸对齐模型的模型参数,使得下一次在对该目标样本人脸图像进行人脸关键点检测后,得到的新的损失函数的值会变小,这样针对第二训练数据集中的所有样本人脸图像重复进行人脸关键点的检测、损失函数的值的计算以及模型参数的调整,以此来得到中间人脸对齐模型。
然后,选用第三训练数据集对中间人脸对齐模型进行训练,得到第 一人脸对齐模型。对中间人脸对齐模型进行训练以得到第一人脸对齐模型的过程可参考上述的从初始人脸对齐模型到中间人脸对齐模型的训练过程的描述。
由于扩增参数越大,其对应的训练数据集中的样本人脸图像的复杂度越高,因此先采用扩增参数较大的第二训练数据集进行训练,可以使得训练的人脸对齐模型先适应较为复杂的人脸图像,再采用扩增参数较小的第三训练数据集进行训练,可以使训练的人脸对齐模型再适应较为简单的人脸图像,先复杂再简单的训练过程,可以提高模型训练效率。
在每一次的训练过程中,均可根据损失函数中的各关键点的损失权重,优先并着重拟合训练损失权重较大的关键点。例如,在多个特征区域中,嘴部区域的损失权重最大,那么每次训练时,可以优先并着重对嘴部区域的关键点进行拟合训练。
在其他实施例中,也可以按照减少损失函数的值的原则,先选用第三训练数据集对初始人脸对齐模型进行训练,得到中间人脸对齐模型;然后选用第二训练数据集对中间人脸对齐模型进行训练,得到第一人脸对齐模型。
需要说明的是,根据人脸对齐模型的精准度的需求,在其他实施例中,还可以采用更多的训练数据集进行迭代训练,例如多个训练数据集可以包括第二训练数据集、第三训练数据集、第四训练数据集、第五训练数据集,采用多个训练数据集进行迭代训练得到第一人脸对齐模型,其中这多个训练数据集所对应的扩增参数由大到小依次为:第二训练数据集所对应的扩增参数>第三训练数据集所对应的扩增参数>第四训练数据集所对应的扩增参数>第五训练数据集所对应的扩增参数。
测试表明,基于使用了不同扩增参数得到的训练数据集依次进行模型的训练优化后,可以使得最终得到的目标人脸对齐模型能够更准确地 进行人脸关键点检测,鲁棒性更好。
步骤303,从样本数据集中筛选得到困难样本人脸图像。
筛选的具体过程包括如下步骤S3031和S3032。其中,
S3031,调用第一人脸对齐模型对样本数据集进行人脸对齐检测,得到样本数据集中的各样本人脸图像的检测关键点集合。其中,检测关键点集合包含多个检测关键点及各检测关键点的标注信息。
S3032,根据参考关键点集合与检测关键点集合之间的差异,从样本数据集中筛选出困难样本人脸图像。
具体实施过程中,针对每个样本人脸图像,可以统计参考关键点集合与检测关键点集合之间的差异;从样本数据集中筛选出差异大于预设阈值的样本人脸图像确定为困难样本人脸图像。
预设阈值可以根据目标人脸对齐模型的业务需求来确定:若目标人脸对齐模型的精准度的要求高,则预设阈值可以取较小的值;若目标人脸对齐模型的精准度的要求低,则预设阈值可以取较大的值。
在一个实施例中,可以采用如式1.7所示的欧式距离的公式来统计每个样本人脸图像的参考关键点集合与检测关键点集合之间的差异。
Figure PCTCN2019108145-appb-000004
其中,p i表示样本人脸图像中的任意一个参考关键点,q i表示样本人脸图像中的任意一个检测关键点,d(p,q)表示参考关键点集合与检测关键点集合之间的差异,并且满足d(p,q)=d(q,p)。
再一个实施例中,可以采用余弦相似度来统计每个样本人脸图像的参考关键点集合与检测关键点集合之间的差异。具体的,可以将参考关键点集合中的各参考关键点的坐标用向量表示得到参考向量集合,以及将检测关键点集合中的各检测关键点的坐标用向量表示得到检测向量 集合,然后采用余弦相似度的公式计算参考向量集合和检测向量集合之间的差值,从而确定每个样本人脸图像的参考关键点集合与检测关键点集合之间的差异。
再一个实施例中,还可以采用曼哈顿距离、汉明距离、切比雪夫距离等来统计每个样本人脸图像的参考关键点集合与检测关键点集合之间的差异。
S304,采用困难样本人脸图像对第一人脸对齐模型进行优化,得到目标人脸对齐模型。
具体的,可以先对困难样本人脸图像进行扩增处理,例如位移处理、旋转处理、镜像处理、压缩处理等;然后可以对困难样本人脸图像及扩增处理后的困难样本人脸图像进行归一化处理,例如图像归一化处理、标注信息归一化处理,得到困难训练数据集;接着可以按照减少所述损失函数的值的原则,采用该困难训练数据集对第一人脸对齐模型进行优化得到目标人脸对齐模型。也就是说,在得到困难训练数据集之后即可基于该困难训练数据集对第一人脸对齐模型进行进一步的优化。
对第一人脸对齐模型进行的优化主要是根据一个损失函数的值来对第一人脸对齐模型中的模型参数进行优化,而基于损失函数的值进行优化的过程可参考前述式1.6及其相关描述。基于该困难训练数据集对第一人脸对齐模型进行优化的过程中,可以不断更改第一人脸对齐模型的模型参数以减小第一人脸对齐模型的损失函数的值,使得第一人脸对齐模型的损失函数的值满足预设条件,从而达到对第一人脸对齐模型进行优化的目的。
通过上述步骤S302-S304所训练得到的目标人脸对齐模型运行速度较快,且内存较小,可以降低在移动终端的部署难度,还可以提高关键点的检测精度。
S305,获取待检测的目标人脸图像。
S306,调用目标人脸对齐模型对所述目标人脸图像进行人脸对齐检测,得到所述目标人脸图像的目标关键点集合。所述目标关键点集合包括多个目标关键点及各目标关键点的标注信息。
S307,根据所述目标关键点集合确定所述目标人脸图像的特征区域。
步骤S305-S307可以参见上述发明实施例中的步骤S201-S203,本申请实施例不再赘述。
本申请实施例可以采用目标人脸对齐模型进行人脸对齐检测,由于该目标人脸对齐模型是通过分级拟合训练得到的,因此该目标人脸对齐模型可较准确地对各特征区域进行关键点检测,提高检测结果的准确性。并且,该目标人脸对齐模型的内存较小,运行速度快,可以提高人脸对齐的检测效率。
基于上述人脸检测方法的实施例,本申请实施例还提供一种业务处理方法,该业务处理方法可以由终端设备来实现,例如智能手机、平板电脑等移动终端,请参见图6,该方法可以包括如下步骤S601-S603:
S601,当检测到需要人脸对齐检测的业务请求时,调用终端设备的摄像装置,获取请求者的目标人脸图像。
业务请求可以是终端自动生成的,例如终端检测到用户打开了终端的人脸对齐检测功能,或者检测到用户在使用基于人脸对齐检测的应用程序时,可以自动生成一个业务请求。
不同的应用程序可以对应不同的业务请求,例如智能贴图应用程序所对应的业务请求为智能贴图请求,人脸识别应用程序所对应的业务请求为身份验证请求,变脸特效应用程序所对应的业务请求为变脸特效处 理请求,等等。在检测到该业务请求之后,可以调用终端的摄像装置(如摄像头)对请求者进行拍摄,以得到请求者的目标人脸图像。
在其他实施例中,在检测到业务请求之后,也可以将从本地图库或者云相册中获取的已存储的人脸图像作为目标人脸图像。或者当检测到业务请求时,将终端屏幕中所显示的人脸图像作为目标人脸图像。
终端在接收到业务请求之后,可以解析该业务请求,以确定该业务请求对应的所请求业务,此处的所请求业务可以包括但不限于:人脸识别业务、表情识别业务、年龄分析业务、变脸特效业务以及智能贴图业务中的任一种或多种。
S602,采用人脸检测方法对目标人脸图像进行人脸对齐检测,得到目标人脸图像的特征区域。
人脸检测方法可以对应上述图2或图3所示的实施例中所描述的人脸检测方法,在采用该人脸检测方法对目标人脸图像进行人脸对齐检测时,可以采用上述方法实施例所提及的目标人脸对齐模型进行人脸对齐检测,以得到目标人脸图像的特征区域,例如嘴部区域、眉毛区域、眼睛区域、鼻子区域、耳朵区域等等。
S603,根据目标人脸图像的特征区域,对所请求业务进行处理以响应业务请求。
在确定了目标人脸图像的特征区域之后,可以根据该特征区域对所请求业务进行处理以响应该业务请求。
具体的,若所请求业务为变脸特效业务,则可以在确定特征区域之后,对各特征区域中的一个或多个关键点的位置、大小等信息进行变换,以改变目标人脸图像中的人脸形状。例如,变脸特效业务为眼睛增大、鼻子缩小的业务,则可以将眼睛区域的多个关键点的位置、大小等信息进行变换以增大眼睛区域,并将鼻子区域的多个关键点的位置、大小等 信息进行变换以减小鼻子区域,从而完成该变脸图像业务,如图7所示。
若所请求业务为智能贴图业务,则可以在确定特征区域以及目标贴图模板之后,将模板贴图模板中的各贴图对应地添加到各特征区域中,以得到智能贴图处理后的目标人脸图像。例如,目标贴图模板为狗图像的贴图模板,则在确定特征区域之后,可以将该贴图模板中的“狗耳朵”、“狗鼻子”、“狗嘴部”等贴图对应地添加到各特征区域中,以完成该智能贴图业务,如图8所示。
本申请实施例在获取到目标人脸图像之后,可以采用人脸检测方法进行人脸对齐检测得到目标人脸图像的特征区域,并根据该特征区域对所请求业务进行处理以响应业务请求。由于该人脸检测方法所采用的目标人脸对齐模型是通过分级拟合训练得到的,因此可较准确地对各特征区域进行关键点检测,从而提高业务处理结果的准确性。
基于上述人脸检测方法实施例的描述,本申请实施例还提供了一种如图9所示的人脸检测装置的结构示意图,该人脸检测装置可以执行图2和图3所示的方法。请参见图9,本申请实施例中的人脸检测装置可以包括:
获取单元101,用于获取待检测的目标人脸图像。
训练单元102,用于采用人脸对齐算法和样本数据集进行分级拟合训练,得到目标人脸对齐模型;
检测单元103,用于调用所述目标人脸对齐模型对所述目标人脸图像进行人脸对齐检测,得到所述目标人脸图像的目标关键点集合。
确定单元104,用于根据所述目标关键点集合确定所述目标人脸图像的特征区域。
在一个实施例中,获取单元101还可用于:获取所述样本数据集, 所述样本数据集包括多个样本人脸图像及各样本人脸图像的参考关键点集合,每个样本人脸图像的参考关键点集合包含多个参考关键点及各参考关键点的标注信息;根据所述多个参考关键点及各参考关键点的标注信息,确定表示各样本人脸图像的多个特征区域;
训练单元102具体用于:根据各样本人脸图像的特征区域的损失权重,确定各特征区域的训练优先级;采用所述人脸对齐算法,根据所述训练优先级对各样本人脸图像的特征区域进行拟合训练。
再一个实施例中,所述特征区域包括以下任一个:眉毛区域、眼睛区域、鼻子区域、嘴部区域、耳朵区域;所述人脸对齐算法包括机器学习回归算法或卷积神经网络算法。
再一个实施例中,获取单元101具体用于:当检测到用户正在使用基于人脸对齐检测的应用程序时,检测需要人脸对齐检测的业务请求;当检测到业务请求时,调用所述终端设备的摄像装置,获取请求者的人脸图像作为所述目标人脸图像。
再一个实施例中,所述样本数据集包括多个样本人脸图像,训练单元102具体用于:根据所述人脸对齐算法和所述样本数据集进行迭代训练;从所述样本数据集中筛选得到困难样本人脸图像;根据所述困难样本人脸图像对迭代训练的结果进行优化,得到所述目标人脸对齐模型。
再一个实施例中,所述样本数据集还包括:各样本人脸图像的参考关键点集合,训练单元102具体用于:对所述样本数据集进行预处理得到多个训练数据集,每个训练数据集包含多个预处理后的样本人脸图像;采用所述人脸对齐算法和所述多个训练数据集进行迭代训练,得到第一人脸对齐模型;调用所述第一人脸对齐模型对所述样本数据集进行人脸对齐检测,得到所述样本数据集中各样本人脸图像的检测关键点集合;根据所述参考关键点集合与所述检测关键点集合之间的差异,从所 述样本数据集中筛选出所述困难样本人脸图像;采用所述困难样本人脸图像对所述第一人脸对齐模型进行优化,得到所述目标人脸对齐模型。
再一个实施例中,所述多个训练数据集包括第一训练数据集,所述第一训练数据集为所述多个训练数据集中的任一个;训练单元102可具体用于:获取第一扩增参数,并按照所述第一扩增参数对所述样本数据集进行扩增处理,得到第一扩增数据集;所述第一扩增数据集包含多个扩增处理后的样本人脸图像;将所述样本数据集与所述第一扩增数据集进行合并;对合并后的数据集进行归一化处理,得到第一训练数据集。
再一个实施例中,所述多个训练数据集包括第二训练数据集和第三训练数据集,在迭代训练时所述第二训练数据集先于所述第三训练数据集被选用;训练单元102可具体用于:采用所述人脸对齐算法和所述第一训练数据集进行训练,得到初始人脸对齐模型;按照分级拟合规则,设置所述初始人脸对齐模型的损失函数;按照减少所述损失函数的值的原则,依次选用所述第二训练数据集及所述第三训练数据集对所述初始人脸对齐模型进行训练,得到所述第一人脸对齐模型。
再一个实施例中,所述第二训练数据集所对应的扩增参数大于所述第三训练数据集所对应的扩增参数;所述扩增参数包括以下至少一种:位移参数、旋转角度参数和压缩比例参数。
再一个实施例中,每个样本人脸图像的参考关键点集合包含多个参考关键点及各参考关键点的标注信息,训练单元102具体用于:根据所述多个参考关键点及各参考关键点的标注信息,确定用于表示各样本人脸图像的多个特征区域;根据各特征区域的检测难度,为各特征区域设置不同的损失权重;基于至少一个特征区域及各特征区域的损失权重,设置所述分级拟合规则,其中,损失权重越大的特征区域,越优先被拟合训练。
再一个实施例中,训练单元102具体用于:选用所述第二训练数据集对初始人脸对齐模型进行训练,得到中间人脸对齐模型;选用所述第三训练数据集对所述中间人脸对齐模型进行训练,得到第一人脸对齐模型。
再一个实施例中,训练单元102具体用于:对所述困难样本人脸图像进行扩增处理;对所述困难样本人脸图像及扩增处理后的困难样本人脸图像进行归一化处理,得到困难训练数据集;采用所述困难训练数据集对所述第一人脸对齐模型进行优化,得到所述目标人脸对齐模型。
再一个实施例中,训练单元102具体用于:针对每个样本人脸图像,统计所述参考关键点集合与所述检测关键点集合之间的差异;从所述样本数据集中筛选出所述差异大于预设阈值的样本人脸图像确定为所述困难样本人脸图像。
再一个实施例中,所述目标关键点集合包括多个目标关键点及各目标关键点的标注信息,训练单元102具体用于:所述根据所述目标关键点集合确定所述目标人脸图像的特征区域包括:根据各目标关键点的标注信息,确定所述目标人脸图像的特征区域。
再一个实施例中,所述标注信息包括特征信息;训练单元102具体用于:根据各目标关键点的特征信息,确定各目标关键点的类别,将同一类别的目标关键点所构成的区域作为一个特征区域,并将该类别作为该特征区域的类别。
再一个实施例中,所述标注信息包括位置信息;训练单元102具体用于:根据所述位置信息确定各个目标关键点的标注位置,将相邻位置的目标关键点连接起来;若连接所得到的形状与人脸的五官中任意一种的形状相似,则将这些相邻位置的目标关键点所构成的区域确定为特征区域,并根据该形状确定特征区域的类别。
本申请实施例可以采用目标人脸对齐模型进行人脸对齐检测,由于该目标人脸对齐模型是通过分级拟合训练得到的,因此该目标人脸对齐模型可较准确地对各特征区域进行关键点检测,提高检测结果的准确性。并且,该目标人脸对齐模型的内存较小,运行速度快,可以提高人脸对齐检测的效率。
基于上述业务处理方法实施例的描述,本申请实施例还提供了一种如图10所示的业务处理装置的结构示意图,该业务处理装置可以执行图6所示的方法。请参见图10,本申请实施例中的业务处理装置可以包括:
获取单元201,用于当检测到需要人脸对齐检测的业务请求时,调用摄像装置获取请求者的目标人脸图像;
检测单元202,用于采用图2或图3所述的人脸检测方法对所述目标人脸图像进行人脸对齐检测,得到所述目标人脸图像的特征区域;
处理单元203,用于根据所述目标人脸图像的特征区域,对所请求业务进行处理以响应所述业务请求。
本申请实施例在获取到目标人脸图像之后,可以采用人脸检测方法进行人脸对齐检测得到目标人脸图像的特征区域,并根据该特征区域对所请求业务进行处理以响应业务请求。由于该人脸检测方法所采用的目标人脸对齐模型是通过分级拟合训练得到的,因此可较准确地对各特征区域进行关键点检测,从而提高业务处理结果的准确性。
基于上述方法实施例以及装置实施例的描述,本申请实施例还提供一种终端。请参见图11,所述终端内部结构至少包括处理器301、输入设备302、输出设备303以及存储器304。其中,终端内的处理器301、 输入设备302、输出设备303以及存储器304可通过总线或其他方式连接,在本申请实施例所示图11中以通过总线305连接为例。其中,所述存储器304可用于存储计算机程序,所述计算机程序包括第一程序指令和/或第二程序指令,所述处理器301用于执行存储器304存储的第一程序指令,以实现图2或图3所示的人脸检测方法。在一个实施例中,所述处理器301还可用于执行存储器304存储的第二程序指令,以实现图6所示的业务处理方法。
在一个实施例中,该处理器301可以是中央处理单元(Central Processing Unit,CPU),该处理器还可以是其他通用处理器,即微处理器或者任何常规的处理器。该存储器304可以包括只读存储器和随机存取存储器,并向处理器301提供指令和数据。因此,在此对于处理器301和存储器304不作限定。
本申请实施例还提供了一种计算机存储介质(Memory),所述计算机存储介质是终端中的记忆设备,用于存放程序和数据。可以理解的是,此处的计算机存储介质既可以包括终端中的内置存储介质,当然也可以包括终端所支持的扩展存储介质。计算机存储介质提供存储空间,该存储空间存储了终端的操作系统。并且,在该存储空间中还存放了适于被处理器301加载并执行的计算机程序指令,这些指令可以是一个或一个以上的计算机程序(包括程序代码)。需要说明的是,此处的计算机存储介质可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器;可选的还可以是至少一个位于远离前述处理器的计算机存储介质。
在一个实施例中,可由处理器301加载并执行计算机存储介质中存放的第一计算机程序指令,以实现上述有关人脸检测实施例中的方法的相应步骤;具体实现中,计算机存储介质中的第一计算机程序指令由处 理器301加载并执行如下步骤:
获取待检测的目标人脸图像;
采用人脸对齐算法和样本数据集进行分级拟合训练,得到目标人脸对齐模型;
调用所述目标人脸对齐模型对所述目标人脸图像进行人脸对齐检测,得到所述目标人脸图像的目标关键点集合;
根据所述目标关键点集合确定所述目标人脸图像的特征区域。
再一个实施例中,可由处理器301加载并执行计算机存储介质中存放的第二计算机程序指令,以实现上述有关业务处理实施例中的方法的相应步骤;具体实现中,计算机存储介质中的第二计算机程序指令由处理器301加载并执行如下步骤:
当检测到需要人脸对齐检测的业务请求时,调用摄像装置获取请求者的目标人脸图像;
采用图2或图3中的人脸检测方法对所述目标人脸图像进行人脸对齐检测,得到所述目标人脸图像的特征区域;
根据所述目标人脸图像的特征区域,对所请求业务进行处理以响应所述业务请求。
图12为本申请实施例提供的实施环境的结构示意图。如图12所示,人脸检测系统100包括用户101和终端设备102。其中,终端设备102中包括摄像装置1021、应用程序1022、人脸检测装置1023以及操作按钮1024。其中,应用程序1022具有人脸对齐检测的需求,例如,表情识别应用程序、变脸特效应用程序、智能贴图应用程序、身份验证应用程序。
根据本申请实施例,当终端设备102检测到用户101正在使用基于人脸对齐检测的应用程序1022时,如箭头1031所示,检测该用户101 是否发出了需要人脸对齐检测的业务请求。当检测到业务请求时,终端设备102调用摄像装置1021获取请求者(如用户101,也可以是用户101之外的其他用户)的人脸图像作为目标人脸图像,如箭头1032所示。
然后,人脸检测装置1023采用人脸对齐算法和样本数据集进行分级拟合训练,得到目标人脸对齐模型;调用目标人脸对齐模型对目标人脸图像进行人脸对齐检测,得到目标人脸图像的目标关键点集合,根据所述目标关键点集合确定所述目标人脸图像的特征区域,例如图1b所示的各个特征区域。
本申请实施例在获取到目标人脸图像之后,可以采用人脸检测方法进行人脸对齐检测得到目标人脸图像的特征区域,并根据该特征区域对所请求业务进行处理以响应业务请求。由于该人脸检测方法所采用的目标人脸对齐模型是通过分级拟合训练得到的,因此可较准确地对各特征区域进行关键点检测,从而提高业务处理结果的准确性。
需要说明的是,上述描述的终端和单元的具体工作过程,可以参考前述各个实施例中的相关描述,在此不再赘述。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
以上所揭露的仅为本申请的部分实施例而已,当然不能以此来限定本申请之权利范围,本领域普通技术人员可以理解实现上述实施例的全部或部分流程,并依本申请权利要求所作的等同变化,仍属于发明所涵盖的范围。

Claims (20)

  1. 一种人脸检测方法,由终端设备执行,包括:
    获取待检测的目标人脸图像;
    采用人脸对齐算法和样本数据集进行分级拟合训练,得到目标人脸对齐模型;
    调用所述目标人脸对齐模型对所述目标人脸图像进行人脸对齐检测,得到所述目标人脸图像的目标关键点集合;
    根据所述目标关键点集合确定所述目标人脸图像的特征区域。
  2. 如权利要求1所述的方法,其中,所述获取待检测的目标人脸图像之前,还包括:
    获取所述样本数据集,所述样本数据集包括多个样本人脸图像及各样本人脸图像的参考关键点集合,每个样本人脸图像的参考关键点集合包含多个参考关键点及各参考关键点的标注信息;
    根据所述多个参考关键点及各参考关键点的标注信息,确定用于表示各样本人脸图像的多个特征区域;
    所述采用人脸对齐算法和样本数据集进行分级拟合训练,得到目标人脸对齐模型包括:
    根据各样本人脸图像的特征区域的损失权重,确定各特征区域的训练优先级;
    采用所述人脸对齐算法,根据所述训练优先级对各样本人脸图像的特征区域进行拟合训练。
  3. 如权利要求2所述的方法,其中,所述特征区域包括以下任一个:眉毛区域、眼睛区域、鼻子区域、嘴部区域、耳朵区域;所述人脸对齐 算法包括机器学习回归算法或卷积神经网络算法。
  4. 如权利要求1所述的方法,其中,所述获取待检测的目标人脸图像包括:
    当检测到用户正在使用基于人脸对齐检测的应用程序时,检测需要人脸对齐检测的业务请求;
    当检测到业务请求时,调用所述终端设备的摄像装置,获取请求者的人脸图像作为所述目标人脸图像。
  5. 如权利要求1所述的方法,其中,所述样本数据集包括多个样本人脸图像,所述采用人脸对齐算法和样本数据集进行分级拟合训练,得到目标人脸对齐模型包括:
    根据所述人脸对齐算法和所述样本数据集进行迭代训练;
    从所述样本数据集中筛选得到困难样本人脸图像;
    根据所述困难样本人脸图像对迭代训练的结果进行优化,得到所述目标人脸对齐模型。
  6. 如权利要求5所述的方法,其中,所述根据所述人脸对齐算法和所述样本数据集进行迭代训练包括:
    对所述样本数据集进行预处理得到多个训练数据集,每个训练数据集包含多个预处理后的样本人脸图像;
    采用所述人脸对齐算法和所述多个训练数据集进行迭代训练,得到第一人脸对齐模型;
    其中,所述样本数据集还包括:各样本人脸图像的参考关键点集合,所述从所述样本数据集中筛选得到困难样本人脸图像包括:
    调用所述第一人脸对齐模型对所述样本数据集进行人脸对齐检测,得到所述样本数据集中各样本人脸图像的检测关键点集合;
    根据所述参考关键点集合与所述检测关键点集合之间的差异,从所述样本数据集中筛选出所述困难样本人脸图像;
    所述根据所述困难样本人脸图像对迭代训练的结果进行优化,得到所述目标人脸对齐模型包括:
    采用所述困难样本人脸图像对所述第一人脸对齐模型进行优化,得到所述目标人脸对齐模型。
  7. 如权利要求6所述的方法,其中,所述多个训练数据集包括第一训练数据集,所述第一训练数据集为所述多个训练数据集中的任一个;
    所述对所述样本数据集进行预处理得到多个训练数据集,包括:
    获取第一扩增参数,并按照所述第一扩增参数对所述样本数据集进行扩增处理,得到第一扩增数据集;所述第一扩增数据集包含多个扩增处理后的样本人脸图像;
    将所述样本数据集与所述第一扩增数据集进行合并;
    对合并后的数据集进行归一化处理,得到第一训练数据集。
  8. 如权利要求7所述的方法,其中,所述多个训练数据集包括第二训练数据集和第三训练数据集,在迭代训练时所述第二训练数据集先于所述第三训练数据集被选用;
    所述采用所述人脸对齐算法和所述多个训练数据集进行迭代训练,得到第一人脸对齐模型,包括:
    采用所述人脸对齐算法和所述第一训练数据集进行训练,得到初始人脸对齐模型;
    按照分级拟合规则,设置所述初始人脸对齐模型的损失函数;
    按照减少所述损失函数的值的原则,依次选用所述第二训练数据集及所述第三训练数据集对所述初始人脸对齐模型进行训练,得到所述第一人脸对齐模型。
  9. 如权利要求8所述的方法,其中,所述第二训练数据集所对应的扩增参数大于所述第三训练数据集所对应的扩增参数;
    所述扩增参数包括以下至少一种:位移参数、旋转角度参数和压缩比例参数。
  10. 如权利要求8所述的方法,其中,每个样本人脸图像的参考关键点集合包含多个参考关键点及各参考关键点的标注信息;
    所述方法还包括:
    根据所述多个参考关键点及各参考关键点的标注信息,确定用于表示各样本人脸图像的多个特征区域;
    根据各特征区域的检测难度,为各特征区域设置不同的损失权重;
    基于至少一个特征区域及各特征区域的损失权重,设置所述分级拟合规则,其中,损失权重越大的特征区域,越优先被拟合训练。
  11. 如权利要求8所述的方法,其中,所述按照减少所述损失函数的值的原则,依次选用所述第二训练数据集及所述第三训练数据集对所述初始人脸对齐模型进行训练,得到所述第一人脸对齐模型包括:
    选用所述第二训练数据集对初始人脸对齐模型进行训练,得到中间人脸对齐模型;
    选用所述第三训练数据集对所述中间人脸对齐模型进行训练,得到 第一人脸对齐模型。
  12. 如权利要求6所述的方法,其中,所述采用所述困难样本人脸图像对所述第一人脸对齐模型进行优化,得到所述目标人脸对齐模型,包括:
    对所述困难样本人脸图像进行扩增处理;
    对所述困难样本人脸图像及扩增处理后的困难样本人脸图像进行归一化处理,得到困难训练数据集;
    采用所述困难训练数据集对所述第一人脸对齐模型进行优化,得到所述目标人脸对齐模型。
  13. 如权利要求6-12任一项所述的方法,其中,所述根据所述参考关键点集合与所述检测关键点集合之间的差异,从所述样本数据集中筛选出所述困难样本人脸图像,包括:
    针对每个样本人脸图像,统计所述参考关键点集合与所述检测关键点集合之间的差异;
    从所述样本数据集中筛选出所述差异大于预设阈值的样本人脸图像确定为所述困难样本人脸图像。
  14. 如权利要求1所述的方法,其中,所述目标关键点集合包括多个目标关键点及各目标关键点的标注信息,
    所述根据所述目标关键点集合确定所述目标人脸图像的特征区域包括:
    根据各目标关键点的标注信息,确定所述目标人脸图像的特征区域。
  15. 如权利要求14所述的方法,其中,所述标注信息包括特征信息;
    所述根据各目标关键点的标注信息,确定所述目标人脸图像的特征区域包括:
    根据各目标关键点的特征信息,确定各目标关键点的类别,将同一类别的目标关键点所构成的区域作为一个特征区域,并将该类别作为该特征区域的类别。
  16. 如权利要求14所述的方法,其中,所述标注信息包括位置信息;
    所述根据各目标关键点的标注信息,确定所述目标人脸图像的特征区域包括:
    根据所述位置信息确定各个目标关键点的标注位置,将相邻位置的目标关键点连接起来;
    若连接所得到的形状与人脸的五官中任意一种的形状相似,则将这些相邻位置的目标关键点所构成的区域确定为特征区域,并根据该形状确定特征区域的类别。
  17. 一种业务处理方法,由终端设备执行,包括:
    当检测到需要人脸对齐检测的业务请求时,调用所述终端设备的摄像装置,获取请求者的目标人脸图像;
    采用如权利要求1-16任一项所述的人脸检测方法对所述目标人脸图像进行人脸对齐检测,得到所述目标人脸图像的特征区域;
    根据所述目标人脸图像的特征区域,对所请求业务进行处理以响应所述业务请求。
  18. 一种人脸检测装置,包括:
    获取单元,用于获取待检测的目标人脸图像;
    训练单元,用于采用人脸对齐算法和样本数据集进行分级拟合训练,得到目标人脸对齐模型;
    检测单元,用于调用所述目标人脸对齐模型对所述目标人脸图像进行人脸对齐检测,得到所述目标人脸图像的目标关键点集合;
    确定单元,用于根据所述目标关键点集合确定所述目标人脸图像的特征区域。
  19. 一种终端设备,包括处理器、输入设备、输出设备和存储器,所述处理器、输入设备、输出设备和存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括第一程序指令,所述处理器被配置用于调用所述第一程序指令,执行如权利要求1-16任一项所述的人脸检测方法;或者,所述计算机程序包括第二程序指令,所述处理器被配置用于调用所述第二程序指令,执行如权利要求17所述的业务处理方法。
  20. 一种计算机存储介质,所述计算机存储介质存储有第一计算机程序指令,所述第一计算机程序指令适于由处理器加载并执行如权利要求1-16任一项所述的人脸检测方法;或者,所述计算机存储介质存储有第二计算机程序指令,所述第二计算机程序指令适于由处理器加载并执行如权利要求17所述的业务处理方法。
PCT/CN2019/108145 2018-09-30 2019-09-26 人脸检测方法及装置、业务处理方法、终端设备及存储介质 Ceased WO2020063744A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP19864907.1A EP3754541A4 (en) 2018-09-30 2019-09-26 FACE DETECTION METHOD AND DEVICE, SERVICE TREATMENT PROCESS, TERMINAL DEVICE AND INFORMATION CARRIER
US17/032,370 US11256905B2 (en) 2018-09-30 2020-09-25 Face detection method and apparatus, service processing method, terminal device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811165758.5 2018-09-30
CN201811165758.5A CN109359575B (zh) 2018-09-30 2018-09-30 人脸检测方法、业务处理方法、装置、终端及介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/032,370 Continuation US11256905B2 (en) 2018-09-30 2020-09-25 Face detection method and apparatus, service processing method, terminal device, and storage medium

Publications (1)

Publication Number Publication Date
WO2020063744A1 true WO2020063744A1 (zh) 2020-04-02

Family

ID=65348338

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/108145 Ceased WO2020063744A1 (zh) 2018-09-30 2019-09-26 人脸检测方法及装置、业务处理方法、终端设备及存储介质

Country Status (4)

Country Link
US (1) US11256905B2 (zh)
EP (1) EP3754541A4 (zh)
CN (1) CN109359575B (zh)
WO (1) WO2020063744A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112633203A (zh) * 2020-12-29 2021-04-09 上海商汤智能科技有限公司 关键点检测方法及装置、电子设备和存储介质
CN113486688A (zh) * 2020-05-27 2021-10-08 海信集团有限公司 一种人脸识别方法及智能设备
CN114360011A (zh) * 2021-12-27 2022-04-15 中国电信股份有限公司 一种图像识别方法、装置、设备及介质
CN115457097A (zh) * 2022-08-22 2022-12-09 杭州欣禾圣世科技有限公司 基于生成图像的人脸重建方法、系统、装置及存储介质
CN116631019A (zh) * 2022-03-24 2023-08-22 清华大学 基于面部图像的口罩适合性检测方法及装置

Families Citing this family (53)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359575B (zh) 2018-09-30 2022-05-10 腾讯科技(深圳)有限公司 人脸检测方法、业务处理方法、装置、终端及介质
CN109919093A (zh) * 2019-03-07 2019-06-21 苏州科达科技股份有限公司 一种人脸识别方法、装置、设备及可读存储介质
CN109978063B (zh) * 2019-03-28 2021-03-02 厦门美图之家科技有限公司 一种生成目标对象的对齐模型的方法
CN111797656B (zh) * 2019-04-09 2023-08-22 Oppo广东移动通信有限公司 人脸关键点检测方法、装置、存储介质及电子设备
CN110059637B (zh) * 2019-04-22 2021-03-30 上海云从企业发展有限公司 一种人脸对齐的检测方法及装置
CN110363175A (zh) * 2019-07-23 2019-10-22 厦门美图之家科技有限公司 图像处理方法、装置及电子设备
WO2021036726A1 (en) * 2019-08-29 2021-03-04 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method, system, and computer-readable medium for using face alignment model based on multi-task convolutional neural network-obtained data
CN110555426A (zh) * 2019-09-11 2019-12-10 北京儒博科技有限公司 视线检测方法、装置、设备及存储介质
CN112560555A (zh) * 2019-09-25 2021-03-26 北京中关村科金技术有限公司 扩充关键点的方法、装置以及存储介质
CN110852257B (zh) * 2019-11-08 2023-02-10 深圳数联天下智能科技有限公司 一种人脸关键点的检测方法、装置及存储介质
CN111062478A (zh) * 2019-12-18 2020-04-24 天地伟业技术有限公司 基于神经网络的特征压缩算法
CN111191571A (zh) * 2019-12-26 2020-05-22 新绎健康科技有限公司 一种基于人脸特征点检测的中医面诊脸部分区方法和系统
CN111127668B (zh) * 2019-12-26 2023-08-22 网易(杭州)网络有限公司 一种角色模型生成方法、装置、电子设备和存储介质
CN111275080B (zh) * 2020-01-14 2021-01-08 腾讯科技(深圳)有限公司 基于人工智能的图像分类模型训练方法、分类方法及装置
CN111325117B (zh) * 2020-02-05 2024-01-26 北京字节跳动网络技术有限公司 目标对象识别模型的训练方法、装置和电子设备
CN111325851B (zh) * 2020-02-28 2023-05-05 腾讯科技(深圳)有限公司 图像处理方法及装置、电子设备和计算机可读存储介质
CN111709288B (zh) * 2020-05-15 2022-03-01 北京百度网讯科技有限公司 人脸关键点检测方法、装置以及电子设备
CN111626246B (zh) * 2020-06-01 2022-07-15 浙江中正智能科技有限公司 口罩遮挡下的人脸对齐方法
CN113869322A (zh) * 2020-06-30 2021-12-31 索尼公司 图像特征提取方法和设备
CN113971822A (zh) * 2020-07-22 2022-01-25 武汉Tcl集团工业研究院有限公司 一种人脸检测方法、智能终端及存储介质
CN111860380B (zh) * 2020-07-27 2024-07-23 平安科技(深圳)有限公司 人脸图像生成方法、装置、服务器及存储介质
CN112036253B (zh) * 2020-08-06 2024-05-10 海纳致远数字科技(上海)有限公司 一种基于深度学习的人脸关键点定位方法
CN112101105B (zh) * 2020-08-07 2024-04-09 深圳数联天下智能科技有限公司 人脸关键点检测模型的训练方法、装置以及存储介质
CN114359990B (zh) * 2020-09-30 2025-09-05 阿里巴巴集团控股有限公司 一种模型训练方法、装置、电子设备以及存储介质
CN112633084B (zh) * 2020-12-07 2024-06-11 深圳云天励飞技术股份有限公司 人脸框确定方法、装置、终端设备及存储介质
CN112541484B (zh) * 2020-12-28 2024-03-19 平安银行股份有限公司 人脸抠图方法、系统、电子装置及存储介质
CN112766049A (zh) * 2020-12-29 2021-05-07 清华大学 基于难样本挖掘的大规模人脸识别测试集构建方法及装置
CN112733700B (zh) * 2021-01-05 2024-07-09 风变科技(深圳)有限公司 人脸关键点检测方法、装置、计算机设备和存储介质
CN112613488B (zh) * 2021-01-07 2024-04-05 上海明略人工智能(集团)有限公司 人脸识别方法及装置、存储介质、电子设备
CN112884040B (zh) * 2021-02-19 2024-04-30 北京小米松果电子有限公司 训练样本数据的优化方法、系统、存储介质及电子设备
CN113705297B (zh) * 2021-03-11 2025-07-15 腾讯科技(深圳)有限公司 检测模型的训练方法、装置、计算机设备和存储介质
CN113313660A (zh) * 2021-05-14 2021-08-27 北京市商汤科技开发有限公司 妆容迁移方法、装置、设备和计算机可读存储介质
CN113313125B (zh) * 2021-06-15 2024-09-27 北京百度网讯科技有限公司 图像处理方法和装置、电子设备、计算机可读介质
CN113610115B (zh) * 2021-07-14 2024-04-12 广州敏视数码科技有限公司 一种基于灰度图像的高效人脸对齐方法
CN113469132B (zh) * 2021-07-26 2024-09-06 浙江大华技术股份有限公司 一种违规行为检测方法、装置、电子设备及存储介质
CN113808200B (zh) * 2021-08-03 2023-04-07 嘉洋智慧安全科技(北京)股份有限公司 一种检测目标对象移动速度的方法、装置及电子设备
CN113837017B (zh) * 2021-08-31 2022-11-04 北京新氧科技有限公司 一种化妆进度检测方法、装置、设备及存储介质
CN114049286A (zh) * 2021-10-26 2022-02-15 深圳数联天下智能科技有限公司 人脸衰老图像预测方法、电子设备和存储介质
CN116092143A (zh) * 2021-11-01 2023-05-09 Oppo广东移动通信有限公司 人脸检测方法、装置、电子设备以及存储介质
CN114048853A (zh) * 2021-11-29 2022-02-15 上海阵量智能科技有限公司 神经网络的量化方法、装置、计算机设备及存储介质
CN114332217A (zh) * 2021-11-30 2022-04-12 浪潮(北京)电子信息产业有限公司 一种姿态估计方法、装置、设备及可读存储介质
CN114550230B (zh) * 2021-12-02 2025-12-16 杭州网易智企科技有限公司 图像识别方法、介质、装置和计算设备
CN114495123A (zh) * 2022-01-14 2022-05-13 北京百度网讯科技有限公司 一种光学字符识别模型的优化方法、装置、设备及介质
US20230260184A1 (en) * 2022-02-17 2023-08-17 Zoom Video Communications, Inc. Facial expression identification and retargeting to an avatar
CN114998690B (zh) * 2022-06-22 2024-10-22 武汉纺织大学 一种基于StyleCLIP和3DDFA的文本调控三维人脸生成方法
CN115082990A (zh) * 2022-06-27 2022-09-20 平安银行股份有限公司 人脸的活体检测方法及装置
CN116994306B (zh) * 2022-06-29 2025-09-12 腾讯科技(深圳)有限公司 关键点检测模型训练方法、装置、电子设备及存储介质
CN114881893B (zh) * 2022-07-05 2022-10-21 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及计算机可读存储介质
CN115909435A (zh) * 2022-09-09 2023-04-04 中国平安人寿保险股份有限公司 人脸检测方法、人脸检测装置、电子设备及存储介质
CN115294320B (zh) * 2022-10-08 2022-12-20 平安银行股份有限公司 图像旋转角度的确定方法、装置、电子设备及存储介质
CN116452466B (zh) * 2023-06-14 2023-10-20 荣耀终端有限公司 图像处理方法、装置、设备及计算机可读存储介质
CN116977345A (zh) * 2023-07-27 2023-10-31 Oppo广东移动通信有限公司 用例生成方法、装置、设备、存储介质及程序产品
CN117291979B (zh) * 2023-09-26 2024-04-26 北京鹰之眼智能健康科技有限公司 一种耳洞定位方法、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080037836A1 (en) * 2006-08-09 2008-02-14 Arcsoft, Inc. Method for driving virtual facial expressions by automatically detecting facial expressions of a face image
CN106203395A (zh) * 2016-07-26 2016-12-07 厦门大学 基于多任务深度学习的人脸属性识别方法
CN106407958A (zh) * 2016-10-28 2017-02-15 南京理工大学 基于双层级联的面部特征检测方法
CN107146196A (zh) * 2017-03-20 2017-09-08 深圳市金立通信设备有限公司 一种图像美颜方法及终端
CN108446606A (zh) * 2018-03-01 2018-08-24 苏州纳智天地智能科技有限公司 一种基于加速二进制特征提取的人脸关键点检测方法
CN109359575A (zh) * 2018-09-30 2019-02-19 腾讯科技(深圳)有限公司 人脸检测方法、业务处理方法、装置、终端及介质

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8165354B1 (en) * 2008-03-18 2012-04-24 Google Inc. Face recognition with discriminative face alignment
CN104598936B (zh) * 2015-02-28 2018-07-27 北京畅景立达软件技术有限公司 人脸图像面部关键点的定位方法
CN106295476B (zh) * 2015-05-29 2019-05-17 腾讯科技(深圳)有限公司 人脸关键点定位方法和装置
CN107924452B (zh) * 2015-06-26 2022-07-19 英特尔公司 用于图像中的脸部对准的组合形状回归
CN105574538B (zh) * 2015-12-10 2020-03-17 小米科技有限责任公司 分类模型训练方法及装置
WO2017149315A1 (en) * 2016-03-02 2017-09-08 Holition Limited Locating and augmenting object features in images
CN105912990B (zh) * 2016-04-05 2019-10-08 深圳先进技术研究院 人脸检测的方法及装置
CN107463865B (zh) * 2016-06-02 2020-11-13 北京陌上花科技有限公司 人脸检测模型训练方法、人脸检测方法及装置
CN108073873A (zh) * 2016-11-15 2018-05-25 上海宝信软件股份有限公司 基于高清智能摄像机的人脸检测与识别系统
CN106875422B (zh) * 2017-02-06 2022-02-25 腾讯科技(上海)有限公司 人脸跟踪方法和装置
CN108229278B (zh) * 2017-04-14 2020-11-17 深圳市商汤科技有限公司 人脸图像处理方法、装置和电子设备
CN107358209B (zh) * 2017-07-17 2020-02-28 成都通甲优博科技有限责任公司 人脸检测模型的训练方法、装置及人脸检测方法、装置
CN107886064B (zh) * 2017-11-06 2021-10-22 安徽大学 一种基于卷积神经网络的人脸识别场景适应的方法
CN108121952B (zh) * 2017-12-12 2022-03-08 北京小米移动软件有限公司 人脸关键点定位方法、装置、设备及存储介质
CN108550176A (zh) * 2018-04-19 2018-09-18 咪咕动漫有限公司 图像处理方法、设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080037836A1 (en) * 2006-08-09 2008-02-14 Arcsoft, Inc. Method for driving virtual facial expressions by automatically detecting facial expressions of a face image
CN106203395A (zh) * 2016-07-26 2016-12-07 厦门大学 基于多任务深度学习的人脸属性识别方法
CN106407958A (zh) * 2016-10-28 2017-02-15 南京理工大学 基于双层级联的面部特征检测方法
CN107146196A (zh) * 2017-03-20 2017-09-08 深圳市金立通信设备有限公司 一种图像美颜方法及终端
CN108446606A (zh) * 2018-03-01 2018-08-24 苏州纳智天地智能科技有限公司 一种基于加速二进制特征提取的人脸关键点检测方法
CN109359575A (zh) * 2018-09-30 2019-02-19 腾讯科技(深圳)有限公司 人脸检测方法、业务处理方法、装置、终端及介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3754541A4 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486688A (zh) * 2020-05-27 2021-10-08 海信集团有限公司 一种人脸识别方法及智能设备
CN112633203A (zh) * 2020-12-29 2021-04-09 上海商汤智能科技有限公司 关键点检测方法及装置、电子设备和存储介质
CN114360011A (zh) * 2021-12-27 2022-04-15 中国电信股份有限公司 一种图像识别方法、装置、设备及介质
CN116631019A (zh) * 2022-03-24 2023-08-22 清华大学 基于面部图像的口罩适合性检测方法及装置
CN116631019B (zh) * 2022-03-24 2024-02-27 清华大学 基于面部图像的口罩适合性检测方法及装置
CN115457097A (zh) * 2022-08-22 2022-12-09 杭州欣禾圣世科技有限公司 基于生成图像的人脸重建方法、系统、装置及存储介质

Also Published As

Publication number Publication date
US20210019503A1 (en) 2021-01-21
US11256905B2 (en) 2022-02-22
CN109359575B (zh) 2022-05-10
CN109359575A (zh) 2019-02-19
EP3754541A4 (en) 2021-08-18
EP3754541A1 (en) 2020-12-23

Similar Documents

Publication Publication Date Title
WO2020063744A1 (zh) 人脸检测方法及装置、业务处理方法、终端设备及存储介质
US11295474B2 (en) Gaze point determination method and apparatus, electronic device, and computer storage medium
WO2022134337A1 (zh) 人脸遮挡检测方法、系统、设备及存储介质
WO2019232866A1 (zh) 人眼模型训练方法、人眼识别方法、装置、设备及介质
US20200410074A1 (en) Identity authentication method and apparatus, electronic device, and storage medium
WO2022001509A1 (zh) 图像优化方法、装置、计算机存储介质以及电子设备
WO2019232862A1 (zh) 嘴巴模型训练方法、嘴巴识别方法、装置、设备及介质
CN114155546B (zh) 一种图像矫正方法、装置、电子设备和存储介质
WO2018086543A1 (zh) 活体判别方法、身份认证方法、终端、服务器和存储介质
CN108229330A (zh) 人脸融合识别方法及装置、电子设备和存储介质
WO2017190646A1 (zh) 一种人脸图像处理方法和装置、存储介质
US11120535B2 (en) Image processing method, apparatus, terminal, and storage medium
WO2015067084A1 (zh) 人眼定位方法和装置
WO2022257456A1 (zh) 头发信息识别方法、装置、设备及存储介质
CN111814564A (zh) 基于多光谱图像的活体检测方法、装置、设备和存储介质
CN110287836B (zh) 图像分类方法、装置、计算机设备和存储介质
CN109033935B (zh) 抬头纹检测方法及装置
CN110188630A (zh) 一种人脸识别方法和相机
WO2021197466A1 (zh) 眼球检测方法、装置、设备及存储介质
CN110298326A (zh) 一种图像处理方法及装置、存储介质与终端
WO2022135574A1 (zh) 肤色检测方法、装置、移动终端和存储介质
WO2022063321A1 (zh) 图像处理方法、装置、设备及存储介质
CN113837017B (zh) 一种化妆进度检测方法、装置、设备及存储介质
WO2021218121A1 (zh) 图像处理方法、装置、电子设备及存储介质
WO2022087846A1 (zh) 图像的处理方法、装置、设备以及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19864907

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019864907

Country of ref document: EP

Effective date: 20200915

NENP Non-entry into the national phase

Ref country code: DE