WO2024120096A1 - 关键点检测方法、训练方法、装置、电子设备、计算机可读存储介质及计算机程序产品 - Google Patents

关键点检测方法、训练方法、装置、电子设备、计算机可读存储介质及计算机程序产品 Download PDF

Info

Publication number
WO2024120096A1
WO2024120096A1 PCT/CN2023/129915 CN2023129915W WO2024120096A1 WO 2024120096 A1 WO2024120096 A1 WO 2024120096A1 CN 2023129915 W CN2023129915 W CN 2023129915W WO 2024120096 A1 WO2024120096 A1 WO 2024120096A1
Authority
WO
WIPO (PCT)
Prior art keywords
features
detected
vertex
vertices
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2023/129915
Other languages
English (en)
French (fr)
Inventor
邱炜彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to JP2025519628A priority Critical patent/JP2025534442A/ja
Priority to EP23899677.1A priority patent/EP4567724A4/en
Publication of WO2024120096A1 publication Critical patent/WO2024120096A1/zh
Priority to US18/793,553 priority patent/US20240394918A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three-dimensional [3D] modelling for computer graphics
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three-dimensional [3D] modelling for computer graphics
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/56Particle system, point based geometry or rendering

Definitions

  • the present application relates to the field of artificial intelligence technology, and in particular to a key point detection method, training method, device, electronic device, computer-readable storage medium and computer program product.
  • the key point detection of 3D human face characters is generally divided into two categories.
  • the first category is based on traditional geometric analysis methods
  • the second category is based on deep learning methods.
  • the key point positioning method based on geometric analysis is very dependent on manually set rules and is difficult to apply to head models with different shapes. Therefore, the robustness of this method is poor; while for the second category of methods, the 3D head model is basically rendered into a 2D image first, and then the 2D convolutional neural network is used to extract features and detect the corresponding key points. In this way, the 3D geometric information will inevitably be lost. Based on this, the accuracy of key point detection of 3D human face characters in the related art is low.
  • the embodiments of the present application provide a key point detection method, a three-dimensional network model training method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product, which can improve the accuracy of key point detection through a three-dimensional network model.
  • the present invention provides a key point detection method, which includes:
  • Feature splicing is performed based on the vertex features, the global features, and the local features, key points of the object to be detected are detected, and positions of the key points of the object to be detected on the object to be detected are obtained.
  • the present application provides a key point detection device, the device comprising:
  • An acquisition module configured to obtain a three-dimensional grid for representing the object to be detected, and determine vertices of the three-dimensional grid and connection relationships between vertices;
  • a first feature extraction module is configured to extract features from vertices of the three-dimensional mesh to obtain vertex features of the three-dimensional mesh;
  • a second feature extraction module is configured to perform global feature extraction on the object to be detected based on the vertex features to obtain global features of the object to be detected, and perform local feature extraction on the object to be detected based on the vertex features and the connection relationship between the vertices to obtain local features of the object to be detected;
  • the output module is configured to detect the key points of the object to be detected based on the vertex features, the global features and the local features, and obtain the positions of the key points of the object to be detected on the object to be detected.
  • the embodiment of the present application provides a training method for a three-dimensional network model, wherein the three-dimensional network model includes at least a first feature extraction layer, a second feature extraction layer, a third feature extraction layer, and an output layer, and the method includes:
  • the second feature extraction layer based on the vertex features of the training three-dimensional mesh, global features of the object training samples are extracted to obtain global features of the object training samples, and by using the third feature extraction layer, based on the vertices of the training three-dimensional mesh and the connection relationship between the vertices, local features of the object training samples are extracted to obtain local features of the object training samples;
  • the key points of the object training sample are detected to obtain the positions of the key points of the object training sample on the object training sample;
  • the target three-dimensional network model is used to perform key point detection on the object to be detected to obtain the position of the key point of the object to be detected on the object to be detected.
  • the embodiment of the present application provides a training device for a three-dimensional network model, wherein the three-dimensional network model comprises at least a first feature extraction layer, a second feature extraction layer, a third feature extraction layer, and an output layer, and the device comprises:
  • An acquisition module configured to acquire an object training sample carrying a label, wherein the label is configured to indicate a real position of a key point of the object training sample;
  • An acquisition module configured to acquire a training three-dimensional mesh for representing the object training sample, and determine vertices of the training three-dimensional mesh and connection relationships between vertices;
  • a first feature extraction module is configured to extract features of vertices of the object training sample through the first feature extraction layer to obtain vertex features of the training three-dimensional mesh;
  • a second feature extraction module is configured to perform global feature extraction on the object training sample based on the vertex features of the training three-dimensional mesh through the second feature extraction layer to obtain the global features of the object training sample, and perform local feature extraction on the object training sample based on the vertices of the training three-dimensional mesh and the connection relationship between the vertices through the third feature extraction layer to obtain the local features of the object training sample;
  • an output module configured to detect key points of the object training sample through the output layer based on vertex features of the training three-dimensional mesh, global features of the object training sample, and local features of the object training sample, and obtain positions of the key points of the object training sample on the object training sample;
  • An updating module is configured to obtain the difference between the position of the key point of the object training sample and the label, and train the three-dimensional network model based on the difference to obtain a target three-dimensional network model; wherein the target three-dimensional network model is used to perform key point detection on the object to be detected to obtain the position of the key point of the object to be detected on the object training sample.
  • An embodiment of the present application provides an electronic device, including:
  • a memory configured to store executable instructions
  • the processor is configured to implement the key point detection method provided in the embodiment of the present application when executing the computer executable instructions stored in the memory.
  • An embodiment of the present application provides an electronic device, including:
  • a memory configured to store executable instructions
  • the processor is configured to implement the three-dimensional network model training method provided in the embodiment of the present application when executing the computer executable instructions stored in the memory.
  • An embodiment of the present application provides a computer-readable storage medium, in which computer-executable instructions are stored.
  • the processor will execute the key point detection method provided by the embodiment of the present application.
  • An embodiment of the present application provides a computer-readable storage medium, which stores computer-executable instructions.
  • the processor will execute the training method of the three-dimensional network model provided by the embodiment of the present application.
  • the embodiment of the present application provides a computer program product, which includes a computer program or a computer executable instruction, and the computer program or the computer executable instruction is stored in a computer-readable storage medium.
  • the processor of the electronic device reads the computer program or the computer executable instruction from the computer-readable storage medium, and the processor executes the computer program or the computer executable instruction, so that the electronic device performs the key point detection method provided in the embodiment of the present application.
  • the embodiment of the present application provides a computer program product, which includes a computer program or a computer executable instruction, and the computer program or the computer executable instruction is stored in a computer-readable storage medium.
  • the processor of the electronic device reads the computer program or the computer executable instruction from the computer-readable storage medium, and the processor executes the computer program or the computer executable instruction, so that the electronic device executes the training method of the three-dimensional network model provided in the embodiment of the present application.
  • the three-dimensional mesh corresponding to the object to be detected is obtained, and then the global features and local features of the object to be detected are extracted based on the vertex features obtained from the three-dimensional mesh and the connection relationship between the vertices through the construction of a two-way feature extraction layer, thereby obtaining the position of the key points on the object to be detected based on the vertex features obtained from the three-dimensional mesh, the extracted global features and the local features.
  • more abundant feature information of the object to be detected is extracted through multiple layers of feature extraction layers, and then the key points of the object to be detected are detected based on the abundant feature information, so that the accuracy of three-dimensional key point detection is significantly improved.
  • FIG1 is a schematic diagram of the architecture of a key point detection system 100 provided in an embodiment of the present application.
  • FIG2 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.
  • FIG3 is a schematic diagram of a key point detection method according to an embodiment of the present application.
  • FIG4 is a schematic diagram of a three-dimensional grid of a human head provided in an embodiment of the present application.
  • FIG5 is a schematic diagram of a process for determining local features of each vertex provided in an embodiment of the present application.
  • FIG6 is a schematic diagram of using an attention mechanism to determine the degree of correlation between a reference vertex and other vertices according to an embodiment of the present application
  • FIG7 is a schematic diagram of the positions of key points on an object to be detected provided by an embodiment of the present application.
  • FIG8 is a schematic diagram of the structure of a three-dimensional network model provided in an embodiment of the present application.
  • FIG9 is a schematic diagram of the structure of a third feature extraction layer provided in an embodiment of the present application.
  • FIG10 is a schematic diagram of the structure of a three-dimensional network model provided in an embodiment of the present application.
  • FIG11 is a flow chart of a training process of a three-dimensional network model provided in an embodiment of the present application.
  • FIG12 is a simplified schematic diagram of a three-dimensional mesh surface provided in an embodiment of the present application.
  • FIG13 is a schematic diagram of a three-dimensional mesh densification process provided by an embodiment of the present application.
  • FIG14 is a schematic diagram of a flow chart of a key point detection method provided in an embodiment of the present application.
  • FIG15 is a schematic diagram of the structure of a graph convolutional neural network provided in an embodiment of the present application.
  • FIG. 16 is a comparison diagram of the geodesic distance and the Euclidean distance provided in an embodiment of the present application.
  • first ⁇ second ⁇ third involved are merely used to distinguish similar objects and do not represent a specific ordering of the objects. It can be understood that “first ⁇ second ⁇ third” can be interchanged with a specific order or sequence where permitted, so that the embodiments of the present application described herein can be implemented in an order other than that illustrated or described herein.
  • Three-dimensional mesh refers to a manifold surface with a topological structure, such as a spherical surface divided into a combination of multiple vertices and multiple edges. In this application, it can be a three-dimensional face mesh.
  • the three-dimensional mesh is a graph structure.
  • Client also known as the user end, refers to the program corresponding to the server that provides local services to users. Except for some applications that can only run locally, it is generally installed on an ordinary client and needs to cooperate with the server to run, that is, it requires corresponding servers and service programs in the network to provide corresponding services. In this way, specific communication connections need to be established on the client and server sides to ensure the normal operation of the application.
  • 3D facial key point detection refers to detecting the 3D coordinates of a series of facial key points with preset semantics given any 3D face mesh model. There is no limit on the number of vertices and facets of the 3D face model.
  • the key points with preset semantics refer to the position information including the corners of the eyes, corners of the mouth, tip of the nose, and facial contours. The semantics and number of key points are determined by the specific task.
  • Graph Neural Networks is a type of artificial neural network used to process data that can be represented as graphs. Compared with traditional two-dimensional convolutional neural networks that act on two-dimensional images, graph neural networks expand their objects of action to graph data that can represent three-dimensional grid forms.
  • the key design element of graph neural networks is the use of paired message passing so that graph nodes can be iteratively updated by exchanging information with their neighbors.
  • Three-dimensional heatmap regression refers to the graph neural network using the heatmap as the output layer and forming a regression loss with the standard heatmap.
  • the neural network is trained through forward propagation and gradient backpropagation to fit the output of the neural network with the label, and finally the coordinates of the key points are calculated from the heatmap.
  • a 3D scanner is a scientific instrument used to detect and analyze the shape (geometry) and appearance data (such as color, surface albedo, etc.) of objects or environments in the real world.
  • the collected data is usually used for 3D reconstruction calculations to create digital models of actual objects in the virtual world.
  • These models have a wide range of uses, such as industrial design, defect detection, reverse engineering, robot guidance, topographic measurement, medical information, bioinformatics, criminal identification, etc.
  • Multi-Layer Perceptron is a forward-structured artificial neural network that maps a set of input vectors to a set of output vectors.
  • MLP can be viewed as a directed graph consisting of multiple node layers, each of which is fully connected to the next layer. Except for the input node, each node is a neuron (or processing unit) with a nonlinear activation function.
  • Convolutional Neural Network a feedforward neural network, is generally composed of one or more convolutional layers (network layers that use convolutional mathematical operations) and a fully connected layer at the end.
  • the neurons inside the network can respond to partial areas of the input image and generally have outstanding performance in the field of visual image processing.
  • Machine Learning is a multi-disciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in studying how computers simulate or implement human learning behavior to acquire new knowledge or skills and reorganize existing knowledge structures to continuously improve their performance. Learning is the core of artificial intelligence and the fundamental way to make computers intelligent. Its applications are spread across all areas of artificial intelligence. Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and self-learning.
  • Point cloud data refers to a massive collection of points on the surface features of a target, which is generally obtained through laser measurement or photogrammetry.
  • Point cloud data obtained by laser measurement includes three-dimensional coordinates and laser reflection intensity. This type of point cloud data usually determines the state of an object by echo characteristics and reflection intensity; point cloud data obtained by photogrammetry usually includes three-dimensional coordinates and color information.
  • GAT Graph Attention Network
  • artificial intelligence technology has been studied and applied in many fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, drones, robots, smart medical care, smart customer service, etc. I believe that with the development of technology, artificial intelligence technology will be applied in more fields and play an increasingly important role.
  • the solution provided in the embodiments of the present application involves technologies such as three-dimensional network models of artificial intelligence, and can also be applied to fields such as cloud technology and Internet of Vehicles, which will be specifically explained through the following embodiments.
  • FIG. 1 is a schematic diagram of the architecture of a key point detection system 100 provided in an embodiment of the present application.
  • the application scenario of key point detection can be that when performing key point detection on a face, the face is first three-dimensionally scanned by a three-dimensional scanner, and then the key point positions on the face are detected based on the three-dimensional scanning data
  • a terminal (terminal 400 is shown as an example) is connected to a server 200 via a network 300.
  • the network 300 can be a wide area network or a local area network, or a combination of the two.
  • the terminal 400 is configured for a user to use a client 401, which is displayed on a display interface (display interface 401-1 is shown as an example).
  • the terminal 400 and the server 200 are connected to each other via a wired or wireless network.
  • the server 200 is configured to receive three-dimensional scanning data; based on the three-dimensional scanning data, obtain a three-dimensional grid for representing the object to be detected, and determine the vertices of the three-dimensional grid and the connection relationship between the vertices; perform feature extraction on the vertices of the three-dimensional grid to obtain vertex features of the three-dimensional grid; based on the vertex features, perform global feature extraction on the object to be detected to obtain global features of the object to be detected, and based on the vertex features and the connection relationship between the vertices, perform local feature extraction on the object to be detected to obtain local features of the object to be detected; based on the vertex features, the global features and the local features, detect key points of the object to be detected to obtain the positions of the key points of the object to be detected on the object to be detected; and send the positions of the key points on the object to be detected to the terminal 400;
  • the terminal 400 is also configured to display the positions of key points on the object to be detected based on a display interface.
  • the server 200 may be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (CDN), and big data and artificial intelligence platforms.
  • cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (CDN), and big data and artificial intelligence platforms.
  • the terminal 400 may be a smart phone, a tablet computer, a laptop computer, a desktop computer, a set-top box, an intelligent voice interaction device, a smart home appliance, a vehicle-mounted terminal, an aircraft, and a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, a portable gaming device, a smart speaker, and a smart watch), etc., but is not limited thereto.
  • the terminal device and the server may be directly or indirectly connected via wired or wireless communication, which is not limited in the embodiments of the present application.
  • FIG. 2 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.
  • the electronic device may be the server 200 or the terminal 400 shown in FIG. 1 .
  • the electronic device shown in FIG. 2 includes: at least one processor 410, a memory 450, at least one network interface 420, and a user interface 430.
  • the various components in the terminal 400 are coupled together via a bus system 440.
  • the bus system 440 is configured to achieve connection and communication between these components.
  • the bus system 440 also includes a power bus, a control bus, and a status signal bus.
  • various buses are labeled as bus systems 440 in FIG. 2 .
  • Processor 410 can be an integrated circuit chip with signal processing capabilities, such as a general-purpose processor, a digital signal processor (DSP), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., where the general-purpose processor can be a microprocessor or any conventional processor, etc.
  • DSP digital signal processor
  • the user interface 430 includes one or more output devices 431 that enable presentation of media content, including one or more speakers and/or one or more visual display screens.
  • the user interface 430 also includes one or more input devices 432, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.
  • the memory 450 may be removable, non-removable, or a combination thereof.
  • Exemplary hardware devices include solid-state memory, hard drives, optical drives, etc.
  • the memory 450 may optionally include one or more storage devices that are physically remote from the processor 410.
  • memory 450 can store data to support various operations, examples of which include programs, modules, and data structures, or a subset or superset thereof, as exemplarily described below.
  • a network communication module 452 is configured to reach other electronic devices via one or more (wired or wireless) network interfaces 420, exemplary network interfaces 420 include: Bluetooth, Wireless Compatibility Certification (WiFi), and Universal Serial Bus (USB), etc.;
  • a presentation module 453 configured to enable presentation of information via one or more output devices 431 (e.g., display screen, speaker, etc.) associated with the user interface 430 (e.g., a user interface configured to operate peripheral devices and display content and information);
  • output devices 431 e.g., display screen, speaker, etc.
  • the user interface 430 e.g., a user interface configured to operate peripheral devices and display content and information
  • the input processing module 454 is configured to detect one or more user inputs or interactions from the input device 432 and to translate the detected inputs or interactions.
  • FIG. 2 shows a key point detection device 455 stored in a memory 450, which can be software in the form of a program and a plug-in, including the following software modules: an acquisition module 4551, a first feature extraction module 4552, a second feature extraction module 4553, and an output module 4554.
  • a key point detection device 455 stored in a memory 450, which can be software in the form of a program and a plug-in, including the following software modules: an acquisition module 4551, a first feature extraction module 4552, a second feature extraction module 4553, and an output module 4554.
  • These modules are logical, and therefore can be arbitrarily combined or further split according to the functions implemented. The functions of each module will be described below.
  • the device provided in the embodiments of the present application can be implemented in hardware.
  • the key point detection device provided in the embodiments of the present application can be a processor in the form of a hardware decoding processor, which is programmed to execute the key point detection method provided in the embodiments of the present application.
  • the processor in the form of a hardware decoding processor can adopt one or more application specific integrated circuits (Application Specific Integrated Circuit, ASIC), DSP, programmable logic device (Programmable Logic Device, PLD), complex programmable logic device (Complex Programmable Logic Device, CPLD), field programmable gate array (Field-Programmable Gate Array, FPGA) or other electronic components.
  • the terminal or server can implement the key point detection method provided in the embodiment of the present application by running a computer program.
  • the computer program can be a native program or software module in the operating system; it can be a native application (Application, APP), that is, a program that needs to be installed in the operating system to run, such as an instant messaging APP, a web browser APP; it can also be a small program, that is, a program that can be run only by downloading it to a browser environment; it can also be a small program that can be embedded in any APP.
  • the above-mentioned computer program can be an application, module or plug-in in any form.
  • FIG. 3 is a flow chart of the key point detection method provided in the embodiment of the present application. Below, the steps shown will be described in conjunction with FIG. 3.
  • Step 101 The server obtains a three-dimensional grid for representing the object to be detected, and determines the vertices of the three-dimensional grid and the connection relationship between the vertices.
  • the three-dimensional grid used to characterize the object to be detected can be obtained by directly receiving the three-dimensional grid of the object to be detected sent by other devices, or it can be achieved through the point cloud data (i.e., three-dimensional scanning data) corresponding to the object to be detected.
  • the point cloud data is configured as a massive point set indicating the surface features of the object to be detected, which can generally be obtained by laser measurement or photogrammetry. Specifically, first obtain the point cloud data corresponding to the object to be detected, and then obtain the three-dimensional grid used to characterize the object to be detected based on the point cloud data, that is, construct the three-dimensional grid corresponding to the object to be detected.
  • the point cloud data can be pre-stored locally in the terminal, or obtained from the outside world (such as the Internet), or collected in real time, for example, collected in real time by a three-dimensional scanning device such as a three-dimensional scanner.
  • the process of constructing a three-dimensional grid corresponding to the object to be detected specifically includes scanning the object to be detected by the three-dimensional scanning device to obtain point cloud data of the geometric surface of the object to be detected; and constructing a three-dimensional grid corresponding to the object to be detected based on the point cloud data.
  • FIG4 is a schematic diagram of a three-dimensional grid of a person's head provided in an embodiment of the present application.
  • a three-dimensional scan is performed on the person's head by a three-dimensional scanner to obtain point cloud data corresponding to the head, thereby constructing a three-dimensional grid corresponding to the head based on the point cloud data.
  • the process of constructing a three-dimensional grid corresponding to the object to be detected based on point cloud data can be, first, preprocessing the point cloud data to obtain target point cloud data; wherein, the preprocessing includes filtering, denoising, and point cloud registration and other operations, wherein filtering can remove noise points, denoising may further reduce noise and invalid points, and point cloud registration can align the point cloud data to the same coordinate system; then, the target point cloud data is meshed to obtain a three-dimensional grid, wherein mesh reconstruction is the process of converting discrete target point cloud data into a three-dimensional grid, and commonly used mesh reconstruction algorithms include grid-based methods, voxel-based methods, and implicit function-based methods, wherein, the grid-based method is to convert the target point cloud data into a triangular grid, the voxel-based method is to convert the target point cloud data into a voxel grid, and the implicit function-based method is to use a data function to represent a three-dimensional grid.
  • connection relationship between the vertices of the three-dimensional grid can be a vertex connection relationship matrix, which is used to indicate whether there is an association between the vertices.
  • the size is N*N, and its value is 0 or 1.
  • N is the number of vertices.
  • connection relationship between the vertices of the three-dimensional mesh used to indicate the eye position on the face there is a connection relationship between the vertices of the three-dimensional mesh used to indicate the eye position on the face, while there is no connection relationship between the vertices of the three-dimensional mesh used to indicate the eye position and the vertices of the three-dimensional mesh used to indicate the chin position.
  • Step 102 extract features from the vertices of the three-dimensional mesh to obtain vertex features of the three-dimensional mesh.
  • feature extraction is performed on the vertices of the three-dimensional mesh to obtain the vertex features of the three-dimensional mesh, wherein the vertex features include the positions of the corresponding vertices and the information of the corresponding positions on the face indicated by the corresponding vertices.
  • the vertex features here can be N*(6+X), wherein N represents the number of vertices corresponding to the three-dimensional mesh, 6 represents the dimensions occupied by the vertex coordinates and the normal vectors, i.e., the 6 directional dimensions corresponding to the three coordinate dimensions of the vertex coordinates (x, y, z), and X includes other characteristics of the vertices of the three-dimensional mesh, i.e., the information of the corresponding positions on the face indicated by the corresponding vertices, such as curvature, texture information, etc. It should be noted that these other characteristics can be adjusted according to different data and tasks. Thus, when the present application is applied to the model, during the training phase of the model, the learning efficiency of the model can be accelerated by adding these other characteristics.
  • Step 103 extract global features of the object to be detected based on vertex features to obtain global features of the object to be detected.
  • Features are obtained by extracting local features of the object to be detected based on vertex features and the connection relationship between vertices to obtain local features of the object to be detected.
  • the process of performing global feature extraction on the object to be detected based on vertex features to obtain the global features of the object to be detected may be, first, performing feature extraction on the object to be detected based on vertex features, and performing maximum pooling processing on the extracted features to obtain maximum pooling features, so that all vertices share the maximum pooling features, and using the maximum pooling features as the global features of the object to be detected.
  • local features of the object to be detected are extracted to obtain the local features of the object to be detected.
  • the process may be to determine the local features of each vertex based on the vertex features and the connection relationship between vertices; and determine the local features of the object to be detected based on the local features of each vertex.
  • the global features here are used to indicate the overall features of the object to be detected, such as the color features, texture features and shape features of the object to be detected, while the local features are used to indicate the detailed features of the object to be detected, that is, the features extracted from the local area of the object to be detected, such as the features extracted from the edges, corners, points, lines, curves and special attribute areas of the object to be detected.
  • the global features can be the size, shape and position of the facial features
  • the local features can be the distribution of facial muscles and the shape changes of the facial features under different expressions.
  • the global features are low-level visual features at the pixel level, the global features have the characteristics of good invariance, simple calculation and intuitive representation, but are not suitable for the case of object aliasing and occlusion, while the local image features have the characteristics of rich quantity contained in the image and small correlation between features.
  • the disappearance of some features will not affect the detection and matching of other features.
  • extracting the global features and local features of the object to be detected more abundant and accurate features of the object to be detected are obtained, thereby improving the accuracy of the key point detection results.
  • FIG. 5 is a schematic diagram of the flow chart of determining the local features of each vertex provided by an embodiment of the present application. Based on FIG. 5, the process of determining the local features of each vertex based on the vertex features and the connection relationship between the vertices is implemented by steps 1031 to 1033. In combination with FIG. 5, the following processing is performed for each vertex:
  • Step 1031 determine the vertex as a reference vertex, and determine the vertex features of the reference vertex and the vertex features of other vertices based on the vertex features of each vertex in the three-dimensional mesh; wherein the other vertex is any vertex except the reference vertex.
  • the number of vertices in the three-dimensional network is N
  • the feature of each vertex is h
  • vertex i is taken as the reference node
  • h i is a vector of size F, that is, the feature of reference node i
  • vertex j is another vertex
  • h j is a vector of size F, that is, the feature of other node j
  • Step 1032 based on the vertex feature of the reference vertex, the vertex features of other vertices, and the connection relationship between the vertices, determine the correlation value between the reference vertex and other vertices; wherein the correlation value is used to indicate the degree of correlation between the reference vertex and other vertices.
  • W is a weight matrix of size F ⁇ F
  • hi is the vertex feature of reference vertex i
  • hj is the vertex feature of other vertex j.
  • the vertex features of , attention indicates the use of attention mechanism processing
  • e ij indicates the correlation between the reference vertex and other vertices.
  • the process of determining the correlation value between a reference vertex and other vertices based on vertex features of the reference vertex, vertex features of other vertices, and connection relationships between vertices may be as follows: determining the connected reference vertex and other vertices based on the connection relationships between the vertices; performing similarity matching on the reference vertex and corresponding other vertices based on the vertex features of the connected reference vertex and vertex features of other vertices to obtain the similarity between the reference vertex and the corresponding other vertices (wherein a corresponding similarity is obtained for each of the other vertices); and determining the similarity as the degree of correlation between the reference vertex and the corresponding other vertices.
  • the correlation degree is normalized to obtain the correlation value between the reference vertex and other vertices, that is,
  • Softmax j indicates the use of normalization processing
  • ⁇ ij indicates the correlation value between nodes i and j
  • exp indicates an exponential function with the natural constant e as the base
  • Ni indicates the domain composed of all other nodes that have a connection relationship with the reference node i
  • q represents any vertex in the domain.
  • Figure 6 is a schematic diagram of using the attention mechanism to determine the correlation degree between a reference vertex and other vertices provided in an embodiment of the present application.
  • ⁇ ij indicated by 601 indicates the correlation value between nodes i and j
  • Wh i in the dotted box 602 indicates the vertex feature corresponding to the reference vertex i
  • Wh j in the dotted box 603 indicates the vertex feature corresponding to other vertex j.
  • a is a weight vector.
  • the correlation degree is subjected to Softmax j processing, i.e., normalization processing, to obtain the correlation value between the reference vertex and other vertices.
  • the attention mechanism is used here to determine the correlation between the reference vertex and other vertices. Specifically, the features Wh i and Wh j of vertices i and j are concatenated, and then the inner product is calculated with a weight vector a of dimension 2F, so as to obtain the correlation value between the reference vertex and other vertices through the activation function, that is,
  • Ni indicates the domain composed of all other nodes that are connected to the reference node i
  • q represents any vertex in the domain
  • Whj indicates the concatenated feature obtained by concatenating the features Whi and Whj of vertices i and j
  • exp indicates the exponential function with the natural constant e as the base
  • LeakyReLU is the nonlinear activation function
  • a is a weight vector of size 2F.
  • the degree of correlation between the reference vertex and other vertices can also be directly calculated based on the vertex features of the reference vertex, the vertex features of other vertices, and the connection relationship between the vertices; among them, there are many methods for calculating the degree of correlation, such as the Pearson correlation coefficient (Pearson), the Spearman's rank correlation coefficient (Spearman's rank correlation coefficient), etc.
  • Step 1033 Determine the local features of the reference vertex based on the correlation value and vertex features of other vertices.
  • is the activation function
  • ⁇ ij is the correlation value between the reference vertex i and other vertices j
  • Wh j indicates the vertex features corresponding to other vertices j
  • h i ⁇ is the local feature corresponding to the reference vertex.
  • the process of determining the local features of the reference vertex based on the correlation value and the vertex features of the other vertices may be, for each other vertex, comparing the correlation value with the vertex features of the corresponding other vertex.
  • the point feature is processed for quadrature to obtain the quadrature results of other vertices; the quadrature results of other vertices are accumulated and summed to obtain the summation result; based on the summation result, the local feature corresponding to the reference vertex is determined, that is,
  • is the activation function
  • ⁇ ij is the correlation value between the reference vertex i and other vertices j
  • Wh j indicates the vertex features corresponding to other vertices j
  • Ni indicates the domain composed of all other nodes that have a connection relationship with the reference node i.
  • the process of determining the local features of the object to be detected based on the local features of each vertex is, specifically, based on the local features of each vertex, performing feature fusion on the local features of each vertex to obtain fused features; and using the fused features as the local features of the object to be detected.
  • Step 104 based on vertex features, global features and local features, key points of the object to be detected are detected to obtain positions of the key points of the object to be detected on the object to be detected.
  • the key points of the object to be detected are detected based on vertex features, global features, and local features to obtain the positions of the key points of the object to be detected on the object to be detected.
  • the process may be to perform feature splicing on the vertex features, global features, and local features to obtain the splicing features of the object to be detected; based on the splicing features, the key points of the object to be detected are detected to obtain the positions of the key points of the object to be detected on the object to be detected.
  • the splicing features contain feature information of the vertex features, global features, and local features of the object to be detected
  • the key points of the object to be detected are detected based on the splicing features, thereby combining the feature information of the vertex features, global features, and local features. That is, the key points of the object to be detected are detected through richer feature information, thereby improving the accuracy of the key point detection results.
  • the three-dimensional heat map in the present application is a statistical chart that displays multiple data by coloring color blocks, that is, each data is displayed according to a specified color mapping rule, such as larger values are represented by dark colors and smaller values by light colors; or larger values are represented by warm tones and smaller values by cold tones, etc. In this way, by outputting a three-dimensional heat map, the possibility of the key point belonging to each vertex is displayed at the same time, so as to better ensure the local accuracy of the detection results.
  • Figure 7 is a schematic diagram of the positions of key points on the object to be detected provided in an embodiment of the present application.
  • the black points in Figure 7 are key points.
  • the positions of the key points shown in Figure 7 can be the positions of the facial features in the human face
  • the black points in the dotted box 701 are key points indicating the position of the forehead in the human face
  • the black points in the dotted boxes 702 and 703 are key points indicating the position of the eyes in the human face
  • the black points indicated by 704 and 705 are key points indicating the position of the ears in the human face
  • the black points in the dotted box 706 are key points indicating the position of the nose in the human face
  • the black points in the dotted box 707 are key points indicating the position of the mouth in the human face
  • the black points indicated by 708 and 709 are key points indicating the position of the cheeks in the human face
  • the black points in the dotted box 710 are key points.
  • the positions of the facial features of the object to be detected are detected, and the probability of the key points at each vertex in the three-dimensional grid is obtained, that is, the probability that each vertex in the three-dimensional grid is the key point corresponding to the position of each facial feature, so that based on each probability, a three-dimensional heat map of the corresponding three-dimensional grid is generated, and then based on the three-dimensional heat map, the positions of the key points of the object to be detected on the object to be detected are determined, that is, for each key point corresponding to the position of the facial feature, the vertex with the largest probability is selected from multiple probabilities and determined as the corresponding key point, so as to determine the positions of the facial features based on the obtained key points.
  • the key point detection method here can also be applied to a three-dimensional network model, which includes at least a first feature extraction layer, a second feature extraction layer, a third feature extraction layer and an output layer, referring to Figure 8, which is a structural schematic diagram of a three-dimensional network model provided in an embodiment of the present application.
  • the process of extracting features from the vertices of the three-dimensional mesh to obtain vertex features of the three-dimensional mesh may be, through the first feature extraction layer, extracting features from the vertices of the three-dimensional mesh to obtain vertex features of the three-dimensional mesh; based on the vertex features, extracting global features from the object to be detected to obtain global features of the object to be detected, and based on the vertex features and the connection relationship between the vertices, extracting local features from the object to be detected to obtain local features of the object to be detected.
  • the process may be, through the second feature extraction layer, extracting global features from the object to be detected based on the vertex features to obtain global features of the object to be detected, and through the third feature extraction layer, extracting local features from the object to be detected to obtain local features of the object to be detected.
  • the process of extracting local features of the object to be detected based on vertex features and the connection relationship between vertices to obtain the local features of the object to be detected; detecting the key points of the object to be detected based on vertex features, global features and local features to obtain the positions of the key points of the object to be detected on the object to be detected can be performed through the output layer, combining vertex features, global features and local features to detect the key points of the object to be detected to obtain the positions of the key points of the object to be detected on the object to be detected.
  • the position of the key point on the object to be detected is detected through the three-dimensional network model, thereby improving the accuracy of the detected position.
  • the third feature extraction layer here may include at least two third feature extraction sublayers and a feature stitching sublayer.
  • Figure 9 is a structural schematic diagram of the third feature extraction layer provided in an embodiment of the present application.
  • the process of determining the local features of each vertex based on vertex features and the connection relationship between vertices through the third feature extraction layer may be that, through each third feature extraction sublayer, the following processing is performed on each vertex: the vertex is determined as a reference vertex, and based on the vertex features of each vertex in the three-dimensional mesh, the vertex features of the reference vertex and the vertex features of other vertices are determined; based on the vertex features of the reference vertex, the vertex features of other vertices, and the connection relationship between vertices, the correlation value between the reference vertex and other vertices is determined; based on the correlation value and the vertex features of other vertices, the local sub-features of the reference vertex are determined
  • k is the number of layers of the third feature extraction sublayer
  • Ni indicates the domain composed of all other nodes that are connected to the reference node i
  • is the activation function
  • ⁇ ij is the correlation value between the reference vertex i and other vertices j
  • Whj indicates the vertex features corresponding to other vertices j
  • concat indicates the use of splicing processing.
  • the process of determining the correlation value between the reference vertex and the other vertices based on the vertex features of the reference vertex, the vertex features of other vertices, and the connection relationship between the vertices is the same as the aforementioned process.
  • the process of determining the local sub-features of the reference vertex based on the correlation value and the vertex features of other vertices is also the same as the aforementioned process of determining the local features of the reference vertex based on the correlation value and the vertex features of other vertices, which will not be elaborated on here.
  • the three-dimensional network model also includes a first feature splicing layer, a second feature splicing layer, and a fourth feature extraction layer.
  • Figure 10 is a structural schematic diagram of the three-dimensional network model provided in an embodiment of the present application. Based on Figure 10, the key points of the object to be detected are detected through the output layer in combination with vertex features, global features and local features to obtain the positions of the key points on the object to be detected.
  • the process can be as follows: through the first feature splicing layer, vertex features, global features and local features are feature spliced to obtain the splicing features of the object to be detected; through the fourth feature extraction layer, local features are extracted from the object to be detected based on the splicing features to obtain the target local features of the object to be detected; through the second feature splicing layer, the splicing features, global features and target local features are feature spliced to obtain the target splicing features of the object to be detected; through the output layer, based on the target splicing features, the key points of the object to be detected are detected to obtain the positions of the key points of the object to be detected on the object to be detected.
  • the three-dimensional network model may also include a fifth feature extraction layer and a third feature splicing layer, so that through the fifth feature extraction layer, based on the target splicing feature, the local feature of the object to be detected is extracted to obtain the second target local feature, and then through the third feature splicing layer, the target splicing feature, the second target local feature and the global feature are spliced to obtain the second target splicing feature, and finally through the output layer, based on the second target splicing feature, the key points of the object to be detected are detected to obtain the position of the key points of the object to be detected on the object to be detected.
  • the number of feature extraction layers and feature splicing layers in the three-dimensional network model can be multiple, and the process of obtaining the final splicing features through multiple feature extraction layers and feature splicing layers is the same as described above, and this embodiment of the present application will not be repeated.
  • the fourth feature extraction layer and the fifth feature extraction layer have the same layer structure as the third feature extraction layer, and the feature processing process is also the same; while the second feature splicing layer and the third feature splicing layer have the same layer structure as the feature splicing layer, and the feature processing process is also the same.
  • the splicing features are further processed to obtain More accurate target local features, based on the second feature stitching layer, the stitching features, the global features and the obtained target local features are feature stitched, and the key points of the object to be detected are detected based on the target stitching features obtained by feature stitching; correspondingly, through the fifth feature extraction layer, the target stitching features are further feature processed to obtain more accurate second target local features, and based on the third feature stitching layer, the target stitching features, the global features and the obtained second target local features are feature stitched, and the key points of the object to be detected are detected based on the second target stitching features obtained by feature stitching.
  • the three-dimensional network model before detecting the key points of the object to be detected based on the three-dimensional network model, the three-dimensional network model needs to be trained, so that the key points of the object to be detected are detected based on the trained three-dimensional network model.
  • Figure 11 is a flow chart of the training process of the three-dimensional network model provided in an embodiment of the present application. Based on Figure 11, the training process of the three-dimensional network model can be implemented through the following steps.
  • Step 201 The server obtains an object training sample carrying a label, where the label is used to indicate the actual position of a key point of the object training sample.
  • Step 202 obtain a training three-dimensional mesh for representing an object training sample, and determine the vertices of the training three-dimensional mesh and the connection relationship between the vertices.
  • the training 3D mesh can also be data enhanced, so as to train the 3D network model through the enhanced training 3D mesh.
  • the method of data enhancement for the training 3D mesh is divided into face simplification and densification.
  • an edge optimization method can be used, that is, the smallest edge between each vertex is found each time, and the corresponding two vertices are merged into one vertex. Specifically, the edge between any two vertices is obtained, and each edge is compared. Based on the comparison result, the smallest edge is selected from each edge as the target edge, and then the two vertices corresponding to the target edge are obtained, and the two vertices are merged into one vertex, thereby obtaining an enhanced training three-dimensional mesh.
  • Figure 12 is a simplified schematic diagram of the three-dimensional mesh patch provided by an embodiment of the present application. Based on Figure 12, there are 10 vertices from v1 to v10.
  • the barycentric coordinates of the patches with larger areas are calculated first, and then the original three patches are divided into three based on the barycentric coordinates. Specifically, at least one patch is obtained, and then the patches are compared. Based on the comparison results, the patch with the largest area is selected from multiple patches as the target patch; the barycenter of the target patch and the three vertices corresponding to the target patch are determined, and then the original three patches are divided into three based on the barycentric coordinates and the three vertices.
  • Figure 13 is a schematic diagram of the densification of three-dimensional mesh facets provided in an embodiment of the present application.
  • vertices from A to I there are 9 vertices from A to I.
  • 8 triangular facets are formed, namely, the facets between vertices A, B, and C, the facets between vertices A, B, and I, the facets between vertices H, B, and I, the facets between vertices H, B, and G, the facets between vertices F, B, and G, the facets between vertices F, B, and E, the facets between vertices D, B, and E, and the facets between vertices D, B, and C.
  • the facets between vertices A, B, and C are the target facets with the largest area.
  • the center of gravity of the target facet namely P, and the corresponding vertices A, B, and C are determined, so that the original target facet is divided into three based on P, A, B, and C, and the enhanced training three-dimensional mesh is obtained.
  • the data enhancement process for the training 3D mesh can be ended by presetting the target number of vertices. Specifically, in the data enhancement process for the training 3D mesh, the number of vertices of the enhanced training 3D mesh is obtained, and the number of vertices is compared with the pre-set target number of vertices. Based on the comparison result, the data enhancement for the training 3D mesh is ended.
  • the training 3D mesh when the comparison result indicates that the number of vertices is less than the target number of vertices, the data enhancement for the training 3D mesh is ended; when the training 3D mesh is face densified, when the comparison result indicates that the number of vertices is greater than the target number of vertices, the data enhancement for the training 3D mesh is ended.
  • Step 203 extract features from the vertices of the object training sample through the first feature extraction layer to obtain vertex features of the training three-dimensional mesh.
  • a global feature extraction is performed on the object training sample based on the vertex features of the training three-dimensional mesh through the second feature extraction layer to obtain the global features of the object training sample
  • a local feature extraction is performed on the object training sample based on the vertices of the training three-dimensional mesh and the connection relationship between the vertices through the third feature extraction layer to obtain the local features of the object training sample.
  • Step 205 through the output layer, based on the vertex features of the training three-dimensional mesh, the global features of the object training sample and the local features of the object training sample, the key points of the object training sample are detected to obtain the positions of the key points of the object training sample on the object training sample.
  • the three-dimensional network model also includes a first feature stitching layer, so that through the output layer, based on the vertex features of the training three-dimensional mesh, the global features of the object training samples and the local features of the object training samples, the key points of the object training samples are detected to obtain the positions of the key points of the object training samples on the object training samples.
  • the process can be that, through the first feature stitching layer, the vertex features of the training three-dimensional mesh, the global features of the object training samples and the local features of the object training samples are stitched to obtain the stitching features of the object training samples; through the output layer, based on the stitching features of the object training samples, the key points of the object training samples are detected to obtain the positions of the key points of the object training samples on the object training samples.
  • Step 206 obtaining the difference between the position of the key point of the object training sample and the label, and training the three-dimensional network model based on the difference to obtain the target three-dimensional network model; wherein the target three-dimensional network model is used to perform key point detection on the object to be detected to obtain the position of the key point of the object to be detected on the object to be detected.
  • FIG. 14 is a flow chart of the key point detection method provided in the embodiment of the present application. Based on FIG. 14 , the key point detection method provided in the embodiment of the present application is implemented collaboratively by the client and the server.
  • Step 301 In response to an upload operation of an object training sample carrying a label, the client obtains the object training sample carrying the label.
  • the client can be a key point detection client set on the terminal.
  • the user Based on the human-computer interaction interface of the client, the user triggers the upload function item in the human-computer interaction interface so that the client presents an object selection interface on the human-computer interaction interface.
  • the user Based on the object selection interface, the user uploads the object training samples with labels from the local terminal, so that the client obtains the uploaded object training samples.
  • the object training sample can also be obtained by taking a picture by a camera that is in communication with the terminal. After taking the picture, the camera labels the object training sample, and then transmits the labeled object training sample to the terminal, which is automatically uploaded to the client by the terminal.
  • Step 302 The client sends the object training sample to the server.
  • Step 303 The server inputs the received object training sample into the three-dimensional network model.
  • Step 304 Based on the three-dimensional network model, key points of the object training samples are detected to obtain positions of the key points of the object training samples.
  • Step 305 Obtain the difference between the position of the key point of the object training sample and the label, and train the three-dimensional network model based on the difference.
  • the server iterates the above training process until the loss function converges to complete the training of the three-dimensional network model.
  • Step 307 Send a prompt message to the client.
  • the point cloud data corresponding to the object to be detected can be pre-stored locally in the terminal, or obtained from the outside world (such as the Internet), or collected in real time, for example, by a three-dimensional scanning device such as a three-dimensional scanner. Collected in real time.
  • Step 309 The client sends point cloud data corresponding to the object to be detected to the server in response to the key point detection instruction for the object to be detected.
  • the key point detection instructions for the object to be detected can be automatically generated by the client under certain trigger conditions.
  • the client automatically generates the key point detection instructions for the object to be detected after obtaining the point cloud data corresponding to the object to be detected. It can also be sent to the client by other devices connected to the terminal for communication. It can also be generated by the user based on the human-computer interaction interface of the client, triggering the corresponding determination function item.
  • step 310 the server inputs the received point cloud data corresponding to the object to be detected into the three-dimensional network model, so that the three-dimensional network model performs key point detection on the object to be detected, and obtains a three-dimensional heat map indicating the positions of the key points of the object to be detected on the object to be detected.
  • Step 311 sending a three-dimensional heat map indicating the positions of key points of the object to be detected on the object to be detected to the client.
  • Step 312 The client displays a three-dimensional heat map indicating the positions of key points of the object to be detected on the object to be detected.
  • the client can display the three-dimensional heat map in the human-computer interaction interface of the client, save the three-dimensional heat map locally in the terminal, and send the three-dimensional heat map to other devices connected to the terminal.
  • a three-dimensional grid corresponding to the object to be detected is obtained, and then by constructing a dual-path feature extraction layer, the global features and local features of the object to be detected are extracted based on the vertex features obtained from the three-dimensional grid and the connection relationship between the vertices, thereby obtaining the position of the key points on the object to be detected based on the vertex features obtained from the three-dimensional grid, the extracted global features, and the local features.
  • more abundant feature information of the object to be detected is extracted through multiple layers of feature extraction layers, and then the key points of the object to be detected are detected based on the abundant feature information, so that the accuracy of three-dimensional key point detection is significantly improved.
  • the first category is based on traditional geometric analysis methods.
  • the semantic key points of the three-dimensional head model are directly located by using methods such as sharp edge detection, curvature calculation, dihedral angle calculation, normal vector calculation and some specific geometric rules.
  • the vertex with the largest z direction in the three-dimensional coordinate system is the nose tip key point.
  • the sharp edge is detected below the nose tip.
  • the approximate area of the left and right mouth corner key points can be roughly located; the second category is based on deep learning methods.
  • This category of methods basically renders the three-dimensional head model into a two-dimensional image first, and then uses a two-dimensional convolutional neural network to extract features and detect the corresponding key points. It is worth noting that this type of method can also be divided into different combination methods according to whether multi-view detection and whether to directly regress the three-dimensional key points. For example, a common combination method is to only render the front view of the three-dimensional head model, record the rendering projection relationship, and then detect the two-dimensional key point coordinates on the two-dimensional front view, and finally reversely project to the three-dimensional space based on the known projection relationship to obtain the final three-dimensional key point coordinates. Another combination method is to render multiple views (such as front and side views) and then input them into different branches of the neural network model respectively, so that the neural network model combines the features of the two to directly regress the coordinates of the three-dimensional key points.
  • the traditional key point positioning method based on geometric analysis is very dependent on manually set rules. For example, when detecting sharp edges, a threshold needs to be specified. This is an empirical value and is difficult to apply to head models with different shapes. Therefore, the robustness of this method is poor.
  • the method based on two-dimensional convolutional neural network has achieved great success in the traditional two-dimensional image key point detection task, but the direct application of two-dimensional convolutional neural network to the detection of three-dimensional key points has many constraints and shortcomings. Specifically, first, the number of available three-dimensional face models is far less than that of face images, that is, the data set is relatively scarce, so it is difficult for the neural network to play its role.
  • the method of rendering from a three-dimensional face head model to a two-dimensional image will inevitably lose three-dimensional geometric information. For example, for the front view, there will inevitably be a lack of information on the back of the head. If it is necessary to detect the key points of the back of the head, then in the absence of information, detection is naturally impossible.
  • a multi-view method is used to avoid the problem of missing information as much as possible, features will be extracted through a multi-branch network, and finally the neural network will be fused and regressed to the three-dimensional coordinates. In this way, the three-dimensional coordinates of different views will be reconstructed. The intrinsic connections between them need to be learned by the neural network, which may lead to the problem of difficulty in convergence, thus increasing the difficulty of training.
  • the embodiments of the present application provide a key point detection method, device, electronic device, computer-readable storage medium and computer program product, which can effectively solve the various shortcomings of the above-mentioned technical methods.
  • the three-dimensional face model dataset is enhanced by simplifying and densifying the patches, which solves the problem of the relative lack of three-dimensional head model datasets, so that supervised deep learning has training data guarantee.
  • the neural convolution module is directly applied in the three-dimensional space, which avoids the problem of the natural loss of three-dimensional geometric information in the detection method under the two-dimensional space of the rendering view, and also solves the problem that the intrinsic connections brought by different views are difficult to learn.
  • the traditional two-dimensional heat map is expanded into a three-dimensional heat map. Compared with the method of directly regressing three-dimensional coordinates, the three-dimensional heat map can better ensure the local accuracy of the detection results.
  • the present application proposes a three-dimensional face key point detection method based on a graph neural network structure and a three-dimensional heat map.
  • This method can be integrated into the character animation tool set, and cooperate with the non-rigid wrapping algorithm to complete the deformation matching process between different head models.
  • the specific product form here can be a control.
  • a key point detection request carrying the relevant data of the three-dimensional head model to be detected is sent to the remote server where the technical solution of the present application is deployed, so as to obtain the return result.
  • the remote server deployment method is conducive to iterative optimization algorithm, and does not require local plug-in code updates, thereby saving local computer resources.
  • the graph convolutional neural network structure in the technical solution of the present application is explained. Specifically, since the three-dimensional model (three-dimensional network model) naturally has a graph structure relationship, and this relationship is not as compact and regularly arranged as the pixels of a two-dimensional image, it is inappropriate to directly use a traditional convolutional neural network. Therefore, a classic graph attention network (Graph Attention Network, GAT) is introduced here.
  • GAT Graph Attention Network
  • the attention mechanism can be used to calculate the importance (correlation value) of node j to node i, as shown in formula (2) and formula (3).
  • the process of using the attention mechanism to calculate the importance of node j to node i can be to splice the features Wh i and Wh j of nodes i and j, and then calculate the inner product of the spliced features and a weight vector a with a dimension of 2F, as shown in formula (4). Therefore, based on the importance of node j to node i, the feature vector (local feature) of node i is determined, as shown in formula (6).
  • FIG. 15 is a schematic diagram of the graph convolutional neural network structure provided by an embodiment of the present application.
  • a three-dimensional head model key point automatic detection neural network as shown in FIG. 15 is constructed based on GAT.
  • the input data i.e., vertex data
  • N represents the number of vertices of the three-dimensional model (three-dimensional mesh)
  • 6 is the dimension occupied by the vertex coordinates and the normal vector
  • X includes other characteristics of the three-dimensional head model vertex (three-dimensional mesh), including curvature, texture information, etc. These other characteristics can be adjusted according to different data and tasks.
  • a ij is the vertex connection relationship matrix (the connection relationship between vertices), the size is N*N, and its value is 0 or 1. If the two vertices i and j are connected, A ij is 1, otherwise it is 0.
  • Multilayer Perceptron represents a multi-layer fully connected perception layer.
  • Vertex data verices of a three-dimensional mesh
  • X 1 vertex features
  • X 2 global feature extraction and local feature extraction
  • One path continues to pass through the MLP module ([512, 1024]), and then performs maximum pooling on the output feature X 2 to obtain global feature information X 3 , which is then shared by all N vertices to determine the global feature N ⁇ X 3.
  • the same network structure does not require a fixed number of vertices N. This means that three-dimensional face models with different numbers of vertices can be used as input to the neural network model, whether in the training phase or in the actual use phase, thereby improving the applicability of the present application.
  • the three-dimensional heat map in the technical solution of the present application is explained. Since the heat map of the three-dimensional grid no longer has the compact structure of the two-dimensional image coordinates, compared with the use of Euclidean distance in the two-dimensional heat map, the three-dimensional heat map here uses the geodesic distance. In this way, at the three-dimensional grid level, the shortest path on the grid graph structure is verified based on the geodesic distance between two points, which can better reflect the characteristics of the three-dimensional surface than the Euclidean distance between two points.
  • Figure 16 is a comparison diagram of the geodesic distance and the Euclidean distance provided in an embodiment of the present application. Based on Figure 16, the straight line between the two vertices indicated by 1602 is the Euclidean distance, and the curve indicated by 1601 is the corresponding geodesic distance.
  • the traditional way of converting two-dimensional heat maps into two-dimensional coordinates includes: first obtaining the vertex coordinates where the probability maximum is located (called the argmax method); then weighting the softmax probability expectations of multiple vertex coordinates (also known as the soft-argmax method) to obtain the final three-dimensional key point coordinates.
  • the argmax method is directly used to obtain the vertex coordinates where the probability maximum is located, thereby determining the final three-dimensional key point coordinates.
  • three-dimensional face mesh data is very difficult to obtain in large quantities. Lack of data is a major problem that plagues neural network supervised learning. Only when the data set is large enough and can cover different facial forms can the graph neural network learn sufficient detection capabilities from it, but the three-dimensional face key point data set is difficult to obtain, and the reason why the three-dimensional face key point data set is difficult to obtain is reflected in the following aspects. Specifically, first, the three-dimensional mesh face data itself is produced by artists, and this production process is relatively troublesome. The generation of two-dimensional images only requires pressing the camera shutter.
  • the data enhancement methods are divided into patch simplification and densification.
  • patch simplification it can be based on edge optimization, that is, each time by finding the smallest edge between nodes, merge them into one vertex, as shown in Figure 12.
  • patch densification the barycentric coordinates of the patches with larger areas are calculated first, and then the original three patches are divided into three based on the barycentric coordinates, as shown in Figure 13.
  • both densification and patch simplification can use the final target vertex number to control the termination of their operations.
  • this application can provide accurate and reliable key point basis for subsequent 3D head model registration work by automatically detecting specific key points of the 3D game head model.
  • this application can avoid excessive manual participation, so that key point-dependent work such as 3D head model registration can be completed automatically. This will greatly save the manpower input of artists, thereby speeding up the entire production process related to model character animation.
  • this application is based on deep supervised learning of graph neural networks, which can accurately predict the positions of three-dimensional key points and has strong robustness.
  • the forward calculation speed of the deep learning model is extremely fast, and the algorithm as a whole can be completed in just 1 second.
  • Automatic labeling in contrast to manual methods, often takes several minutes, so this application has great practical value in terms of efficiency.
  • this application does not limit the number of vertices of the input 3D face model.
  • the generated deep learning model can be widely used in the automatic detection of key points of 3D head models with different vertex densities, and has strong applicability.
  • a three-dimensional grid corresponding to the object to be detected is obtained, and then by constructing a dual-path feature extraction layer, the global features and local features of the object to be detected are extracted based on the vertex features obtained from the three-dimensional grid and the connection relationship between the vertices, thereby obtaining the position of the key points on the object to be detected based on the vertex features obtained from the three-dimensional grid, the extracted global features, and the local features.
  • more abundant feature information of the object to be detected is extracted through multiple layers of feature extraction layers, and then the key points of the object to be detected are detected based on the abundant feature information, so that the accuracy of three-dimensional key point detection is significantly improved.
  • the software module stored in the key point detection device 455 of the memory 450 may include:
  • An acquisition module 4551 is configured to obtain a three-dimensional grid for representing the object to be detected, and determine the vertices of the three-dimensional grid and the connection relationship between the vertices;
  • a first feature extraction module 4552 is configured to extract features from vertices of the three-dimensional mesh to obtain vertex features of the three-dimensional mesh;
  • the second feature extraction module 4553 is configured to perform global feature extraction on the object to be detected based on the vertex features to obtain the global features of the object to be detected, and perform local feature extraction on the object to be detected based on the vertex features and the connection relationship between the vertices to obtain the local features of the object to be detected;
  • the output module 4554 is configured to detect the key points of the object to be detected based on the vertex features, the global features and the local features, and obtain the positions of the key points of the object to be detected on the object to be detected.
  • the acquisition module 4551 is further configured to scan the object to be detected through a three-dimensional scanning device to obtain point cloud data of the geometric surface of the object to be detected; and construct a three-dimensional grid corresponding to the object to be detected based on the point cloud data.
  • the second feature extraction module 4553 is further configured to determine the local features of each of the vertices based on the vertex features and the connection relationship between the vertices; and determine the local features of the object to be detected based on the local features of each of the vertices.
  • the second feature extraction module 4553 is further configured to perform the following processing for each of the vertices: determine the vertex as a reference vertex, and determine the vertex features of the reference vertex and the vertex features of other vertices based on the vertex features of each vertex in the three-dimensional mesh; wherein the other vertices are any vertices other than the reference vertex; determine the correlation value between the reference vertex and the other vertices based on the vertex features of the reference vertex, the vertex features of the other vertices, and the connection relationship between the vertices; wherein the correlation value is used to indicate the size of the degree of correlation between the reference vertex and the other vertices; determine the local features of the reference vertex based on the correlation value and the vertex features of the other vertices.
  • the second feature extraction module 4553 is further configured to use an attention mechanism to determine the degree of correlation between the reference vertex and the other vertices based on the vertex features of the reference vertex, the vertex features of the other vertices, and the connection relationship between the vertices; and normalize the degree of correlation to obtain a correlation value between the reference vertex and the other vertices.
  • the second feature extraction module 4553 is further configured to perform product processing on the correlation value and the vertex features of the other vertices to obtain a product result; and determine the local feature corresponding to the reference vertex based on the product result.
  • the second feature extraction module 4553 is further configured to perform a product process on the correlation value and the vertex feature of the corresponding other vertex for each of the other vertices to obtain a product result of the other vertices; and cumulatively sum the product results of each of the other vertices to obtain a sum result; Based on the summation result, a local feature corresponding to the reference vertex is determined.
  • the second feature extraction module 4553 is further configured to perform feature fusion on the local features of each of the vertices based on the local features of each of the vertices to obtain a fused feature; and use the fused feature as the local feature of the object to be detected.
  • the output module 4554 is further configured to perform feature splicing on the vertex features, the global features and the local features to obtain the splicing features of the object to be detected; based on the splicing features, the key points of the object to be detected are detected to obtain the positions of the key points of the object to be detected on the object to be detected.
  • the output module 4554 is further configured to detect the key points of the object to be detected based on the vertex features, the global features and the local features, and obtain the probability of the key points at each vertex in the three-dimensional grid; based on the probability, generate a three-dimensional heat map corresponding to the three-dimensional grid; based on the three-dimensional heat map, determine the position of the key points of the object to be detected on the object to be detected.
  • the device is applied to a three-dimensional network model, which includes at least a first feature extraction layer, a second feature extraction layer, a third feature extraction layer and an output layer.
  • the first feature extraction module 4552 is also configured to perform feature extraction on the vertices of the three-dimensional mesh through the first feature extraction layer to obtain vertex features of the three-dimensional mesh;
  • the second feature extraction module 4553 is also configured to perform global feature extraction on the object to be detected based on the vertex features through the second feature extraction layer to obtain global features of the object to be detected, and perform local feature extraction on the object to be detected based on the vertex features and the connection relationship between the vertices through the third feature extraction layer to obtain local features of the object to be detected;
  • the output module 4554 is also configured to detect the key points of the object to be detected based on the vertex features, the global features and the local features through the output layer to obtain the positions of the key points of the object to be detected on the object to be detected.
  • the three-dimensional network model also includes a first feature splicing layer, a second feature splicing layer, and a fourth feature extraction layer.
  • the output module 4554 is also configured to perform feature splicing on the vertex features, the global features, and the local features through the first feature splicing layer to obtain the splicing features of the object to be detected; perform local feature extraction on the object to be detected based on the splicing features through the fourth feature extraction layer to obtain the target local features of the object to be detected; perform feature splicing on the splicing features, the global features, and the target local features through the second feature splicing layer to obtain the target splicing features of the object to be detected; and perform detection on the key points of the object to be detected based on the target splicing features through the output layer to obtain the positions of the key points of the object to be detected on the object to be detected.
  • the following further describes an exemplary structure of a training device for a three-dimensional network model provided in an embodiment of the present application implemented as a software module, wherein the three-dimensional network model at least includes a first feature extraction layer, a second feature extraction layer, a third feature extraction layer, and an output layer, and the training device includes:
  • An acquisition module configured to acquire an object training sample carrying a label, wherein the label is configured to indicate a real position of a key point of the object training sample;
  • An acquisition module configured to acquire a training three-dimensional mesh for representing the object training sample, and determine vertices of the training three-dimensional mesh and connection relationships between vertices;
  • a first feature extraction module is configured to extract features of vertices of the object training sample through the first feature extraction layer to obtain vertex features of the training three-dimensional mesh;
  • a second feature extraction module is configured to perform global feature extraction on the object training sample based on the vertex features of the training three-dimensional mesh through the second feature extraction layer to obtain the global features of the object training sample, and perform local feature extraction on the object training sample based on the vertices of the training three-dimensional mesh and the connection relationship between the vertices through the third feature extraction layer to obtain the local features of the object training sample;
  • an output module configured to detect key points of the object training sample through the output layer based on vertex features of the training three-dimensional mesh, global features of the object training sample, and local features of the object training sample, and obtain positions of the key points of the object training sample on the object training sample;
  • An updating module is configured to obtain the difference between the position of the key point of the object training sample and the label, and based on the The three-dimensional network model is trained based on the difference to obtain a target three-dimensional network model; wherein the target three-dimensional network model is used to perform key point detection on the object to be detected to obtain the position of the key points of the object to be detected on the object to be detected.
  • the present application also provides an electronic device, the electronic device comprising:
  • a memory configured to store computer executable instructions
  • the processor is configured to execute the computer executable instructions stored in the memory to implement the key point detection method or the three-dimensional network model training method described above in the embodiment of the present application, for example, the key point detection method shown in FIG. 3, or the three-dimensional network model training method shown in FIG. 11.
  • the embodiment of the present application provides a computer program product or a computer program, which includes computer executable instructions, and the computer executable instructions are stored in a computer-readable storage medium.
  • the processor of the electronic device reads the computer executable instructions from the computer-readable storage medium, and the processor executes the computer executable instructions, so that the electronic device executes the key point detection method or the three-dimensional network model training method described in the embodiment of the present application, for example, the key point detection method shown in FIG. 3, or the three-dimensional network model training method shown in FIG. 11.
  • An embodiment of the present application provides a computer-readable storage medium storing computer-executable instructions, wherein computer-executable instructions are stored.
  • the processor will execute the key point detection method provided in the embodiment of the present application, or the training method of a three-dimensional network model, for example, the key point detection method shown in FIG. 3, or the training method of a three-dimensional network model shown in FIG. 11.
  • the computer-readable storage medium may be a memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface storage, optical disk, or CD-ROM; or it may be various devices including one or any combination of the above memories.
  • computer executable instructions may be in the form of a program, software, software module, script or code, written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages), and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine or other unit suitable for use in a computing environment.
  • computer-executable instructions may, but do not necessarily, correspond to a file in a file system, may be stored as part of a file that stores other programs or data, such as in one or more scripts in a Hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files storing one or more modules, subroutines, or code portions).
  • HTML Hypertext Markup Language
  • the executable instructions may be deployed to be executed on one electronic device, or on multiple electronic devices located at one site, or on multiple electronic devices distributed at multiple sites and interconnected by a communication network.
  • Richer feature information of the object to be detected is extracted through multiple feature extraction layers, and then the key points of the object to be detected are detected based on the rich feature information, so that the accuracy of 3D key point detection is significantly improved.
  • GAT does not rely on the complete graph structure, but only on the characteristics of the edges, which improves the flexibility of the key point detection process.
  • the attention mechanism can also assign different weights to different neighbor nodes, which improves the accuracy of the key point detection process.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)

Abstract

本申请提供了一种关键点检测方法、训练方法、装置、设备、介质及产品,应用于人工智能技术领域,包括:获得用于表征待检测对象的三维网格,并确定三维网格的顶点及顶点间的连接关系;对三维网格的顶点进行特征提取,得到顶点特征;基于顶点特征对待检测对象进行全局特征提取得到全局特征,并基于顶点特征及顶点间的连接关系对待检测对象进行局部特征提取得到局部特征;结合顶点特征、全局特征及局部特征,对待检测对象的关键点进行检测,得到待检测对象的关键点在待检测对象上的位置。

Description

关键点检测方法、训练方法、装置、电子设备、计算机可读存储介质及计算机程序产品
相关申请的交叉引用
本申请实施例基于申请号为202211576832.9、申请日为2022年12月09日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请实施例作为参考。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种关键点检测方法、训练方法、装置、电子设备、计算机可读存储介质及计算机程序产品。
背景技术
相关技术中,三维人脸角色的关键点检测一般分为两大类,第一个大类是基于传统几何分析的方法,第二大类是基于深度学习的方法,对于第一类方法,基于几何分析的关键点定位方法十分依赖于人工设定的规则,很难以适用于形态各异的头模,因此该方法的鲁棒性较差;而对于第二类方法,基本上会先把三维头模渲染成二维图像,然后再利用二维卷积神经网络提取特征,检测相应的关键点,这样,必然会损失三维几何信息。基于此,相关技术中对三维人脸角色的关键点进行检测的准确率较低。
发明内容
本申请实施例提供一种关键点检测方法、三维网络模型的训练方法、装置、电子设备、计算机可读存储介质以及计算机程序产品,能够提高通过三维网络模型进行关键点检测的准确率。
本申请实施例的技术方案是这样实现的:
本申请实施例提供一种关键点检测方法,所述方法包括:
获得用于表征待检测对象的三维网格,并确定所述三维网格的顶点、以及顶点间的连接关系;
对所述三维网格的顶点进行特征提取,得到所述三维网格的顶点特征;
基于所述顶点特征,对所述待检测对象进行全局特征提取,得到所述待检测对象的全局特征,并基于所述顶点特征、以及所述顶点间的连接关系,对所述待检测对象进行局部特征提取,得到所述待检测对象的局部特征;
基于所述顶点特征、所述全局特征以及所述局部特征进行特征拼接,对所述待检测对象的关键点进行检测,得到所述待检测对象的关键点在所述待检测对象上的位置。
本申请实施例提供一种关键点检测装置,所述装置包括:
获得模块,配置为获得用于表征待检测对象的三维网格,并确定所述三维网格的顶点、以及顶点间的连接关系;
第一特征提取模块,配置为对所述三维网格的顶点进行特征提取,得到所述三维网格的顶点特征;
第二特征提取模块,配置为基于所述顶点特征,对所述待检测对象进行全局特征提取,得到所述待检测对象的全局特征,并基于所述顶点特征、以及所述顶点间的连接关系,对所述待检测对象进行局部特征提取,得到所述待检测对象的局部特征;
输出模块,配置为基于所述顶点特征、所述全局特征以及所述局部特征,对所述待检测对象的关键点进行检测,得到所述待检测对象的关键点在所述待检测对象上的位置。
本申请实施例提供一种三维网络模型的训练方法,所述三维网络模型至少包括第一特征提取层、第二特征提取层、第三特征提取层以及输出层,所述方法包括:
获取携带标签的对象训练样本,所述标签用于指示对象训练样本的关键点的真实位置;
获得用于表征所述对象训练样本的训练三维网格,并确定所述训练三维网格的顶点以及顶点间的连接关系;
通过所述第一特征提取层,对所述对象训练样本的顶点进行特征提取,得到所述训练三维网格的顶点特征;
通过所述第二特征提取层,基于所述训练三维网格的顶点特征,对所述对象训练样本进行全局特征提取,得到所述对象训练样本的全局特征,并通过所述第三特征提取层,基于所述训练三维网格的顶点以及顶点间的连接关系,对所述对象训练样本进行局部特征提取,得到所述对象训练样本的局部特征;
通过所述输出层,基于所述训练三维网格的顶点特征、所述对象训练样本的全局特征以及所述对象训练样本的局部特征,对所述对象训练样本的关键点进行检测,得到所述对象训练样本的关键点在所述对象训练样本上的位置;
获取所述对象训练样本的关键点的位置与所述标签的差异,并基于所述差异训练所述三维网络模型,得到目标三维网络模型;其中,所述目标三维网络模型用于对待检测对象进行关键点检测,得到所述待检测对象的关键点在所述待检测对象上的位置。
本申请实施例提供一种三维网络模型的训练装置,所述三维网络模型至少包括第一特征提取层、第二特征提取层、第三特征提取层以及输出层,所述装置包括:
获取模块,配置为获取携带标签的对象训练样本,所述标签配置为指示对象训练样本的关键点的真实位置;
获得模块,配置为获得用于表征所述对象训练样本的训练三维网格,并确定所述训练三维网格的顶点以及顶点间的连接关系;
第一特征提取模块,配置为通过所述第一特征提取层,对所述对象训练样本的顶点进行特征提取,得到所述训练三维网格的顶点特征;
第二特征提取模块,配置为通过所述第二特征提取层,基于所述训练三维网格的顶点特征,对所述对象训练样本进行全局特征提取,得到所述对象训练样本的全局特征,并通过所述第三特征提取层,基于所述训练三维网格的顶点以及顶点间的连接关系,对所述对象训练样本进行局部特征提取,得到所述对象训练样本的局部特征;
输出模块,配置为通过所述输出层,基于所述训练三维网格的顶点特征、所述对象训练样本的全局特征以及所述对象训练样本的局部特征,对所述对象训练样本的关键点进行检测,得到所述对象训练样本的关键点在所述对象训练样本上的位置;
更新模块,配置为获取所述对象训练样本的关键点的位置与所述标签的差异,并基于所述差异训练所述三维网络模型,得到目标三维网络模型;其中,所述目标三维网络模型用于对待检测对象进行关键点检测,得到所述待检测对象的关键点在所述对象训练样本上的位置。
本申请实施例提供一种电子设备,包括:
存储器,配置为存储可执行指令;
处理器,配置为执行所述存储器中存储的计算机可执行指令时,实现本申请实施例提供的关键点检测方法。
本申请实施例提供一种电子设备,包括:
存储器,配置为存储可执行指令;
处理器,配置为执行所述存储器中存储的计算机可执行指令时,实现本申请实施例提供的三维网络模型的训练方法。
本申请实施例提供一种计算机可读存储介质,其中存储有计算机可执行指令,当计算机可执行指令被处理器执行时,将引起处理器执行本申请实施例提供的关键点检测方法。
本申请实施例提供一种计算机可读存储介质,其中存储有计算机可执行指令,当计算机可执行指令被处理器执行时,将引起处理器执行本申请实施例提供的三维网络模型的训练方法。
本申请实施例提供了一种计算机程序产品,该计算机程序产品包括计算机程序或计算机可执行指令,该计算机程序或计算机可执行指令存储在计算机可读存储介质中。电子设备的处理器从计算机可读存储介质读取该计算机程序或计算机可执行指令,处理器执行该计算机程序或计算机可执行指令,使得该电子设备执行本申请实施例提供的关键点检测方法。
本申请实施例提供了一种计算机程序产品,该计算机程序产品包括计算机程序或计算机可执行指令,该计算机程序或计算机可执行指令存储在计算机可读存储介质中。电子设备的处理器从计算机可读存储介质读取该计算机程序或计算机可执行指令,处理器执行该计算机程序或计算机可执行指令,使得该电子设备执行本申请实施例提供的三维网络模型的训练方法。
本申请实施例具有以下有益效果:
获得待检测对象对应的三维网格,再通过双路特征提取层的搭建,基于三维网格得到的顶点特征以及顶点间的连接关系,分别提取待检测对象全局特征以及局部特征,从而基于三维网格得到的顶点特征、提取得到的全局特征以及局部特征,得到待检测对象上关键点的位置。如此,通过多层特征提取层提取到待检测对象更丰富的特征信息,再依据丰富的特征信息对待检测对象的关键点进行检测,使得三维关键点检测的准确率得到显著提高。
附图说明
图1是本申请实施例提供的关键点检测系统100的架构示意图;
图2是本申请实施例提供的电子设备的结构示意图;
图3是本申请实施例提供的关键点检测方法的流程示意图;
图4是本申请实施例提供的人的头部的三维网格的示意图;
图5是本申请实施例提供的确定各顶点的局部特征的流程示意图;
图6是本申请实施例提供的采用注意力机制确定参考顶点与其它顶点间的相关程度的示意图;
图7是本申请实施例提供的关键点在待检测对象上的位置的示意图;
图8是本申请实施例提供的三维网络模型的结构示意图;
图9是本申请实施例提供的第三特征提取层的结构示意图;
图10是本申请实施例提供的三维网络模型的结构示意图;
图11是本申请实施例提供的三维网络模型的训练过程的流程示意图;
图12是本申请实施例提供的三维网格面片简化的示意图;
图13是本申请实施例提供的三维网格面片稠密化的示意图;
图14是本申请实施例提供的关键点检测方法的流程示意图;
图15是本申请实施例提供的图卷积神经网络结构示意图;
图16是本申请实施例提供的测地线距离和欧氏距离的对比图。
具体实施方式
为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施例作进一步地详细描述,所描述的实施例不应视为对本申请实施例的限制,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。
在以下的描述中,所涉及的术语“第一\第二\第三”仅仅是区别类似的对象,不代表针对对象的特定排序,可以理解地,“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。
对本申请实施例进行进一步详细说明之前,对本申请实施例中涉及的名词和术语进行说明,本申请实施例中涉及的名词和术语适用于如下的解释。
1)三维网格(Mesh),指具有拓扑结构的流形表面,比如一个球状的表面被划分为多个顶点与多条边的组合,本申请中可以是三维人脸网格,这里,三维网格是一个图(Graph)结构。
2)客户端(Client),又称用户端,是指与服务器相对应的为用户提供本地服务的程序,除了一些只能在本地运行的应用程序之外,一般安装在普通的客户机上,需要与服务器相互配合运行,即需要网络中有相应的服务器和服务程序来提供相应的服务,这样在客户端和服务器端,需要建立特定的通信连接,来保证应用程序的正常运行。
3)三维人脸关键点检测,指的是给定任意一个三维人脸网格模型,检测预设语义的一系列人脸关键点的三维坐标。三维人脸模型的顶点数目和面片数目均没有任何限制,预设语义的关键点指的是包括眼角,嘴角,鼻尖,脸部轮廓等在内的位置信息,关键点的语义和数目由具体任务决定。
4)图神经网络(Graph Neural Networks,GNN),是一类人工神经网络,用于处理可以表示为图的数据。相比于传统的二维卷积神经网络作用于二维图片,图神经网络将作用对象扩展为可以表征三维网格形态的图数据。图神经网络的关键设计元素是使用成对消息传递,以便图节点通过与其邻居交换信息的方式来迭代更新。
5)损失,用于衡量模型的实际结果和目标结果之间的差距,以进行模型的训练和优化。
6)三维热力图回归(Heatmap),指的是图神经网络以热力图作为输出层,并与标准热力图形成回归损失,通过前向传递和梯度回传训练神经网络,使神经网络的输出与标签拟合,最终再从热力图中计算关键点坐标。
7)三维扫描仪(3D scanner),是一种科学仪器,用来侦测并分析现实世界中物体或环境的形状(几何构造)与外观数据(如颜色、表面反照率等性质)。搜集到的数据通常被用来进行三维重建计算,在虚拟世界中创建实际物体的数字模型。这些模型具有相当广泛的用途,如工业设计、瑕疵检测、逆向工程、机器人导引、地貌测量、医学信息、生物信息、刑事鉴定等。
8)多层感知器(Multi-Layer Perceptron,MLP),是一种前向结构的人工神经网络,映射一组输入向量到一组输出向量。MLP可以被看作是一个有向图,由多个的节点层所组成,每一层都全连接到下一层。除了输入节点,每个节点都是一个带有非线性激活函数的神经元(或称处理单元)。
9)卷积神经网络(Convolutional Neural Network,CNN),一种前馈神经网络,一般由一个或者多个卷积层(采用卷积数学运算的网络层)和末端的全连接层组成,其网络内部的神经元可以响应输入图像的部分区域,一般在视觉图像处理领域有着较为出色的表现。
10)机器学习(Machine Learning,ML),是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学 习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、式教学习等技术。
11)点云数据,是指目标表面特征的海量点集合,一般是通过激光测量或摄影测量获得的。对于激光测量得到的点云数据,包括三维坐标和激光反射强度,这类点云数据通常通过回波特性和反射强度判别物体的状态;对于摄影测量得到的点云数据,通常包括三维坐标和颜色信息。
12)图注意力网络(Graph Attention Network,GAT),一种基于图结构数据的新型神经网络架构。
随着人工智能技术研究和进步,人工智能技术在多个领域展开研究和应用,例如常见的智能家居、智能穿戴设备、虚拟助理、智能音箱、智能营销、无人驾驶、自动驾驶、无人机、机器人、智能医疗、智能客服等,相信随着技术的发展,人工智能技术将在更多的领域得到应用,并发挥越来越重要的价值。
本申请实施例提供的方案涉及人工智能的三维网络模型等技术,也可以应用与云技术以及车联网等领域,具体通过以下实施例进行说明。
参见图1,图1是本申请实施例提供的关键点检测系统100的架构示意图,为实现关键点检测的应用场景(例如,关键点检测的应用场景可以是在对人脸进行关键点检测时,首先通过三维扫描仪对人脸进行三维扫描,从而基于三维扫描数据,检测人脸上的关键点位置),终端(示例性示出了终端400)通过网络300连接服务器200,网络300可以是广域网或者局域网,又或者是二者的组合,终端400配置为供用户使用客户端401,在显示界面(示例性示出了显示界面401-1)显示,终端400和服务器200通过有线或者无线网络相互连接。
其中,终端400配置为,获取对应待检测对象的三维扫描数据,并将三维扫描数据发送至服务器200;
服务器200配置为,接收三维扫描数据;基于三维扫描数据,获得用于表征待检测对象的三维网格,并确定三维网格的顶点、以及顶点间的连接关系;对三维网格的顶点进行特征提取,得到三维网格的顶点特征;基于顶点特征,对待检测对象进行全局特征提取,得到待检测对象的全局特征,并基于顶点特征、以及顶点间的连接关系,对待检测对象进行局部特征提取,得到待检测对象的局部特征;基于顶点特征、全局特征以及局部特征,对待检测对象的关键点进行检测,得到待检测对象的关键点在待检测对象上的位置;并将关键点在待检测对象上的位置发送至终端400;
终端400还配置为,基于显示界面,展示待检测对象上关键点的位置。
一些实施例中,服务器200可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Deliver Network,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。终端400可以是智能手机、平板电脑、笔记本电脑、台式计算机、机顶盒、智能语音交互设备、智能家电、车载终端、飞行器、以及移动设备(例如,移动电话,便携式音乐播放器,个人数字助理,专用消息设备,便携式游戏设备,智能音箱及智能手表)等,但并不局限于此。终端设备以及服务器可以通过有线或无线通信方式进行直接或间接地连接,本申请实施例中不做限制。
参见图2,图2是本申请实施例提供的电子设备的结构示意图,在实际应用中,电子设备可以为图1示出的服务器200或终端400,参见图2,图2所示的电子设备包括:至少一个处理器410、存储器450、至少一个网络接口420和用户接口430。终端400中的各个组件通过总线系统440耦合在一起。可理解,总线系统440配置为实现这些组件之间的连接通信。总线系统440除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图2中将各种总线都标为总线系统440。
处理器410可以是一种集成电路芯片,具有信号的处理能力,例如通用处理器、数字信号处理器(Digital Signal Processor,DSP),或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等,其中,通用处理器可以是微处理器或者任何常规的处理器等。
用户接口430包括使得能够呈现媒体内容的一个或多个输出装置431,包括一个或多个扬声器和/或一个或多个视觉显示屏。用户接口430还包括一个或多个输入装置432,包括有助于用户输入的用户接口部件,比如键盘、鼠标、麦克风、触屏显示屏、摄像头、其他输入按钮和控件。
存储器450可以是可移除的,不可移除的或其组合。示例性的硬件设备包括固态存储器,硬盘驱动器,光盘驱动器等。存储器450可选地包括在物理位置上远离处理器410的一个或多个存储设备。
存储器450包括易失性存储器或非易失性存储器,也可包括易失性和非易失性存储器两者。非易失性存储器可以是只读存储器(Read Only Memory,ROM),易失性存储器可以是随机存取存储器(Random Access Memory,RAM)。本申请实施例描述的存储器450旨在包括任意适合类型的存储器。
在一些实施例中,存储器450能够存储数据以支持各种操作,这些数据的示例包括程序、模块和数据结构或者其子集或超集,下面示例性说明。
操作系统451,包括配置为处理各种基本系统服务和执行硬件相关任务的系统程序,例如框架层、核心库层、驱动层等,配置为实现各种基础业务以及处理基于硬件的任务;
网络通信模块452,配置为经由一个或多个(有线或无线)网络接口420到达其他电子设备,示例性的网络接口420包括:蓝牙、无线相容性认证(WiFi)、和通用串行总线(Universal Serial Bus,USB)等;
呈现模块453,配置为经由一个或多个与用户接口430相关联的输出装置431(例如,显示屏、扬声器等)使得能够呈现信息(例如,配置为操作外围设备和显示内容和信息的用户接口);
输入处理模块454,配置为对一个或多个来自输入装置432的用户输入或互动进行检测以及翻译所检测的输入或互动。
在一些实施例中,本申请实施例提供的装置可以采用软件方式实现,图2示出了存储在存储器450的关键点检测装置455,其可以是程序和插件等形式的软件,包括以下软件模块:获得模块4551、第一特征提取模块4552、第二特征提取模块4553以及输出模块4554,这些模块是逻辑上的,因此根据所实现的功能可以进行任意的组合或进一步拆分。将在下文中说明各个模块的功能。
在另一些实施例中,本申请实施例提供的装置可以采用硬件方式实现,作为示例,本申请实施例提供的关键点检测装置可以是采用硬件译码处理器形式的处理器,其被编程以执行本申请实施例提供的关键点检测方法,例如,硬件译码处理器形式的处理器可以采用一个或多个应用专用集成电路(Application Specific Integrated Circuit,ASIC)、DSP、可编程逻辑器件(Programmable Logic Device,PLD)、复杂可编程逻辑器件(Complex Programmable Logic Device,CPLD)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或其他电子元件。
在一些实施例中,终端或服务器可以通过运行计算机程序来实现本申请实施例提供的关键点检测方法。举例来说,计算机程序可以是操作系统中的原生程序或软件模块;可以是本地(Native)应用程序(Application,APP),即需要在操作系统中安装才能运行的程序,如即时通信APP、网页浏览器APP;也可以是小程序,即只需要下载到浏览器环境中就可以运行的程序;还可以是能够嵌入至任意APP中的小程序。总而言之,上述计算机程序可以是任意形式的应用程序、模块或插件。
基于上述对本申请实施例提供的关键点检测系统及电子设备的说明,下面说明本申请实 施例提供的关键点检测方法。在实际实施时,本申请实施例提供的关键点检测方法可以由终端或服务器单独实现,或者由终端及服务器协同实现,以由图1中的服务器200单独执行本申请实施例提供的关键点检测方法为例进行说明。参见图3,图3是本申请实施例提供的关键点检测方法的流程示意图,下面,将结合图3对示出的步骤进行说明。
步骤101,服务器获得用于表征待检测对象的三维网格,并确定三维网格的顶点、以及顶点间的连接关系。
在实际实施时,获得用于表征待检测对象的三维网格可以是直接接收其他设备发送的待检测对象的三维网格,也可以是通过待检测对象对应的点云数据(即三维扫描数据)来实现,这里,点云数据配置为指示待检测对象的表面特征的海量点集合,一般可以通过激光测量或摄影测量获得。具体地,首先获取对应待检测对象的点云数据,从而基于点云数据,获得用于表征待检测对象的三维网格,也即构建对应待检测对象的三维网格。这里,获取对应待检测对象的点云数据的方式存在多种,点云数据可以是预先存储于终端本地的,也可以是从外界(如互联网)中获取到的,还可以是实时采集的,例如通过三维扫描装置如三维扫描仪实时采集到的。
在一些实施例中,当点云数据为通过三维扫描装置如三维扫描仪实时采集到的时,构建对应待检测对象的三维网格的过程,具体包括,通过三维扫描装置对待检测对象进行扫描,得到待检测对象的几何表面的点云数据;基于点云数据,构建对应待检测对象的三维网格。示例性地,参见图4,图4是本申请实施例提供的人的头部的三维网格的示意图,基于图4,当待检测对象为人脸时,通过三维扫描仪对人的头部进行三维扫描,得到对应头部的点云数据,从而基于点云数据,构建对应头部的三维网格。
需要说明的是,基于点云数据,构建对应待检测对象的三维网格的过程,可以是,首先对点云数据进行预处理,得到目标点云数据;其中,预处理包括滤波、去噪、以及点云配准等操作,这里,滤波可以去除噪声点,去噪可能进一步减少噪声和无效点,而点云配准可以将点云数据对齐到同一个坐标系中;然后,对目标点云数据进行网格重建,得到三维网格,其中,网格重建是将离散的目标点云数据转化为三维网格的过程,常用的网格重建算法包括基于网格的方法、基于体素的方法和基于隐函数的方法等,这里,基于网格的方法是将目标点云数据转化为三角网格,基于体素的方法是将目标点云数据转化为体素网格,而基于隐函数的方法则是使用数据函数来表示三维网格。
需要说明的是,在本申请实施例中,涉及到实时扫描等相关的数据,当本申请实施例运用到具体产品或技术中时,需要获得用户许可或者同意,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。
在实际实施时,三维网格的顶点间的连接关系可以是顶点间连接关系矩阵,用于指示顶点间是否存在关联,大小为N*N,其值为0或1,这里的N为顶点数目,当顶点i与顶点j相连,则两顶点间连接关系Aij为1,否则为0。
示例性地,人脸上用于指示眼睛位置的三维网格的顶点之间存在连接关系,而用于指示眼睛位置的三维网格的顶点和用于指示下巴位置的三维网格的顶点间不存在连接关系。
步骤102,对三维网格的顶点进行特征提取,得到三维网格的顶点特征。
在实际实施时,对三维网格的顶点进行特征提取,得到三维网格的顶点特征,其中,顶点特征包括相应顶点的位置以及相应顶点所指示的人脸上对应位置的信息,示例性地,这里的顶点特征可以是N*(6+X),其中,N表示三维网格对应的顶点数目,6则是顶点坐标与法向量所占据的维度即顶点坐标(x,y,z)三个坐标维度所对应的6个方向维度,X则包含了三维网格的顶点的其他特性,即相应顶点所指示的人脸上对应位置的信息,如曲率、纹理信息等。需要说明的是,这些其他特性随数据和任务的不同可加以调整,如此,当本申请应用于模型中时,在模型的训练阶段,通过增加这些其他特性,加快模型的学习效率。
步骤103,基于顶点特征,对待检测对象进行全局特征提取,得到待检测对象的全局特 征,并基于顶点特征、以及顶点间的连接关系,对待检测对象进行局部特征提取,得到待检测对象的局部特征。
需要说明的是,在确定三维网格的顶点间的连接关系以及三维网格的顶点特征后,分别对待检测对象进行全局特征提取,并对待检测对象进行局部特征提取,从而得到待检测对象的全局特征以及局部特征。
在一些实施例中,基于顶点特征,对待检测对象进行全局特征提取,得到待检测对象的全局特征的过程,可以是,首先基于顶点特征,对待检测对象进行特征提取,并对提取到的特征进行最大池化处理,得到最大池化特征,以使得所有顶点共享该最大池化特征,并将该最大池化特征作为待检测对象的全局特征。
在一些实施例中,基于顶点特征、以及顶点间的连接关系,对待检测对象进行局部特征提取,得到待检测对象的局部特征的过程,可以是,基于顶点特征、以及顶点间的连接关系,确定各顶点的局部特征;基于各顶点的局部特征,确定待检测对象的局部特征。
需要说明的是,这里的全局特征用于指示待检测对象的整体性特征,例如待检测对象的颜色特征、纹理特征和形状特征等,而局部特征用于指示待检测对象的细节性特征也即从待检测对象的局部区域中提取的特征,例如从待检测对象的边缘、角、点、线、曲线和特别属性的区域所提取的特征等。示例性地,当待检测对象而人脸时,全局特征可以是人脸上五官的大小、形状以及位置等,而局部特征可以是在不同表情下脸部肌肉的分布以及五官的形状变化等。这里,由于全局特征是像素级的低层可视特征,因此,全局特征具有良好的不变性、计算简单、表示直观等特点,但不适用于物体混叠和有遮挡的情况,而局部图像特征具有在图像中蕴含数量丰富、特征间相关度小的特点,在物体混叠和有遮挡时,不会因为部分特征的消失而影响其他特征的检测和匹配,如此,通过对待检测对象进行全局特征以及局部特征的提取,从而获取待检测对象更丰富且准确的特征,进而提高关键点检测结果的准确率。
接下来,分别对基于顶点特征、以及顶点间的连接关系,确定各顶点的局部特征的过程,以及基于各顶点的局部特征,确定待检测对象的局部特征的过程进行说明。
对于基于顶点特征、以及顶点间的连接关系,确定各顶点的局部特征的过程,这里,参见图5,图5是本申请实施例提供的确定各顶点的局部特征的流程示意图,基于图5,通过基于顶点特征、以及顶点间的连接关系,确定各顶点的局部特征的过程通过步骤1031至步骤1033所实现,结合图5,针对各顶点执行以下处理:
步骤1031,将顶点确定为参考顶点,并基于三维网格中各顶点的顶点特征,确定参考顶点的顶点特征、以及其它顶点的顶点特征;其中,其它顶点为除参考顶点以外的任一顶点。
示例性地,三维网络中的顶点数目为N,每个顶点的特征为h,维度是F,即
h={h1,h2,……,hN},h∈RF   公式(1);
其中,将顶点i作为参考节点,hi为大小为F的向量即参考节点i的特征,顶点j为其它顶点,hj为大小为F的向量即其它节点j的特征,顶点i与顶点j存在边连接关系。
步骤1032,基于参考顶点的顶点特征、其它顶点的顶点特征、以及顶点间的连接关系,确定参考顶点与其它顶点间的相关值;其中,相关值用于指示参考顶点与其它顶点间的相关程度。
在一些实施例中,基于参考顶点的顶点特征、其它顶点的顶点特征、以及顶点间的连接关系,确定参考顶点与其它顶点间的相关值的过程,可以是,基于参考顶点的顶点特征、其它顶点的顶点特征、以及顶点间的连接关系,采用注意力机制确定参考顶点与其它顶点间的相关程度,其中,相关程度为衡量参考顶点与其它顶点间关联强度的指标,相关程度的大小可以通过如下公式计算得到,即
eij=Attention(Whi,Whj)   公式(2);
其中,W为一个大小为F×F的权重矩阵,hi为参考顶点i的顶点特征,hj为其它顶点j 的顶点特征,attention指示采用注意力机制处理,eij指示参考顶点与其它顶点间的相关程度。
在另一些实施例中,基于参考顶点的顶点特征、其它顶点的顶点特征、以及顶点间的连接关系,确定参考顶点与其它顶点间的相关值的过程,可以是,基于顶点间的连接关系,确定相连接的参考顶点和其它顶点;基于相连接的参考顶点的顶点特征与其它顶点的顶点特征,对参考顶点与相应的其它顶点进行相似度匹配,得到参考顶点与相应其它顶点间的相似度(其中,对其它顶点中的每一个顶点均得到一个对应的相似度);将该相似度,确定为参考顶点与相应其它顶点间的相关程度。
然后,对相关程度进行归一化处理,得到参考顶点与其它顶点间的相关值,即
其中,Softmaxj指示采用归一化处理,αij指示对节点i和j间的相关值,exp指示以自然常数e为底的指数函数,Ni指示与参考节点i存在连接关系的所有其它节点组成的域,q表示该域中任一顶点。
示例性地,参见图6,图6是本申请实施例提供的采用注意力机制确定参考顶点与其它顶点间的相关程度的示意图,基于图6,601所指示的αij指示对节点i和j间的相关值,虚线框602中的Whi指示对应参考顶点i的顶点特征,虚线框603中的Whj指示对应其它顶点j的顶点特征,a为权重向量,在基于Whi与Whj,确定参考顶点与其它顶点间的相关程度后,对相关程度进行Softmaxj处理即归一化处理,得到参考顶点与其它顶点间的相关值。
需要说明的是,这里采用注意力机制确定参考顶点与其它顶点间的相关程度的过程,具体可以是,对顶点i、j的特征Whi、Whj进行拼接,然后将拼接得到的特征和一个维度为2F的权重向量a计算内积,从而再通过激活函数,得到参考顶点与其它顶点间的相关值,即
其中,Ni指示与参考节点i存在连接关系的所有其它节点组成的域,q表示该域中任一顶点,Whi||Whj指示对顶点i、j的特征Whi、Whj进行拼接得到的拼接特征,exp指示以自然常数e为底的指数函数,LeakyReLU为非线性激活函数,a为一个大小为2F的权重向量。
需要说明的是,对于确定相关程度的方法,还可以基于参考顶点的顶点特征、其它顶点的顶点特征、以及顶点间的连接关系,直接计算参考顶点与其它顶点间的相关程度;其中,相关程度的计算方法存在多种,例如皮尔逊相关系数(Pearson)、斯皮尔曼相关系数(Spearman's rank correlation coefficient)等。
步骤1033,基于相关值、以及其它顶点的顶点特征,确定参考顶点的局部特征。
在实际实施时,在得到相关值之后,当其它顶点的数量为一个时,基于相关值、以及其它顶点的顶点特征,确定参考顶点的局部特征的过程,可以是,将相关值与其它顶点的顶点特征进行求积处理,得到求积结果;基于求积结果,确定参考顶点对应的局部特征,即
hi`=σαijWhj    公式(5);
其中,σ为激活函数,αij为参考顶点i与其它顶点j间的相关值,Whj指示对应其它顶点j的顶点特征,hi`为参考顶点对应的局部特征。
需要说明的是,当其它顶点的数量为多个时,基于相关值、以及其它顶点的顶点特征,确定参考顶点的局部特征的过程,可以是,针对各其它顶点,将相关值与相应其它顶点的顶 点特征进行求积处理,得到其它顶点的求积结果;对各其它顶点的求积结果进行累计求和,得到求和结果;基于求和结果,确定参考顶点对应的局部特征,即
其中,σ为激活函数,αij为参考顶点i与其它顶点j间的相关值,Whj指示对应其它顶点j的顶点特征,Ni指示与参考节点i存在连接关系的所有其它节点组成的域。
对于基于各顶点的局部特征,确定待检测对象的局部特征的过程,具体地,基于各顶点的局部特征,对各顶点的局部特征进行特征融合,得到融合特征;将融合特征作为待检测对象的局部特征。
步骤104,基于顶点特征、全局特征以及局部特征,对待检测对象的关键点进行检测,得到待检测对象的关键点在待检测对象上的位置。
在一些实施例中,基于顶点特征、全局特征以及局部特征,对待检测对象的关键点进行检测,得到待检测对象的关键点在待检测对象上的位置的过程,可以是,对顶点特征、全局特征以及局部特征进行特征拼接,得到待检测对象的拼接特征;基于拼接特征,对待检测对象的关键点进行检测,得到待检测对象的关键点在待检测对象上的位置。如此,由于拼接特征包含了待检测对象的顶点特征、全局特征以及局部特征的特征信息,基于拼接特征对待检测对象的关键点进行检测,从而结合了顶点特征、全局特征以及局部特征的特征信息,也即通过更丰富的特征信息,对待检测对象的关键点进行检测,提高了关键点检测结果的准确率
需要说明的是,由于在相关技术的基于三维坐标回归的方法中,关键点周围的点也可能与关键点相像,因此,很难准确地通过一个像素位置定义关键点,而本申请中的三维热力图是一种通过对色块着色来显示多个数据的统计图表,即依据指定的颜色映射规则来显示各数据,如较大的值由深色表示,较小的值由浅色表示;或较大的值由暖色调表示,较小的值由冷色调表示等,如此,通过输出三维热力图,同时显示关键点归属于各顶点的可能性,从而能够更好地确保检测结果的局部准确性。
示例性地,参见图7,图7是本申请实施例提供的关键点在待检测对象上的位置的示意图,基于图7,图7中黑色点为关键点,当待检测对象为人脸时,如图7所示的关键点的位置可以人脸中五官所在位置,其中,虚线框701中的黑色点为指示人脸中额头位置的关键点,虚线框702和703中的黑色点为指示人脸中眼睛位置的关键点,704和705所指示的黑色点为指示人脸中耳朵位置的关键点,虚线框706中的黑色点为指示人脸中鼻子位置的关键点,虚线框707中的黑色点为指示人脸中嘴巴位置的关键点,708和709所指示的黑色点为指示人脸中脸颊位置的关键点,虚线框710中的黑色点为指示人脸中下巴位置的关键点。这里,通过输出层,基于顶点特征、全局特征以及局部特征,对待检测对象的五官所处位置进行检测,得到关键点在三维网格中各顶点的概率,也即三维网格中各顶点为每一个五官所处位置对应的关键点的概率,从而基于各概率,生成对应三维网格的三维热力图,进而基于三维热力图,确定待检测对象的关键点在待检测对象上的位置,也即针对每一个五官所处位置对应的关键点,从多个概率中选择概率最大的顶点,确定为相应关键点,从而基于得到关键点,确定五官所处位置。
在一些实施例中,这里的关键点检测方法还可以应用于三维网络模型,三维网络模型至少包括第一特征提取层、第二特征提取层、第三特征提取层以及输出层,参见图8,图8是本申请实施例提供的三维网络模型的结构示意图,基于图8,对三维网格的顶点进行特征提取,得到三维网格的顶点特征的过程,可以是,通过第一特征提取层,对三维网格的顶点进行特征提取,得到三维网格的顶点特征;基于顶点特征,对待检测对象进行全局特征提取,得到待检测对象的全局特征,并基于顶点特征、以及顶点间的连接关系,对待检测对象进行局部特征提取,得到待检测对象的局部特征的过程,可以是,通过第二特征提取层,基于顶点特征,对待检测对象进行全局特征提取,得到待检测对象的全局特征,并通过第三特征提 取层,基于顶点特征、以及顶点间的连接关系,对待检测对象进行局部特征提取,得到待检测对象的局部特征;基于顶点特征、全局特征以及局部特征,对待检测对象的关键点进行检测,得到待检测对象的关键点在待检测对象上的位置的过程,可以是,通过输出层,结合顶点特征、全局特征以及局部特征,对待检测对象的关键点进行检测,得到待检测对象的关键点在待检测对象上的位置。
如此,通过三维网络模型检测关键点在待检测对象上的位置,提高了检测出的位置的准确性。
在一些实施例中,这里的第三特征提取层可以包括至少两个第三特征提取子层以及特征拼接子层,示例性地,参见图9,图9是本申请实施例提供的第三特征提取层的结构示意图,基于图9,通过第三特征提取层,基于顶点特征、以及顶点间的连接关系,确定各顶点的局部特征的过程,可以是,通过各第三特征提取子层,针对各所述顶点执行以下处理:将顶点确定为参考顶点,并基于三维网格中各顶点的顶点特征,确定参考顶点的顶点特征、以及其它顶点的顶点特征;基于参考顶点的顶点特征、其它顶点的顶点特征、以及顶点间的连接关系,确定参考顶点与其它顶点间的相关值;基于相关值、以及其它顶点的顶点特征,确定参考顶点的局部子特征;通过特征提取子层,将通过各第三特征提取子层得到的局部子特征进行拼接,得到参考顶点的局部特征,即
其中,k为第三特征提取子层的层数,Ni指示与参考节点i存在连接关系的所有其它节点组成的域,σ为激活函数,αij为参考顶点i与其它顶点j间的相关值,Whj指示对应其它顶点j的顶点特征,concat表示采用拼接处理。
需要说明的是,这里基于参考顶点的顶点特征、其它顶点的顶点特征、以及顶点间的连接关系,确定参考顶点与所述其它顶点间的相关值的过程与前述过程相同,同时基于相关值、以及其它顶点的顶点特征,确定参考顶点的局部子特征的过程与前述基于相关值、以及其它顶点的顶点特征,确定参考顶点的局部特征的过程也相同,对此不做赘述。
在一些实施例中,三维网络模型还包括第一特征拼接层、第二特征拼接层、第四特征提取层,示例性地,参见图10,图10是本申请实施例提供的三维网络模型的结构示意图,基于图10,通过输出层,结合顶点特征、全局特征以及局部特征,对待检测对象的关键点进行检测,得到关键点在待检测对象上的位置的过程,可以是,通过第一特征拼接层,对顶点特征、全局特征以及局部特征进行特征拼接,得到待检测对象的拼接特征;通过第四特征提取层,基于拼接特征,对待检测对象进行局部特征提取,得到待检测对象的目标局部特征;通过第二特征拼接层,对拼接特征、全局特征以及目标局部特征进行特征拼接,得到待检测对象的目标拼接特征;通过输出层,基于目标拼接特征,对待检测对象的关键点进行检测,得到待检测对象的关键点在待检测对象上的位置。
需要说明的是,三维网络模型中还可以包括第五特征提取层以及第三特征拼接层,从而通过第五特征提取层,基于目标拼接特征,对待检测对象进行局部特征提取,得到第二目标局部特征,然后再通过第三特征拼接层,对目标拼接特征、第二目标局部特征以及全局特征进行特征拼接,得到第二目标拼接特征,最后通过输出层,基于第二目标拼接特征,对待检测对象的关键点进行检测,得到待检测对象的关键点在待检测对象上的位置。这里,对于三维网络模型中确定待检测对象的局部特征以及相应拼接特征的过程,三维网络模型中的特征提取层以及特征拼接层的数量可以是多个,而通过多个特征提取层以及特征拼接层,得到最终拼接特征的过程与前文所述,对此本申请实施例不做赘述。
需要说明的是,第四特征提取层、第五特征提取层与第三特征提取层的层结构相同,对特征的处理过程也相同;而第二特征拼接层、第三特征拼接层与特征拼接层的层结构相同,对特征的处理过程也相同。通过第四特征提取层,对拼接特征进行进一步的特征处理,得到 更精确的目标局部特征,同时基于第二特征拼接层,将拼接特征、全局特征以及所得到的目标局部特征进行特征拼接,以基于特征拼接得到的目标拼接特征,对待检测对象的关键点进行检测;相应的,通过第五特征提取层,对目标拼接特征进行进一步的特征处理,得到更精确的第二目标局部特征,同时基于第三特征拼接层,将目标拼接特征、全局特征以及所得到的第二目标局部特征进行特征拼接,以基于特征拼接得到的第二目标拼接特征,对待检测对象的关键点进行检测。
如此,通过设置相同结构的特征提取层以及相同结构的特征拼接层,进行重复多次对待检测对象的局部特征提取以及相应特征拼接的过程,提高了所提取到的特征的准确性,从而提高了关键点检测结果的准确率。
在一些实施例中,在基于三维网络模型对待检测对象的关键点进行检测之前,还需要对三维网络模型进行训练,从而基于训练完成的三维网络模型对待检测对象的关键点进行检测,具体地,参见图11,图11是本申请实施例提供的三维网络模型的训练过程的流程示意图,基于图11,三维网络模型的训练过程可以通过以下步骤实现。
步骤201,服务器获取携带标签的对象训练样本,标签用于指示对象训练样本的关键点的真实位置。
步骤202,获得用于表征对象训练样本的训练三维网格,并确定训练三维网格的顶点以及顶点间的连接关系。
需要说明的是,在获得用于表征对象训练样本的训练三维网格后,还可以对训练三维网格进行数据增强,从而通过增强后的训练三维网格,对三维网络模型进行训练。具体地,对训练三维网格进行数据增强的方法分为面片简化和稠密化。
在一些实施例中,当对训练三维网格进行面片简化时,可以依据边优化的方式,即每次查找各顶点间最小的边,并将相应两顶点合并为一个顶点,具体地,获取任意两个顶点间的边,并对各个边进行比较,从而基于比较结果,从各个边中选择最小的边,作为目标边,然后获取目标边对应的两个顶点,并将这两个顶点合并为一个顶点,从而得到增强后的训练三维网格。示例性地,参见图12,图12是本申请实施例提供的三维网格面片简化的示意图,基于图12,其中,这里有v1到v10这10个顶点,基于这10个顶点,形成v1v2、v1v3、v1v4、v1v10、v1v9、v1v2、v1v8、v5v2、v7v2、v6v2这10条边,而v1和v2间的边为最小的边,从而然后将这两个顶点合并为一个顶点v,从而得到增强后的训练三维网格。
在另一些实施例中,当对训练三维网格进行面片稠密化时,则是对面积较大的面片优先进行重心坐标的计算,然后基于该重心坐标,将原本的三面片一分为三。具体地,获取至少一个面片,然后对面片进行比较,基于比较结果,从多个面片中选择面积最大的面片,作为目标面片;确定目标面片的重心、以及对应目标面片的三个顶点,然后基于该重心坐标以及三个顶点,将原本的三面片一分为三。示例性地,参见图13,图13是本申请实施例提供的三维网格面片稠密化的示意图,基于图13,其中,这里有A到I这9个顶点,基于这9个顶点,形成8个三角形面片,即顶点A、B、C间的面片、顶点A、B、I间的面片、顶点H、B、I间的面片、顶点H、B、G间的面片、顶点F、B、G间的面片、顶点F、B、E间的面片、顶点D、B、E间的面片、顶点D、B、C间的面片,而这里顶点A、B、C间的面片为面积最大的目标面片,确定目标面片的重心即P以及对应的顶点A、B、C,从而基于P、A、B、C将原本的目标面片一分为三,进而得到增强后的训练三维网格。
需要说明的是,这里可以通过预先设定目标顶点数,来结束对训练三维网格的数据增强过程,具体地,在对训练三维网格的数据增强过程中,获取增强后的训练三维网格的顶点数,将该顶点数与预先设定的目标顶点数进行比对,基于比对结果,结束对训练三维网格的数据增强。这里,当对训练三维网格进行面片简化时,当比对结果表征顶点数小于目标顶点数时,结束对训练三维网格的数据增强;当对训练三维网格进行面片稠密化时,当比对结果表征顶点数大于目标顶点数时,结束对训练三维网格的数据增强。
步骤203,通过第一特征提取层,对对象训练样本的顶点进行特征提取,得到训练三维网格的顶点特征。
步骤204,通过第二特征提取层,基于训练三维网格的顶点特征,对对象训练样本进行全局特征提取,得到对象训练样本的全局特征,并通过第三特征提取层,基于训练三维网格的顶点以及顶点间的连接关系,对对象训练样本进行局部特征提取,得到对象训练样本的局部特征。
步骤205,通过输出层,基于训练三维网格的顶点特征、对象训练样本的全局特征以及对象训练样本的局部特征,对对象训练样本的关键点进行检测,得到对象训练样本的关键点在对象训练样本上的位置。
在实际实施时,三维网络模型还包括第一特征拼接层,从而通过输出层,基于训练三维网格的顶点特征、对象训练样本的全局特征以及对象训练样本的局部特征,对对象训练样本的关键点进行检测,得到对象训练样本的关键点在对象训练样本上的位置的过程,可以是,通过第一特征拼接层,对训练三维网格的顶点特征、对象训练样本的全局特征以及对象训练样本的局部特征进行特征拼接,得到对象训练样本的拼接特征;通过输出层,基于对象训练样本的拼接特征,对对象训练样本的关键点进行检测,得到对象训练样本的关键点在对象训练样本上的位置。
步骤206,获取对象训练样本的关键点的位置与标签的差异,并基于差异训练三维网络模型,得到目标三维网络模型;其中,所述目标三维网络模型用于对待检测对象进行关键点检测,得到待检测对象的关键点在待检测对象上的位置。
下面,继续对本申请实施例提供的关键点检测方法进行介绍,参见图14,图14是本申请实施例提供的关键点检测方法的流程示意图,基于图14,本申请实施例提供的关键点检测方法由客户端、服务器协同实施。
步骤301,客户端响应于携带标签的对象训练样本的上传操作,获取携带标签的对象训练样本。
在实际实施时,客户端可以是设置于终端的关键点检测客户端,用户基于该客户端的人机交互界面,触发人机交互界面中的上传功能项使客户端在人机交互界面呈现对象选择界面,用户则基于该对象选择界面,从终端本地上传携带标签的对象训练样本,从而使得客户端获得上传的对象训练样本。
在一些实施例中,对象训练样本还可以是由与终端通信连接的摄像头拍摄得到,摄像头在拍摄得到对象训练样本后,对对象训练样本标注标签,再将携带标签的该对象训练样本传输给终端并由终端自动上传至客户端。
步骤302,客户端发送对象训练样本至服务器。
步骤303,服务器将接收到的对象训练样本输入至三维网络模型。
步骤304,基于三维网络模型,对对象训练样本的关键点进行检测,得到对象训练样本的关键点的位置。
步骤305,获取对象训练样本的关键点的位置与标签的差异,并基于差异训练三维网络模型。
在实际实施时,服务器通过迭代上述训练过程,直至损失函数达到收敛,完成对三维网络模型的训练。
步骤306,服务器生成三维网络模型训练完成的提示消息。
步骤307,发送提示消息至客户端。
步骤308,客户端响应于待检测对象对应的点云数据的上传操作,获取待检测对象对应的点云数据。
在实际实施时,待检测对象对应的点云数据可以是预先存储于终端本地的,也可以是从外界(如互联网)中获取到的,还可以是实时采集的,例如通过三维扫描装置如三维扫描仪 实时采集到的。
步骤309,客户端响应于针对待检测对象的关键点检测指令,发送待检测对象对应的点云数据至服务器。
在实际实施时,针对待检测对象的关键点检测指令可以是由一定的触发条件由客户端自动生成,例如客户端获取到待检测对象对应的点云数据后则自动生成针对待检测对象的关键点检测指令,还可以是由与终端通信连接的其他设备发送给客户端,还可以是由用户基于客户端的人机交互界面,触发相应的确定功能项后生成。
步骤310,服务器将接收到的待检测对象对应的点云数据输入至三维网络模型,以使三维网络模型对待检测对象进行关键点检测,得到用于指示待检测对象的关键点在待检测对象上的位置的三维热力图。
步骤311,发送用于指示待检测对象的关键点在待检测对象上的位置的三维热力图至客户端。
步骤312,客户端展示用于指示待检测对象的关键点在待检测对象上的位置的三维热力图。
在实际实施时,客户端可以在该客户端的人机交互界面中展示三维热力图,还可以将三维热力图保存至终端本地,还可以将三维热力图发送至与终端通信连接的其他设备等。
应用本申请上述实施例,获得待检测对象对应的三维网格,再通过双路特征提取层的搭建,基于三维网格得到的顶点特征以及顶点间的连接关系,分别提取待检测对象全局特征以及局部特征,从而基于三维网格得到的顶点特征、提取得到的全局特征以及局部特征,得到待检测对象上关键点的位置。如此,通过多层特征提取层提取到待检测对象更丰富的特征信息,再依据丰富的特征信息对待检测对象的关键点进行检测,使得三维关键点检测的准确率得到显著提高。
下面,将说明本申请实施例在一个实际的应用场景中的示例性应用。
发明人发现,三维人脸角色的关键点检测一般分为两大类,第一个大类是基于传统几何分析的方法。一般是利用诸如锐利边检测、曲率计算、二面角计算、法向量计算以及一些特定的几何规则来直接定位三维头模的语义关键点。例如,可以假定三维坐标系中z方向最大的顶点是鼻尖关键点,在鼻尖下方检测锐利边,结合对称关系可以粗略定位到左右嘴角关键点大致的区域;第二大类是基于深度学习的方法,这一大类方法基本上会先把三维头模渲染成二维图像,然后再利用二维卷积神经网络提取特征,检测相应的关键点。值得注意的是,这类方法还可以根据是否多视角检测和是否直接回归三维关键点划分出不同的组合方法。比如,一种常见的组合方法是只渲染三维头模的正面视图,并记录下这个渲染投影关系,然后在二维的正面视图上检测二维的关键点坐标,最后基于已知的投影关系反向投影至三维空间,得到最终的三维关键点坐标。另一种组合方法则是,渲染多个视图(例如正面,侧面),然后分别输入到神经网络模型不同的分支中,使得神经网络模型结合二者的特征,去直接回归三维关键点的坐标。
然而,对于上述第一类方法,传统的基于几何分析的关键点定位方法十分依赖于人工设定的规则,例如在检测锐利边的时候,需要指定一个阈值,这是一个经验性的数值,很难以适用于形态各异的头模,因此该方法的鲁棒性较差;而对于上述第二类方法,基于二维卷积神经网络的方法在传统的二维图像关键点检测任务上取得了很大的成功,但是直接将二维卷积神经网络应用的三维关键点的检测却存在着多方面的制约和不足,具体地,第一,可获取的三维人脸模型数量远远少于人脸图片,也就是说数据集是较为匮乏的,因此难以使得神经网络发挥功效;第二,从三维人脸头模渲染为二维图片的方式,必然会损失三维几何信息,例如对于正视图,势必缺乏后脑勺的信息,如果有必要检测后脑勺的关键点,那么在信息缺失的情况下,检测自然无法谈起;第三,如果要采用多视图的方式尽量规避信息缺失的问题,此时会通过多分支的网络提取特征,最后神经网络再融合回归三维坐标,这样,不同视图之 间的内在联系需要神经网络去学习,可能存在难以收敛的问题,从而增加了训练的难度。
基于此,本申请实施例提供一种关键点检测方法、装置、电子设备、计算机可读存储介质以及计算机程序产品,能够有效地解决上述技术方法的多种缺点,具体地,首先,通过面片简化和稠密化将三维人脸模型数据集增强,解决了三维头模数据集相对缺乏的问题,使得监督式的深度学习有了训练数据保障。其次,基于图神经网络结构,直接在三维空间上应用神经卷积模块,规避了渲染视图二维空间下检测方法天然丢失三维几何信息的问题,同时也解决了不同视图带来的内在联系难以学习的问题。最后,将传统意义上的二维热力图扩展为三维热力图,相比于直接回归三维坐标的方式,三维热力图能够更好地确保检测结果的局部准确性。
接下来,从产品侧对本申请技术方案进行说明。这里,本申请提出了一种基于图神经网络结构和三维热力图的三维人脸关键点检测方法。本方法可整合至角色动画工具集中,配合非刚性包裹算法完成对不同头部模型之间的变形匹配流程,这里具体的产品形态可以是一个控件,通过响应于针对控件的触发操作,发送携带待检测三维头模的相关数据的关键点检测请求至部署了本申请技术方案的远端服务器,从而获取返回结果。这里,远端服务器部署的方式有利于迭代优化算法,不需要本地插件代码更新,从而节省了本地计算机资源。
接下来,从技术侧对本申请技术方案进行说明。
首先,对本申请技术方案中的图卷积神经网络结构进行说明,具体地,由于三维模型(三维网络模型)是天然存在图结构关系的,同时这种关系又不像二维图像像素点那样紧凑规律排布,因此直接采用传统卷积神经网络是不合时宜的,因此这里引入了一种经典的图注意力网络(Graph Attention Network,GAT)。这里,对于图卷积神经网络结构包括的GAT基础网络,如公式(1)所示,假设图结构(三维网格)包含N个节点(顶点),每个节点的特征向量(顶点特征)为h,维度是F,然后假定节点j是节点i的邻居(也就是i和j存在边连接关系),则可以使用注意力机制计算节点j对于节点i的重要性(相关值),如公式(2)和公式(3)所示。具体地,使用注意力机制计算节点j对于节点i的重要性的过程,可以是,对节点i、j的特征Whi、Whj进行拼接,然后将拼接得到的特征和一个维度为2F的权重向量a计算内积,如公式(4)所示。从而,基于节点j对于节点i的重要性,确定节点i的特征向量(局部特征),如公式(6)所示。
在实际应用中,还可以采用多层GAT拼接的方式即通过K个注意力机制得到对应节点i的K个特征向量(局部子特征),然后将K个特征向量进行拼接,得到最终对应节点i的特征向量(局部特征),如公式(7)所示。如此,基于GAT不依赖于完整的图结构,只依赖于边(顶点间的连接关系)的特点,提高关键点检测过程的灵活性。同时采用注意力机制,还可以为不同的邻居节点分配不同的权重,提高了关键点检测过程的准确性。
这里,在对图注意力网络(Graph Attention Network,GAT)说明之后,参见图15,图15是本申请实施例提供的图卷积神经网络结构示意图,这里以GAT为基础构造了如图15所示的三维头模关键点自动检测神经网络。基于图15,输入数据即顶点数据为N*(6+X)(三维网格的顶点),N表示三维模型(三维网格)顶点数目,6则是顶点坐标与法向量所占据的维度,X则包含了三维头模顶点(三维网格)的其他特性,包括曲率、纹理信息等。这些其他特性随数据和任务的不同可加以调整,一般来讲,输入特性越丰富,越有助于神经网络的学习。而Aij则是顶点连接关系矩阵(顶点间的连接关系),大小为N*N,其值为0或1,如i,j两个顶点相连则Aij为1,否则为0。
基于图15,多层感知器(Multilayer Perceptron,MLP)代表多层全连接感知层,顶点数据(三维网格的顶点)先经过一个隐藏层维度为[128,64]的MLP模块,得到初步的隐藏层特征X1(顶点特征),然后分为两路(全局特征提取以及局部特征提取),一路继续经过MLP模块([512,1024]),继而对输出的特征X2进行最大池化,从而获取全局特征信息X3,然后供所有N个顶点共享,进而确定全局特征N×X3。另一路则经过3组GAT模块,每个GAT 模块包括了8层注意力基础网络(heads),这里,3组GAT模块的输出层拼接到一起,从而确定局部特征。最后两路特征拼接,输入最终的MLP模块([1024,512,K]),得到最终的N*K(K为关键点的数目)的三维热力图数据,将该数据在三维头模上可视化,即得到有N张三维热力图。
需要说明的是,由于GAT模块和MLP模块的特点,同一网络结构并不需要固定顶点数目N,这就使得无论是训练阶段,还是实际使用阶段,不同顶点数目的三维人脸模型都可以作为该神经网络模型的输入,从而提高了本申请的适用性。
其次,对本申请技术方案中的三维热力图进行说明。由于三维网格的热力图不再有二维图像坐标紧凑的结构,因此相较于二维热力图中利用欧氏距离,三维热力图这里则是利用测地线距离。如此,在三维网格层面上,基于两点之间的测地线距离验证网格图结构上的最短路径,相比两点之间的欧氏距离更能体现出三维表面的特性。示例性地,参见图16,图16是本申请实施例提供的测地线距离和欧氏距离的对比图,基于图16,如1602所指示的两顶点间的直线为欧氏距离,而如1601所指示的曲线则是对应的测地线距离。
需要说明的是,当图神经网络训练完毕并投入使用时,需要将神经网络输出的三维热力图进一步转化为最终的三维关键点坐标。这里,传统的二维热力图转化二维坐标的方式包括:首先获取概率最大值所在的顶点坐标(称为argmax方法);然后加权多个顶点坐标的softmax概率期望(也即soft-argmax方法),从而得到最终的三维关键点坐标。对本申请而言,考虑到利用soft-argmax方法将会加权多个三维坐标,其结果将未必落在三维网格平面上,因此这里直接采用argmax方法即获取概率最大值所在的顶点坐标,从而确定最终的三维关键点坐标。
最后,对本申请的数据增强方法进行说明。
需要说明的是,不同于二维人脸图像和二维关键点数据,三维人脸网格数据十分难以大量获取。数据的缺乏是困扰神经网络监督学习的一大问题。只有当数据集足够大,并且能够覆盖不同的人脸形态,才有可能使得图神经网络从中学习到足够的检测能力,但是三维人脸关键点数据集却较难获取,而三维人脸关键点数据集之所以难以获取,表现在以下几个方面。具体地,首先,三维网格人脸数据本身就是由美术人员制作的,这一制作过程相对麻烦。而二维图像的生成,则仅仅需要按一下相机快门,因此无论是互联网还是学术界公开的数据集中,二维人脸图像已经十分丰富,而对应的三维人脸数据则十分缺乏。其次,对于关键点检测任务来说,需要事先对关键点进行人工标注(或者是通过既有算法的初步自动检测以及人工后期的少量修正来实现标注),而二维关键点标注工作已有许多前人做过,标注工具也不复杂,本质上只需要标注图片上的某一个像素即可;而三维网格的关键点标记,难度将大增,例如标注者很难以对脸部轮廓进行确认。因此,在三维数据本身比较缺乏的情况下,无法基于已有的三维人脸头模,开发相应的标注工具,进行三维关键点人工标注。基于此,本申请的技术方案将基于面片简化和稠密化,对已有的三维人脸模型数据进行数据增强,从而为图神经网络提供标准化的合理的训练数据。
这里,数据增强方法分为面片简化和稠密化。对于面片简化而言,可以基于边优化的方式,即每次通过查找节点间最小的边,将之合并为一个顶点,如图12所示。对于面片稠密化而言,则是对面积较大的面片优先进行重心坐标的计算,然后基于该重心坐标,将原本的三面片一分为三,如图13所示。这里,稠密化和面片简化都可以用最终的目标顶点数来控制其操作的终止。
如此,本申请通过自动检测三维游戏头模的特定关键点,能够为后续的三维头模配准工作提供准确可靠的关键点依据。相比于传统的人工标注,再做头模配准的方式,本申请可以规避人工的过多参与,使得三维头模配准等依赖关键点的工作得以自动化完成。这将大大节省了美术人员的人力投入,从而加速整个模型角色动画相关的制作过程。
进一步地,本申请基于图神经网络深度监督学习,能够准确地预测出三维关键点的位置,具有较强的鲁棒性。同时深度学习模型的正向演算速度极快,算法整体上只需1秒就能完成 自动标注,与之相对的人工方式则往往需要耗费数分钟,因此本申请在效率上具有较大的实际价值。此外,本申请不限定输入的三维人脸模型顶点的数目,当进行监督学习训练之后,产生的深度学习模型可以广泛运用于顶点稠密程度不一的三维头模关键点自动检测任务,适用性较强。
应用本申请上述实施例,获得待检测对象对应的三维网格,再通过双路特征提取层的搭建,基于三维网格得到的顶点特征以及顶点间的连接关系,分别提取待检测对象全局特征以及局部特征,从而基于三维网格得到的顶点特征、提取得到的全局特征以及局部特征,得到待检测对象上关键点的位置。如此,通过多层特征提取层提取到待检测对象更丰富的特征信息,再依据丰富的特征信息对待检测对象的关键点进行检测,使得三维关键点检测的准确率得到显著提高。
下面继续说明本申请实施例提供的关键点检测装置455的实施为软件模块的示例性结构,在一些实施例中,如图2所示,存储在存储器450的关键点检测装置455中的软件模块可以包括:
获得模块4551,配置为获得用于表征待检测对象的三维网格,并确定所述三维网格的顶点、以及顶点间的连接关系;
第一特征提取模块4552,配置为对所述三维网格的顶点进行特征提取,得到所述三维网格的顶点特征;
第二特征提取模块4553,配置为基于所述顶点特征,对所述待检测对象进行全局特征提取,得到所述待检测对象的全局特征,并基于所述顶点特征、以及所述顶点间的连接关系,对所述待检测对象进行局部特征提取,得到所述待检测对象的局部特征;
输出模块4554,配置为基于所述顶点特征、所述全局特征以及所述局部特征,对所述待检测对象的关键点进行检测,得到所述待检测对象的关键点在所述待检测对象上的位置。
在一些实施例中,所述获得模块4551,还配置为通过三维扫描装置对所述待检测对象进行扫描,得到所述待检测对象的几何表面的点云数据;基于所述点云数据,构建对应所述待检测对象的三维网格。
在一些实施例中,所述第二特征提取模块4553,还配置为基于所述顶点特征、以及所述顶点间的连接关系,确定各所述顶点的局部特征;基于各所述顶点的局部特征,确定所述待检测对象的局部特征。
在一些实施例中,所述第二特征提取模块4553,还配置为针对各所述顶点执行以下处理:将所述顶点确定为参考顶点,并基于所述三维网格中各顶点的顶点特征,确定所述参考顶点的顶点特征、以及其它顶点的顶点特征;其中,所述其它顶点为除所述参考顶点以外的任一顶点;基于所述参考顶点的顶点特征、所述其它顶点的顶点特征、以及所述顶点间的连接关系,确定所述参考顶点与所述其它顶点间的相关值;其中,所述相关值用于指示所述参考顶点与所述其它顶点间的相关程度的大小;基于所述相关值、以及所述其它顶点的顶点特征,确定所述参考顶点的局部特征。
在一些实施例中,所述第二特征提取模块4553,还配置为基于所述参考顶点的顶点特征、所述其它顶点的顶点特征、以及所述顶点间的连接关系,采用注意力机制确定所述参考顶点与所述其它顶点间的相关程度;对所述相关程度进行归一化处理,得到所述参考顶点与所述其它顶点间的相关值。
在一些实施例中,当所述其它顶点的数量为一个时,所述第二特征提取模块4553,还配置为将所述相关值与所述其它顶点的顶点特征进行求积处理,得到求积结果;基于所述求积结果,确定所述参考顶点对应的局部特征。
在一些实施例中,当所述其它顶点的数量为多个时,所述第二特征提取模块4553,还配置为针对各所述其它顶点,将所述相关值与相应所述其它顶点的顶点特征进行求积处理,得到所述其它顶点的求积结果;对各所述其它顶点的求积结果进行累计求和,得到求和结果; 基于所述求和结果,确定所述参考顶点对应的局部特征。
在一些实施例中,所述第二特征提取模块4553,还配置为基于各所述顶点的局部特征,对各所述顶点的局部特征进行特征融合,得到融合特征;将所述融合特征作为所述待检测对象的局部特征。
在一些实施例中,所述输出模块4554,还配置为对所述顶点特征、所述全局特征以及所述局部特征进行特征拼接,得到所述待检测对象的拼接特征;基于所述拼接特征,对所述待检测对象的关键点进行检测,得到所述待检测对象的关键点在所述待检测对象上的位置。
在一些实施例中,所述输出模块4554,还配置为基于所述顶点特征、所述全局特征以及所述局部特征,对所述待检测对象的关键点进行检测,得到所述关键点在所述三维网格中各所述顶点的概率;基于所述概率,生成对应所述三维网格的三维热力图;基于所述三维热力图,确定所述待检测对象的关键点在所述待检测对象上的位置。
在一些实施例中,所述装置应用于三维网络模型,所述三维网络模型至少包括第一特征提取层、第二特征提取层、第三特征提取层以及输出层,所述第一特征提取模块4552,还配置为通过所述第一特征提取层,对所述三维网格的顶点进行特征提取,得到所述三维网格的顶点特征;所述第二特征提取模块4553,还配置为通过所述第二特征提取层,基于所述顶点特征,对所述待检测对象进行全局特征提取,得到所述待检测对象的全局特征,并通过所述第三特征提取层,基于所述顶点特征、以及所述顶点间的连接关系,对所述待检测对象进行局部特征提取,得到所述待检测对象的局部特征;所述输出模块4554,还配置为通过所述输出层,基于所述顶点特征、所述全局特征以及所述局部特征,对所述待检测对象的关键点进行检测,得到所述待检测对象的关键点在所述待检测对象上的位置。
在一些实施例中,所述三维网络模型还包括第一特征拼接层、第二特征拼接层、第四特征提取层,所述输出模块4554,还配置为通过所述第一特征拼接层,对所述顶点特征、所述全局特征以及所述局部特征进行特征拼接,得到所述待检测对象的拼接特征;通过所述第四特征提取层,基于所述拼接特征,对所述待检测对象进行局部特征提取,得到所述待检测对象的目标局部特征;通过所述第二特征拼接层,对所述拼接特征、所述全局特征以及所述目标局部特征进行特征拼接,得到所述待检测对象的目标拼接特征;通过所述输出层,基于所述目标拼接特征,对所述待检测对象的关键点进行检测,得到所述待检测对象的关键点在所述待检测对象上的位置。
下面继续说明本申请实施例提供的三维网络模型的训练装置的实施为软件模块的示例性结构,其中,三维网络模型至少包括第一特征提取层、第二特征提取层、第三特征提取层以及输出层,训练装置包括:
获取模块,配置为获取携带标签的对象训练样本,所述标签配置为指示对象训练样本的关键点的真实位置;
获得模块,配置为获得用于表征所述对象训练样本的训练三维网格,并确定所述训练三维网格的顶点以及顶点间的连接关系;
第一特征提取模块,配置为通过所述第一特征提取层,对所述对象训练样本的顶点进行特征提取,得到所述训练三维网格的顶点特征;
第二特征提取模块,配置为通过所述第二特征提取层,基于所述训练三维网格的顶点特征,对所述对象训练样本进行全局特征提取,得到所述对象训练样本的全局特征,并通过所述第三特征提取层,基于所述训练三维网格的顶点以及顶点间的连接关系,对所述对象训练样本进行局部特征提取,得到所述对象训练样本的局部特征;
输出模块,配置为通过所述输出层,基于所述训练三维网格的顶点特征、所述对象训练样本的全局特征以及所述对象训练样本的局部特征,对所述对象训练样本的关键点进行检测,得到所述对象训练样本的关键点在所述对象训练样本上的位置;
更新模块,配置为获取所述对象训练样本的关键点的位置与所述标签的差异,并基于所 述差异训练所述三维网络模型,得到目标三维网络模型;其中,所述目标三维网络模型用于对待检测对象进行关键点检测,得到所述待检测对象的关键点在所述待检测对象上的位置。
本申请实施例还提供一种电子设备,所述电子设备包括:
存储器,配置为存储计算机可执行指令;
处理器,配置为执行所述存储器中存储的计算机可执行指令时,实现本申请实施例上述的关键点检测方法,或三维网络模型的训练方法,例如,如图3示出的关键点检测方法,或者如图11示出的三维网络模型的训练方法
本申请实施例提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机可执行指令,该计算机可执行指令存储在计算机可读存储介质中。电子设备的处理器从计算机可读存储介质读取该计算机可执行指令,处理器执行该计算机可执行指令,使得该电子设备执行本申请实施例上述的关键点检测方法,或三维网络模型的训练方法,例如,如图3示出的关键点检测方法,或者如图11示出的三维网络模型的训练方法。
本申请实施例提供一种存储有计算机可执行指令的计算机可读存储介质,其中存储有计算机可执行指令,当计算机可执行指令被处理器执行时,将引起处理器执行本申请实施例提供的关键点检测方法,或三维网络模型的训练方法,例如,如图3示出的关键点检测方法,或者如图11示出的三维网络模型的训练方法。
在一些实施例中,计算机可读存储介质可以是FRAM、ROM、PROM、EPROM、EEPROM、闪存、磁表面存储器、光盘、或CD-ROM等存储器;也可以是包括上述存储器之一或任意组合的各种设备。
在一些实施例中,计算机可执行指令可以采用程序、软件、软件模块、脚本或代码的形式,按任意形式的编程语言(包括编译或解释语言,或者声明性或过程性语言)来编写,并且其可按任意形式部署,包括被部署为独立的程序或者被部署为模块、组件、子例程或者适合在计算环境中使用的其它单元。
作为示例,计算机可执行指令可以但不一定对应于文件系统中的文件,可以可被存储在保存其它程序或数据的文件的一部分,例如,存储在超文本标记语言(Hyper Text Markup Language,HTML)文档中的一个或多个脚本中,存储在专用于所讨论的程序的单个文件中,或者,存储在多个协同文件(例如,存储一个或多个模块、子程序或代码部分的文件)中。
作为示例,可执行指令可被部署为在一个电子设备上执行,或者在位于一个地点的多个电子设备上执行,又或者,在分布在多个地点且通过通信网络互连的多个电子设备上执行。
综上所述,通过本申请实施例具有以下有益效果:
(1)通过多层特征提取层提取到待检测对象更丰富的特征信息,再依据丰富的特征信息对待检测对象的关键点进行检测,使得三维关键点检测的准确率得到显著提高。
(2)通过GAT不依赖于完整的图结构,只依赖于边的特点,提高关键点检测过程的灵活性。同时采用注意力机制,还可以为不同的邻居节点分配不同的权重,提高了关键点检测过程的准确性。
(3)通过自动检测三维游戏头模的特定关键点,能够为后续的三维头模配准工作提供准确可靠的关键点依据。相比于传统的人工标注,再做头模配准的方式,本申请可以规避人工的过多参与,使得三维头模配准等依赖关键点的工作得以自动化完成。这将大大节省了美术人员的人力投入,从而加速整个模型角色动画相关的制作过程。
(4)基于图神经网络深度监督学习,能够准确地预测出三维关键点的位置,具有较强的鲁棒性。同时深度学习模型的正向演算速度极快,算法整体上只需1秒就能完成自动标注,与之相对的人工方式则往往需要耗费数分钟,因此本申请在效率上具有较大的实际价值。此外,本申请不限定输入的三维人脸模型顶点的数目,当进行监督学习训练之后,产生的深度学习模型可以广泛运用于顶点稠密程度不一的三维头模关键点自动检测任务,适用性较强。
以上所述,仅为本申请的实施例而已,并非用于限定本申请的保护范围。凡在本申请的 精神和范围之内所作的任何修改、等同替换和改进等,均包含在本申请的保护范围之内。

Claims (18)

  1. 一种关键点检测方法,所述方法由电子设备执行,所述方法包括:
    获得用于表征待检测对象的三维网格,并确定所述三维网格的顶点、以及顶点间的连接关系;
    对所述三维网格的顶点进行特征提取,得到所述三维网格的顶点特征;
    基于所述顶点特征,对所述待检测对象进行全局特征提取,得到所述待检测对象的全局特征,并基于所述顶点特征、以及所述顶点间的连接关系,对所述待检测对象进行局部特征提取,得到所述待检测对象的局部特征;
    基于所述顶点特征、所述全局特征以及所述局部特征,得到所述待检测对象的关键点在所述待检测对象上的位置。
  2. 如权利要求1所述的方法,其中,所述获得用于表征待检测对象的三维网格,包括:
    通过三维扫描装置对所述待检测对象进行扫描,得到所述待检测对象的几何表面的点云数据;
    基于所述点云数据,构建对应所述待检测对象的三维网格。
  3. 如权利要求1-2所述的方法,其中,所述基于所述顶点特征、以及所述顶点间的连接关系,对所述待检测对象进行局部特征提取,得到所述待检测对象的局部特征,包括:
    基于所述顶点特征、以及所述顶点间的连接关系,确定各所述顶点的局部特征;
    基于各所述顶点的局部特征,确定所述待检测对象的局部特征。
  4. 如权利要求3所述的方法,其中,所述基于所述顶点特征、以及所述顶点间的连接关系,确定各所述顶点的局部特征,包括:
    针对各所述顶点执行以下处理:
    将所述顶点确定为参考顶点,并基于所述三维网格中各顶点的顶点特征,确定所述参考顶点的顶点特征、以及其它顶点的顶点特征;
    其中,所述其它顶点为除所述参考顶点以外的任一顶点;
    基于所述参考顶点的顶点特征、所述其它顶点的顶点特征、以及所述顶点间的连接关系,确定所述参考顶点与所述其它顶点间的相关值;其中,所述相关值用于指示所述参考顶点与所述其它顶点间的相关程度的大小;
    基于所述相关值、以及所述其它顶点的顶点特征,确定所述参考顶点的局部特征。
  5. 如权利要求4所述的方法,其中,所述基于所述参考顶点的顶点特征、所述其它顶点的顶点特征、以及所述顶点间的连接关系,确定所述参考顶点与所述其它顶点间的相关值,包括:
    基于所述参考顶点的顶点特征、所述其它顶点的顶点特征、以及所述顶点间的连接关系,采用注意力机制确定所述参考顶点与所述其它顶点间的相关程度;
    对所述相关程度进行归一化处理,得到所述参考顶点与所述其它顶点间的相关值。
  6. 如权利要求4所述的方法,其中,当所述其它顶点的数量为一个时,所述基于所述相关值、以及所述其它顶点的顶点特征,确定所述参考顶点对应的局部特征,包括:
    将所述相关值与所述其它顶点的顶点特征进行求积处理,得到求积结果;
    基于所述求积结果,确定所述参考顶点对应的局部特征。
  7. 如权利要求4所述的方法,其中,当所述其它顶点的数量为多个时,所述基于所述相关值、以及所述其它顶点的顶点特征,确定所述参考顶点对应的局部特征,包括:
    针对各所述其它顶点,将所述相关值与相应所述其它顶点的顶点特征进行求积处理,得到所述其它顶点的求积结果;
    对各所述其它顶点的求积结果进行累计求和,得到求和结果;
    基于所述求和结果,确定所述参考顶点对应的局部特征。
  8. 如权利要求3所述的方法,其中,所述基于各所述顶点的局部特征,确定所述待检 测对象的局部特征,包括:
    基于各所述顶点的局部特征,对各所述顶点的局部特征进行特征融合,得到融合特征;
    将所述融合特征作为所述待检测对象的局部特征。
  9. 如权利要求1-8所述的方法,其中,基于所述顶点特征、所述全局特征以及所述局部特征,对所述待检测对象的关键点进行检测,得到所述待检测对象的关键点在所述待检测对象上的位置,包括:
    对所述顶点特征、所述全局特征以及所述局部特征进行特征拼接,得到所述待检测对象的拼接特征;
    基于所述拼接特征,对所述待检测对象的关键点进行检测,得到所述待检测对象的关键点在所述待检测对象上的位置。
  10. 如权利要求1-9所述的方法,其中,所述基于所述顶点特征、所述全局特征以及所述局部特征,对所述待检测对象的关键点进行检测,得到所述待检测对象的关键点在所述待检测对象上的位置,包括:
    基于所述顶点特征、所述全局特征以及所述局部特征,对所述待检测对象的关键点进行检测,得到所述关键点在所述三维网格中各所述顶点的概率;
    基于所述概率,生成对应所述三维网格的三维热力图;
    基于所述三维热力图,确定所述待检测对象的关键点在所述待检测对象上的位置。
  11. 如权利要求1-10所述的方法,其中,所述方法应用于三维网络模型,所述三维网络模型至少包括第一特征提取层、第二特征提取层、第三特征提取层以及输出层,所述对所述三维网格的顶点进行特征提取,得到所述三维网格的顶点特征,包括:
    通过所述第一特征提取层,对所述三维网格的顶点进行特征提取,得到所述三维网格的顶点特征;
    所述基于所述顶点特征,对所述待检测对象进行全局特征提取,得到所述待检测对象的全局特征,并基于所述顶点特征、以及所述顶点间的连接关系,对所述待检测对象进行局部特征提取,得到所述待检测对象的局部特征,包括:
    通过所述第二特征提取层,基于所述顶点特征,对所述待检测对象进行全局特征提取,得到所述待检测对象的全局特征,并通过所述第三特征提取层,基于所述顶点特征、以及所述顶点间的连接关系,对所述待检测对象进行局部特征提取,得到所述待检测对象的局部特征;
    所述基于所述顶点特征、所述全局特征以及所述局部特征,对所述待检测对象的关键点进行检测,得到所述待检测对象的关键点在所述待检测对象上的位置,包括:
    通过所述输出层,基于所述顶点特征、所述全局特征以及所述局部特征,对所述待检测对象的关键点进行检测,得到所述待检测对象的关键点在所述待检测对象上的位置。
  12. 如权利要求11所述的方法,其中,所述三维网络模型还包括第一特征拼接层、第二特征拼接层、第四特征提取层,所述通过所述输出层,基于所述顶点特征、所述全局特征以及所述局部特征,对所述待检测对象的关键点进行检测,得到所述待检测对象的关键点在所述待检测对象上的位置,包括:
    通过所述第一特征拼接层,对所述顶点特征、所述全局特征以及所述局部特征进行特征拼接,得到所述待检测对象的拼接特征;
    通过所述第四特征提取层,基于所述拼接特征,对所述待检测对象进行局部特征提取,得到所述待检测对象的目标局部特征;
    通过所述第二特征拼接层,对所述拼接特征、所述全局特征以及所述目标局部特征进行特征拼接,得到所述待检测对象的目标拼接特征;
    通过所述输出层,基于所述目标拼接特征,对所述待检测对象的关键点进行检测,得到所述待检测对象的关键点在所述待检测对象上的位置。
  13. 一种三维网络模型的训练方法,所述方法由电子设备执行,所述三维网络模型至少 包括第一特征提取层、第二特征提取层、第三特征提取层以及输出层,所述方法包括:
    获取携带标签的对象训练样本,所述标签用于指示对象训练样本的关键点的真实位置;
    获得用于表征所述对象训练样本的训练三维网格,并确定所述训练三维网格的顶点以及顶点间的连接关系;
    通过所述第一特征提取层,对所述对象训练样本的顶点进行特征提取,得到所述训练三维网格的顶点特征;
    通过所述第二特征提取层,基于所述训练三维网格的顶点特征,对所述对象训练样本进行全局特征提取,得到所述对象训练样本的全局特征,并通过所述第三特征提取层,基于所述训练三维网格的顶点以及顶点间的连接关系,对所述对象训练样本进行局部特征提取,得到所述对象训练样本的局部特征;
    通过所述输出层,基于所述训练三维网格的顶点特征、所述对象训练样本的全局特征以及所述对象训练样本的局部特征,对所述对象训练样本的关键点进行检测,得到所述对象训练样本的关键点在所述对象训练样本上的位置;
    获取所述对象训练样本的关键点的位置与所述标签的差异,并基于所述差异训练所述三维网络模型,得到目标三维网络模型;其中,所述目标三维网络模型用于对待检测对象进行关键点检测,得到所述待检测对象的关键点在所述对象训练样本上的位置。
  14. 一种关键点检测装置,所述装置包括:
    获得模块,配置为获得用于表征待检测对象的三维网格,并确定所述三维网格的顶点、以及顶点间的连接关系;
    第一特征提取模块,配置为对所述三维网格的顶点进行特征提取,得到所述三维网格的顶点特征;
    第二特征提取模块,配置为基于所述顶点特征,对所述待检测对象进行全局特征提取,得到所述待检测对象的全局特征,并基于所述顶点特征、以及所述顶点间的连接关系,对所述待检测对象进行局部特征提取,得到所述待检测对象的局部特征;
    输出模块,配置为基于所述顶点特征、所述全局特征以及所述局部特征,对所述待检测对象的关键点进行检测,得到所述待检测对象的关键点在所述待检测对象上的位置。
  15. 一种三维网络模型的训练装置,所述三维网络模型至少包括第一特征提取层、第二特征提取层、第三特征提取层以及输出层,所述装置包括:
    获取模块,配置为获取携带标签的对象训练样本,所述标签用于指示对象训练样本的关键点的真实位置;
    获得模块,配置为获得用于表征所述对象训练样本的训练三维网格,并确定所述训练三维网格的顶点以及顶点间的连接关系;
    第一特征提取模块,配置为通过所述第一特征提取层,对所述对象训练样本的顶点进行特征提取,得到所述训练三维网格的顶点特征;
    第二特征提取模块,配置为通过所述第二特征提取层,基于所述训练三维网格的顶点特征,对所述对象训练样本进行全局特征提取,得到所述对象训练样本的全局特征,并通过所述第三特征提取层,基于所述训练三维网格的顶点以及顶点间的连接关系,对所述对象训练样本进行局部特征提取,得到所述对象训练样本的局部特征;
    输出模块,配置为通过所述输出层,基于所述训练三维网格的顶点特征、所述对象训练样本的全局特征以及所述对象训练样本的局部特征,对所述对象训练样本的关键点进行检测,得到所述对象训练样本的关键点在所述对象训练样本上的位置;
    更新模块,配置为获取所述对象训练样本的关键点的位置与所述标签的差异,并基于所述差异训练所述三维网络模型,得到目标三维网络模型;其中,所述目标三维网络模型用于对待检测对象进行关键点检测,得到所述待检测对象的关键点在所述待检测对象上的位置。
  16. 一种电子设备,包括:
    存储器,配置为存储计算机可执行指令;
    处理器,配置为执行所述存储器中存储的计算机可执行指令时,实现权利要求1至12任一项所述的关键点检测方法,或者权利要求13所述的三维网络模型的训练方法。
  17. 一种计算机可读存储介质,存储有计算机可执行指令,用于引起处理器执行时,实现权利要求1至12任一项所述的关键点检测方法,或者权利要求13所述的三维网络模型的训练方法。
  18. 一种计算机程序产品,包括计算机程序或计算机可执行指令,所述计算机程序或计算机可执行指令被处理器执行时,实现权利要求1至12任一项所述的关键点检测方法,或者权利要求13所述的三维网络模型的训练方法。
PCT/CN2023/129915 2022-12-09 2023-11-06 关键点检测方法、训练方法、装置、电子设备、计算机可读存储介质及计算机程序产品 Ceased WO2024120096A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2025519628A JP2025534442A (ja) 2022-12-09 2023-11-06 キーポイント検出方法、訓練方法、装置、電子機器、及びコンピュータプログラム
EP23899677.1A EP4567724A4 (en) 2022-12-09 2023-11-06 KEY POINT DETECTION METHOD, DRIVE METHOD AND APPARATUS, ELECTRONIC DEVICE, COMPUTER-READABLE STORAGE MEDIA AND COMPUTER PROGRAM PRODUCT
US18/793,553 US20240394918A1 (en) 2022-12-09 2024-08-02 Keypoint detection method, training method, apparatus, electronic device, computer-readable storage medium, and computer program product

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211576832.9 2022-12-09
CN202211576832.9A CN115578393B (zh) 2022-12-09 2022-12-09 关键点检测方法、训练方法、装置、设备、介质及产品

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/793,553 Continuation US20240394918A1 (en) 2022-12-09 2024-08-02 Keypoint detection method, training method, apparatus, electronic device, computer-readable storage medium, and computer program product

Publications (1)

Publication Number Publication Date
WO2024120096A1 true WO2024120096A1 (zh) 2024-06-13

Family

ID=84590570

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/129915 Ceased WO2024120096A1 (zh) 2022-12-09 2023-11-06 关键点检测方法、训练方法、装置、电子设备、计算机可读存储介质及计算机程序产品

Country Status (5)

Country Link
US (1) US20240394918A1 (zh)
EP (1) EP4567724A4 (zh)
JP (1) JP2025534442A (zh)
CN (1) CN115578393B (zh)
WO (1) WO2024120096A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115578393B (zh) * 2022-12-09 2023-03-10 腾讯科技(深圳)有限公司 关键点检测方法、训练方法、装置、设备、介质及产品
CN115830642B (zh) * 2023-02-13 2024-01-12 粤港澳大湾区数字经济研究院(福田) 2d全身人体关键点标注方法及3d人体网格标注方法
CN116091570B (zh) * 2023-04-07 2023-07-07 腾讯科技(深圳)有限公司 三维模型的处理方法、装置、电子设备、及存储介质
CN117932607B (zh) * 2024-03-20 2024-09-24 山东省计算中心(国家超级计算济南中心) 一种勒索软件检测方法、系统、介质及设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109993819A (zh) * 2019-04-09 2019-07-09 网易(杭州)网络有限公司 虚拟角色的蒙皮方法及装置、电子设备
CN111179419A (zh) * 2019-12-31 2020-05-19 北京奇艺世纪科技有限公司 三维关键点预测及深度学习模型训练方法、装置及设备
CN112991502A (zh) * 2021-04-22 2021-06-18 腾讯科技(深圳)有限公司 一种模型训练方法、装置、设备及存储介质
WO2021213742A1 (en) * 2020-04-22 2021-10-28 Continental Automotive Gmbh Method and system for keypoint detection based on neural networks
CN115238723A (zh) * 2022-06-29 2022-10-25 厦门华联电子股份有限公司 一种局部顶点检测方法及装置
CN115578393A (zh) * 2022-12-09 2023-01-06 腾讯科技(深圳)有限公司 关键点检测方法、训练方法、装置、设备、介质及产品

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8605998B2 (en) * 2011-05-06 2013-12-10 Toyota Motor Engineering & Manufacturing North America, Inc. Real-time 3D point cloud obstacle discriminator apparatus and associated methodology for training a classifier via bootstrapping
EP3631687A1 (en) * 2017-07-05 2020-04-08 Siemens Aktiengesellschaft Semi-supervised iterative keypoint and viewpoint invariant feature learning for visual recognition
GB2583687B (en) * 2018-09-12 2022-07-20 Sony Interactive Entertainment Inc Method and system for generating a 3D reconstruction of a human
JP6659901B2 (ja) * 2019-08-01 2020-03-04 株式会社メルカリ プログラム、情報処理方法、及び情報処理装置
CN111489358B (zh) * 2020-03-18 2022-06-14 华中科技大学 一种基于深度学习的三维点云语义分割方法
CN112215180B (zh) * 2020-10-20 2024-05-07 腾讯科技(深圳)有限公司 一种活体检测方法及装置
CN113706480B (zh) * 2021-08-13 2022-12-09 重庆邮电大学 一种基于关键点多尺度特征融合的点云3d目标检测方法
CN114387445A (zh) * 2022-01-13 2022-04-22 深圳市商汤科技有限公司 对象关键点识别方法及装置、电子设备和存储介质
CN115082885A (zh) * 2022-06-27 2022-09-20 深圳见得空间科技有限公司 点云目标的检测方法、装置、设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109993819A (zh) * 2019-04-09 2019-07-09 网易(杭州)网络有限公司 虚拟角色的蒙皮方法及装置、电子设备
CN111179419A (zh) * 2019-12-31 2020-05-19 北京奇艺世纪科技有限公司 三维关键点预测及深度学习模型训练方法、装置及设备
WO2021213742A1 (en) * 2020-04-22 2021-10-28 Continental Automotive Gmbh Method and system for keypoint detection based on neural networks
CN112991502A (zh) * 2021-04-22 2021-06-18 腾讯科技(深圳)有限公司 一种模型训练方法、装置、设备及存储介质
CN115238723A (zh) * 2022-06-29 2022-10-25 厦门华联电子股份有限公司 一种局部顶点检测方法及装置
CN115578393A (zh) * 2022-12-09 2023-01-06 腾讯科技(深圳)有限公司 关键点检测方法、训练方法、装置、设备、介质及产品

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4567724A4

Also Published As

Publication number Publication date
EP4567724A4 (en) 2025-12-10
EP4567724A1 (en) 2025-06-11
CN115578393A (zh) 2023-01-06
US20240394918A1 (en) 2024-11-28
JP2025534442A (ja) 2025-10-15
CN115578393B (zh) 2023-03-10

Similar Documents

Publication Publication Date Title
WO2024120096A1 (zh) 关键点检测方法、训练方法、装置、电子设备、计算机可读存储介质及计算机程序产品
US12131416B2 (en) Pixel-aligned volumetric avatars
WO2024032464A1 (zh) 三维人脸重建方法及其装置、设备、介质、产品
CN112785712B (zh) 三维模型的生成方法、装置和电子设备
CN113822965B (zh) 图像渲染处理方法、装置和设备及计算机存储介质
JP7701932B2 (ja) 複数の特徴タイプに基づく効率的位置特定
CN113593001A (zh) 目标对象三维重建方法、装置、计算机设备和存储介质
CN114820907B (zh) 人脸图像卡通化处理方法、装置、计算机设备和存储介质
CN115994944B (zh) 关键点预测模型的训练方法、三维关键点预测方法及相关设备
US20250225777A1 (en) Three-dimensional model processing method and apparatus, electronic device, and computer storage medium
CN112463936A (zh) 一种基于三维信息的视觉问答方法及系统
CN110490959A (zh) 三维图像处理方法及装置、虚拟形象生成方法以及电子设备
CN116977548A (zh) 三维重建方法、装置、设备及计算机可读存储介质
WO2024179446A1 (zh) 一种图像处理方法以及相关设备
CN120726243B (zh) 场景补全方法、装置、电子设备和存储介质
Yang et al. Architectural sketch to 3D model: An experiment on simple-form houses
CN120339515A (zh) 图像处理方法、电子设备及计算机可读存储介质
CN118864676A (zh) 三维模型生成方法及装置、计算机程序产品和电子设备
HK40080387A (zh) 关键点检测方法、训练方法、装置、设备、介质及产品
HK40080387B (zh) 关键点检测方法、训练方法、装置、设备、介质及产品
CN116029912A (zh) 图像处理模型的训练、图像处理方法、装置、设备及介质
CN116524106B (zh) 一种图像标注方法、装置、设备及存储介质、程序产品
EP4600905A1 (en) Method and apparatus for determining three-dimensional layout information, device, and storage medium
CN117809357B (zh) 一种眼球模型的确定方法、装置及电子设备
CN117557699B (zh) 动画数据生成方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23899677

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023899677

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2023899677

Country of ref document: EP

Effective date: 20250303

ENP Entry into the national phase

Ref document number: 2025519628

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2025519628

Country of ref document: JP

WWP Wipo information: published in national office

Ref document number: 2023899677

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE