WO2024120096A1 - 关键点检测方法、训练方法、装置、电子设备、计算机可读存储介质及计算机程序产品 - Google Patents
关键点检测方法、训练方法、装置、电子设备、计算机可读存储介质及计算机程序产品 Download PDFInfo
- Publication number
- WO2024120096A1 WO2024120096A1 PCT/CN2023/129915 CN2023129915W WO2024120096A1 WO 2024120096 A1 WO2024120096 A1 WO 2024120096A1 CN 2023129915 W CN2023129915 W CN 2023129915W WO 2024120096 A1 WO2024120096 A1 WO 2024120096A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- features
- detected
- vertex
- vertices
- dimensional
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
- G06T7/75—Determining position or orientation of objects or cameras using feature-based methods involving models
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three-dimensional [3D] modelling for computer graphics
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three-dimensional [3D] modelling for computer graphics
- G06T17/20—Finite element generation, e.g. wire-frame surface description, tesselation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
- G06T2207/10012—Stereo images
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2210/00—Indexing scheme for image generation or computer graphics
- G06T2210/56—Particle system, point based geometry or rendering
Definitions
- the present application relates to the field of artificial intelligence technology, and in particular to a key point detection method, training method, device, electronic device, computer-readable storage medium and computer program product.
- the key point detection of 3D human face characters is generally divided into two categories.
- the first category is based on traditional geometric analysis methods
- the second category is based on deep learning methods.
- the key point positioning method based on geometric analysis is very dependent on manually set rules and is difficult to apply to head models with different shapes. Therefore, the robustness of this method is poor; while for the second category of methods, the 3D head model is basically rendered into a 2D image first, and then the 2D convolutional neural network is used to extract features and detect the corresponding key points. In this way, the 3D geometric information will inevitably be lost. Based on this, the accuracy of key point detection of 3D human face characters in the related art is low.
- the embodiments of the present application provide a key point detection method, a three-dimensional network model training method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product, which can improve the accuracy of key point detection through a three-dimensional network model.
- the present invention provides a key point detection method, which includes:
- Feature splicing is performed based on the vertex features, the global features, and the local features, key points of the object to be detected are detected, and positions of the key points of the object to be detected on the object to be detected are obtained.
- the present application provides a key point detection device, the device comprising:
- An acquisition module configured to obtain a three-dimensional grid for representing the object to be detected, and determine vertices of the three-dimensional grid and connection relationships between vertices;
- a first feature extraction module is configured to extract features from vertices of the three-dimensional mesh to obtain vertex features of the three-dimensional mesh;
- a second feature extraction module is configured to perform global feature extraction on the object to be detected based on the vertex features to obtain global features of the object to be detected, and perform local feature extraction on the object to be detected based on the vertex features and the connection relationship between the vertices to obtain local features of the object to be detected;
- the output module is configured to detect the key points of the object to be detected based on the vertex features, the global features and the local features, and obtain the positions of the key points of the object to be detected on the object to be detected.
- the embodiment of the present application provides a training method for a three-dimensional network model, wherein the three-dimensional network model includes at least a first feature extraction layer, a second feature extraction layer, a third feature extraction layer, and an output layer, and the method includes:
- the second feature extraction layer based on the vertex features of the training three-dimensional mesh, global features of the object training samples are extracted to obtain global features of the object training samples, and by using the third feature extraction layer, based on the vertices of the training three-dimensional mesh and the connection relationship between the vertices, local features of the object training samples are extracted to obtain local features of the object training samples;
- the key points of the object training sample are detected to obtain the positions of the key points of the object training sample on the object training sample;
- the target three-dimensional network model is used to perform key point detection on the object to be detected to obtain the position of the key point of the object to be detected on the object to be detected.
- the embodiment of the present application provides a training device for a three-dimensional network model, wherein the three-dimensional network model comprises at least a first feature extraction layer, a second feature extraction layer, a third feature extraction layer, and an output layer, and the device comprises:
- An acquisition module configured to acquire an object training sample carrying a label, wherein the label is configured to indicate a real position of a key point of the object training sample;
- An acquisition module configured to acquire a training three-dimensional mesh for representing the object training sample, and determine vertices of the training three-dimensional mesh and connection relationships between vertices;
- a first feature extraction module is configured to extract features of vertices of the object training sample through the first feature extraction layer to obtain vertex features of the training three-dimensional mesh;
- a second feature extraction module is configured to perform global feature extraction on the object training sample based on the vertex features of the training three-dimensional mesh through the second feature extraction layer to obtain the global features of the object training sample, and perform local feature extraction on the object training sample based on the vertices of the training three-dimensional mesh and the connection relationship between the vertices through the third feature extraction layer to obtain the local features of the object training sample;
- an output module configured to detect key points of the object training sample through the output layer based on vertex features of the training three-dimensional mesh, global features of the object training sample, and local features of the object training sample, and obtain positions of the key points of the object training sample on the object training sample;
- An updating module is configured to obtain the difference between the position of the key point of the object training sample and the label, and train the three-dimensional network model based on the difference to obtain a target three-dimensional network model; wherein the target three-dimensional network model is used to perform key point detection on the object to be detected to obtain the position of the key point of the object to be detected on the object training sample.
- An embodiment of the present application provides an electronic device, including:
- a memory configured to store executable instructions
- the processor is configured to implement the key point detection method provided in the embodiment of the present application when executing the computer executable instructions stored in the memory.
- An embodiment of the present application provides an electronic device, including:
- a memory configured to store executable instructions
- the processor is configured to implement the three-dimensional network model training method provided in the embodiment of the present application when executing the computer executable instructions stored in the memory.
- An embodiment of the present application provides a computer-readable storage medium, in which computer-executable instructions are stored.
- the processor will execute the key point detection method provided by the embodiment of the present application.
- An embodiment of the present application provides a computer-readable storage medium, which stores computer-executable instructions.
- the processor will execute the training method of the three-dimensional network model provided by the embodiment of the present application.
- the embodiment of the present application provides a computer program product, which includes a computer program or a computer executable instruction, and the computer program or the computer executable instruction is stored in a computer-readable storage medium.
- the processor of the electronic device reads the computer program or the computer executable instruction from the computer-readable storage medium, and the processor executes the computer program or the computer executable instruction, so that the electronic device performs the key point detection method provided in the embodiment of the present application.
- the embodiment of the present application provides a computer program product, which includes a computer program or a computer executable instruction, and the computer program or the computer executable instruction is stored in a computer-readable storage medium.
- the processor of the electronic device reads the computer program or the computer executable instruction from the computer-readable storage medium, and the processor executes the computer program or the computer executable instruction, so that the electronic device executes the training method of the three-dimensional network model provided in the embodiment of the present application.
- the three-dimensional mesh corresponding to the object to be detected is obtained, and then the global features and local features of the object to be detected are extracted based on the vertex features obtained from the three-dimensional mesh and the connection relationship between the vertices through the construction of a two-way feature extraction layer, thereby obtaining the position of the key points on the object to be detected based on the vertex features obtained from the three-dimensional mesh, the extracted global features and the local features.
- more abundant feature information of the object to be detected is extracted through multiple layers of feature extraction layers, and then the key points of the object to be detected are detected based on the abundant feature information, so that the accuracy of three-dimensional key point detection is significantly improved.
- FIG1 is a schematic diagram of the architecture of a key point detection system 100 provided in an embodiment of the present application.
- FIG2 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.
- FIG3 is a schematic diagram of a key point detection method according to an embodiment of the present application.
- FIG4 is a schematic diagram of a three-dimensional grid of a human head provided in an embodiment of the present application.
- FIG5 is a schematic diagram of a process for determining local features of each vertex provided in an embodiment of the present application.
- FIG6 is a schematic diagram of using an attention mechanism to determine the degree of correlation between a reference vertex and other vertices according to an embodiment of the present application
- FIG7 is a schematic diagram of the positions of key points on an object to be detected provided by an embodiment of the present application.
- FIG8 is a schematic diagram of the structure of a three-dimensional network model provided in an embodiment of the present application.
- FIG9 is a schematic diagram of the structure of a third feature extraction layer provided in an embodiment of the present application.
- FIG10 is a schematic diagram of the structure of a three-dimensional network model provided in an embodiment of the present application.
- FIG11 is a flow chart of a training process of a three-dimensional network model provided in an embodiment of the present application.
- FIG12 is a simplified schematic diagram of a three-dimensional mesh surface provided in an embodiment of the present application.
- FIG13 is a schematic diagram of a three-dimensional mesh densification process provided by an embodiment of the present application.
- FIG14 is a schematic diagram of a flow chart of a key point detection method provided in an embodiment of the present application.
- FIG15 is a schematic diagram of the structure of a graph convolutional neural network provided in an embodiment of the present application.
- FIG. 16 is a comparison diagram of the geodesic distance and the Euclidean distance provided in an embodiment of the present application.
- first ⁇ second ⁇ third involved are merely used to distinguish similar objects and do not represent a specific ordering of the objects. It can be understood that “first ⁇ second ⁇ third” can be interchanged with a specific order or sequence where permitted, so that the embodiments of the present application described herein can be implemented in an order other than that illustrated or described herein.
- Three-dimensional mesh refers to a manifold surface with a topological structure, such as a spherical surface divided into a combination of multiple vertices and multiple edges. In this application, it can be a three-dimensional face mesh.
- the three-dimensional mesh is a graph structure.
- Client also known as the user end, refers to the program corresponding to the server that provides local services to users. Except for some applications that can only run locally, it is generally installed on an ordinary client and needs to cooperate with the server to run, that is, it requires corresponding servers and service programs in the network to provide corresponding services. In this way, specific communication connections need to be established on the client and server sides to ensure the normal operation of the application.
- 3D facial key point detection refers to detecting the 3D coordinates of a series of facial key points with preset semantics given any 3D face mesh model. There is no limit on the number of vertices and facets of the 3D face model.
- the key points with preset semantics refer to the position information including the corners of the eyes, corners of the mouth, tip of the nose, and facial contours. The semantics and number of key points are determined by the specific task.
- Graph Neural Networks is a type of artificial neural network used to process data that can be represented as graphs. Compared with traditional two-dimensional convolutional neural networks that act on two-dimensional images, graph neural networks expand their objects of action to graph data that can represent three-dimensional grid forms.
- the key design element of graph neural networks is the use of paired message passing so that graph nodes can be iteratively updated by exchanging information with their neighbors.
- Three-dimensional heatmap regression refers to the graph neural network using the heatmap as the output layer and forming a regression loss with the standard heatmap.
- the neural network is trained through forward propagation and gradient backpropagation to fit the output of the neural network with the label, and finally the coordinates of the key points are calculated from the heatmap.
- a 3D scanner is a scientific instrument used to detect and analyze the shape (geometry) and appearance data (such as color, surface albedo, etc.) of objects or environments in the real world.
- the collected data is usually used for 3D reconstruction calculations to create digital models of actual objects in the virtual world.
- These models have a wide range of uses, such as industrial design, defect detection, reverse engineering, robot guidance, topographic measurement, medical information, bioinformatics, criminal identification, etc.
- Multi-Layer Perceptron is a forward-structured artificial neural network that maps a set of input vectors to a set of output vectors.
- MLP can be viewed as a directed graph consisting of multiple node layers, each of which is fully connected to the next layer. Except for the input node, each node is a neuron (or processing unit) with a nonlinear activation function.
- Convolutional Neural Network a feedforward neural network, is generally composed of one or more convolutional layers (network layers that use convolutional mathematical operations) and a fully connected layer at the end.
- the neurons inside the network can respond to partial areas of the input image and generally have outstanding performance in the field of visual image processing.
- Machine Learning is a multi-disciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in studying how computers simulate or implement human learning behavior to acquire new knowledge or skills and reorganize existing knowledge structures to continuously improve their performance. Learning is the core of artificial intelligence and the fundamental way to make computers intelligent. Its applications are spread across all areas of artificial intelligence. Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and self-learning.
- Point cloud data refers to a massive collection of points on the surface features of a target, which is generally obtained through laser measurement or photogrammetry.
- Point cloud data obtained by laser measurement includes three-dimensional coordinates and laser reflection intensity. This type of point cloud data usually determines the state of an object by echo characteristics and reflection intensity; point cloud data obtained by photogrammetry usually includes three-dimensional coordinates and color information.
- GAT Graph Attention Network
- artificial intelligence technology has been studied and applied in many fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, drones, robots, smart medical care, smart customer service, etc. I believe that with the development of technology, artificial intelligence technology will be applied in more fields and play an increasingly important role.
- the solution provided in the embodiments of the present application involves technologies such as three-dimensional network models of artificial intelligence, and can also be applied to fields such as cloud technology and Internet of Vehicles, which will be specifically explained through the following embodiments.
- FIG. 1 is a schematic diagram of the architecture of a key point detection system 100 provided in an embodiment of the present application.
- the application scenario of key point detection can be that when performing key point detection on a face, the face is first three-dimensionally scanned by a three-dimensional scanner, and then the key point positions on the face are detected based on the three-dimensional scanning data
- a terminal (terminal 400 is shown as an example) is connected to a server 200 via a network 300.
- the network 300 can be a wide area network or a local area network, or a combination of the two.
- the terminal 400 is configured for a user to use a client 401, which is displayed on a display interface (display interface 401-1 is shown as an example).
- the terminal 400 and the server 200 are connected to each other via a wired or wireless network.
- the server 200 is configured to receive three-dimensional scanning data; based on the three-dimensional scanning data, obtain a three-dimensional grid for representing the object to be detected, and determine the vertices of the three-dimensional grid and the connection relationship between the vertices; perform feature extraction on the vertices of the three-dimensional grid to obtain vertex features of the three-dimensional grid; based on the vertex features, perform global feature extraction on the object to be detected to obtain global features of the object to be detected, and based on the vertex features and the connection relationship between the vertices, perform local feature extraction on the object to be detected to obtain local features of the object to be detected; based on the vertex features, the global features and the local features, detect key points of the object to be detected to obtain the positions of the key points of the object to be detected on the object to be detected; and send the positions of the key points on the object to be detected to the terminal 400;
- the terminal 400 is also configured to display the positions of key points on the object to be detected based on a display interface.
- the server 200 may be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (CDN), and big data and artificial intelligence platforms.
- cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (CDN), and big data and artificial intelligence platforms.
- the terminal 400 may be a smart phone, a tablet computer, a laptop computer, a desktop computer, a set-top box, an intelligent voice interaction device, a smart home appliance, a vehicle-mounted terminal, an aircraft, and a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, a portable gaming device, a smart speaker, and a smart watch), etc., but is not limited thereto.
- the terminal device and the server may be directly or indirectly connected via wired or wireless communication, which is not limited in the embodiments of the present application.
- FIG. 2 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.
- the electronic device may be the server 200 or the terminal 400 shown in FIG. 1 .
- the electronic device shown in FIG. 2 includes: at least one processor 410, a memory 450, at least one network interface 420, and a user interface 430.
- the various components in the terminal 400 are coupled together via a bus system 440.
- the bus system 440 is configured to achieve connection and communication between these components.
- the bus system 440 also includes a power bus, a control bus, and a status signal bus.
- various buses are labeled as bus systems 440 in FIG. 2 .
- Processor 410 can be an integrated circuit chip with signal processing capabilities, such as a general-purpose processor, a digital signal processor (DSP), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., where the general-purpose processor can be a microprocessor or any conventional processor, etc.
- DSP digital signal processor
- the user interface 430 includes one or more output devices 431 that enable presentation of media content, including one or more speakers and/or one or more visual display screens.
- the user interface 430 also includes one or more input devices 432, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.
- the memory 450 may be removable, non-removable, or a combination thereof.
- Exemplary hardware devices include solid-state memory, hard drives, optical drives, etc.
- the memory 450 may optionally include one or more storage devices that are physically remote from the processor 410.
- memory 450 can store data to support various operations, examples of which include programs, modules, and data structures, or a subset or superset thereof, as exemplarily described below.
- a network communication module 452 is configured to reach other electronic devices via one or more (wired or wireless) network interfaces 420, exemplary network interfaces 420 include: Bluetooth, Wireless Compatibility Certification (WiFi), and Universal Serial Bus (USB), etc.;
- a presentation module 453 configured to enable presentation of information via one or more output devices 431 (e.g., display screen, speaker, etc.) associated with the user interface 430 (e.g., a user interface configured to operate peripheral devices and display content and information);
- output devices 431 e.g., display screen, speaker, etc.
- the user interface 430 e.g., a user interface configured to operate peripheral devices and display content and information
- the input processing module 454 is configured to detect one or more user inputs or interactions from the input device 432 and to translate the detected inputs or interactions.
- FIG. 2 shows a key point detection device 455 stored in a memory 450, which can be software in the form of a program and a plug-in, including the following software modules: an acquisition module 4551, a first feature extraction module 4552, a second feature extraction module 4553, and an output module 4554.
- a key point detection device 455 stored in a memory 450, which can be software in the form of a program and a plug-in, including the following software modules: an acquisition module 4551, a first feature extraction module 4552, a second feature extraction module 4553, and an output module 4554.
- These modules are logical, and therefore can be arbitrarily combined or further split according to the functions implemented. The functions of each module will be described below.
- the device provided in the embodiments of the present application can be implemented in hardware.
- the key point detection device provided in the embodiments of the present application can be a processor in the form of a hardware decoding processor, which is programmed to execute the key point detection method provided in the embodiments of the present application.
- the processor in the form of a hardware decoding processor can adopt one or more application specific integrated circuits (Application Specific Integrated Circuit, ASIC), DSP, programmable logic device (Programmable Logic Device, PLD), complex programmable logic device (Complex Programmable Logic Device, CPLD), field programmable gate array (Field-Programmable Gate Array, FPGA) or other electronic components.
- the terminal or server can implement the key point detection method provided in the embodiment of the present application by running a computer program.
- the computer program can be a native program or software module in the operating system; it can be a native application (Application, APP), that is, a program that needs to be installed in the operating system to run, such as an instant messaging APP, a web browser APP; it can also be a small program, that is, a program that can be run only by downloading it to a browser environment; it can also be a small program that can be embedded in any APP.
- the above-mentioned computer program can be an application, module or plug-in in any form.
- FIG. 3 is a flow chart of the key point detection method provided in the embodiment of the present application. Below, the steps shown will be described in conjunction with FIG. 3.
- Step 101 The server obtains a three-dimensional grid for representing the object to be detected, and determines the vertices of the three-dimensional grid and the connection relationship between the vertices.
- the three-dimensional grid used to characterize the object to be detected can be obtained by directly receiving the three-dimensional grid of the object to be detected sent by other devices, or it can be achieved through the point cloud data (i.e., three-dimensional scanning data) corresponding to the object to be detected.
- the point cloud data is configured as a massive point set indicating the surface features of the object to be detected, which can generally be obtained by laser measurement or photogrammetry. Specifically, first obtain the point cloud data corresponding to the object to be detected, and then obtain the three-dimensional grid used to characterize the object to be detected based on the point cloud data, that is, construct the three-dimensional grid corresponding to the object to be detected.
- the point cloud data can be pre-stored locally in the terminal, or obtained from the outside world (such as the Internet), or collected in real time, for example, collected in real time by a three-dimensional scanning device such as a three-dimensional scanner.
- the process of constructing a three-dimensional grid corresponding to the object to be detected specifically includes scanning the object to be detected by the three-dimensional scanning device to obtain point cloud data of the geometric surface of the object to be detected; and constructing a three-dimensional grid corresponding to the object to be detected based on the point cloud data.
- FIG4 is a schematic diagram of a three-dimensional grid of a person's head provided in an embodiment of the present application.
- a three-dimensional scan is performed on the person's head by a three-dimensional scanner to obtain point cloud data corresponding to the head, thereby constructing a three-dimensional grid corresponding to the head based on the point cloud data.
- the process of constructing a three-dimensional grid corresponding to the object to be detected based on point cloud data can be, first, preprocessing the point cloud data to obtain target point cloud data; wherein, the preprocessing includes filtering, denoising, and point cloud registration and other operations, wherein filtering can remove noise points, denoising may further reduce noise and invalid points, and point cloud registration can align the point cloud data to the same coordinate system; then, the target point cloud data is meshed to obtain a three-dimensional grid, wherein mesh reconstruction is the process of converting discrete target point cloud data into a three-dimensional grid, and commonly used mesh reconstruction algorithms include grid-based methods, voxel-based methods, and implicit function-based methods, wherein, the grid-based method is to convert the target point cloud data into a triangular grid, the voxel-based method is to convert the target point cloud data into a voxel grid, and the implicit function-based method is to use a data function to represent a three-dimensional grid.
- connection relationship between the vertices of the three-dimensional grid can be a vertex connection relationship matrix, which is used to indicate whether there is an association between the vertices.
- the size is N*N, and its value is 0 or 1.
- N is the number of vertices.
- connection relationship between the vertices of the three-dimensional mesh used to indicate the eye position on the face there is a connection relationship between the vertices of the three-dimensional mesh used to indicate the eye position on the face, while there is no connection relationship between the vertices of the three-dimensional mesh used to indicate the eye position and the vertices of the three-dimensional mesh used to indicate the chin position.
- Step 102 extract features from the vertices of the three-dimensional mesh to obtain vertex features of the three-dimensional mesh.
- feature extraction is performed on the vertices of the three-dimensional mesh to obtain the vertex features of the three-dimensional mesh, wherein the vertex features include the positions of the corresponding vertices and the information of the corresponding positions on the face indicated by the corresponding vertices.
- the vertex features here can be N*(6+X), wherein N represents the number of vertices corresponding to the three-dimensional mesh, 6 represents the dimensions occupied by the vertex coordinates and the normal vectors, i.e., the 6 directional dimensions corresponding to the three coordinate dimensions of the vertex coordinates (x, y, z), and X includes other characteristics of the vertices of the three-dimensional mesh, i.e., the information of the corresponding positions on the face indicated by the corresponding vertices, such as curvature, texture information, etc. It should be noted that these other characteristics can be adjusted according to different data and tasks. Thus, when the present application is applied to the model, during the training phase of the model, the learning efficiency of the model can be accelerated by adding these other characteristics.
- Step 103 extract global features of the object to be detected based on vertex features to obtain global features of the object to be detected.
- Features are obtained by extracting local features of the object to be detected based on vertex features and the connection relationship between vertices to obtain local features of the object to be detected.
- the process of performing global feature extraction on the object to be detected based on vertex features to obtain the global features of the object to be detected may be, first, performing feature extraction on the object to be detected based on vertex features, and performing maximum pooling processing on the extracted features to obtain maximum pooling features, so that all vertices share the maximum pooling features, and using the maximum pooling features as the global features of the object to be detected.
- local features of the object to be detected are extracted to obtain the local features of the object to be detected.
- the process may be to determine the local features of each vertex based on the vertex features and the connection relationship between vertices; and determine the local features of the object to be detected based on the local features of each vertex.
- the global features here are used to indicate the overall features of the object to be detected, such as the color features, texture features and shape features of the object to be detected, while the local features are used to indicate the detailed features of the object to be detected, that is, the features extracted from the local area of the object to be detected, such as the features extracted from the edges, corners, points, lines, curves and special attribute areas of the object to be detected.
- the global features can be the size, shape and position of the facial features
- the local features can be the distribution of facial muscles and the shape changes of the facial features under different expressions.
- the global features are low-level visual features at the pixel level, the global features have the characteristics of good invariance, simple calculation and intuitive representation, but are not suitable for the case of object aliasing and occlusion, while the local image features have the characteristics of rich quantity contained in the image and small correlation between features.
- the disappearance of some features will not affect the detection and matching of other features.
- extracting the global features and local features of the object to be detected more abundant and accurate features of the object to be detected are obtained, thereby improving the accuracy of the key point detection results.
- FIG. 5 is a schematic diagram of the flow chart of determining the local features of each vertex provided by an embodiment of the present application. Based on FIG. 5, the process of determining the local features of each vertex based on the vertex features and the connection relationship between the vertices is implemented by steps 1031 to 1033. In combination with FIG. 5, the following processing is performed for each vertex:
- Step 1031 determine the vertex as a reference vertex, and determine the vertex features of the reference vertex and the vertex features of other vertices based on the vertex features of each vertex in the three-dimensional mesh; wherein the other vertex is any vertex except the reference vertex.
- the number of vertices in the three-dimensional network is N
- the feature of each vertex is h
- vertex i is taken as the reference node
- h i is a vector of size F, that is, the feature of reference node i
- vertex j is another vertex
- h j is a vector of size F, that is, the feature of other node j
- Step 1032 based on the vertex feature of the reference vertex, the vertex features of other vertices, and the connection relationship between the vertices, determine the correlation value between the reference vertex and other vertices; wherein the correlation value is used to indicate the degree of correlation between the reference vertex and other vertices.
- W is a weight matrix of size F ⁇ F
- hi is the vertex feature of reference vertex i
- hj is the vertex feature of other vertex j.
- the vertex features of , attention indicates the use of attention mechanism processing
- e ij indicates the correlation between the reference vertex and other vertices.
- the process of determining the correlation value between a reference vertex and other vertices based on vertex features of the reference vertex, vertex features of other vertices, and connection relationships between vertices may be as follows: determining the connected reference vertex and other vertices based on the connection relationships between the vertices; performing similarity matching on the reference vertex and corresponding other vertices based on the vertex features of the connected reference vertex and vertex features of other vertices to obtain the similarity between the reference vertex and the corresponding other vertices (wherein a corresponding similarity is obtained for each of the other vertices); and determining the similarity as the degree of correlation between the reference vertex and the corresponding other vertices.
- the correlation degree is normalized to obtain the correlation value between the reference vertex and other vertices, that is,
- Softmax j indicates the use of normalization processing
- ⁇ ij indicates the correlation value between nodes i and j
- exp indicates an exponential function with the natural constant e as the base
- Ni indicates the domain composed of all other nodes that have a connection relationship with the reference node i
- q represents any vertex in the domain.
- Figure 6 is a schematic diagram of using the attention mechanism to determine the correlation degree between a reference vertex and other vertices provided in an embodiment of the present application.
- ⁇ ij indicated by 601 indicates the correlation value between nodes i and j
- Wh i in the dotted box 602 indicates the vertex feature corresponding to the reference vertex i
- Wh j in the dotted box 603 indicates the vertex feature corresponding to other vertex j.
- a is a weight vector.
- the correlation degree is subjected to Softmax j processing, i.e., normalization processing, to obtain the correlation value between the reference vertex and other vertices.
- the attention mechanism is used here to determine the correlation between the reference vertex and other vertices. Specifically, the features Wh i and Wh j of vertices i and j are concatenated, and then the inner product is calculated with a weight vector a of dimension 2F, so as to obtain the correlation value between the reference vertex and other vertices through the activation function, that is,
- Ni indicates the domain composed of all other nodes that are connected to the reference node i
- q represents any vertex in the domain
- Whj indicates the concatenated feature obtained by concatenating the features Whi and Whj of vertices i and j
- exp indicates the exponential function with the natural constant e as the base
- LeakyReLU is the nonlinear activation function
- a is a weight vector of size 2F.
- the degree of correlation between the reference vertex and other vertices can also be directly calculated based on the vertex features of the reference vertex, the vertex features of other vertices, and the connection relationship between the vertices; among them, there are many methods for calculating the degree of correlation, such as the Pearson correlation coefficient (Pearson), the Spearman's rank correlation coefficient (Spearman's rank correlation coefficient), etc.
- Step 1033 Determine the local features of the reference vertex based on the correlation value and vertex features of other vertices.
- ⁇ is the activation function
- ⁇ ij is the correlation value between the reference vertex i and other vertices j
- Wh j indicates the vertex features corresponding to other vertices j
- h i ⁇ is the local feature corresponding to the reference vertex.
- the process of determining the local features of the reference vertex based on the correlation value and the vertex features of the other vertices may be, for each other vertex, comparing the correlation value with the vertex features of the corresponding other vertex.
- the point feature is processed for quadrature to obtain the quadrature results of other vertices; the quadrature results of other vertices are accumulated and summed to obtain the summation result; based on the summation result, the local feature corresponding to the reference vertex is determined, that is,
- ⁇ is the activation function
- ⁇ ij is the correlation value between the reference vertex i and other vertices j
- Wh j indicates the vertex features corresponding to other vertices j
- Ni indicates the domain composed of all other nodes that have a connection relationship with the reference node i.
- the process of determining the local features of the object to be detected based on the local features of each vertex is, specifically, based on the local features of each vertex, performing feature fusion on the local features of each vertex to obtain fused features; and using the fused features as the local features of the object to be detected.
- Step 104 based on vertex features, global features and local features, key points of the object to be detected are detected to obtain positions of the key points of the object to be detected on the object to be detected.
- the key points of the object to be detected are detected based on vertex features, global features, and local features to obtain the positions of the key points of the object to be detected on the object to be detected.
- the process may be to perform feature splicing on the vertex features, global features, and local features to obtain the splicing features of the object to be detected; based on the splicing features, the key points of the object to be detected are detected to obtain the positions of the key points of the object to be detected on the object to be detected.
- the splicing features contain feature information of the vertex features, global features, and local features of the object to be detected
- the key points of the object to be detected are detected based on the splicing features, thereby combining the feature information of the vertex features, global features, and local features. That is, the key points of the object to be detected are detected through richer feature information, thereby improving the accuracy of the key point detection results.
- the three-dimensional heat map in the present application is a statistical chart that displays multiple data by coloring color blocks, that is, each data is displayed according to a specified color mapping rule, such as larger values are represented by dark colors and smaller values by light colors; or larger values are represented by warm tones and smaller values by cold tones, etc. In this way, by outputting a three-dimensional heat map, the possibility of the key point belonging to each vertex is displayed at the same time, so as to better ensure the local accuracy of the detection results.
- Figure 7 is a schematic diagram of the positions of key points on the object to be detected provided in an embodiment of the present application.
- the black points in Figure 7 are key points.
- the positions of the key points shown in Figure 7 can be the positions of the facial features in the human face
- the black points in the dotted box 701 are key points indicating the position of the forehead in the human face
- the black points in the dotted boxes 702 and 703 are key points indicating the position of the eyes in the human face
- the black points indicated by 704 and 705 are key points indicating the position of the ears in the human face
- the black points in the dotted box 706 are key points indicating the position of the nose in the human face
- the black points in the dotted box 707 are key points indicating the position of the mouth in the human face
- the black points indicated by 708 and 709 are key points indicating the position of the cheeks in the human face
- the black points in the dotted box 710 are key points.
- the positions of the facial features of the object to be detected are detected, and the probability of the key points at each vertex in the three-dimensional grid is obtained, that is, the probability that each vertex in the three-dimensional grid is the key point corresponding to the position of each facial feature, so that based on each probability, a three-dimensional heat map of the corresponding three-dimensional grid is generated, and then based on the three-dimensional heat map, the positions of the key points of the object to be detected on the object to be detected are determined, that is, for each key point corresponding to the position of the facial feature, the vertex with the largest probability is selected from multiple probabilities and determined as the corresponding key point, so as to determine the positions of the facial features based on the obtained key points.
- the key point detection method here can also be applied to a three-dimensional network model, which includes at least a first feature extraction layer, a second feature extraction layer, a third feature extraction layer and an output layer, referring to Figure 8, which is a structural schematic diagram of a three-dimensional network model provided in an embodiment of the present application.
- the process of extracting features from the vertices of the three-dimensional mesh to obtain vertex features of the three-dimensional mesh may be, through the first feature extraction layer, extracting features from the vertices of the three-dimensional mesh to obtain vertex features of the three-dimensional mesh; based on the vertex features, extracting global features from the object to be detected to obtain global features of the object to be detected, and based on the vertex features and the connection relationship between the vertices, extracting local features from the object to be detected to obtain local features of the object to be detected.
- the process may be, through the second feature extraction layer, extracting global features from the object to be detected based on the vertex features to obtain global features of the object to be detected, and through the third feature extraction layer, extracting local features from the object to be detected to obtain local features of the object to be detected.
- the process of extracting local features of the object to be detected based on vertex features and the connection relationship between vertices to obtain the local features of the object to be detected; detecting the key points of the object to be detected based on vertex features, global features and local features to obtain the positions of the key points of the object to be detected on the object to be detected can be performed through the output layer, combining vertex features, global features and local features to detect the key points of the object to be detected to obtain the positions of the key points of the object to be detected on the object to be detected.
- the position of the key point on the object to be detected is detected through the three-dimensional network model, thereby improving the accuracy of the detected position.
- the third feature extraction layer here may include at least two third feature extraction sublayers and a feature stitching sublayer.
- Figure 9 is a structural schematic diagram of the third feature extraction layer provided in an embodiment of the present application.
- the process of determining the local features of each vertex based on vertex features and the connection relationship between vertices through the third feature extraction layer may be that, through each third feature extraction sublayer, the following processing is performed on each vertex: the vertex is determined as a reference vertex, and based on the vertex features of each vertex in the three-dimensional mesh, the vertex features of the reference vertex and the vertex features of other vertices are determined; based on the vertex features of the reference vertex, the vertex features of other vertices, and the connection relationship between vertices, the correlation value between the reference vertex and other vertices is determined; based on the correlation value and the vertex features of other vertices, the local sub-features of the reference vertex are determined
- k is the number of layers of the third feature extraction sublayer
- Ni indicates the domain composed of all other nodes that are connected to the reference node i
- ⁇ is the activation function
- ⁇ ij is the correlation value between the reference vertex i and other vertices j
- Whj indicates the vertex features corresponding to other vertices j
- concat indicates the use of splicing processing.
- the process of determining the correlation value between the reference vertex and the other vertices based on the vertex features of the reference vertex, the vertex features of other vertices, and the connection relationship between the vertices is the same as the aforementioned process.
- the process of determining the local sub-features of the reference vertex based on the correlation value and the vertex features of other vertices is also the same as the aforementioned process of determining the local features of the reference vertex based on the correlation value and the vertex features of other vertices, which will not be elaborated on here.
- the three-dimensional network model also includes a first feature splicing layer, a second feature splicing layer, and a fourth feature extraction layer.
- Figure 10 is a structural schematic diagram of the three-dimensional network model provided in an embodiment of the present application. Based on Figure 10, the key points of the object to be detected are detected through the output layer in combination with vertex features, global features and local features to obtain the positions of the key points on the object to be detected.
- the process can be as follows: through the first feature splicing layer, vertex features, global features and local features are feature spliced to obtain the splicing features of the object to be detected; through the fourth feature extraction layer, local features are extracted from the object to be detected based on the splicing features to obtain the target local features of the object to be detected; through the second feature splicing layer, the splicing features, global features and target local features are feature spliced to obtain the target splicing features of the object to be detected; through the output layer, based on the target splicing features, the key points of the object to be detected are detected to obtain the positions of the key points of the object to be detected on the object to be detected.
- the three-dimensional network model may also include a fifth feature extraction layer and a third feature splicing layer, so that through the fifth feature extraction layer, based on the target splicing feature, the local feature of the object to be detected is extracted to obtain the second target local feature, and then through the third feature splicing layer, the target splicing feature, the second target local feature and the global feature are spliced to obtain the second target splicing feature, and finally through the output layer, based on the second target splicing feature, the key points of the object to be detected are detected to obtain the position of the key points of the object to be detected on the object to be detected.
- the number of feature extraction layers and feature splicing layers in the three-dimensional network model can be multiple, and the process of obtaining the final splicing features through multiple feature extraction layers and feature splicing layers is the same as described above, and this embodiment of the present application will not be repeated.
- the fourth feature extraction layer and the fifth feature extraction layer have the same layer structure as the third feature extraction layer, and the feature processing process is also the same; while the second feature splicing layer and the third feature splicing layer have the same layer structure as the feature splicing layer, and the feature processing process is also the same.
- the splicing features are further processed to obtain More accurate target local features, based on the second feature stitching layer, the stitching features, the global features and the obtained target local features are feature stitched, and the key points of the object to be detected are detected based on the target stitching features obtained by feature stitching; correspondingly, through the fifth feature extraction layer, the target stitching features are further feature processed to obtain more accurate second target local features, and based on the third feature stitching layer, the target stitching features, the global features and the obtained second target local features are feature stitched, and the key points of the object to be detected are detected based on the second target stitching features obtained by feature stitching.
- the three-dimensional network model before detecting the key points of the object to be detected based on the three-dimensional network model, the three-dimensional network model needs to be trained, so that the key points of the object to be detected are detected based on the trained three-dimensional network model.
- Figure 11 is a flow chart of the training process of the three-dimensional network model provided in an embodiment of the present application. Based on Figure 11, the training process of the three-dimensional network model can be implemented through the following steps.
- Step 201 The server obtains an object training sample carrying a label, where the label is used to indicate the actual position of a key point of the object training sample.
- Step 202 obtain a training three-dimensional mesh for representing an object training sample, and determine the vertices of the training three-dimensional mesh and the connection relationship between the vertices.
- the training 3D mesh can also be data enhanced, so as to train the 3D network model through the enhanced training 3D mesh.
- the method of data enhancement for the training 3D mesh is divided into face simplification and densification.
- an edge optimization method can be used, that is, the smallest edge between each vertex is found each time, and the corresponding two vertices are merged into one vertex. Specifically, the edge between any two vertices is obtained, and each edge is compared. Based on the comparison result, the smallest edge is selected from each edge as the target edge, and then the two vertices corresponding to the target edge are obtained, and the two vertices are merged into one vertex, thereby obtaining an enhanced training three-dimensional mesh.
- Figure 12 is a simplified schematic diagram of the three-dimensional mesh patch provided by an embodiment of the present application. Based on Figure 12, there are 10 vertices from v1 to v10.
- the barycentric coordinates of the patches with larger areas are calculated first, and then the original three patches are divided into three based on the barycentric coordinates. Specifically, at least one patch is obtained, and then the patches are compared. Based on the comparison results, the patch with the largest area is selected from multiple patches as the target patch; the barycenter of the target patch and the three vertices corresponding to the target patch are determined, and then the original three patches are divided into three based on the barycentric coordinates and the three vertices.
- Figure 13 is a schematic diagram of the densification of three-dimensional mesh facets provided in an embodiment of the present application.
- vertices from A to I there are 9 vertices from A to I.
- 8 triangular facets are formed, namely, the facets between vertices A, B, and C, the facets between vertices A, B, and I, the facets between vertices H, B, and I, the facets between vertices H, B, and G, the facets between vertices F, B, and G, the facets between vertices F, B, and E, the facets between vertices D, B, and E, and the facets between vertices D, B, and C.
- the facets between vertices A, B, and C are the target facets with the largest area.
- the center of gravity of the target facet namely P, and the corresponding vertices A, B, and C are determined, so that the original target facet is divided into three based on P, A, B, and C, and the enhanced training three-dimensional mesh is obtained.
- the data enhancement process for the training 3D mesh can be ended by presetting the target number of vertices. Specifically, in the data enhancement process for the training 3D mesh, the number of vertices of the enhanced training 3D mesh is obtained, and the number of vertices is compared with the pre-set target number of vertices. Based on the comparison result, the data enhancement for the training 3D mesh is ended.
- the training 3D mesh when the comparison result indicates that the number of vertices is less than the target number of vertices, the data enhancement for the training 3D mesh is ended; when the training 3D mesh is face densified, when the comparison result indicates that the number of vertices is greater than the target number of vertices, the data enhancement for the training 3D mesh is ended.
- Step 203 extract features from the vertices of the object training sample through the first feature extraction layer to obtain vertex features of the training three-dimensional mesh.
- a global feature extraction is performed on the object training sample based on the vertex features of the training three-dimensional mesh through the second feature extraction layer to obtain the global features of the object training sample
- a local feature extraction is performed on the object training sample based on the vertices of the training three-dimensional mesh and the connection relationship between the vertices through the third feature extraction layer to obtain the local features of the object training sample.
- Step 205 through the output layer, based on the vertex features of the training three-dimensional mesh, the global features of the object training sample and the local features of the object training sample, the key points of the object training sample are detected to obtain the positions of the key points of the object training sample on the object training sample.
- the three-dimensional network model also includes a first feature stitching layer, so that through the output layer, based on the vertex features of the training three-dimensional mesh, the global features of the object training samples and the local features of the object training samples, the key points of the object training samples are detected to obtain the positions of the key points of the object training samples on the object training samples.
- the process can be that, through the first feature stitching layer, the vertex features of the training three-dimensional mesh, the global features of the object training samples and the local features of the object training samples are stitched to obtain the stitching features of the object training samples; through the output layer, based on the stitching features of the object training samples, the key points of the object training samples are detected to obtain the positions of the key points of the object training samples on the object training samples.
- Step 206 obtaining the difference between the position of the key point of the object training sample and the label, and training the three-dimensional network model based on the difference to obtain the target three-dimensional network model; wherein the target three-dimensional network model is used to perform key point detection on the object to be detected to obtain the position of the key point of the object to be detected on the object to be detected.
- FIG. 14 is a flow chart of the key point detection method provided in the embodiment of the present application. Based on FIG. 14 , the key point detection method provided in the embodiment of the present application is implemented collaboratively by the client and the server.
- Step 301 In response to an upload operation of an object training sample carrying a label, the client obtains the object training sample carrying the label.
- the client can be a key point detection client set on the terminal.
- the user Based on the human-computer interaction interface of the client, the user triggers the upload function item in the human-computer interaction interface so that the client presents an object selection interface on the human-computer interaction interface.
- the user Based on the object selection interface, the user uploads the object training samples with labels from the local terminal, so that the client obtains the uploaded object training samples.
- the object training sample can also be obtained by taking a picture by a camera that is in communication with the terminal. After taking the picture, the camera labels the object training sample, and then transmits the labeled object training sample to the terminal, which is automatically uploaded to the client by the terminal.
- Step 302 The client sends the object training sample to the server.
- Step 303 The server inputs the received object training sample into the three-dimensional network model.
- Step 304 Based on the three-dimensional network model, key points of the object training samples are detected to obtain positions of the key points of the object training samples.
- Step 305 Obtain the difference between the position of the key point of the object training sample and the label, and train the three-dimensional network model based on the difference.
- the server iterates the above training process until the loss function converges to complete the training of the three-dimensional network model.
- Step 307 Send a prompt message to the client.
- the point cloud data corresponding to the object to be detected can be pre-stored locally in the terminal, or obtained from the outside world (such as the Internet), or collected in real time, for example, by a three-dimensional scanning device such as a three-dimensional scanner. Collected in real time.
- Step 309 The client sends point cloud data corresponding to the object to be detected to the server in response to the key point detection instruction for the object to be detected.
- the key point detection instructions for the object to be detected can be automatically generated by the client under certain trigger conditions.
- the client automatically generates the key point detection instructions for the object to be detected after obtaining the point cloud data corresponding to the object to be detected. It can also be sent to the client by other devices connected to the terminal for communication. It can also be generated by the user based on the human-computer interaction interface of the client, triggering the corresponding determination function item.
- step 310 the server inputs the received point cloud data corresponding to the object to be detected into the three-dimensional network model, so that the three-dimensional network model performs key point detection on the object to be detected, and obtains a three-dimensional heat map indicating the positions of the key points of the object to be detected on the object to be detected.
- Step 311 sending a three-dimensional heat map indicating the positions of key points of the object to be detected on the object to be detected to the client.
- Step 312 The client displays a three-dimensional heat map indicating the positions of key points of the object to be detected on the object to be detected.
- the client can display the three-dimensional heat map in the human-computer interaction interface of the client, save the three-dimensional heat map locally in the terminal, and send the three-dimensional heat map to other devices connected to the terminal.
- a three-dimensional grid corresponding to the object to be detected is obtained, and then by constructing a dual-path feature extraction layer, the global features and local features of the object to be detected are extracted based on the vertex features obtained from the three-dimensional grid and the connection relationship between the vertices, thereby obtaining the position of the key points on the object to be detected based on the vertex features obtained from the three-dimensional grid, the extracted global features, and the local features.
- more abundant feature information of the object to be detected is extracted through multiple layers of feature extraction layers, and then the key points of the object to be detected are detected based on the abundant feature information, so that the accuracy of three-dimensional key point detection is significantly improved.
- the first category is based on traditional geometric analysis methods.
- the semantic key points of the three-dimensional head model are directly located by using methods such as sharp edge detection, curvature calculation, dihedral angle calculation, normal vector calculation and some specific geometric rules.
- the vertex with the largest z direction in the three-dimensional coordinate system is the nose tip key point.
- the sharp edge is detected below the nose tip.
- the approximate area of the left and right mouth corner key points can be roughly located; the second category is based on deep learning methods.
- This category of methods basically renders the three-dimensional head model into a two-dimensional image first, and then uses a two-dimensional convolutional neural network to extract features and detect the corresponding key points. It is worth noting that this type of method can also be divided into different combination methods according to whether multi-view detection and whether to directly regress the three-dimensional key points. For example, a common combination method is to only render the front view of the three-dimensional head model, record the rendering projection relationship, and then detect the two-dimensional key point coordinates on the two-dimensional front view, and finally reversely project to the three-dimensional space based on the known projection relationship to obtain the final three-dimensional key point coordinates. Another combination method is to render multiple views (such as front and side views) and then input them into different branches of the neural network model respectively, so that the neural network model combines the features of the two to directly regress the coordinates of the three-dimensional key points.
- the traditional key point positioning method based on geometric analysis is very dependent on manually set rules. For example, when detecting sharp edges, a threshold needs to be specified. This is an empirical value and is difficult to apply to head models with different shapes. Therefore, the robustness of this method is poor.
- the method based on two-dimensional convolutional neural network has achieved great success in the traditional two-dimensional image key point detection task, but the direct application of two-dimensional convolutional neural network to the detection of three-dimensional key points has many constraints and shortcomings. Specifically, first, the number of available three-dimensional face models is far less than that of face images, that is, the data set is relatively scarce, so it is difficult for the neural network to play its role.
- the method of rendering from a three-dimensional face head model to a two-dimensional image will inevitably lose three-dimensional geometric information. For example, for the front view, there will inevitably be a lack of information on the back of the head. If it is necessary to detect the key points of the back of the head, then in the absence of information, detection is naturally impossible.
- a multi-view method is used to avoid the problem of missing information as much as possible, features will be extracted through a multi-branch network, and finally the neural network will be fused and regressed to the three-dimensional coordinates. In this way, the three-dimensional coordinates of different views will be reconstructed. The intrinsic connections between them need to be learned by the neural network, which may lead to the problem of difficulty in convergence, thus increasing the difficulty of training.
- the embodiments of the present application provide a key point detection method, device, electronic device, computer-readable storage medium and computer program product, which can effectively solve the various shortcomings of the above-mentioned technical methods.
- the three-dimensional face model dataset is enhanced by simplifying and densifying the patches, which solves the problem of the relative lack of three-dimensional head model datasets, so that supervised deep learning has training data guarantee.
- the neural convolution module is directly applied in the three-dimensional space, which avoids the problem of the natural loss of three-dimensional geometric information in the detection method under the two-dimensional space of the rendering view, and also solves the problem that the intrinsic connections brought by different views are difficult to learn.
- the traditional two-dimensional heat map is expanded into a three-dimensional heat map. Compared with the method of directly regressing three-dimensional coordinates, the three-dimensional heat map can better ensure the local accuracy of the detection results.
- the present application proposes a three-dimensional face key point detection method based on a graph neural network structure and a three-dimensional heat map.
- This method can be integrated into the character animation tool set, and cooperate with the non-rigid wrapping algorithm to complete the deformation matching process between different head models.
- the specific product form here can be a control.
- a key point detection request carrying the relevant data of the three-dimensional head model to be detected is sent to the remote server where the technical solution of the present application is deployed, so as to obtain the return result.
- the remote server deployment method is conducive to iterative optimization algorithm, and does not require local plug-in code updates, thereby saving local computer resources.
- the graph convolutional neural network structure in the technical solution of the present application is explained. Specifically, since the three-dimensional model (three-dimensional network model) naturally has a graph structure relationship, and this relationship is not as compact and regularly arranged as the pixels of a two-dimensional image, it is inappropriate to directly use a traditional convolutional neural network. Therefore, a classic graph attention network (Graph Attention Network, GAT) is introduced here.
- GAT Graph Attention Network
- the attention mechanism can be used to calculate the importance (correlation value) of node j to node i, as shown in formula (2) and formula (3).
- the process of using the attention mechanism to calculate the importance of node j to node i can be to splice the features Wh i and Wh j of nodes i and j, and then calculate the inner product of the spliced features and a weight vector a with a dimension of 2F, as shown in formula (4). Therefore, based on the importance of node j to node i, the feature vector (local feature) of node i is determined, as shown in formula (6).
- FIG. 15 is a schematic diagram of the graph convolutional neural network structure provided by an embodiment of the present application.
- a three-dimensional head model key point automatic detection neural network as shown in FIG. 15 is constructed based on GAT.
- the input data i.e., vertex data
- N represents the number of vertices of the three-dimensional model (three-dimensional mesh)
- 6 is the dimension occupied by the vertex coordinates and the normal vector
- X includes other characteristics of the three-dimensional head model vertex (three-dimensional mesh), including curvature, texture information, etc. These other characteristics can be adjusted according to different data and tasks.
- a ij is the vertex connection relationship matrix (the connection relationship between vertices), the size is N*N, and its value is 0 or 1. If the two vertices i and j are connected, A ij is 1, otherwise it is 0.
- Multilayer Perceptron represents a multi-layer fully connected perception layer.
- Vertex data verices of a three-dimensional mesh
- X 1 vertex features
- X 2 global feature extraction and local feature extraction
- One path continues to pass through the MLP module ([512, 1024]), and then performs maximum pooling on the output feature X 2 to obtain global feature information X 3 , which is then shared by all N vertices to determine the global feature N ⁇ X 3.
- the same network structure does not require a fixed number of vertices N. This means that three-dimensional face models with different numbers of vertices can be used as input to the neural network model, whether in the training phase or in the actual use phase, thereby improving the applicability of the present application.
- the three-dimensional heat map in the technical solution of the present application is explained. Since the heat map of the three-dimensional grid no longer has the compact structure of the two-dimensional image coordinates, compared with the use of Euclidean distance in the two-dimensional heat map, the three-dimensional heat map here uses the geodesic distance. In this way, at the three-dimensional grid level, the shortest path on the grid graph structure is verified based on the geodesic distance between two points, which can better reflect the characteristics of the three-dimensional surface than the Euclidean distance between two points.
- Figure 16 is a comparison diagram of the geodesic distance and the Euclidean distance provided in an embodiment of the present application. Based on Figure 16, the straight line between the two vertices indicated by 1602 is the Euclidean distance, and the curve indicated by 1601 is the corresponding geodesic distance.
- the traditional way of converting two-dimensional heat maps into two-dimensional coordinates includes: first obtaining the vertex coordinates where the probability maximum is located (called the argmax method); then weighting the softmax probability expectations of multiple vertex coordinates (also known as the soft-argmax method) to obtain the final three-dimensional key point coordinates.
- the argmax method is directly used to obtain the vertex coordinates where the probability maximum is located, thereby determining the final three-dimensional key point coordinates.
- three-dimensional face mesh data is very difficult to obtain in large quantities. Lack of data is a major problem that plagues neural network supervised learning. Only when the data set is large enough and can cover different facial forms can the graph neural network learn sufficient detection capabilities from it, but the three-dimensional face key point data set is difficult to obtain, and the reason why the three-dimensional face key point data set is difficult to obtain is reflected in the following aspects. Specifically, first, the three-dimensional mesh face data itself is produced by artists, and this production process is relatively troublesome. The generation of two-dimensional images only requires pressing the camera shutter.
- the data enhancement methods are divided into patch simplification and densification.
- patch simplification it can be based on edge optimization, that is, each time by finding the smallest edge between nodes, merge them into one vertex, as shown in Figure 12.
- patch densification the barycentric coordinates of the patches with larger areas are calculated first, and then the original three patches are divided into three based on the barycentric coordinates, as shown in Figure 13.
- both densification and patch simplification can use the final target vertex number to control the termination of their operations.
- this application can provide accurate and reliable key point basis for subsequent 3D head model registration work by automatically detecting specific key points of the 3D game head model.
- this application can avoid excessive manual participation, so that key point-dependent work such as 3D head model registration can be completed automatically. This will greatly save the manpower input of artists, thereby speeding up the entire production process related to model character animation.
- this application is based on deep supervised learning of graph neural networks, which can accurately predict the positions of three-dimensional key points and has strong robustness.
- the forward calculation speed of the deep learning model is extremely fast, and the algorithm as a whole can be completed in just 1 second.
- Automatic labeling in contrast to manual methods, often takes several minutes, so this application has great practical value in terms of efficiency.
- this application does not limit the number of vertices of the input 3D face model.
- the generated deep learning model can be widely used in the automatic detection of key points of 3D head models with different vertex densities, and has strong applicability.
- a three-dimensional grid corresponding to the object to be detected is obtained, and then by constructing a dual-path feature extraction layer, the global features and local features of the object to be detected are extracted based on the vertex features obtained from the three-dimensional grid and the connection relationship between the vertices, thereby obtaining the position of the key points on the object to be detected based on the vertex features obtained from the three-dimensional grid, the extracted global features, and the local features.
- more abundant feature information of the object to be detected is extracted through multiple layers of feature extraction layers, and then the key points of the object to be detected are detected based on the abundant feature information, so that the accuracy of three-dimensional key point detection is significantly improved.
- the software module stored in the key point detection device 455 of the memory 450 may include:
- An acquisition module 4551 is configured to obtain a three-dimensional grid for representing the object to be detected, and determine the vertices of the three-dimensional grid and the connection relationship between the vertices;
- a first feature extraction module 4552 is configured to extract features from vertices of the three-dimensional mesh to obtain vertex features of the three-dimensional mesh;
- the second feature extraction module 4553 is configured to perform global feature extraction on the object to be detected based on the vertex features to obtain the global features of the object to be detected, and perform local feature extraction on the object to be detected based on the vertex features and the connection relationship between the vertices to obtain the local features of the object to be detected;
- the output module 4554 is configured to detect the key points of the object to be detected based on the vertex features, the global features and the local features, and obtain the positions of the key points of the object to be detected on the object to be detected.
- the acquisition module 4551 is further configured to scan the object to be detected through a three-dimensional scanning device to obtain point cloud data of the geometric surface of the object to be detected; and construct a three-dimensional grid corresponding to the object to be detected based on the point cloud data.
- the second feature extraction module 4553 is further configured to determine the local features of each of the vertices based on the vertex features and the connection relationship between the vertices; and determine the local features of the object to be detected based on the local features of each of the vertices.
- the second feature extraction module 4553 is further configured to perform the following processing for each of the vertices: determine the vertex as a reference vertex, and determine the vertex features of the reference vertex and the vertex features of other vertices based on the vertex features of each vertex in the three-dimensional mesh; wherein the other vertices are any vertices other than the reference vertex; determine the correlation value between the reference vertex and the other vertices based on the vertex features of the reference vertex, the vertex features of the other vertices, and the connection relationship between the vertices; wherein the correlation value is used to indicate the size of the degree of correlation between the reference vertex and the other vertices; determine the local features of the reference vertex based on the correlation value and the vertex features of the other vertices.
- the second feature extraction module 4553 is further configured to use an attention mechanism to determine the degree of correlation between the reference vertex and the other vertices based on the vertex features of the reference vertex, the vertex features of the other vertices, and the connection relationship between the vertices; and normalize the degree of correlation to obtain a correlation value between the reference vertex and the other vertices.
- the second feature extraction module 4553 is further configured to perform product processing on the correlation value and the vertex features of the other vertices to obtain a product result; and determine the local feature corresponding to the reference vertex based on the product result.
- the second feature extraction module 4553 is further configured to perform a product process on the correlation value and the vertex feature of the corresponding other vertex for each of the other vertices to obtain a product result of the other vertices; and cumulatively sum the product results of each of the other vertices to obtain a sum result; Based on the summation result, a local feature corresponding to the reference vertex is determined.
- the second feature extraction module 4553 is further configured to perform feature fusion on the local features of each of the vertices based on the local features of each of the vertices to obtain a fused feature; and use the fused feature as the local feature of the object to be detected.
- the output module 4554 is further configured to perform feature splicing on the vertex features, the global features and the local features to obtain the splicing features of the object to be detected; based on the splicing features, the key points of the object to be detected are detected to obtain the positions of the key points of the object to be detected on the object to be detected.
- the output module 4554 is further configured to detect the key points of the object to be detected based on the vertex features, the global features and the local features, and obtain the probability of the key points at each vertex in the three-dimensional grid; based on the probability, generate a three-dimensional heat map corresponding to the three-dimensional grid; based on the three-dimensional heat map, determine the position of the key points of the object to be detected on the object to be detected.
- the device is applied to a three-dimensional network model, which includes at least a first feature extraction layer, a second feature extraction layer, a third feature extraction layer and an output layer.
- the first feature extraction module 4552 is also configured to perform feature extraction on the vertices of the three-dimensional mesh through the first feature extraction layer to obtain vertex features of the three-dimensional mesh;
- the second feature extraction module 4553 is also configured to perform global feature extraction on the object to be detected based on the vertex features through the second feature extraction layer to obtain global features of the object to be detected, and perform local feature extraction on the object to be detected based on the vertex features and the connection relationship between the vertices through the third feature extraction layer to obtain local features of the object to be detected;
- the output module 4554 is also configured to detect the key points of the object to be detected based on the vertex features, the global features and the local features through the output layer to obtain the positions of the key points of the object to be detected on the object to be detected.
- the three-dimensional network model also includes a first feature splicing layer, a second feature splicing layer, and a fourth feature extraction layer.
- the output module 4554 is also configured to perform feature splicing on the vertex features, the global features, and the local features through the first feature splicing layer to obtain the splicing features of the object to be detected; perform local feature extraction on the object to be detected based on the splicing features through the fourth feature extraction layer to obtain the target local features of the object to be detected; perform feature splicing on the splicing features, the global features, and the target local features through the second feature splicing layer to obtain the target splicing features of the object to be detected; and perform detection on the key points of the object to be detected based on the target splicing features through the output layer to obtain the positions of the key points of the object to be detected on the object to be detected.
- the following further describes an exemplary structure of a training device for a three-dimensional network model provided in an embodiment of the present application implemented as a software module, wherein the three-dimensional network model at least includes a first feature extraction layer, a second feature extraction layer, a third feature extraction layer, and an output layer, and the training device includes:
- An acquisition module configured to acquire an object training sample carrying a label, wherein the label is configured to indicate a real position of a key point of the object training sample;
- An acquisition module configured to acquire a training three-dimensional mesh for representing the object training sample, and determine vertices of the training three-dimensional mesh and connection relationships between vertices;
- a first feature extraction module is configured to extract features of vertices of the object training sample through the first feature extraction layer to obtain vertex features of the training three-dimensional mesh;
- a second feature extraction module is configured to perform global feature extraction on the object training sample based on the vertex features of the training three-dimensional mesh through the second feature extraction layer to obtain the global features of the object training sample, and perform local feature extraction on the object training sample based on the vertices of the training three-dimensional mesh and the connection relationship between the vertices through the third feature extraction layer to obtain the local features of the object training sample;
- an output module configured to detect key points of the object training sample through the output layer based on vertex features of the training three-dimensional mesh, global features of the object training sample, and local features of the object training sample, and obtain positions of the key points of the object training sample on the object training sample;
- An updating module is configured to obtain the difference between the position of the key point of the object training sample and the label, and based on the The three-dimensional network model is trained based on the difference to obtain a target three-dimensional network model; wherein the target three-dimensional network model is used to perform key point detection on the object to be detected to obtain the position of the key points of the object to be detected on the object to be detected.
- the present application also provides an electronic device, the electronic device comprising:
- a memory configured to store computer executable instructions
- the processor is configured to execute the computer executable instructions stored in the memory to implement the key point detection method or the three-dimensional network model training method described above in the embodiment of the present application, for example, the key point detection method shown in FIG. 3, or the three-dimensional network model training method shown in FIG. 11.
- the embodiment of the present application provides a computer program product or a computer program, which includes computer executable instructions, and the computer executable instructions are stored in a computer-readable storage medium.
- the processor of the electronic device reads the computer executable instructions from the computer-readable storage medium, and the processor executes the computer executable instructions, so that the electronic device executes the key point detection method or the three-dimensional network model training method described in the embodiment of the present application, for example, the key point detection method shown in FIG. 3, or the three-dimensional network model training method shown in FIG. 11.
- An embodiment of the present application provides a computer-readable storage medium storing computer-executable instructions, wherein computer-executable instructions are stored.
- the processor will execute the key point detection method provided in the embodiment of the present application, or the training method of a three-dimensional network model, for example, the key point detection method shown in FIG. 3, or the training method of a three-dimensional network model shown in FIG. 11.
- the computer-readable storage medium may be a memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface storage, optical disk, or CD-ROM; or it may be various devices including one or any combination of the above memories.
- computer executable instructions may be in the form of a program, software, software module, script or code, written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages), and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine or other unit suitable for use in a computing environment.
- computer-executable instructions may, but do not necessarily, correspond to a file in a file system, may be stored as part of a file that stores other programs or data, such as in one or more scripts in a Hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files storing one or more modules, subroutines, or code portions).
- HTML Hypertext Markup Language
- the executable instructions may be deployed to be executed on one electronic device, or on multiple electronic devices located at one site, or on multiple electronic devices distributed at multiple sites and interconnected by a communication network.
- Richer feature information of the object to be detected is extracted through multiple feature extraction layers, and then the key points of the object to be detected are detected based on the rich feature information, so that the accuracy of 3D key point detection is significantly improved.
- GAT does not rely on the complete graph structure, but only on the characteristics of the edges, which improves the flexibility of the key point detection process.
- the attention mechanism can also assign different weights to different neighbor nodes, which improves the accuracy of the key point detection process.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
- Processing Or Creating Images (AREA)
Abstract
Description
h={h1,h2,……,hN},h∈RF 公式(1);
eij=Attention(Whi,Whj) 公式(2);
hi`=σαijWhj 公式(5);
Claims (18)
- 一种关键点检测方法,所述方法由电子设备执行,所述方法包括:获得用于表征待检测对象的三维网格,并确定所述三维网格的顶点、以及顶点间的连接关系;对所述三维网格的顶点进行特征提取,得到所述三维网格的顶点特征;基于所述顶点特征,对所述待检测对象进行全局特征提取,得到所述待检测对象的全局特征,并基于所述顶点特征、以及所述顶点间的连接关系,对所述待检测对象进行局部特征提取,得到所述待检测对象的局部特征;基于所述顶点特征、所述全局特征以及所述局部特征,得到所述待检测对象的关键点在所述待检测对象上的位置。
- 如权利要求1所述的方法,其中,所述获得用于表征待检测对象的三维网格,包括:通过三维扫描装置对所述待检测对象进行扫描,得到所述待检测对象的几何表面的点云数据;基于所述点云数据,构建对应所述待检测对象的三维网格。
- 如权利要求1-2所述的方法,其中,所述基于所述顶点特征、以及所述顶点间的连接关系,对所述待检测对象进行局部特征提取,得到所述待检测对象的局部特征,包括:基于所述顶点特征、以及所述顶点间的连接关系,确定各所述顶点的局部特征;基于各所述顶点的局部特征,确定所述待检测对象的局部特征。
- 如权利要求3所述的方法,其中,所述基于所述顶点特征、以及所述顶点间的连接关系,确定各所述顶点的局部特征,包括:针对各所述顶点执行以下处理:将所述顶点确定为参考顶点,并基于所述三维网格中各顶点的顶点特征,确定所述参考顶点的顶点特征、以及其它顶点的顶点特征;其中,所述其它顶点为除所述参考顶点以外的任一顶点;基于所述参考顶点的顶点特征、所述其它顶点的顶点特征、以及所述顶点间的连接关系,确定所述参考顶点与所述其它顶点间的相关值;其中,所述相关值用于指示所述参考顶点与所述其它顶点间的相关程度的大小;基于所述相关值、以及所述其它顶点的顶点特征,确定所述参考顶点的局部特征。
- 如权利要求4所述的方法,其中,所述基于所述参考顶点的顶点特征、所述其它顶点的顶点特征、以及所述顶点间的连接关系,确定所述参考顶点与所述其它顶点间的相关值,包括:基于所述参考顶点的顶点特征、所述其它顶点的顶点特征、以及所述顶点间的连接关系,采用注意力机制确定所述参考顶点与所述其它顶点间的相关程度;对所述相关程度进行归一化处理,得到所述参考顶点与所述其它顶点间的相关值。
- 如权利要求4所述的方法,其中,当所述其它顶点的数量为一个时,所述基于所述相关值、以及所述其它顶点的顶点特征,确定所述参考顶点对应的局部特征,包括:将所述相关值与所述其它顶点的顶点特征进行求积处理,得到求积结果;基于所述求积结果,确定所述参考顶点对应的局部特征。
- 如权利要求4所述的方法,其中,当所述其它顶点的数量为多个时,所述基于所述相关值、以及所述其它顶点的顶点特征,确定所述参考顶点对应的局部特征,包括:针对各所述其它顶点,将所述相关值与相应所述其它顶点的顶点特征进行求积处理,得到所述其它顶点的求积结果;对各所述其它顶点的求积结果进行累计求和,得到求和结果;基于所述求和结果,确定所述参考顶点对应的局部特征。
- 如权利要求3所述的方法,其中,所述基于各所述顶点的局部特征,确定所述待检 测对象的局部特征,包括:基于各所述顶点的局部特征,对各所述顶点的局部特征进行特征融合,得到融合特征;将所述融合特征作为所述待检测对象的局部特征。
- 如权利要求1-8所述的方法,其中,基于所述顶点特征、所述全局特征以及所述局部特征,对所述待检测对象的关键点进行检测,得到所述待检测对象的关键点在所述待检测对象上的位置,包括:对所述顶点特征、所述全局特征以及所述局部特征进行特征拼接,得到所述待检测对象的拼接特征;基于所述拼接特征,对所述待检测对象的关键点进行检测,得到所述待检测对象的关键点在所述待检测对象上的位置。
- 如权利要求1-9所述的方法,其中,所述基于所述顶点特征、所述全局特征以及所述局部特征,对所述待检测对象的关键点进行检测,得到所述待检测对象的关键点在所述待检测对象上的位置,包括:基于所述顶点特征、所述全局特征以及所述局部特征,对所述待检测对象的关键点进行检测,得到所述关键点在所述三维网格中各所述顶点的概率;基于所述概率,生成对应所述三维网格的三维热力图;基于所述三维热力图,确定所述待检测对象的关键点在所述待检测对象上的位置。
- 如权利要求1-10所述的方法,其中,所述方法应用于三维网络模型,所述三维网络模型至少包括第一特征提取层、第二特征提取层、第三特征提取层以及输出层,所述对所述三维网格的顶点进行特征提取,得到所述三维网格的顶点特征,包括:通过所述第一特征提取层,对所述三维网格的顶点进行特征提取,得到所述三维网格的顶点特征;所述基于所述顶点特征,对所述待检测对象进行全局特征提取,得到所述待检测对象的全局特征,并基于所述顶点特征、以及所述顶点间的连接关系,对所述待检测对象进行局部特征提取,得到所述待检测对象的局部特征,包括:通过所述第二特征提取层,基于所述顶点特征,对所述待检测对象进行全局特征提取,得到所述待检测对象的全局特征,并通过所述第三特征提取层,基于所述顶点特征、以及所述顶点间的连接关系,对所述待检测对象进行局部特征提取,得到所述待检测对象的局部特征;所述基于所述顶点特征、所述全局特征以及所述局部特征,对所述待检测对象的关键点进行检测,得到所述待检测对象的关键点在所述待检测对象上的位置,包括:通过所述输出层,基于所述顶点特征、所述全局特征以及所述局部特征,对所述待检测对象的关键点进行检测,得到所述待检测对象的关键点在所述待检测对象上的位置。
- 如权利要求11所述的方法,其中,所述三维网络模型还包括第一特征拼接层、第二特征拼接层、第四特征提取层,所述通过所述输出层,基于所述顶点特征、所述全局特征以及所述局部特征,对所述待检测对象的关键点进行检测,得到所述待检测对象的关键点在所述待检测对象上的位置,包括:通过所述第一特征拼接层,对所述顶点特征、所述全局特征以及所述局部特征进行特征拼接,得到所述待检测对象的拼接特征;通过所述第四特征提取层,基于所述拼接特征,对所述待检测对象进行局部特征提取,得到所述待检测对象的目标局部特征;通过所述第二特征拼接层,对所述拼接特征、所述全局特征以及所述目标局部特征进行特征拼接,得到所述待检测对象的目标拼接特征;通过所述输出层,基于所述目标拼接特征,对所述待检测对象的关键点进行检测,得到所述待检测对象的关键点在所述待检测对象上的位置。
- 一种三维网络模型的训练方法,所述方法由电子设备执行,所述三维网络模型至少 包括第一特征提取层、第二特征提取层、第三特征提取层以及输出层,所述方法包括:获取携带标签的对象训练样本,所述标签用于指示对象训练样本的关键点的真实位置;获得用于表征所述对象训练样本的训练三维网格,并确定所述训练三维网格的顶点以及顶点间的连接关系;通过所述第一特征提取层,对所述对象训练样本的顶点进行特征提取,得到所述训练三维网格的顶点特征;通过所述第二特征提取层,基于所述训练三维网格的顶点特征,对所述对象训练样本进行全局特征提取,得到所述对象训练样本的全局特征,并通过所述第三特征提取层,基于所述训练三维网格的顶点以及顶点间的连接关系,对所述对象训练样本进行局部特征提取,得到所述对象训练样本的局部特征;通过所述输出层,基于所述训练三维网格的顶点特征、所述对象训练样本的全局特征以及所述对象训练样本的局部特征,对所述对象训练样本的关键点进行检测,得到所述对象训练样本的关键点在所述对象训练样本上的位置;获取所述对象训练样本的关键点的位置与所述标签的差异,并基于所述差异训练所述三维网络模型,得到目标三维网络模型;其中,所述目标三维网络模型用于对待检测对象进行关键点检测,得到所述待检测对象的关键点在所述对象训练样本上的位置。
- 一种关键点检测装置,所述装置包括:获得模块,配置为获得用于表征待检测对象的三维网格,并确定所述三维网格的顶点、以及顶点间的连接关系;第一特征提取模块,配置为对所述三维网格的顶点进行特征提取,得到所述三维网格的顶点特征;第二特征提取模块,配置为基于所述顶点特征,对所述待检测对象进行全局特征提取,得到所述待检测对象的全局特征,并基于所述顶点特征、以及所述顶点间的连接关系,对所述待检测对象进行局部特征提取,得到所述待检测对象的局部特征;输出模块,配置为基于所述顶点特征、所述全局特征以及所述局部特征,对所述待检测对象的关键点进行检测,得到所述待检测对象的关键点在所述待检测对象上的位置。
- 一种三维网络模型的训练装置,所述三维网络模型至少包括第一特征提取层、第二特征提取层、第三特征提取层以及输出层,所述装置包括:获取模块,配置为获取携带标签的对象训练样本,所述标签用于指示对象训练样本的关键点的真实位置;获得模块,配置为获得用于表征所述对象训练样本的训练三维网格,并确定所述训练三维网格的顶点以及顶点间的连接关系;第一特征提取模块,配置为通过所述第一特征提取层,对所述对象训练样本的顶点进行特征提取,得到所述训练三维网格的顶点特征;第二特征提取模块,配置为通过所述第二特征提取层,基于所述训练三维网格的顶点特征,对所述对象训练样本进行全局特征提取,得到所述对象训练样本的全局特征,并通过所述第三特征提取层,基于所述训练三维网格的顶点以及顶点间的连接关系,对所述对象训练样本进行局部特征提取,得到所述对象训练样本的局部特征;输出模块,配置为通过所述输出层,基于所述训练三维网格的顶点特征、所述对象训练样本的全局特征以及所述对象训练样本的局部特征,对所述对象训练样本的关键点进行检测,得到所述对象训练样本的关键点在所述对象训练样本上的位置;更新模块,配置为获取所述对象训练样本的关键点的位置与所述标签的差异,并基于所述差异训练所述三维网络模型,得到目标三维网络模型;其中,所述目标三维网络模型用于对待检测对象进行关键点检测,得到所述待检测对象的关键点在所述待检测对象上的位置。
- 一种电子设备,包括:存储器,配置为存储计算机可执行指令;处理器,配置为执行所述存储器中存储的计算机可执行指令时,实现权利要求1至12任一项所述的关键点检测方法,或者权利要求13所述的三维网络模型的训练方法。
- 一种计算机可读存储介质,存储有计算机可执行指令,用于引起处理器执行时,实现权利要求1至12任一项所述的关键点检测方法,或者权利要求13所述的三维网络模型的训练方法。
- 一种计算机程序产品,包括计算机程序或计算机可执行指令,所述计算机程序或计算机可执行指令被处理器执行时,实现权利要求1至12任一项所述的关键点检测方法,或者权利要求13所述的三维网络模型的训练方法。
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2025519628A JP2025534442A (ja) | 2022-12-09 | 2023-11-06 | キーポイント検出方法、訓練方法、装置、電子機器、及びコンピュータプログラム |
| EP23899677.1A EP4567724A4 (en) | 2022-12-09 | 2023-11-06 | KEY POINT DETECTION METHOD, DRIVE METHOD AND APPARATUS, ELECTRONIC DEVICE, COMPUTER-READABLE STORAGE MEDIA AND COMPUTER PROGRAM PRODUCT |
| US18/793,553 US20240394918A1 (en) | 2022-12-09 | 2024-08-02 | Keypoint detection method, training method, apparatus, electronic device, computer-readable storage medium, and computer program product |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211576832.9 | 2022-12-09 | ||
| CN202211576832.9A CN115578393B (zh) | 2022-12-09 | 2022-12-09 | 关键点检测方法、训练方法、装置、设备、介质及产品 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/793,553 Continuation US20240394918A1 (en) | 2022-12-09 | 2024-08-02 | Keypoint detection method, training method, apparatus, electronic device, computer-readable storage medium, and computer program product |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024120096A1 true WO2024120096A1 (zh) | 2024-06-13 |
Family
ID=84590570
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2023/129915 Ceased WO2024120096A1 (zh) | 2022-12-09 | 2023-11-06 | 关键点检测方法、训练方法、装置、电子设备、计算机可读存储介质及计算机程序产品 |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US20240394918A1 (zh) |
| EP (1) | EP4567724A4 (zh) |
| JP (1) | JP2025534442A (zh) |
| CN (1) | CN115578393B (zh) |
| WO (1) | WO2024120096A1 (zh) |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115578393B (zh) * | 2022-12-09 | 2023-03-10 | 腾讯科技(深圳)有限公司 | 关键点检测方法、训练方法、装置、设备、介质及产品 |
| CN115830642B (zh) * | 2023-02-13 | 2024-01-12 | 粤港澳大湾区数字经济研究院(福田) | 2d全身人体关键点标注方法及3d人体网格标注方法 |
| CN116091570B (zh) * | 2023-04-07 | 2023-07-07 | 腾讯科技(深圳)有限公司 | 三维模型的处理方法、装置、电子设备、及存储介质 |
| CN117932607B (zh) * | 2024-03-20 | 2024-09-24 | 山东省计算中心(国家超级计算济南中心) | 一种勒索软件检测方法、系统、介质及设备 |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109993819A (zh) * | 2019-04-09 | 2019-07-09 | 网易(杭州)网络有限公司 | 虚拟角色的蒙皮方法及装置、电子设备 |
| CN111179419A (zh) * | 2019-12-31 | 2020-05-19 | 北京奇艺世纪科技有限公司 | 三维关键点预测及深度学习模型训练方法、装置及设备 |
| CN112991502A (zh) * | 2021-04-22 | 2021-06-18 | 腾讯科技(深圳)有限公司 | 一种模型训练方法、装置、设备及存储介质 |
| WO2021213742A1 (en) * | 2020-04-22 | 2021-10-28 | Continental Automotive Gmbh | Method and system for keypoint detection based on neural networks |
| CN115238723A (zh) * | 2022-06-29 | 2022-10-25 | 厦门华联电子股份有限公司 | 一种局部顶点检测方法及装置 |
| CN115578393A (zh) * | 2022-12-09 | 2023-01-06 | 腾讯科技(深圳)有限公司 | 关键点检测方法、训练方法、装置、设备、介质及产品 |
Family Cites Families (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8605998B2 (en) * | 2011-05-06 | 2013-12-10 | Toyota Motor Engineering & Manufacturing North America, Inc. | Real-time 3D point cloud obstacle discriminator apparatus and associated methodology for training a classifier via bootstrapping |
| EP3631687A1 (en) * | 2017-07-05 | 2020-04-08 | Siemens Aktiengesellschaft | Semi-supervised iterative keypoint and viewpoint invariant feature learning for visual recognition |
| GB2583687B (en) * | 2018-09-12 | 2022-07-20 | Sony Interactive Entertainment Inc | Method and system for generating a 3D reconstruction of a human |
| JP6659901B2 (ja) * | 2019-08-01 | 2020-03-04 | 株式会社メルカリ | プログラム、情報処理方法、及び情報処理装置 |
| CN111489358B (zh) * | 2020-03-18 | 2022-06-14 | 华中科技大学 | 一种基于深度学习的三维点云语义分割方法 |
| CN112215180B (zh) * | 2020-10-20 | 2024-05-07 | 腾讯科技(深圳)有限公司 | 一种活体检测方法及装置 |
| CN113706480B (zh) * | 2021-08-13 | 2022-12-09 | 重庆邮电大学 | 一种基于关键点多尺度特征融合的点云3d目标检测方法 |
| CN114387445A (zh) * | 2022-01-13 | 2022-04-22 | 深圳市商汤科技有限公司 | 对象关键点识别方法及装置、电子设备和存储介质 |
| CN115082885A (zh) * | 2022-06-27 | 2022-09-20 | 深圳见得空间科技有限公司 | 点云目标的检测方法、装置、设备及存储介质 |
-
2022
- 2022-12-09 CN CN202211576832.9A patent/CN115578393B/zh active Active
-
2023
- 2023-11-06 JP JP2025519628A patent/JP2025534442A/ja active Pending
- 2023-11-06 EP EP23899677.1A patent/EP4567724A4/en active Pending
- 2023-11-06 WO PCT/CN2023/129915 patent/WO2024120096A1/zh not_active Ceased
-
2024
- 2024-08-02 US US18/793,553 patent/US20240394918A1/en active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109993819A (zh) * | 2019-04-09 | 2019-07-09 | 网易(杭州)网络有限公司 | 虚拟角色的蒙皮方法及装置、电子设备 |
| CN111179419A (zh) * | 2019-12-31 | 2020-05-19 | 北京奇艺世纪科技有限公司 | 三维关键点预测及深度学习模型训练方法、装置及设备 |
| WO2021213742A1 (en) * | 2020-04-22 | 2021-10-28 | Continental Automotive Gmbh | Method and system for keypoint detection based on neural networks |
| CN112991502A (zh) * | 2021-04-22 | 2021-06-18 | 腾讯科技(深圳)有限公司 | 一种模型训练方法、装置、设备及存储介质 |
| CN115238723A (zh) * | 2022-06-29 | 2022-10-25 | 厦门华联电子股份有限公司 | 一种局部顶点检测方法及装置 |
| CN115578393A (zh) * | 2022-12-09 | 2023-01-06 | 腾讯科技(深圳)有限公司 | 关键点检测方法、训练方法、装置、设备、介质及产品 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4567724A4 |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4567724A4 (en) | 2025-12-10 |
| EP4567724A1 (en) | 2025-06-11 |
| CN115578393A (zh) | 2023-01-06 |
| US20240394918A1 (en) | 2024-11-28 |
| JP2025534442A (ja) | 2025-10-15 |
| CN115578393B (zh) | 2023-03-10 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2024120096A1 (zh) | 关键点检测方法、训练方法、装置、电子设备、计算机可读存储介质及计算机程序产品 | |
| US12131416B2 (en) | Pixel-aligned volumetric avatars | |
| WO2024032464A1 (zh) | 三维人脸重建方法及其装置、设备、介质、产品 | |
| CN112785712B (zh) | 三维模型的生成方法、装置和电子设备 | |
| CN113822965B (zh) | 图像渲染处理方法、装置和设备及计算机存储介质 | |
| JP7701932B2 (ja) | 複数の特徴タイプに基づく効率的位置特定 | |
| CN113593001A (zh) | 目标对象三维重建方法、装置、计算机设备和存储介质 | |
| CN114820907B (zh) | 人脸图像卡通化处理方法、装置、计算机设备和存储介质 | |
| CN115994944B (zh) | 关键点预测模型的训练方法、三维关键点预测方法及相关设备 | |
| US20250225777A1 (en) | Three-dimensional model processing method and apparatus, electronic device, and computer storage medium | |
| CN112463936A (zh) | 一种基于三维信息的视觉问答方法及系统 | |
| CN110490959A (zh) | 三维图像处理方法及装置、虚拟形象生成方法以及电子设备 | |
| CN116977548A (zh) | 三维重建方法、装置、设备及计算机可读存储介质 | |
| WO2024179446A1 (zh) | 一种图像处理方法以及相关设备 | |
| CN120726243B (zh) | 场景补全方法、装置、电子设备和存储介质 | |
| Yang et al. | Architectural sketch to 3D model: An experiment on simple-form houses | |
| CN120339515A (zh) | 图像处理方法、电子设备及计算机可读存储介质 | |
| CN118864676A (zh) | 三维模型生成方法及装置、计算机程序产品和电子设备 | |
| HK40080387A (zh) | 关键点检测方法、训练方法、装置、设备、介质及产品 | |
| HK40080387B (zh) | 关键点检测方法、训练方法、装置、设备、介质及产品 | |
| CN116029912A (zh) | 图像处理模型的训练、图像处理方法、装置、设备及介质 | |
| CN116524106B (zh) | 一种图像标注方法、装置、设备及存储介质、程序产品 | |
| EP4600905A1 (en) | Method and apparatus for determining three-dimensional layout information, device, and storage medium | |
| CN117809357B (zh) | 一种眼球模型的确定方法、装置及电子设备 | |
| CN117557699B (zh) | 动画数据生成方法、装置、计算机设备和存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23899677 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023899677 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2023899677 Country of ref document: EP Effective date: 20250303 |
|
| ENP | Entry into the national phase |
Ref document number: 2025519628 Country of ref document: JP Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2025519628 Country of ref document: JP |
|
| WWP | Wipo information: published in national office |
Ref document number: 2023899677 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |