WO2023185241A1 - 数据处理方法、装置、设备以及介质 - Google Patents

数据处理方法、装置、设备以及介质 Download PDF

Info

Publication number
WO2023185241A1
WO2023185241A1 PCT/CN2023/073976 CN2023073976W WO2023185241A1 WO 2023185241 A1 WO2023185241 A1 WO 2023185241A1 CN 2023073976 W CN2023073976 W CN 2023073976W WO 2023185241 A1 WO2023185241 A1 WO 2023185241A1
Authority
WO
WIPO (PCT)
Prior art keywords
posture
image frame
detection result
pose
global
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2023/073976
Other languages
English (en)
French (fr)
Inventor
张亮
马名浪
徐湛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to JP2024556677A priority Critical patent/JP7792532B2/ja
Priority to EP23777620.8A priority patent/EP4411641A4/en
Priority to US18/238,321 priority patent/US20230401740A1/en
Publication of WO2023185241A1 publication Critical patent/WO2023185241A1/zh
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4007Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4023Scaling of whole images or parts thereof, e.g. expanding or contracting based on decimating pixels or lines of pixels; based on inserting pixels or lines of pixels
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular, to a data processing method, device, equipment and medium.
  • Computer vision technology (Computer Vision, CV) is a science that studies how to make machines "see”. Furthermore, it refers to using cameras and computers instead of human eyes to perform machine vision such as identification and measurement, and further performs graphics processing. , so that computer processing becomes an image more suitable for human eye observation or transmitted to instrument detection. As a scientific discipline, computer vision studies related theories and technologies, trying to build artificial intelligence systems that can obtain information from images or multi-dimensional data.
  • Posture estimation can detect the positions of key points in pictures or videos, and has wide application value in film animation, assisted driving, virtual reality, action recognition and other fields.
  • the final object pose can be constructed by detecting key points in images or videos and based on the detected key points and object constraint relationships.
  • Embodiments of the present application provide a data processing method, device, equipment and medium, which can improve the accuracy of object posture estimation.
  • the embodiment of the present application provides a data processing method, which is executed by a computer device, including:
  • the object posture detection result corresponding to the object in the image frame and the part posture detection result corresponding to the first object part of the object in the image frame; wherein at least one object part of the object is missing from the object posture detection result , the first object part is one or more parts of the object;
  • At least one object part missing in the object posture detection result is interpolated to obtain a global posture corresponding to the object, wherein the global posture is used to control the computer device To realize the business functions corresponding to the global posture.
  • the embodiment of the present application also provides a data processing device, including:
  • the posture detection module is used to obtain the object posture detection result corresponding to the object in the image frame, and the part posture detection result corresponding to the first object part of the object in the image frame; wherein, the object posture detection result is missing the At least one object part of the object, the first object part is one or more parts of the object;
  • a posture estimation module configured to interpolate at least one object part missing in the object posture detection result according to the part posture detection result and the standard posture associated with the object, to obtain the global posture corresponding to the object, wherein the global posture Gestures are used to control computer equipment to implement business functions corresponding to the global postures.
  • Embodiments of the present application also provide a computer device, including a memory and a processor.
  • the memory is connected to the processor.
  • the memory is used to store computer programs.
  • the processor is used to call the computer program, so that the computer device executes the above-mentioned steps in the embodiments of the present application. Methods.
  • Embodiments of the present application also provide a computer-readable storage medium.
  • a computer program is stored in the computer-readable storage medium.
  • the computer program is adapted to be loaded and executed by a processor, so that a computer device having a processor executes the embodiment of the present application. the above method.
  • Embodiments of the present application also provide a computer program product or computer program.
  • the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the above method.
  • Figure 1 is a schematic structural diagram of a network architecture provided by an embodiment of the present application.
  • Figure 2 is a schematic diagram of an object pose estimation scene for video data provided by an embodiment of the present application
  • Figure 3 is a schematic flow chart of a data processing method provided by an embodiment of the present application.
  • Figure 4 is a schematic diagram of a standard posture provided by the embodiment of the present application.
  • Figure 5 is a schematic diagram of a scene for object pose estimation provided by an embodiment of the present application.
  • Figure 6 is a schematic diagram of an application scenario of a global posture provided by an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of another data processing method provided by an embodiment of the present application.
  • Figure 8 is a schematic structural diagram of an object detection model provided by an embodiment of the present application.
  • Figure 9 is a schematic flowchart of obtaining object posture detection results provided by an embodiment of the present application.
  • Figure 10 is a schematic flowchart of obtaining part posture detection results provided by an embodiment of the present application.
  • Figure 11 is a schematic diagram of correction of key points of an object provided by an embodiment of the present application.
  • Figure 12 is a schematic flowchart of an object pose estimation provided by an embodiment of the present application.
  • Figure 13 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • Figure 14 is a schematic structural diagram of another data processing device provided by an embodiment of the present application.
  • Figure 15 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • Pose estimation is an important task in computer vision and an indispensable step for computers to understand object actions and behaviors.
  • Pose estimation can be converted into predictions of key points of the object. For example, the position coordinates of each object key point in the image can be predicted, and the object skeleton in the image can be predicted based on the positional relationship between the key points of each object.
  • the pose estimation involved in this application may include object pose estimation for the object, and part pose estimation for the specific parts of the object, etc.
  • the objects may include but are not limited to: human body, animals, plants, etc.
  • the specific parts of the object may be Palms, faces, animal limbs, plant roots, etc. This application does not limit the types of objects.
  • the image or video When the image or video is shot in a mobile scene, the image or video may only contain part of the object.
  • the extracted Insufficient part information results in the final object posture result not being the complete posture of the object, which affects the integrity of the object posture.
  • the object pose detection results for the object and the part pose detection results for the first object part of the object can be obtained, and then the object pose detection results can be obtained
  • the pose estimation of the object in the image frame can compensate for the missing key points of the object in the image frame and ensure the integrity of the final global pose of the object. and rationality, which can improve the accuracy of global pose estimation.
  • Figure 1 is a schematic structural diagram of a network architecture provided by an embodiment of the present application.
  • the network architecture may include a server 10d and a user terminal cluster.
  • the user terminal cluster may include one or more user terminals. There is no limit on the number of user terminals here.
  • the user terminal cluster may specifically include user terminal 10a, user terminal 10b, user terminal 10c, etc.
  • the server 10d can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or it can provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud services, etc. Communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms Cloud servers for basic cloud computing services such as Taiwan.
  • the user terminal 10a, the user terminal 10b, the user terminal 10c, etc. may include: smart phones, tablet computers, notebook computers, handheld computers, mobile internet devices (mobile internet devices, MID), wearable devices (such as smart watches, smart bracelets) etc.), intelligent voice interaction devices, smart home appliances (such as smart TVs, etc.), vehicle-mounted equipment and other electronic devices with object posture estimation functions.
  • the user terminal 10a, the user terminal 10b, the user terminal 10c, etc. can each have a network connection with the server 10d, so that each user terminal can perform data interaction with the server 10d through the network connection.
  • the user terminal for example, the user terminal 10a in the user terminal cluster is integrated with an application client with an object pose estimation function.
  • the application client may include but is not limited to: a multimedia client (for example, a short video client, live video client, video client), object management applications (e.g., patient care client).
  • the application client in the user terminal 10a can obtain video data.
  • the video data may refer to the video of the object photographed in the mobile scene.
  • the camera integrated in the user terminal 10a is used to photograph the object to obtain the video data, or the video data is obtained by using a camera integrated with the user terminal 10a.
  • the camera equipment for example, SLR, camera, etc. connected to the terminal 10a captures the object to obtain video data.
  • the images in the video data may only contain part of the object.
  • the objects in the video data may only include the upper body of the human body, or only the head of the human body, etc.; when estimating the object pose of the object in the video data, it is necessary to perform pose repair on the object contained in the video data to obtain the corresponding global pose of the object.
  • the accuracy of the global pose can also be improved.
  • the global posture may also be called a complete posture, which refers to a posture that includes all parts of the object, that is, the posture corresponding to the complete object.
  • the object posture estimation process involved in the embodiment of the present application can be executed by a computer device, which can be a user terminal in the user terminal cluster shown in Figure 1, or the server 10d shown in Figure 1; in short , the computer device may be a user terminal, or a server, or a combined device composed of a server and a user terminal, which is not limited in this application.
  • Figure 2 is a schematic diagram of an object pose estimation scene for video data provided by an embodiment of the present application.
  • the user terminal 10a shown in Figure 1 the object pose estimation process in the video is described; as shown in Figure 2, the user terminal 10a can obtain video data 20a, and the video data 20a can be integrated through the user terminal 10a
  • the first image frame (i.e., image frame T1) can be obtained from N image frames in chronological order, and the image frame T1 is input to the object detection model 20b, and the object detection model 20b is used to perform object detection on the image frame T1, Obtain the object posture detection result 20c corresponding to the image frame T1; the object posture detection result 20c may include the key points of the object contained in the image frame T1 (for convenience of description, the key points of the object are referred to as object key points below), and The positions of these object key points in the image frame T1; the object posture detection result 20c can also include the first confidence corresponding to each detected object key point, and the first confidence can be used to characterize the detected object. For the prediction accuracy of key points, the greater the first confidence level, the more accurate the detected key points of the object are, and the more likely they are the real key points of the object.
  • the object key points corresponding to the object can be considered as joint points in the human body structure.
  • the number of key points and key point categories of the object can be predefined.
  • the human body structure can include Multiple object points on limbs, brain, waist, chest, etc.
  • the image frame T1 may contain all the object key points of the object; when the image frame T1 contains only a partial structure of the object, the image frame T1 may contain part of the object. key point.
  • the object detection model 20b can be a pre-trained network model with an object detection function for videos/images; when the object is a human body, the object detection model 20b can also be called a human posture estimation model.
  • the human body posture 20j of the object in the image frame T1 can be obtained through the object posture detection result 20c. Since the human body posture 20j is missing some key points of the object (lack of human joint points), the user terminal 10a can obtain the standard posture 20k corresponding to the object, based on This standard posture 20k can perform key point compensation on the human body posture 20j to obtain the human body posture 20m corresponding to the object in the image frame T1. Among them, the standard posture 20k can also be considered as the default posture of the object, or referred to as the reference posture. The standard posture 20k can be pre-constructed based on all object key points of the object. For example, the posture of the human body when standing normally (for example, global Attitude) is determined to be the standard attitude 20k.
  • the posture of the human body when standing normally for example, global Attitude
  • the image frame T1 can also be input to the part detection model 20d.
  • the part detection model 20d detects a specific part (for example, the first object part) of the object in the image frame T1 to obtain the part posture detection corresponding to the image frame T1. Result 20e.
  • the detection can continue The key points of the first object part and the positions of the key points, and according to the key point categories and key point positions of the first object part, the detected key points of the first object part can be connected and marked in the image frame T1
  • the result after connection is the part posture detection result 20e.
  • the number of key points and key point categories corresponding to the first object part can also be predefined; when the object is a human body, the part detection model 20d can be a palm posture estimation model (the first object part is the palm here) , for example, the palm can include palm key points and finger key points; the part detection model 20d can be a pre-trained network model, capable of object part detection for videos/images Function, for the convenience of description, the key points of the first object part are called part key points below.
  • the part pose detection result 20e carries a second confidence level.
  • This second confidence level can be used to characterize the possibility that the detected object part is the first object part.
  • the image frame can be determined through the part detection model 20d.
  • the second confidence level for area 20f in T1 is 0.01 for the first object part
  • the second confidence level for area 20g is 0.09 for the first object part
  • the second confidence level for area 20h is 0.86 for the first object part
  • the second confidence level for area 20i is 0.86.
  • the second confidence level for the first object part is 0.84.
  • the greater the second confidence level the greater the possibility that the area is the first object part.
  • the user terminal 10a can combine the object posture detection result 20c and the part posture detection result 20e to perform interpolation processing on the partially missing object parts, and obtain a reasonable object key point through the interpolation processing.
  • the part pose detection result 20e is the key point of the palm
  • the object pose detection result 20c and the part pose detection result 20e can be combined to interpolate the wrist, elbow and other parts of the object missing in the image frame T1 to complete the human body of the object.
  • the posture is 20m
  • the human body posture 20n also called the global posture
  • the same method can be used to estimate the object posture for subsequent image frames in the video data 20a to obtain the global posture corresponding to the object in each image frame.
  • the behavior of the object in the video data 20a can be obtained.
  • the video data 20a can also be a video captured in real time, and the user terminal 10a can perform object posture estimation on the image frames in the video data captured in real time to obtain the behavior of the object in real time.
  • the global pose of the object in the image frame can be estimated through the object detection results output by the object detection model 20b, the part detection results output by the part detection model 20d, and the standard pose 20m. , which can ensure the integrity and rationality of the final global pose of the object, thereby improving the accuracy of global pose estimation.
  • Figure 3 is a schematic flowchart of a data processing method provided by an embodiment of the present application. As shown in Figure 3, the data processing method may include the following steps S101 to S102:
  • Step S101 Obtain the object posture detection result corresponding to the object in the image frame, and the part posture detection result corresponding to the first object part of the object; wherein at least one of the object is missing in the object posture detection result
  • the first object part is one or more parts of the object.
  • the computer device can obtain the video data (for example, the video data 20a in the embodiment corresponding to Figure 2) or image data of the object photographed in the mobile scene; when performing pose estimation on the video data or image data, the computer device
  • the object posture detection result for the object can be obtained by performing object detection on the image frames in the image data or video data (for example, the object posture detection result 20c in the embodiment corresponding to Figure 2 above); at the same time
  • part detection may also be performed on the image frame to obtain a part posture detection result for the first object part of the object (for example, the part posture detection result 20e in the embodiment corresponding to FIG. 2 above).
  • the object may refer to the objects included in the video data, such as the human body, animals, plants, etc.
  • the first object part may refer to one or more parts of the object, such as the face, palms, and animal structures of the human body. limbs, tails, heads, roots of plants, etc.
  • This application does not limit the type of the object or the type of the first object part.
  • the object in the video data or image data may have missing parts, that is, some parts of the object may be missing.
  • the accuracy of the object's posture estimation can be improved.
  • the embodiments of this application take the object being a human body as an example to describe the object posture estimation process of video data or image data.
  • the image data can be used as an image frame; if object pose estimation is performed on the video data in the mobile scene, the video data can be processed into frames to obtain the
  • the N image frames corresponding to the video data N is a positive integer, can then form an image frame sequence containing N image frames according to the time sequence of the N image frames in the video data, and the N images in the image frame sequence can be
  • Object pose estimation is performed frame by frame; for example, after completing the object pose estimation for the first image frame in the image frame sequence, the object pose estimation can be continued for the second image frame in the image frame sequence until the entire video data is completed.
  • Object pose estimation is performed frame by frame; for example, after completing the object pose estimation for the first image frame in the image frame sequence, the object pose estimation can be continued for the second image frame in the image frame sequence until the entire video data is completed.
  • the computer device can obtain the object detection model and the part detection model, input the image frame to the object detection model, and use the object detection model to output the object posture detection result corresponding to the image frame; at the same time, the image frame can also be input to Part detection model, through which the part pose detection results corresponding to the image frame can be output.
  • the object detection model can be used to detect the key points of objects in the image frame (such as human body key points, which can also be called object key points).
  • the object detection model at this time can also be called the human pose estimation model;
  • the object detection model Can include but are not limited to: DensePose (real-time human pose recognition system, used to realize real-time pose recognition of dense crowds), OpenPose (a framework for real-time estimation of multi-person body, face and hand morphology), Realtime Multi-Person Pose Estimation (real-time multi-person pose estimation model), DeepPose (a pose estimation method based on deep neural networks), mobilenetv2 (lightweight deep neural network), this application does not limit the types of object detection models.
  • the part detection model can be used to detect key points of the first object part of the object (such as key points of the palm).
  • the part detection model can also be called a palm pose estimation model; the part detection model can be a detection-based method, or a The regression-based method and the detection-based method can predict the key points of the first object part by generating a heat map.
  • the regression-based method can directly return the position coordinates of the key points of the part; the network structure of the part detection model is the same as the object detection model.
  • the network structures can be the same or different. When the network structures of the part detection model and the object detection model are the same, the network parameters of the two are also different (based on different data trained), this application does not limit the type of part detection model.
  • the above-mentioned object detection model and part detection model can be detection models pre-trained using sample data.
  • the object detection model can be trained using sample data carrying human body key point label information (such as a three-dimensional human body data set). , use sample data (such as palm data set) carrying key point information of the palm to train to obtain the part detection model;
  • the object detection model can be an object detection service called from the artificial intelligence cloud service through the application program interface (API), part detection
  • API application program interface
  • part detection The model can be a part detection service called from the artificial intelligence cloud service through the API interface, and there is no specific limitation here.
  • AI as a Service AI as a Service
  • AIaaS Artificial intelligence cloud services
  • the AIaaS platform will split several common AI services and provide independent or packaged services in the cloud.
  • This service model is similar to opening an AI theme mall: all developers can access one or more artificial intelligence services provided by the platform through API interfaces, and some senior developers can also use the platform.
  • the object detection model used in the embodiments of the present application can be a human body three-dimensional pose estimation model with confidence.
  • the object detection model can predict the object key points of the object in the image frame. Each predicted The object key points of can correspond to a first confidence level.
  • the first confidence level can be used to characterize the prediction accuracy of each predicted object key point.
  • the predicted object key points and the corresponding first confidence level can be called The object pose detection result corresponding to the image frame.
  • the part detection model can be a three-dimensional palm pose estimation model with confidence.
  • the part detection model can predict the position area of the first object part in the image frame, and predict the key points of the first object part in the position area;
  • the part detection model can predict one or more location areas where the first object part may be located.
  • One location area can correspond to a second confidence level, and the second confidence level can be used to characterize the prediction of each predicted location area.
  • the second confidence level corresponding to the predicted part key points and location area can be called the part pose detection result corresponding to the image frame.
  • Step S102 According to the part posture detection result and the standard posture associated with the object, interpolation processing is performed on at least one object part missing in the object posture detection result to obtain a global posture corresponding to the object, wherein the global posture is used for Computer equipment is manipulated to implement business functions corresponding to the global posture.
  • the computer device can obtain the standard posture corresponding to the object (for example, the standard posture 20m in the embodiment corresponding to Figure 2).
  • This standard posture can be considered as the complete default posture of the object (marked as T-pose); the standard posture
  • the number can be one or more, such as the default standing posture of the human body, the default sitting posture of the human body, the default squatting posture of the human body, etc. This application does not limit the type and quantity of standard postures.
  • Figure 4 is a schematic diagram of a standard posture provided by an embodiment of the present application.
  • the model 30a can be expressed as an SMPL (Skinned Multi-Person Linear) model.
  • the model 30a is a parametric human body model that can be applied to different human body structures; the model 30a can include human body joint distribution: 1 root node (serial number is 0 nodes) and 23 joint nodes (nodes represented by serial numbers 1 to 23), where the root node is used to transform the entire human body as a complete rigid body (an object whose volume and shape do not change under the action of force) , 23 joint nodes can be used to describe the deformation of local human body parts.
  • 1 root node serial number is 0 nodes
  • 23 joint nodes nodes represented by serial numbers 1 to 23
  • the root node is used to transform the entire human body as a complete rigid body (an object whose volume and shape do not change under the action of force)
  • 23 joint nodes can be used to describe the deformation of local human
  • the above 1 root node and 23 joint nodes can be used as object key points of the object, based on the category of the object key points (for example, wrist joint points, elbow joint points, palm joint points, ankle joint points, etc. ) and position, connecting 1 root node and 23 joint nodes, the standard posture 30b can be obtained.
  • the image frame may not contain a complete object. If part of the object (such as the lower limbs of the human body) is not in the image frame, then some of the object's key points are missing in the object pose detection result corresponding to the image frame.
  • the object can be mapped through standard poses. Key point compensation is performed on the object posture detection results to complete the missing object key points to obtain the first candidate object posture corresponding to the object.
  • the part pose detection result includes the part key points of the first object part
  • the part key points in the part pose detection result and the object key points in the object pose detection result can be combined to adjust the first candidate object pose to obtain the object Global pose in image frames.
  • the object pose After obtaining the global pose corresponding to the current image frame, the object pose can be estimated for the next image frame in the video data to obtain the global pose of the object in each image frame of the video data.
  • the computer device can determine the object's behavioral actions based on the object's global posture in the video data, through which the object can be managed or cared for, or human-computer interaction can be performed through the object's behavioral actions.
  • the global posture of objects in video data can be applied in human-computer interaction scenarios (for example, virtual reality, human-computer animation, etc.), content review scenarios, autonomous driving scenarios, virtual live broadcast scenarios, game or movie character action design scenarios, etc.
  • human-computer interaction scenarios images (or videos) of users (objects) can be collected.
  • the machine can be controlled based on the global posture, such as based on a specific human action (by global posture to determine) to execute a specific command.
  • human body movements are obtained through the global posture corresponding to the object to replace expensive motion capture equipment, which can reduce the cost and difficulty of game character action design.
  • the virtual live broadcast scene can mean that the live broadcast screen in the live broadcast room does not directly play the video of the anchor user (object), but plays the video of a virtual object with the same behavior as the anchor user in the live broadcast room. For example, it can be based on the anchor user's global situation.
  • the posture determines the behavior of the anchor user, and then the virtual object can be driven by the behavior of the anchor user, that is, a virtual object with the same behavior and actions as the anchor user is constructed, and the virtual object is used for live broadcast, which can avoid the anchor user appearing in the public In the field of view, the same live broadcast effect as that of real anchor users can be achieved.
  • a computer device can construct an image associated with an object based on its global pose in the video data.
  • Virtual objects, and play virtual objects with this global posture in multimedia applications such as live broadcast rooms, video websites, short video applications, etc.
  • multimedia applications such as live broadcast rooms, video websites, short video applications, etc.
  • videos about virtual objects can be played in multimedia applications
  • the posture of the virtual object is consistent with
  • the object's pose remains synchronized in the video data.
  • the global posture corresponding to the object in the video data will be reflected on the virtual object played in the multimedia application. Every time the object changes its posture, it will drive the virtual object in the multimedia application to transform into the same posture (which can be considered as a new posture). Construct a virtual object with a new posture, where the new posture is the posture after the object changes), so that the posture of the object and the virtual object are always consistent.
  • Figure 5 is a schematic diagram of a scene for object pose estimation provided by an embodiment of the present application. Taking the virtual live broadcast scene as an example, the object pose estimation process of video data is described; as shown in Figure 5, when the anchor user 40c (can be used as an object) needs to live broadcast, he can enter the live broadcast room (such as the live broadcast room with room number 116889). time), before starting the live broadcast, the anchor user 40c can choose the live broadcast mode or the virtual live broadcast mode. If the anchor user 40c selects the virtual live broadcast mode, the virtual object can be pulled. When the anchor user 40c starts the live broadcast, the behavior of the anchor user 40c can be used to drive the virtual object, so that the virtual object maintains the same posture as the anchor user 40c.
  • the anchor user 40c can collect his own video data through the user terminal 40a (for example, a smartphone). At this time, the anchor user 40a can be used as an object, and the user terminal 40a can be fixed using a bracket 40b. After the user terminal 40a collects the video data of the anchor user 40c, it can obtain the image frame 40g from the video data, and input the image frame 40g into the object detection model and the part detection model respectively. The image frame 40g can be predicted by the object detection model.
  • the part joint points (object key points) of the anchor user 40c contained in these predicted part joint points can be used as the object posture detection results of the image frame 40g; the anchor included in the image frame 40g can be predicted through the part detection model
  • the palm key points of user 40c (the default first object part here is the palm, and the palm key points can also be called part key points), these predicted palm key points can be used as the part pose detection results of the image frame 40g; the object here
  • the posture detection results and the part posture detection results may be marked in the image frame 40g (as shown in the image 40h), where the area 40i and the area 40j in the image 40h represent the above-mentioned part posture detection results.
  • the human body posture 40k of the anchor user 40c in the image frame 40g can be obtained; obviously, since the image frame 40g only contains the anchor user 40c The upper body of the body, so the human body posture 40k is not the complete human body posture of the anchor user 40c.
  • the standard pose (the complete default pose of the human body) can be obtained, and the joint points of the human body pose 40k can be interpolated through the standard pose to complete the missing joint points in the human body pose 40k, and obtain the overall human body pose for the anchor user 40c 40m (global attitude).
  • the virtual objects in the live broadcast room can be driven by the overall human body posture 40m, so that the virtual objects in the live broadcast room can be 40m It has the same overall human body posture 40k as the anchor user 40c.
  • the display page of the live broadcast room where the virtual object is located can be displayed on the user terminal 40d they use.
  • the display page of the live broadcast room can include an area 40e and an area 40f.
  • the area 40e can be used to play virtual objects.
  • the area 40f can be used to post barrages, etc.
  • users who enter the live broadcast room to watch the live broadcast can only see the video of the virtual object and the voice data of the anchor user 40c, but cannot see the video data of the anchor user 40c. This can protect the personal information of the anchor user 40c. And use virtual objects to achieve the same live broadcast effect as the anchor user 40c.
  • the above-mentioned global posture of the object in the video data can be applied in the content review scenario.
  • the global posture is the same as the posture in the content review system, it can be determined that the review result of the object in the content review system is review. Pass the results and set access rights for the content review system for the object; after the global posture passes the review in the content review system, the object can have access rights to the content review system.
  • Figure 6 is a schematic diagram of an application scenario of a global posture provided by an embodiment of the present application.
  • user A (subject) can send a verification request to server 50d through user terminal 50a.
  • server 50d After receiving the verification request sent by user terminal 50a, server 50d can obtain the identity verification method for user A and send the verification request to server 50d.
  • the identity verification mode returns to the user terminal 50a, and the verification box 50b may be displayed in the terminal screen of the user terminal 50a.
  • User A can face the verification box 50b in the user terminal 50a and make a specific action (for example, raising a hand, kicking a leg, placing arms on hips, etc.), and the user terminal 50a can collect the image to be verified 50c in the verification box 50b in real time (can considered to be the above image frame), and the image 50c collected in real time to be verified is sent to the server 50d.
  • a specific action for example, raising a hand, kicking a leg, placing arms on hips, etc.
  • the server 50d can obtain the image to be verified 50c sent by the user terminal 50a, and obtain the gesture 50e set by user A in the content review system in advance. This gesture 50e can be used as the verification information of user A in the content review system.
  • the server 50d can use the object detection model, the part detection model and the standard posture to perform posture estimation on the image to be verified 50c to obtain the global posture of user A in the image to be verified 50c; and compare the global posture corresponding to the image to be verified 50c with the posture 50e.
  • the similarity threshold for example, the similarity threshold can be set to 90%
  • the global pose of the image 50c to be verified and the pose 50e can be determined.
  • user A passed the review in the content review system.
  • the similarity between the global posture of the image to be verified 50c and the posture 50e is less than the similarity threshold, it can be determined that the global posture of the image to be verified 50c is not the same as the posture 50e, and the user A has failed the review in the content review system, and Action error prompt information is returned to the user terminal 50a, and the action error prompt information is used to prompt user A to perform the identity verification again.
  • the object pose detection results for the object and the part pose detection results for the first object part of the object can be obtained, and then the object pose detection results can be obtained Based on the object pose detection results, part pose detection results and standard poses, the image Estimating the pose of the object in the frame can compensate for the missing key points of the object in the image frame, ensuring the integrity and rationality of the final global pose of the object, and thus improving the accuracy of global pose estimation.
  • Figure 7 is a schematic flow chart of another data processing method provided by an embodiment of the present application.
  • the data processing method may include the following steps S201 to S208:
  • Step S201 input the image frame to the object detection model, obtain the object posture characteristics corresponding to the object in the image frame through the object detection model, and identify the first classification result corresponding to the object posture characteristics; the first classification result is used to characterize the key points of the object.
  • the corresponding object part category input the image frame to the object detection model, obtain the object posture characteristics corresponding to the object in the image frame through the object detection model, and identify the first classification result corresponding to the object posture characteristics; the first classification result is used to characterize the key points of the object.
  • the corresponding object part category input the image frame to the object detection model, obtain the object posture characteristics corresponding to the object in the image frame through the object detection model, and identify the first classification result corresponding to the object posture characteristics; the first classification result is used to characterize the key points of the object.
  • the corresponding object part category input the image frame to the object detection model, obtain the object posture characteristics corresponding to the object in the image frame through the object detection model, and identify the first classification result corresponding to the object posture characteristics; the first classification
  • the computer device can select an image frame from the video data, input the image frame to the trained object detection model, and obtain the image frame through the object detection model.
  • the object posture characteristics corresponding to the objects in the object detection model can output the first classification result corresponding to the object posture characteristics through the classifier of the object detection model.
  • the first classification result can be used to characterize the key points of the object (for example, human body joints).
  • Corresponding object part category can be used to characterize the key points of the object (for example, human body joints).
  • Corresponding object part category can be object description features for the object extracted through the object detection model, or may be fusion features between object description features and part description features corresponding to the object.
  • the object posture feature is the object description feature corresponding to the object in the image frame, it indicates that part-aware block learning based on part perception is not introduced in the feature extraction process of the image frame using the object detection model; when the object posture feature is the image frame
  • the fusion features between the object description features and part description features corresponding to the objects in Block-based learning enables object posture features to include not only local posture features (part description features) of various parts of the object contained in the image frame, but also object description features of the objects contained in the object, which can enhance the object posture.
  • the finer granularity of features can improve the accuracy of object pose detection results.
  • the computer device can input the image frame to the object detection model, and in the object detection model Obtain the object description features corresponding to the objects in the image frame, and output the second classification result corresponding to the object description features according to the classifier in the object detection model; obtain the object volume for the image frame output by the convolution layer in the object detection model
  • Product feature perform a product operation on the second classification result and the object convolution feature to obtain the second activation map corresponding to the image frame
  • the object detection model the part description features corresponding to the M object part area images are obtained, M is a positive integer; the object description features and the part description features corresponding to the M object part area images are combined into object posture features.
  • object description features can be considered as feature representations extracted from image frames to characterize objects; Section 1
  • the binary classification result can also be used to represent the object part category corresponding to the object key point contained in the image frame;
  • the convolution layer can refer to the last convolution layer in the object detection model, and the object convolution feature can represent the object detection model.
  • the last convolutional layer outputs the convolution features for the image frame;
  • the second activation map can be the Class Activation Mapping (CAM) corresponding to the image frame.
  • CAM is a tool for visualizing image features.
  • the second activation map can be obtained, which The second activation map can be considered as the result of visualizing the convolution features of the object output by the convolution layer, and can be used to characterize the image pixel area that the object detection model focuses on.
  • the computer device can use the class activation map (second activation map) of each object key point in the image frame as a priori information of the regional location, and perform block processing on the image frame, that is, the image frame can be processed according to the second activation map.
  • Clipping is performed to obtain an object part area image containing a single part; then feature extraction can be performed on each object part area image through the object detection model, and the part description features corresponding to each object part area image are obtained.
  • the aforementioned object description features and The part description features corresponding to each object part area image can be combined into object posture features for the object; the part description features can be considered as feature representations extracted from the object part area image to characterize the object part.
  • Step S202 Generate a first activation map according to the first classification result and the object convolution feature of the image frame output by the object detection model.
  • the computer device may multiply the first classification result and the object convolution feature of the image frame to generate the first activation map.
  • the first activation map and the second activation map are both class activation maps for image frames, but the first activation map uses the first classification result as the weight of the object convolution feature output by the convolution layer ( The default here is that the first classification result combines object description features and part description features), and the second activation map uses the second classification result as the weight of the object convolution feature output by the convolution layer.
  • the second classification result is only related to Relevant to object description characteristics.
  • Step S203 Obtain the pixel average value corresponding to the first activation map, determine the positioning result of the key point in the object in the image frame based on the pixel average value, and determine the object posture detection result corresponding to the image frame according to the object part category and positioning result. .
  • the computer device can take the pixel average of the first activation map, and determine the pixel average as the positioning result of the key point in the object in the image frame. According to the object part category and the positioning result, it can determine the positioning result of the key point in the image frame.
  • the object skeleton of the object can be used as the object pose detection result corresponding to the object in the image frame.
  • Figure 8 is a schematic structural diagram of an object detection model provided by an embodiment of the present application.
  • the computer device obtains the image frame 60a, it can input the image frame 60a to the object detection model, and use the feature extraction component 60b in the object detection model (for example, the feature extraction network can be a convolutional network) to extract the image frame.
  • the feature extraction network can be a convolutional network
  • the object description feature 60c corresponding to the object in the image frame 60a can be obtained, using global average pooling (the number of object description features can be multiple, global average pooling refers to converting an object description feature into an value) and activation function, process the object description feature 60c, and classify the processed results to obtain a second classification result; combine the second classification result with the object output by the last convolution layer in the feature extraction component 60b
  • the convolutional features are weighted to obtain the second activation map.
  • the image frame 60a is segmented based on the second activation map to obtain M object part area images 60f.
  • the M object part area images 60f are sequentially input to the feature extraction component 60b in the object detection model.
  • the part description features 60g respectively corresponding to the M target part area images 60f can be obtained.
  • the M part description features 60g are combined with the object description features 60c of the image frame 60a to obtain object posture features; by identifying the object posture features, a first classification result 60d can be obtained, and the first classification result 60d and feature extraction
  • the object convolution features output by the last convolutional layer in component 60b are weighted to obtain the first activation map 60e.
  • the pixel average value of the first activation map 60e can be used as the positioning result of the object in the image frame 60a, and thereby the object posture detection result corresponding to the object in the image frame 60a is obtained.
  • FIG. 9 is a schematic flowchart of obtaining an object posture detection result according to an embodiment of the present application.
  • the computer device can input the image frame 70a into the human body three-dimensional posture estimation model, and the object can be obtained through the human body three-dimensional posture estimation model (the object at this time is the human body ) three-dimensional key points of the human body in the image frame 70a.
  • each human body three-dimensional key point can correspond to a position coordinate and a first A degree of confidence. Based on the first degree of confidence, the possibility that the detected three-dimensional key points of the human body can be determined to be the key points of the real human body; if the first degree of confidence is greater than the first confidence threshold (which can be set according to actual needs), the possibility of the three-dimensional key points of the human body is It can be considered as the real human body key points (for example, the three-dimensional key points of the human body represented by x4 to x16).
  • the human body posture 70c By connecting the key points of the real human body, the human body posture 70c can be obtained (which can also be considered as the object posture detection result).
  • the three-dimensional key points of the human body whose first confidence level is less than or equal to the first confidence threshold are abnormal key points, and these abnormal key points can be compensated in subsequent processing to obtain more accurate human body key points.
  • the position coordinates of the three-dimensional key points of the human body can refer to The spatial coordinates in this spatial coordinate system.
  • Step S204 The image frame is input to the part detection model, and the first object part of the object in the image frame is detected in the part detection model.
  • the computer device can also input the image frame to the part detection model, in which the part detection model first detects whether the image frame contains the first object part of the object.
  • the part detection model can be used to detect the key points of the first object part. Therefore, it is necessary to detect the first object part in the image frame. If the first object part of the object is not detected in the image frame, the image frame can be directly determined.
  • the corresponding part posture detection result is a null value, and there is no need to perform subsequent steps of detecting key points of the first object part.
  • Step S205 if the first object part is detected in the image frame, obtain a region image containing the first object part from the image frame, obtain the key point position of the part corresponding to the first object part according to the region image, and obtain the key point position of the part corresponding to the first object part based on the region image. Determine the part pose detection result corresponding to the image frame.
  • the position area of the first object part in the image frame can be determined, and based on the position area of the first object part in the image frame, the image frame is trimmed to obtain An image of the area containing the first object part.
  • features can be extracted from the regional image to obtain the part contour features corresponding to the first object part in the regional image.
  • the position of the key point of the part corresponding to the first object part can be predicted; based on the position of the key point of the part , the key points of the first object part can be connected to obtain the part posture detection result corresponding to the image frame.
  • FIG. 10 is a schematic flowchart of obtaining part posture detection results according to an embodiment of the present application.
  • the computer device can input the image frame 80a into the palm three-dimensional posture estimation model.
  • the palm three-dimensional posture estimation model it can detect whether the image frame 80a contains an object. of the palm (the first object part).
  • the palm is not detected in the image frame 80a, it can be determined that the part posture detection result corresponding to the image frame 80a is null; if the palm is detected in the image frame 80a, then it can be determined in the image frame 80a A region containing the palm is determined in frame 80a (for example, region 80c and region 80d in image 80b, region 80c contains the subject's right palm, and region 80d contains the subject's left palm), and the region 80c can be detected through the palm three-dimensional pose estimation model The three-dimensional key points of the palm in area 80d.
  • multiple possible areas can be obtained through the palm three-dimensional posture estimation model, and a second confidence level of the palm is predicted in each possible area, and the second confidence level is greater than the second confidence threshold (which can be compared with the aforementioned first confidence threshold). (the same, or may be different, not limited here) is determined as the area including the palm, for example, the second confidence levels corresponding to the area 80c and the area 80d are both greater than the second confidence threshold.
  • the right palm pose 80e can be obtained by connecting the palm key points detected in the area 80c
  • the left palm pose 80f can be obtained by connecting the palm key points detected in the area 80d.
  • the above left palm posture 80f and right palm posture 80e can be called the part postures corresponding to the image frame 80a. Test results.
  • Step S206 Obtain the standard posture associated with the object, determine the number of first key points corresponding to the standard posture, and determine the number of second key points corresponding to the object posture detection result.
  • the computer device can obtain the standard posture corresponding to the object, and count the number of first key points of the object's key points contained in the standard posture, and the number of second key points of the object's key points contained in the object posture detection result.
  • the first number of key points is known when constructing the standard pose
  • the second number of key points is the number of object key points predicted by the object detection model.
  • Step S207 When the number of first key points is greater than the number of second key points, the object posture detection result is interpolated according to the standard posture to obtain the first candidate object posture.
  • the number of first key points is greater than the number of second key points, it means that there are missing object key points in the object posture detection results, and key point compensation (interpolation processing) can be performed on the object posture detection results through standard postures to improve the Missing object key points to obtain the first candidate object pose corresponding to the object.
  • key point compensation interpolation processing
  • the human body posture 20m can be obtained.
  • the human body posture 20m at this time can be called the first candidate object posture.
  • the object posture detection results predicted by the object detection model are missing key points such as knees, ankles, feet, elbows, etc.
  • the above object posture detection results can be interpolated through standard postures Processing, such as adding missing object keypoints, to obtain a more reasonable first candidate object pose. Interpolating the object pose detection results through standard poses can improve the integrity and rationality of the object pose.
  • Step S208 Interpolate the object parts associated with the first object part in the first candidate object pose according to the part pose detection results to obtain the global pose corresponding to the object.
  • the posture change of the object depends to a large extent on a few parts of the object, that is, some specific parts of the object (for example, the arm part in the human body structure, the arm part can include Key points on the palm, wrist, elbow, etc.) play an important role in the final result; therefore, in the embodiment of the present application, the first candidate object pose associated with the first object part can be determined based on the part pose detection results.
  • the object parts are interpolated to obtain the corresponding global posture of the object.
  • the part pose detection result is a null value (that is, the image frame does not contain the first object part)
  • the first candidate object pose may be directly determined as the global pose corresponding to the object.
  • the key point for the elbow part can be predicted through the object detection model.
  • the key points of the elbow and wrist of the object can be determined based on the part pose detection results, and the key points of the elbow can be and wrist keypoints added to the first candidate pose In the state, the global posture corresponding to the object can be obtained.
  • the object includes a second object part and a third object part, and the second object part and the third object part are relatively symmetrical.
  • the second object part is the subject's right arm
  • the third object part is the object.
  • the left arm of the subject; the second object part is the subject's left leg, then the third object part is the subject's right leg, etc.
  • the part pose detection result includes all the key points of the first object part (if the first object part is the palm, it is assumed here that the part pose detection result includes the key points of the left and right palms)
  • the object pose detection result includes the second object
  • the posture of the part, and the object posture detection result does not include the posture of the third object part, that is, the image frame contains the second object part, but does not contain the third object part, then the first object included in the part posture detection result can be
  • the key point position of the part determines the direction of the first part corresponding to the third object part; where the second object part and the third object part are symmetric parts of the object.
  • the length of the second object part is the same as the length of the third object part, so the length of the first part of the second object part in the first candidate object posture can be obtained,
  • the key point position of the third object part is determined; and the key point position of the third object part is added to the first candidate object pose to obtain the global pose corresponding to the object in the image frame.
  • the object pose detection result can be The key point position of the first object part determines the direction of the second part corresponding to the second object part, and the direction of the third part corresponding to the third object part; then the second object part can be obtained in the i-1th image frame
  • the length of the corresponding second part, and the length of the third part corresponding to the third object part in other words, the length of the second object part in the previous image frame can be used as the length of the second object part in the image frame, and the length of the third object part in the image frame
  • the image of the object part in the previous image frame is used as the length of the third object part in the image frame.
  • the key point position of the second object part can be determined according to the length of the second part and the direction of the second part; the key point position of the third object part can be determined according to the length of the third part and the direction of the third part, and the second object part can be The key point position and the key point position of the third object part are added to the first candidate object pose to obtain the global pose corresponding to the object in the image frame. If the i-1th image frame also does not contain the second object part and the third object part, you can continue to look back and obtain the second object part and the third object part respectively in the i-2th image frame. length to determine the key point positions of the second object part and the third object part respectively in the image frame.
  • an approximate length may be set for the second object part and the third object part according to the first candidate object posture to determine the third object part.
  • the key point positions of the second object part and the third object part in the image frame respectively.
  • the object is a human body
  • the first object part is the palm
  • the second object part and the third object part are the left and right arms respectively
  • the key points of the left palm can be calculated Get the direction of the left arm, and calculate the direction of the right arm through the key points of the right palm.
  • the left arm is part of the left arm
  • the right arm is part of the right arm.
  • the lengths of the left and right small arms (the second part length and the second part length) in the image frame before the image frame (for example, the i-1th image frame) can be used as The length of the left and right small arms in the image frame. If the left and right arms are not detected in the image frame or the previous image frame, the reference length of the left and right small arms in the image frame can be given by referring to the shoulder length in the image frame. If any one of the left and right arms (for example, the left arm) is detected in the image frame, the length of the left small arm (the length of the first part) can be directly assigned to the right small arm.
  • right wrist point A, right palm point B, and right elbow point C are missing;
  • the direction of the right forearm can be expressed as the direction from right palm point B to right wrist point A, which can be recorded as vector BA;
  • the length of the left forearm It can be expressed as the length from right wrist point A to right elbow point C, which can be recorded as L;
  • the elbow points predicted by the object detection model can be adjusted and updated based on the detected palm key points, which can improve the accuracy of the elbow points, and then Improve the rationality of the overall posture.
  • the posture obtained by interpolating the first candidate object posture based on the part posture detection results may have some unreasonable object key points. Therefore, the unreasonable object key points can be corrected in combination with the standard posture. , to obtain the final global pose of the object.
  • the computer device can determine the first candidate object pose with the key point position of the third object part added as the second candidate object pose; and then the standard pose can be obtained The attitude offset between the attitude of the second candidate object and the second candidate object. If the attitude offset is greater than the offset threshold (which can be understood as the maximum angle at which the object can offset under normal circumstances), then the second candidate object is determined based on the standard attitude.
  • the offset threshold which can be understood as the maximum angle at which the object can offset under normal circumstances
  • the posture is corrected by key points to obtain the global posture corresponding to the object in the image frame.
  • the above-mentioned posture offset can be understood as the relative angle between the posture of the second candidate object and the standard posture.
  • the posture offset can be the shoulder of the second candidate object posture and the shoulder of the standard posture. The angle between them, etc.
  • Figure 11 is a schematic diagram of correction of object key points provided by an embodiment of the present application.
  • a human body model 90b can be constructed based on the second candidate object posture; limited by the performance of the object detection model, the human body model 90b Compared with the normal human body structure (for example, the standard posture), the area 90c (for example, the shoulder area) in the model 90b obviously has a collapse problem, such as the angle between the shoulders of the first candidate object posture and the shoulders of the standard posture. greater than the offset threshold.
  • the computer equipment can correct the human body model 90c through the standard posture to obtain the human body model 90d; the human body model 90d is The area 90e can be considered as the result of correcting the area 90c, and the human posture corresponding to the human body model 90d can be called the global posture corresponding to the object in the image frame 90a.
  • Video data captured in mobile scenarios usually cannot contain the entire object.
  • the posture of the object predicted by the object detection model is incomplete.
  • key point interpolation key point correction and other processes, the reasonableness of the global posture can be improved. property; through the part pose detection results, the object key point position associated with the first object part can be calculated, which can improve the accuracy of the global pose.
  • Figure 12 is a schematic flowchart of an object pose estimation provided by an embodiment of the present application. As shown in Figure 12, assuming that the object is a human body, after the computer device obtains the video data or image data captured in the mobile scene, it can obtain a human body three-dimensional pose estimation model (object detection model) with confidence and a confidence level. degree of palm three-dimensional pose estimation model (part detection model).
  • object detection model object detection model
  • confidence level confidence level
  • degree of palm three-dimensional pose estimation model part detection model
  • the three-dimensional human body pose estimation model can be used to predict the human body's three-dimensional key points in any image frame (image frame), and these human body three-dimensional key points can constitute the object pose detection result;
  • the palm three-dimensional pose estimation model can be used to predict any image frame ( The three-dimensional key points of the palm in the image frame), these three-dimensional key points of the palm can constitute the part pose detection result.
  • the three-dimensional key points of the human body predicted by the human body's three-dimensional posture estimation model can be interpolated to complete the missing human body key points; the three-dimensional key points of the human body can also be combined with the three-dimensional key points of the human body to calculate the human body (object )'s elbow and wrist are interpolated to obtain the candidate body posture (the above-mentioned second candidate object posture).
  • the user's video collection may be involved.
  • the user's permission or consent needs to be obtained, and the collection of relevant data , use and processing need to comply with relevant laws, regulations and standards of relevant countries and regions.
  • the object pose detection results for the object and the part pose detection results for the first object part of the object can be obtained, and then the object pose detection results can be obtained
  • the posture estimation of the object in the image frame can be used to compensate for the missing key points of the object in the image frame and correct the key points of the object that do not meet the standard posture. This ensures the integrity and rationality of the final global pose of the object, thereby improving the estimation accuracy of the global pose.
  • Figure 13 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • the data processing device 1 may include: a posture detection module 11, a posture estimation module 12;
  • the posture detection module 11 is used to obtain the object posture detection results corresponding to the objects in the image frame and the The part posture detection result corresponding to the first object part of the object; wherein, at least one object part of the object is missing from the object posture detection result, and the first object part is one or more parts of the object ;
  • the posture estimation module 12 is configured to perform interpolation processing on at least one object part missing in the object posture detection result according to the part posture detection result and the standard posture associated with the object, to obtain the global posture corresponding to the object, wherein, the The global posture is used to control the computer device to implement the business function corresponding to the global posture.
  • the object pose detection results for the object and the part pose detection results for the first object part of the object can be obtained, and then The pose of the object in the image frame can be estimated based on the object pose detection results, part pose detection results and standard poses, and the missing key points of the object in the image frame can be compensated to ensure that the final global pose of the object is complete. sex and rationality, which can improve the accuracy of global attitude estimation.
  • Figure 14 is a schematic structural diagram of another data processing device provided by an embodiment of the present application.
  • the data processing device 2 includes: a posture detection module 21, a posture estimation module 22, and a virtual object construction module 23;
  • the posture detection module 21 is used to obtain the object posture detection result corresponding to the object in the image frame and the part posture detection result corresponding to the first object part of the object in the image frame; wherein the object posture detection result is missing the At least one object part of the object, the first object part is one or more parts of the object;
  • the posture estimation module 22 is configured to interpolate at least one object part missing in the object posture detection result according to the part posture detection result and the standard posture associated with the object, to obtain the global posture corresponding to the object.
  • the virtual object construction module 23 is used to construct a virtual object associated with the object, and control the posture of the virtual object according to the global posture.
  • the specific function implementation methods of the posture detection module 21, the posture estimation module 22, and the virtual object construction module 23 can be found in the description of the above-mentioned relevant steps, and will not be described again here.
  • the gesture detection module 21 includes: an object detection unit 211, a part detection unit 212;
  • the object detection unit 211 is used to input the image frame to the object detection model, and obtain the object posture detection result through the object detection model;
  • the part detection unit 212 is used to input the image frame to the part detection model, and obtain the part detection model through the part detection model.
  • the posture detection results of the above parts are used to input the image frame to the part detection model, and obtain the part detection model through the part detection model.
  • step S101 for the specific functional implementation of the object detection unit 211 and the part detection unit 212, please refer to step S101 in the corresponding embodiment of FIG. 3, which will not be described again here.
  • the object detection unit 211 may include: a part classification subunit 2111, a part map generation subunit 2112, a positioning result determination subunit 2113, and a detection result determination subunit 2114;
  • Part classification subunit 2111 is used to input the image frame to the object detection model, obtain the object posture characteristics corresponding to the object in the image frame through the object detection model, and identify the first classification result corresponding to the object posture characteristics; the first classification result is used The object part category corresponding to the key point characterizing the object;
  • Part map generation subunit 2112 configured to generate a first activation map according to the first classification result and the object convolution feature of the image frame output by the object detection model
  • the positioning result determination subunit 2113 is used to obtain the pixel average value corresponding to the first activation map, and determine the positioning result of the key point in the object in the image frame based on the pixel average value;
  • the detection result determination subunit 2114 is used to determine the object posture detection result corresponding to the image frame according to the object part category and positioning result.
  • the specific functional implementation of the part classification subunit 2111, the part map generation subunit 2112, the positioning result determination subunit 2113, and the detection result determination subunit 2114 can be seen in steps S201 to S203 in the corresponding embodiment of Figure 7. No further details will be given here.
  • the part classification subunit 2111 includes: a global classification subunit 21111, a global map acquisition subunit 21112, a block processing subunit 21113, and a feature combination subunit 21114;
  • the global classification subunit 21111 is used to obtain the object description features corresponding to the objects in the image frame in the object detection model, and output the second classification result corresponding to the object description features according to the classifier in the object detection model;
  • the global map acquisition subunit 21112 is used to obtain the object convolution features output by the convolution layer in the object detection model for the image frame, perform a product operation on the second classification result and the object convolution feature, and obtain the image frame corresponding to second activation map;
  • Block processing subunit 21113 is used to perform block processing on the image frame according to the second activation map to obtain M object part area images, and obtain the part description features corresponding to the M object part area images according to the object detection model; M is a positive integer;
  • the feature combination subunit 21114 is used to combine the object description features and the part description features corresponding to the M object part area images into object posture features.
  • the part detection unit 212 may include: an object part detection subunit 2121, a part pose estimation subunit 2122, and a null value determination subunit 2123;
  • the object part detection subunit 2121 is used to input the image frame to the part detection model, and detect the first object part of the object in the image frame in the part detection model;
  • the part pose estimation subunit 2122 is used to obtain a region image containing the first object part from the image frame if the first object part is detected in the image frame, and obtain the part key point position corresponding to the first object part according to the region image. , determine the part pose detection result corresponding to the image frame based on the position of the key point of the part;
  • the null value determination subunit 2123 is used to determine that the part posture detection result corresponding to the image frame is a null value if the first object part is not detected in the image frame.
  • the part pose estimation subunit 2122 may include: an image cropping subunit 21221, a part key point determination subunit 21222, and a part key point connection subunit 21223;
  • Image clipping subunit 21221 used to clip the image frame if the first object part is detected in the image frame to obtain a region image containing the first object part
  • the part key point determination subunit 21222 is used to obtain the part contour features corresponding to the regional image, and predict the part key point position corresponding to the first object part according to the part contour features;
  • the part key point connection subunit 21223 is used to connect the key points of the first object part based on the part key point positions to obtain the part posture detection result corresponding to the image frame.
  • step S205 the specific functional implementation of the image clipping sub-unit 21221, the part key point determination sub-unit 21222, and the part key point connection sub-unit 21223 can be found in step S205 in the corresponding embodiment of Figure 7, and will not be described again here.
  • the posture estimation module 22 includes: a key point number determination unit 221, a first interpolation processing unit 222, and a second interpolation processing unit 223;
  • the key point number determination unit 221 is used to obtain the standard posture associated with the object, determine the first key point number corresponding to the standard posture, and the second key point number corresponding to the object posture detection result;
  • the first interpolation processing unit 222 is configured to perform interpolation processing on at least one object part missing in the object posture detection result according to the standard posture to obtain the first candidate object posture if the number of first key points is greater than the number of second key points;
  • the second interpolation processing unit 223 is configured to compare the first candidate object pose with the first candidate object pose according to the part pose detection result.
  • the object parts associated with the object parts are interpolated to obtain the global posture corresponding to the object.
  • the specific functional implementation of the key point quantity statistics unit 221, the first interpolation processing unit 222, and the second interpolation processing unit 223 can be referred to steps S206 to S208 in the corresponding embodiment of FIG. 7, and will not be described again here.
  • the second interpolation processing unit 223 may include: a first direction determination subunit 2231, a first position determination subunit 2232, and a first key point adding subunit 2233;
  • the first direction determination subunit 2231 is configured to determine the first object position included in the part posture detection result if the object posture detection result includes the posture of the second object part, and the object posture detection result does not include the posture of the third object part.
  • the key point position of the part determines the direction of the first part corresponding to the third object part; the second object part and the third object part are symmetrical parts of the object, and the second object part and the third object part are consistent with the first object part Associated;
  • the first position determination subunit 2232 is used to obtain the first part length of the second object part in the first candidate object posture, and determine the key point position of the third object part according to the first part length and the first part direction;
  • the first key point adding sub-unit 2233 is used to add the key point position of the third object part to the first candidate object pose to obtain the global pose corresponding to the object in the image frame.
  • the first key point adding subunit 2233 is specifically used for:
  • posture offset is greater than the offset threshold, key point correction is performed on the posture of the second candidate object based on the standard posture to obtain the global posture corresponding to the object in the image frame.
  • the image frame is the i-th image frame in the video data, and i is a positive integer;
  • the second interpolation processing unit 223 may also include: a second direction determination subunit 2234, a second position determination subunit 2235, The second key point adds subunit 2236;
  • the second direction determination subunit 2234 is configured to determine the third object part according to the key point position of the first object part included in the part posture detection result if the object posture detection result does not include the postures of the second object part and the third object part.
  • the direction of the second part corresponding to the two object parts, and the direction of the third part corresponding to the third object part; the second object part and the third object part are symmetrical parts of the object, and the second object part and the third object part are associated with the first object part;
  • the second position determination subunit 2235 is used to obtain the second part length corresponding to the second object part and the third part length corresponding to the third object part in the j-th image frame. According to the second part length and the second part length, Part direction, determine the key point position of the second object part; where j is a positive integer and j is less than i;
  • the second key point adding subunit 2236 is used to determine the key point position of the third object part according to the length of the third part and the direction of the third object part, and combine the key point position of the second object part and the key point position of the third object part. Added to the first candidate object pose, the global pose corresponding to the object in the image frame is obtained.
  • the first direction determining subunit 2231, the first position determining subunit 2232, the first key point adding subunit 2233, the second direction determining subunit 2234, the second position determining subunit 2235, and the second key point adding subunit For the specific function implementation of 2236, please refer to step S208 in the corresponding embodiment of Figure 7, which will not be described again here.
  • the second direction determining subunit 2234, the second position determining subunit 2235, and the second key point adding sub-unit 2236 all suspends execution of operations; when the second direction determining sub-unit 2234, the second position determining sub-unit 2235, and the second key point adding sub-unit 2236 perform corresponding operations, the first direction determining sub-unit 2236 Unit 2231, the first position determination subunit 2232, and the first key point adding subunit 2233 all suspend execution operations.
  • the object pose detection results for the object and the part pose detection results for the first object part of the object can be obtained, and then the object pose detection results can be obtained
  • the posture estimation of the object in the image frame can be used to compensate for the missing key points of the object in the image frame and correct the key points of the object that do not meet the standard posture. This ensures the integrity and rationality of the final global pose of the object, thereby improving the estimation accuracy of the global pose.
  • Figure 15 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the computer device 1000 can be a user terminal, for example, the user terminal 10a in the embodiment corresponding to the above Figure 1, or a server, for example, the server 10d in the embodiment corresponding to the above Figure 1, where There will be no restrictions on it.
  • this application takes a computer device as a user terminal as an example.
  • the computer device 1000 may include: a processor 1001, a network interface 1004, and a memory 1005.
  • the computer device 1000 may also include: a user interface 1003, and at least one Communication bus 1002. Among them, the communication bus 1002 is used to realize connection communication between these components.
  • the user interface 1003 may also include standard wired interfaces and wireless interfaces.
  • the network interface 1004 may include standard wired interfaces and wireless interfaces (such as WI-FI interfaces).
  • the memory 1005 may be a high-speed RAM memory or a non-volatile memory, such as at least one disk memory.
  • the memory 1005 may optionally be at least one storage device located remotely from the aforementioned processor 1001.
  • memory 1005, which is a computer-readable storage medium may include an operating system, a network communication module, a user interface module, and a device control application program.
  • the network interface 1004 in the computer device 1000 can also provide network communication functions
  • the optional user interface 1003 can also include a display screen (Display) and a keyboard (Keyboard).
  • the network interface 1004 can provide network communication functions
  • the user interface 1003 is mainly used to provide an input interface for the user
  • the processor 1001 can be used to call the device control application stored in the memory 1005 to achieve:
  • At least one object part missing in the object posture detection result is interpolated to obtain the global posture corresponding to the object, wherein the global posture is used to control the computer color To achieve the business functions corresponding to the global posture.
  • the computer device 1000 described in the embodiment of the present application can execute the data processing method described in any of the embodiments shown in FIG. 3 and FIG. 7, and can also execute the data processing device in the embodiment corresponding to FIG. 13.
  • the description of 1 can also be carried out with the description of the data processing device 2 in the embodiment corresponding to Figure 14, which will not be described again here.
  • the description of the beneficial effects of using the same method will not be described again.
  • embodiments of the present application also provide a computer-readable storage medium, and the computer-readable storage medium stores a computer program, and the computer program includes computer instructions.
  • the processor executes the computer instructions.
  • the description of the data processing method in any of the previous embodiments in FIG. 3 and FIG. 7 can be performed, and therefore will not be described again here.
  • the description of the beneficial effects of using the same method will not be described again.
  • computer instructions may be deployed for execution on one computing device, or on multiple computing devices located at one location, or on multiple computing devices distributed across multiple locations and interconnected by a communications network.
  • multiple computing devices distributed in multiple locations and interconnected through communication networks can form a blockchain system.
  • embodiments of the present application also provide a computer program product or computer program.
  • the computer program product or computer program may include computer instructions, and the computer instructions may be stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor can execute the computer instructions, so that the computer device executes the description of the data processing method in any one of the embodiments in Figure 3 and Figure 7. Therefore, , which will not be described in detail here.
  • the description of the beneficial effects of using the same method will not be described again.
  • Modules in the device of the embodiment of the present application can be merged, divided, and deleted according to actual needs.
  • the computer program can be stored in a computer-readable storage medium, and the program can be executed when executed. When doing so, it may include the processes of the above method embodiments.
  • the storage medium can be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM) or a random access memory (Random Access Memory, RAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

一种数据处理方法、装置、设备以及介质,可以应用于人工智能、辅助驾驶等领域。方法包括:获取图像帧中的对象对应的对象姿态检测结果和图像帧中的对象的第一对象部位对应的部位姿态检测结果;其中,对象姿态检测结果中缺失对象的至少一个对象部位,第一对象部位为对象的一个或多个部位;根据部位姿态检测结果以及与对象相关联的标准姿态,对对象姿态检测结果中缺失的至少一个对象部位进行插值处理,得到对象对应的全局姿态,其中,全局姿态用于操控计算机设备以实现与全局姿态对应的业务功能。

Description

数据处理方法、装置、设备以及介质
本申请要求于2022年3月31日提交中国专利局、申请号为2022103327630,发明名称为“数据处理方法、装置、设备以及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种数据处理方法、装置、设备以及介质。
背景技术
计算机视觉技术(Computer Vision,CV)是一门研究如何使机器“看”的科学,更进一步的说,就是指用摄影机和电脑代替人眼对进行识别、测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取信息的人工智能系统。
姿态估计可以检测出图片或者视频中各个关键点的位置,在电影动画、辅助驾驶、虚拟现实、动作识别等领域具有十分广泛的应用价值。
目前的姿态估计算法中,可以通过对图像或视频进行关键点检测,基于检测到的关键点以及对象约束关系,构建最终的对象姿态。
技术内容
本申请实施例提供一种数据处理方法、装置、设备以及介质,可以提升对象姿态的估计准确性。
本申请实施例提供了一种数据处理方法,由计算机设备执行,包括:
获取图像帧中的对象对应的对象姿态检测结果,以及图像帧中所述对象的第一对象部位对应的部位姿态检测结果;其中,所述对象姿态检测结果中缺失所述对象的至少一个对象部位,所述第一对象部位为所述对象的一个或多个部位;
根据部位姿态检测结果以及与对象相关联的标准姿态,对所述对象姿态检测结果中缺失的至少一个对象部位进行插值处理,得到对象对应的全局姿态,其中,所述全局姿态用于操控计算机设备以实现与所述全局姿态对应的业务功能。
本申请实施例还提供了一种数据处理装置,包括:
姿态检测模块,用于获取图像帧中的对象对应的对象姿态检测结果,以及图像帧中所述对象的第一对象部位对应的部位姿态检测结果;其中,所述对象姿态检测结果中缺失所述对象的至少一个对象部位,所述第一对象部位为所述对象的一个或多个部位;
姿态估计模块,用于根据部位姿态检测结果以及与对象相关联的标准姿态,对所述对象姿态检测结果中缺失的至少一个对象部位进行插值处理,得到对象对应的全局姿态,其中,所述全局姿态用于操控计算机设备以实现与所述全局姿态对应的业务功能。
本申请实施例还提供了一种计算机设备,包括存储器和处理器,存储器与处理器相连,存储器用于存储计算机程序,处理器用于调用计算机程序,以使得该计算机设备执行本申请实施例中上述的方法。
本申请实施例还提供了一种计算机可读存储介质,计算机可读存储介质中存储有计算机程序,计算机程序适于由处理器加载并执行,以使得具有处理器的计算机设备执行本申请实施例中上述的方法。
本申请实施例还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述的方法。
附图简要说明
图1是本申请实施例提供的一种网络架构的结构示意图;
图2是本申请实施例提供的一种视频数据的对象姿态估计场景示意图;
图3是本申请实施例提供的一种数据处理方法的流程示意图;
图4是本申请实施例提供的一种标准姿态的示意图;
图5是本申请实施例提供的一种对象姿态估计的场景示意图;
图6是本申请实施例提供的一种全局姿态的应用场景示意图;
图7是本申请实施例提供的另一种数据处理方法的流程示意图;
图8是本申请实施例提供的一种对象检测模型的结构示意图;
图9是本申请实施例提供的一种获取对象姿态检测结果的流程示意图;
图10是本申请实施例提供的一种获取部位姿态检测结果的流程示意图;
图11是本申请实施例提供的一种对象关键点的矫正示意图;
图12是本申请实施例提供的一种对象姿态估计的流程示意图;
图13是本申请实施例提供的一种数据处理装置的结构示意图;
图14是本申请实施例提供的另一种数据处理装置的结构示意图;
图15是本申请实施例提供的一种计算机设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请涉及计算机视觉技术下属的姿态估计(Pose Estimation),姿态估计是计算机视觉中的一个重要任务,也是计算机理解对象动作、行为必不可少的一步;姿态估计可以转换为对对象关键点的预测问题,如可以预测出图像中的各个对象关键点的位置坐标,并根据各个对象关键点之间的位置关系,预测出图像中的对象骨架。其中,本申请所涉及的姿态估计可以包括针对对象的对象姿态估计,以及针对对象的特定部位的部位姿态估计等,对象可以包括但不限于:人体、动物、植物等,对象的特定部位可以为手掌、脸部、动物肢体、植物根部等,本申请对对象的类型不做限定。
当图像或视频为移动端场景下的拍摄画面时,图像或视频画面可能只能包含对象的一部分部位,那么在对其进行姿态估计的过程中,由于缺少对象中的一些部位,造成提取到的部位信息不足,导致最终的对象姿态结果并不是该对象的完整姿态,影响了对象姿态的完整性。
本申请实施例中,通过对图像帧中的对象分别进行对象姿态估计和特定部位姿态估计,可以得到针对对象的对象姿态检测结果,以及针对对象的第一对象部位的部位姿态检测结果,进而可以基于对象姿态检测结果、部位姿态检测结果以及标准姿态,对图像帧中的对象进行姿态估计,可以对图像帧中对象缺少的部位关键点进行补偿,可以确保最终得到的对象的全局姿态的完整性和合理性,进而可以提高全局姿态的估计准确性。
请参见图1,图1是本申请实施例提供的一种网络架构的结构示意图。如图1所示,该网络架构可以包括服务器10d和用户终端集群,该用户终端集群可以包括一个或者多个用户终端,这里不对用户终端的数量进行限制。如图1所示,该用户终端集群可以具体包括用户终端10a、用户终端10b以及用户终端10c等。
其中,服务器10d可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN、以及大数据和人工智能平 台等基础云计算服务的云服务器。
用户终端10a、用户终端10b以及用户终端10c等均可以包括:智能手机、平板电脑、笔记本电脑、掌上电脑、移动互联网设备(mobile internet device,MID)、可穿戴设备(例如智能手表、智能手环等)、智能语音交互设备、智能家电(例如智能电视等)、车载设备等具有对象姿态估计功能的电子设备。如图1所示,用户终端10a、用户终端10b以及用户终端10c等可以分别与服务器10d进行网络连接,以便于每个用户终端可以通过该网络连接与服务器10d之间进行数据交互。
如图1所示的用户终端集群中的用户终端(例如,用户终端10a)集成有具备对象姿态估计功能的应用客户端,该应用客户端可以包括但不限于:多媒体客户端(例如,短视频客户端、视频直播客户端、视频客户端)、对象管理应用(例如,病患护理客户端)。用户终端10a中的应用客户端可以获取视频数据,该视频数据可以是指移动端场景下所拍摄的对象的视频,如采用用户终端10a中集成的相机拍摄对象以得到视频数据,或者采用与用户终端10a相连的摄像设备(例如,单反、摄像头等)拍摄对象以得到视频数据。需要说明的是,在移动端场景下(例如,自拍场景),由于拍摄距离以及拍摄设备的限制,视频数据中的画面可能只可以包含对象的一部分,如对象为人体时,视频数据中的画面可能只包含人体的上半身,或者只包含人体的头部等;在对视频数据中的对象进行对象姿态估计时,需要对视频数据中包含的对象进行姿态修补,以得到该对象对应的全局姿态,在保证对象的全局姿态的完整性的前提下,还可以提高全局姿态的准确性。在本申请实施例中,全局姿态也可以称为完整姿态,是指包含对象所有部分的姿态,即完整的对象所对应的姿态。
需要说明的是,本申请实施例所涉及的对象姿态估计过程可以由计算机设备执行,该计算机设备可以为图1所示用户终端集群中的用户终端,或者为图1所示的服务器10d;总而言之,计算机设备可以为用户终端,或者为服务器,或者为服务器和用户终端构成的组合设备,本申请对此不做限定。
请参见图2,图2是本申请实施例提供的一种视频数据的对象姿态估计场景示意图。以图1所示的用户终端10a为例,对视频中的对象姿态估计过程进行描述;如图2所示,用户终端10a可以获取视频数据20a,该视频数据20a可以是通过用户终端10a中集成的相机所拍摄的对象的视频,或者为其余设备传输至用户终端10a的关于对象的视频等;通过对视频数据20a进行分帧处理,得到N个图像帧,N为正整数,如N可以取值为1,2,……。可以按照时间顺序从N个图像帧中获取第一个图像帧(即图像帧T1),并将该图像帧T1输入至对象检测模型20b,通过该对象检测模型20b对图像帧T1进行对象检测, 得到该图像帧T1对应的对象姿态检测结果20c;该对象姿态检测结果20c可以包括图像帧T1所包含的对象的关键点(为方便描述,下面将对象的关键点称为对象关键点),以及这些对象关键点在该图像帧T1中的位置;对象姿态检测结果20c还可以包含检测得到的每个对象关键点分别对应的第一置信度,该第一置信度可以用于表征检测到的对象关键点的预测准确性,第一置信度越大,表示检测到的对象关键点越准确,越有可能是对象的真实关键点。
例如,当视频数据20a中的对象为人体时,对象所对应的对象关键点可以认为是人体结构中的关节点,对象的关键点数量和关键点类别是可以预先定义的,如人体结构可以包括四肢、脑部、腰部、胸部等部位的多个对象关键点。当图像帧T1中包含完整的对象时,该图像帧T1中可能包含该对象的所有对象关键点;当图像帧T1中只包含对象的部分结构时,该图像帧T1中可以包含对象的部分对象关键点。在检测到图像帧T1中所包含对象关键点后,可以按照对象的关键点类别和关键点位置对检测到的对象关键点进行连接,并在图像帧T1中标记连接后的结果,即对象姿态检测结果20c。其中,对象检测模型20b可以为预先训练完成的网络模型,具备针对视频/图像的对象检测功能;当对象为人体时,该对象检测模型20b也可以称为人体姿态估计模型。
通过对象姿态检测结果20c可以得到对象在图像帧T1中的人体姿态20j,由于人体姿态20j缺失了一些对象关键点(缺少人体关节点),因此用户终端10a可以获取对象对应的标准姿态20k,基于该标准姿态20k,可以对人体姿态20j进行关键点补偿,得到图像帧T1中的对象所对应的人体姿态20m。其中,标准姿态20k也可以认为是对象的默认姿态,或者称为参考姿态,该标准姿态20k可以基于对象的所有对象关键点进行预先构建,如可以将人体在正常站立时的姿态(例如,全局姿态)确定为标准姿态20k。
该图像帧T1还可以输入至部位检测模型20d,通过该部位检测模型20d对图像帧T1中的对象的特定部位(例如,第一对象部位)进行检测,得到该图像帧T1对应的部位姿态检测结果20e。当检测到图像帧T1中不存在对象的第一对象部位时,可以确定该图像帧T1的部位姿态检测结果为空;当检测到图像帧T1中存在对象的第一对象部位,则可以继续检测该第一对象部位的关键点以及关键点的位置,并按照第一对象部位的关键点类别和关键点位置,可以对检测到的第一对象部位关键点进行连接,并在图像帧T1中标记连接后的结果,即部位姿态检测结果20e。其中,第一对象部位所对应的关键点数量和关键点类别同样是可以预先定义的;当对象为人体时,该部位检测模型20d可以为手掌姿态估计模型(此处第一对象部位为手掌),如手掌可以包括手心关键点和手指关键点;部位检测模型20d可以为预先训练完成的网络模型,具备针对视频/图像的对象部位检测 功能,为方便描述,下面将第一对象部位的关键点称为部位关键点。
如图2所示,部位姿态检测结果20e携带第二置信度,该第二置信度可以用于表征检测到的对象部位为第一对象部位的可能性,如通过部位检测模型20d可以确定图像帧T1中的区域20f为第一对象部位的第二置信度为0.01,区域20g为第一对象部位的第二置信度为0.09,区域20h为第一对象部位的第二置信度为0.86,区域20i为第一对象部位的第二置信度为0.84。第二置信度越大,表示该区域为第一对象部位的可能性越大,如可以基于第二置信度可以确定区域20h和区域20i中包含第一对象部位,在区域20h和区域20i中可以标记出第一对象部位的姿态。
进一步地,用户终端10a可以联合对象姿态检测结果20c和部位姿态检测结果20e,对部分缺失的对象部位进行插值处理,通过插值处理得到一个合理的对象关键点。如部位姿态检测结果20e为手掌关键点时,可以联合对象姿态检测结果20c和部位姿态检测结果20e,对图像帧T1中缺失的对象的手腕、手肘等部位进行插值处理,以完善对象的人体姿态20m,得到人体姿态20n(也可以称为全局姿态)。同理,在得到图像帧T1中的对象所对应的全局姿态后,可以采用相同的方式对视频数据20a中的后续图像帧进行对象姿态估计,得到每个图像帧中的对象所对应的全局姿态,基于N个图像帧分别对应的全局姿态,可以得到对象在视频数据20a中的行为。可以理解的是,视频数据20a还可以为实时拍摄的视频,用户终端10a可以对实时拍摄的视频数据中的图像帧进行对象姿态估计,以实时获取对象的行为。
总而言之,针对仅包含部分对象的图像帧,可以通过对象检测模型20b所输出的对象检测结果、部位检测模型20d所输出的部位检测结果,以及标准姿态20m,来估计图像帧中的对象的全局姿态,可以确保最终得到的对象的全局姿态的完整性和合理性,进而可以提高全局姿态的估计准确性。
请参见图3,图3是本申请实施例提供的一种数据处理方法的流程示意图。如图3所示,该数据处理方法可以包括以下步骤S101-步骤S102:
步骤S101,获取与图像帧中的对象对应的对象姿态检测结果,以及与所述对象的第一对象部位对应的部位姿态检测结果;其中,所述对象姿态检测结果中缺失所述对象的至少一个对象部位,所述第一对象部位为所述对象的一个或多个部位。
具体的,计算机设备可以获取移动端场景下拍摄的对象的视频数据(例如,图2所对应实施例中的视频数据20a)或图像数据;在对视频数据或图像数据进行姿态估计时,计算机设备可以通过对图像数据或视频数据中的图像帧进行对象检测,得到针对对象的对象姿态检测结果(例如,上述图2所对应实施例中的对象姿态检测结果20c);与此同 时,还可以对图像帧进行部位检测,得到针对对象的第一对象部位的部位姿态检测结果(例如,上述图2所对应实施例中的部位姿态检测结果20e)。其中,对象可以是指视频数据中所包含的物体,如人体、动物、植物等;第一对象部位可以是指对象中的一个或多个部位,如人体结构中的脸部、手掌,动物结构中的肢体、尾部、头部,植物的根部等,本申请对对象的类型以及第一对象部位的类型都不做限定。需要说明的是,受限于移动端场景下的拍摄设备与被拍摄的对象之间的距离,视频数据或图像数据中的对象可能会存在缺少部位的情况,即对象可能会有一部分对象部位不在视频数据的画面中,通过联合对象姿态检测结果和部位姿态检测结果,可以提高对象的姿态估计准确性。
为方便描述,本申请实施例均以对象是人体为例,对视频数据或图像数据的对象姿态估计过程进行描述。若对移动端场景下的图像数据进行对象姿态估计,则可以将该图像数据作为图像帧;若对移动端场景下的视频数据进行对象姿态估计,则可以对视频数据进行分帧处理,得到该视频数据对应的N个图像帧,N为正整数,进而可以按照N个图像帧在视频数据中的时间顺序,组成包含N个图像帧的图像帧序列,可以对图像帧序列中的N个图像帧依次进行对象姿态估计;例如,可以在完成图像帧序列中第一个图像帧的对象姿态估计后,可以继续对图像帧序列中第二个图像帧进行对象姿态估计,直至完成整个视频数据的对象姿态估计。
其中,计算机设备可以获取对象检测模型和部位检测模型,将图像帧输入至对象检测模型,通过该对象检测模型可以输出图像帧对应的对象姿态检测结果;与此同时,还可以将图像帧输入至部位检测模型,通过该部位检测模型可以输出图像帧对应的部位姿态检测结果。其中,对象检测模型可以用于检测图像帧中的对象的关键点(如人体关键点,也可以称为对象关键点),此时的对象检测模型也可以称为人体姿态估计模型;对象检测模型可以包括但不限于:DensePose(人体实时姿势识别系统,用于实现密集人群的实时姿态识别)、OpenPose(一种对多人身体、面部和手部形态进行实时估计的框架)、Realtime Multi-Person Pose Estimation(实时多人姿态估计模型)、DeepPose(一种基于深度神经网络的姿态估计方法)、mobilenetv2(轻量级深度神经网络),本申请对对象检测模型的类型不做限定。部位检测模型可以用于检测对象的第一对象部位的关键点(如手掌关键点),此时的部位检测模型也可以称为手掌姿态估计模型;部位检测模型可以为基于检测的方法,或者为基于回归的方法,基于检测的方法可以通过生成热力图来预测第一对象部位的部位关键点,基于回归的方法可以直接回归部位关键点的位置坐标;部位检测模型的网络结构与对象检测模型的网络结构可以相同,也可以不同,当部位检测模型和对象检测模型的网络结构相同时,两者的网络参数也是不同的(由不同的数据 训练得到的),本申请对部位检测模型的类型不做限定。
在一些实施例中,上述对象检测模型和部位检测模型可以是利用样本数据预先训练好的检测模型,如可以利用携带人体关键点标签信息的样本数据(如三维人体数据集)训练得到对象检测模型,利用携带手掌关键点信息的样本数据(如手掌数据集)训练得到部位检测模型;或者,对象检测模型可以是通过应用程序接口(API)从人工智能云服务中调用的对象检测服务,部位检测模型可以是通过API接口从人工智能云服务中调用的部位检测服务,此处不做具体限定。
其中,人工智能云服务,一般也被称作是AI即服务(AI as a Service,AIaaS)。这是目前主流的一种人工智能平台的服务方式,具体来说AIaaS平台会把几类常见的AI服务进行拆分,并在云端提供独立或者打包的服务。这种服务模式类似于开了一个AI主题商城:所有的开发者都可以通过API接口的方式来接入使用平台提供的一种或者是多种人工智能服务,部分资深的开发者还可以使用平台提供的AI框架和AI基础设施来部署和运维自身专属的云人工智能服务。
在一些实施例中,本申请实施例所使用的对象检测模型可以为带有置信度的人体三维姿态估计模型,如通过对象检测模型可以预测图像帧中的对象的对象关键点,每个预测到的对象关键点都可以对应一个第一置信度,第一置信度可以用于表征每个预测到的对象关键点的预测准确性,预测得到的对象关键点以及对应的第一置信度可以称为图像帧对应的对象姿态检测结果。部位检测模型可以为带有置信度的手掌三维姿态估计模型,如通过部位检测模型可以预测第一对象部位在图像帧中的位置区域,并预测位置区域中的第一对象部位的部位关键点;部位检测模型可以预测得到一个或多个可能是第一对象部位所处的位置区域,一个位置区域可以对应一个第二置信度,第二置信度可以用于表征每个预测到的位置区域的预测准确性,预测得到的部位关键点和位置区域对应的第二置信度可以称为图像帧对应的部位姿态检测结果。
步骤S102,根据部位姿态检测结果以及与对象相关联的标准姿态,对所述对象姿态检测结果中缺失的至少一个对象部位进行插值处理,得到对象对应的全局姿态,其中,所述全局姿态用于操控计算机设备以实现与所述全局姿态对应的业务功能。
具体的,计算机设备可以获取对象对应的标准姿态(例如,图2所对应实施例中的标准姿态20m),该标准姿态可以认为是对象的完整默认姿态(记为T-pose);标准姿态的数量可以为一个或多个,如人体默认站姿、人体默认坐姿、人体默认蹲姿等,本申请对标准姿态的类型和数量不做限定。
请参见图4,图4是本申请实施例提供的一种标准姿态的示意图。如图4所示,模型 30a可以表示为SMPL(Skinned Multi-Person Linear)模型,该模型30a是一个参数化人体模型,可以适用于不同的人体结构;该模型30a中可以包括人体关节分布:1个根节点(序号为0的节点)和23个关节节点(序号1至序号23所表示的节点),其中根节点用于将整个人体作为完整刚体(在受力作用下,体积和形状都不发生变化的物体)进行变换,23个关节节点可以用于描述局部的人体部位形变。当对象为人体时,上述1个根节点和23个关节节点可以作为对象的对象关键点,基于对象关键点的类别(例如,手腕关节点、手肘关节点、手掌关节点、脚踝关节点等)和位置,对1个根节点和23个关节节点进行连接,可以得到标准姿态30b。
图像帧中可能没有包含完整的对象,如对象的部分部位(如人体下肢)不在图像帧中,那么该图像帧对应的对象姿态检测结果中缺失了一部分对象关键点,可以通过标准姿态对对象对应的对象姿态检测结果进行关键点补偿,完善缺失的对象关键点,以得到对象对应的第一候选对象姿态。当部位姿态检测结果包括第一对象部位的部位关键点时,可以联合部位姿态检测结果中的部位关键点和对象姿态检测结果中的对象关键点,对第一候选对象姿态进行调整,以得到对象在图像帧中的全局姿态。在得到当前的图像帧对应的全局姿态之后,可以继续对视频数据中的下一个图像帧进行对象姿态估计,以得到对象在视频数据的各个图像帧中的全局姿态。
在一些实施例中,计算机设备可以根据对象在视频数据中的全局姿态,确定对象的行为动作,通过这些行为动作可以对该对象进行管理或护理,或者通过对象的行为动作进行人机交互。总而言之,对象在视频数据中的全局姿态可以应用在人机交互场景(例如,虚拟现实、人机动画等)、内容审核场景、自动驾驶场景、虚拟直播场景、游戏或电影人物动作设计场景等。在人机交互场景中,可以采集用户(对象)的图像(或视频),在得到图像或视频中的全局姿态之后,可以基于全局姿态实现对机器的操控,如基于一个特定的人体动作(由全局姿态来确定)执行一个特定指令。在游戏人物动作设计场景中,通过对象对应的全局姿态来获取人体动作,以取代昂贵的动作捕捉设备,可以降低游戏人物动作设计成本与难度。
其中,虚拟直播场景可以是指直播间的直播画面不直接播放主播用户(对象)的视频,而是在直播间播放与主播用户具有相同行为动作的虚拟对象的视频,如可以基于主播用户的全局姿态确定该主播用户的行为动作,进而可以通过该主播用户的行为动作驱动虚拟对象,即构建一个与主播用户具有相同行为动作的虚拟对象,利用虚拟对象进行直播,既可以避免主播用户出现在公众视野中,又可以达到与真实主播用户相同的直播效果。例如,计算机设备可以根据对象在视频数据中的全局姿态,构建与对象相关联的 虚拟对象,并在多媒体应用(例如,直播间、视频网站、短视频应用等)中播放具有该全局姿态的虚拟对象,即在多媒体应用中可以播放关于虚拟对象的视频,且虚拟对象的姿态与对象在视频数据中的姿态保持同步。其中,视频数据中的对象所对应的全局姿态都会在多媒体应用中所播放的虚拟对象上体现,对象每变化一次姿态,就会驱动多媒体应用中的虚拟对象变换为相同的姿态(可以认为是重新构建一个具有新姿态的虚拟对象,此处的新姿态为对象变化后的姿态),使得对象与虚拟对象的姿态始终保持一致。
请参见图5,图5是本申请实施例提供的一种对象姿态估计的场景示意图。以虚拟直播场景为例,对视频数据的对象姿态估计过程进行描述;如图5所示,当主播用户40c(可以作为对象)需要进行直播时,可以进入直播间(如房间号为116889的直播间),在开始直播之前,主播用户40c可以选择真人直播模式,也可以选择虚拟直播模式。若主播用户40c选择虚拟直播模式,则可以拉取虚拟对象,在主播用户40c开始直播时,可以利用该主播用户40c的行为动作去驱动虚拟对象,使得虚拟对象与主播用户40c保持相同的姿态。
在开始直播后,主播用户40c可以通过用户终端40a(例如,智能手机)采集自身的视频数据,此时的主播用户40a可以作为对象,用户终端40a可以采用支架40b进行固定。该用户终端40a采集到主播用户40c的视频数据后,可以从该视频数据中获取图像帧40g,将该图像帧40g分别输入对象检测模型和部位检测模型,通过对象检测模型可以预测得到图像帧40g中所包含的主播用户40c的部位关节点(对象关键点),这些预测的部位关节点可以作为该图像帧40g的对象姿态检测结果;通过部位检测模型可以预测得到图像帧40g中所包含的主播用户40c的手掌关键点(这里默认第一对象部位为手掌,手掌关键点也可以称为部位关键点),这些预测的手掌关键点可以作为该图像帧40g的部位姿态检测结果;此处的对象姿态检测结果和部位姿态检测结果可以标记在图像帧40g中(如图像40h所示),其中,图像40h中的区域40i和区域40j表示上述部位姿态检测结果。
如图5所示,通过图像40h中所展示的对象姿态检测结果和部位姿态检测结果,可以得到主播用户40c在图像帧40g中的人体姿态40k;很显然,由于图像帧40g仅包含主播用户40c的上半身,所以人体姿态40k并不是主播用户40c的完整人体姿态。在这种情形下,可以获取标准姿态(人体完整默认姿态),通过标准姿态对人体姿态40k进行关节点插值,以完善人体姿态40k中缺失的部位关节点,得到针对主播用户40c的整体人体姿态40m(全局姿态)。
通过整体人体姿态40m可以驱动直播间中的虚拟对象,让直播间中的虚拟对象40m 具备与主播用户40c相同的整体人体姿态40k。对于进入该直播间观看直播的用户,可以在其使用的用户终端40d中显示虚拟对象所在直播间的展示页面,该直播间的展示页面可以包括区域40e和区域40f,区域40e可以用于播放虚拟对象的视频(与主播用户40c具有相同的姿态),区域40f可以用于发表弹幕等。在虚拟直播场景中,进入直播间观看直播的用户只能看到虚拟对象的视频和主播用户40c的语音数据,而无法看到主播用户40c的视频数据,这样可以保护主播用户40c的个人信息,并通过虚拟对象达到与主播用户40c相同的直播效果。
在一些实施例中,可以将上述对象在视频数据中的全局姿态应用在内容审核场景中,当全局姿态与内容审核系统中的姿态相同时,可以确定对象在内容审核系统中的审核结果为审核通过结果,并为该对象设置针对内容审核系统的访问权限;在全局姿态通过内容审核系统中的审核后,对象可以具备访问内容审核系统的权限。
请参见图6,图6是本申请实施例提供的一种全局姿态的应用场景示意图。如图6所示,用户A(对象)可以通过用户终端50a向服务器50d发送验证请求,服务器50d在接收到用户终端50a发送的验证请求之后,可以获取针对用户A的身份审核方式,并将该身份审核方式返回至用户终端50a,在该用户终端50a的终端屏幕中可以显示验证框50b。用户A可以正面对准用户终端50a中的验证框50b,并做出特定动作(例如,抬手、踢腿、叉腰等),用户终端50a可以实时采集验证框50b中的待验证图像50c(可以认为是上述图像帧),并将实时采集到的待验证图像50c发送至服务器50d。
服务器50d可以获取用户终端50a发送的待验证图像50c,并获取用户A预先在内容审核系统中设置的姿态50e,该姿态50e可以作为用户A在内容审核系统中的验证信息。服务器50d可以利用对象检测模型、部位检测模型以及标准姿态对待验证图像50c进行姿态估计,得到用户A在待验证图像50c中的全局姿态;将待验证图像50c对应的全局姿态与姿态50e进行相似度比较,当待验证图像50c的全局姿态与姿态50e之间的相似度大于或等于相似度阈值(例如,相似度阈值可以设置为90%)时,可以确定待验证图像50c的全局姿态与姿态50e相同,该用户A在内容审核系统中通过审核。当待验证图像50c的全局姿态与姿态50e之间的相似度小于相似度阈值时,可以确定待验证图像50c的全局姿态与姿态50e不相同,该用户A在内容审核系统中未通过审核,并向用户终端50a返回动作错误提示信息,该动作错误提示信息用于提示用户A重新做动作进行身份审核。
本申请实施例中,通过对图像帧中的对象分别进行对象姿态估计和特定部位姿态估计,可以得到针对对象的对象姿态检测结果,以及针对对象的第一对象部位的部位姿态检测结果,进而可以基于对象姿态检测结果、部位姿态检测结果以及标准姿态,对图像 帧中的对象进行姿态估计,可以对图像帧中对象缺少的部位关键点进行补偿,可以确保最终得到的对象的全局姿态的完整性和合理性,进而可以提高全局姿态的估计准确性。
请参见图7,图7是本申请实施例提供的另一种数据处理方法的流程示意图。如图7所示,该数据处理方法可以包括以下步骤S201-步骤S208:
步骤S201,将图像帧输入至对象检测模型,通过对象检测模型获取图像帧中的对象对应的对象姿态特征,识别对象姿态特征对应的第一分类结果;第一分类结果用于表征对象的关键点所对应的对象部位类别。
具体的,计算机设备在获取到移动端场景下拍摄的视频数据后,可以从该视频数据中选取一个图像帧,将该图像帧输入至训练完成的对象检测模型,经过对象检测模型可以获取图像帧中的对象所对应的对象姿态特征,通过对象检测模型的分类器,可以输出对象姿态特征对应的第一分类结果,该第一分类结果可以用于表征对象的关键点(例如,人体关节)所对应的对象部位类别。其中,上述对象姿态特征可以为经过对象检测模型所提取到的针对对象的对象描述特征,或者可以为对象所对应的对象描述特征与部位描述特征之间的融合特征。当对象姿态特征为图像帧中的对象所对应的对象描述特征时,表明利用对象检测模型对图像帧进行特征提取的过程中,未引入基于部位感知的分块学习;当对象姿态特征为图像帧中的对象所对应的对象描述特征与部位描述特征之间的融合特征时,表明利用对象检测模型对图像帧进行特征提取的过程中,引入了基于部位感知的分块学习;通过引入基于部位感知的分块学习,使得对象姿态特征既可以包含图像帧中所包含的对象的各个部位的局部姿态特征(部位描述特征),又可以包含对象中所包含的对象的对象描述特征,可以增强对象姿态特征的细粒度,进而可以提高对象姿态检测结果的准确性。
在一些实施例中,若在使用对象检测模型对图像帧进行特征提取的过程中,引入了基于部位感知的分块学习,则计算机设备可以将图像帧输入至对象检测模型,在对象检测模型中获取图像帧中的对象对应的对象描述特征,根据对象检测模型中的分类器,输出对象描述特征对应的第二分类结果;获取对象检测模型中的卷积层所输出的针对图像帧的对象卷积特征,将第二分类结果和对象卷积特征进行乘积运算,得到图像帧对应的第二激活映射图;根据第二激活映射图对图像帧进行分块处理,得到M个对象部位区域图像,根据对象检测模型获取M个对象部位区域图像分别对应的部位描述特征,M为正整数;将对象描述特征和M个对象部位区域图像所对应的部位描述特征组合为对象姿态特征。
其中,对象描述特征可以认为是从图像帧中提取到的用于表征对象的特征表示;第 二分类结果也可以用于表示图像帧中所包含的对象关键点对应的对象部位类别;卷积层可以是指对象检测模型中的最后一个卷积层,对象卷积特征可以表示对象检测模型的最后一个卷积层所输出的针对图像帧的卷积特征;第二激活映射图可以为图像帧对应的类激活映射图(Class Activation Mapping,CAM),CAM是一种可视化图像特征的工具。将对象检测模型中的最后一个卷积层输出的对象卷积特征和第二分类结果进行加权(第二分类结果可以认为是对象卷积特征对应的权重),可以得到第二激活映射图,该第二激活映射图可以认为是对卷积层所输出的对象卷积特征进行可视化后的结果,可以用于表征对象检测模型所关注的图像像素点区域。
计算机设备可以将图像帧中的各个对象关键点的类激活映射图(第二激活映射图)作为区域位置的先验信息,对图像帧进行分块处理,即根据第二激活映射图对图像帧进行剪裁,得到包含单个部位的对象部位区域图像;进而可以通过对象检测模型对每个对象部位区域图像均进行特征提取,得到每个对象部位区域图像分别对应的部位描述特征,前述对象描述特征和每个对象部位区域图像分别对应的部位描述特征可以组合为针对对象的对象姿态特征;部位描述特征可以认为是从对象部位区域图像中提取到的用于表征对象部位的特征表示。
步骤S202,根据第一分类结果和对象检测模型输出的所述图像帧的对象卷积特征,生成第一激活映射图。
具体的,计算机设备在得到第一分类结果后,可以将第一分类结果和图像帧的对象卷积特征进行相乘,生成第一激活映射图。其中,第一激活映射图和第二激活映射图都是针对图像帧的类激活映射图,只是第一激活映射图是以第一分类结果作为卷积层所输出的对象卷积特征的权重(此处默认第一分类结果结合了对象描述特征和部位描述特征),而第二激活映射图是以第二分类结果作为卷积层所输出的对象卷积特征的权重,第二分类结果只与对象描述特征有关。
步骤S203,获取第一激活映射图对应的像素平均值,根据像素平均值确定对象中的关键点在图像帧中的定位结果,根据对象部位类别和定位结果,确定图像帧对应的对象姿态检测结果。
具体的,计算机设备可以对第一激活映射图取像素平均值,并将像素平均值确定为对象中的关键点在图像帧中的定位结果,根据对象部位类别和定位结果,可以确定图像帧中的对象的对象骨架,该对象骨架可以作为图像帧中的对象所对应的对象姿态检测结果。
请参见图8,图8是本申请实施例提供的一种对象检测模型的结构示意图。如图8所 示,计算机设备在获取到图像帧60a后,可以将图像帧60a输入至对象检测模型,通过该对象检测模型中的特征提取组件60b(例如,该特征提取网络可以为卷积网络)对图像帧60a进行特征提取,可以得到图像帧60a中的对象对应的对象描述特征60c,利用全局平均池化(对象描述特征的数量可以为多个,全局平均池化是指将一个对象描述特征转换为一个数值)以及激活函数,对对象描述特征60c进行处理,并对处理后的结果进行分类,得到第二分类结果;将第二分类结果与特征提取组件60b中的最后一个卷积层所输出的对象卷积特征进行加权,得到第二激活映射图。
基于第二激活映射图对图像帧60a进行分块处理,得到M个对象部位区域图像60f,将M个对象部位区域图像60f依次输入至对象检测模型中的特征提取组件60b,通过特征提取组件60b可以得到M个对象部位区域图像60f分别对应的部位描述特征60g。将M个部位描述特征60g与图像帧60a的对象描述特征60c进行特征组合,得到对象姿态特征;通过对对象姿态特征进行识别,可以得到第一分类结果60d,将第一分类结果60d和特征提取组件60b中的最后一个卷积层所输出的对象卷积特征进行加权,可以得到第一激活映射图60e。该第一激活映射图60e的像素平均值可以作为对象在图像帧60a中的定位结果,并以此得到图像帧60a中的对象对应的对象姿态检测结果。
需要说明的是,图8所对应实施例中所描述的对象姿态检测结果的获取方式仅为本申请实施例的一个举例说明,本申请还可以采用其余的方式得到对象姿态检测结果,本申请对此不做限定。
请参见图9,图9是本申请实施例提供的一种获取对象姿态检测结果的流程示意图。如图9所示,以对象检测模型是人体三维姿态估计模型为例,计算机设备可以将图像帧70a输入人体三维姿态估计模型,通过该人体三维姿态估计模型可以获取对象(此时的对象为人体)在图像帧70a中的人体三维关键点。如图9所示,若通过人体三维姿态估计模型检测得到图像帧70a所包含的16个人体三维关键点,分别标记为x1至x16,每个人体三维关键点都可以对应一个位置坐标和一个第一置信度,基于第一置信度可以确定检测到的人体三维关键点为真实人体关键点的可能性;如第一置信度大于第一置信阈值(可以根据实际需求进行设置)的人体三维关键点可以认为是真实人体关键点(例如,x4至x16所表示的人体三维关键点)。通过对真实人体关键点进行连接,可以得到人体姿态70c(也可以认为是对象姿态检测结果)。第一置信度小于或等于第一置信阈值的人体三维关键点为异常关键点,在后续处理中可以对这些异常关键点进行补偿,以得到更准确的人体关键点。
可以理解的是,以图像帧构建空间坐标系,人体三维关键点的位置坐标可以是指在 该空间坐标系中的空间坐标。
步骤S204,将图像帧输入至部位检测模型,在部位检测模型中检测图像帧中的对象的第一对象部位。
具体的,计算机设备还可以将图像帧输入至部位检测模型,在该部位检测模型中首先检测图像帧中是否包含对象的第一对象部位。其中,部位检测模型可以用于检测第一对象部位的关键点,因此需要检测图像帧中的第一对象部位,若在图像帧中未检测到对象的第一对象部位,则可以直接确定图像帧对应的部位姿态检测结果为空值,无需执行后续检测第一对象部位的关键点的步骤。
步骤S205,若在图像帧中检测到第一对象部位,则从图像帧中获取包含第一对象部位的区域图像,根据区域图像获取第一对象部位对应的部位关键点位置,基于部位关键点位置确定图像帧对应的部位姿态检测结果。
具体的,若在图像帧中检测到第一对象部位,则可以确定第一对象部位在图像帧中的位置区域,基于第一对象部位在图像帧中的位置区域,对图像帧进行剪裁,得到包含第一对象部位的区域图像。在部位检测模型中可以对区域图像进行特征提取,获取区域图像中的第一对象部位对应的部位轮廓特征,根据部位轮廓特征可以预测第一对象部位对应的部位关键点位置;基于部位关键点位置,可以对第一对象部位的关键点进行连接,得到图像帧对应的部位姿态检测结果。
请参见图10,图10是本申请实施例提供的一种获取部位姿态检测结果的流程示意图。如图10所示,以部位检测模型是手掌三维姿态估计模型为例,计算机设备可以将图像帧80a输入手掌三维姿态估计模型,在该手掌三维姿态估计模型中可以检测图像帧80a中是否包含对象的手掌(第一对象部位),若在图像帧80a中未检测到手掌,则可以确定图像帧80a对应的部位姿态检测结果为空值;若在图像帧80a中检测到手掌,则可以在图像帧80a中确定包含手掌的区域(例如,图像80b中的区域80c和区域80d,区域80c包含对象的右手掌,区域80d包含对象的左手掌),并通过手掌三维姿态估计模型可以检测得到区域80c中的手掌三维关键点,以及区域80d中的手掌三维关键点。
其中,通过手掌三维姿态估计模可以获取多个可能的区域,并预测每个可能的区域中包含手掌的第二置信度,将第二置信度大于第二置信阈值(可以与前述第一置信阈值相同,也可以不同,在此不做限定)的区域确定为包含手掌的区域,如区域80c和区域80d对应的第二置信度都大于第二置信阈值。对区域80c中所检测到的手掌关键点进行连接可以得到右手掌姿态80e,对区域80d中所检测到的手掌关键点进行连接可以得到左手掌姿态80f。上述左手掌姿态80f和右手掌姿态80e可以称为图像帧80a对应的部位姿态 检测结果。
步骤S206,获取与对象相关联的标准姿态,确定所述标准姿态对应的第一关键点数量,以及对象姿态检测结果对应的第二关键点数量。
具体的,计算机设备可以获取对象对应的标准姿态,并统计标准姿态中所包含的对象关键点的第一关键点数量,以及对象姿态检测结果中所包含的对象关键点的第二关键点数量。其中,第一关键点数量是在构建标准姿态时就已知的,第二关键点数量是对象检测模型所预测得到的对象关键点的数量。
步骤S207,当第一关键点数量大于第二关键点数量时,根据标准姿态对对象姿态检测结果进行插值处理,得到第一候选对象姿态。
具体的,当第一关键点数量大于第二关键点数量时,表示对象姿态检测结果中存在缺失的对象关键点,可以通过标准姿态对该对象姿态检测结果进行关键点补偿(插值处理),完善缺失的对象关键点,以得到对象对应的第一候选对象姿态。如图2所示,通过标准姿态20k对人体姿态20j(对象姿态检测结果)进行关键点补偿,可以得到人体姿态20m,此时的人体姿态20m可以称为第一候选对象姿态。
例如,假设对象为人体,若通过对象检测模型预测得到的对象姿态检测结果中缺失了膝盖、脚踝、脚部、手肘等部位的关键点,则可以通过标准姿态对上述对象姿态检测结果进行插值处理,如添加缺失的对象关键点,以得到更合理的第一候选对象姿态。通过标准姿态对对象姿态检测结果进行插值,可以提高对象姿态的完整性和合理性。
步骤S208,根据部位姿态检测结果对第一候选对象姿态中与第一对象部位相关联的对象部位进行插值处理,得到对象对应的全局姿态。
具体的,在实际应用场景中,对象的姿态变化在很大程度上取决于对象的少数几个部位,也就是说,对象的一些特定部位(例如,人体结构中的手臂部位,手臂部位可以包括手掌、手腕以及手肘等部位的关键点)对最终的结果具有重要作用;因此,本申请实施例中,可以基于部位姿态检测结果对第一候选对象姿态中,与第一对象部位相关联的对象部位进行插值处理,可以得到对象对应的全局姿态。在一些实施例中,若部位姿态检测结果为空值(即图像帧中不包含第一对象部位),则可以直接将第一候选对象姿态确定为对象对应的全局姿态。
举例来说,假设对象为人体,第一对象部位为手掌;当图像帧中包含手肘部位时,通过对象检测模型可以预测得到针对手肘部位的关键点。当图像帧中不包含手肘部位时,通过对象检测模型无法预测得到手肘部位的关键点,此时可以基于部位姿态检测结果确定对象的手肘关键点和手腕关键点,将手肘关键点和手腕关键点添加到第一候选对象姿 态中,可以得到对象对应的全局姿态。
在一些实施例中,对象包括第二对象部位和第三对象部位,第二对象部位和第三对象部位是相对称的,如第二对象部位为对象的右手臂,则第三对象部位为对象的左手臂;第二对象部位为对象的左腿,则第三对象部位为对象的右腿等。
在部位姿态检测结果包括第一对象部位的所有部位关键点的情形下(若第一对象部位为手掌,此处假设部位姿态检测结果包括左右手掌关键点),若对象姿态检测结果包含第二对象部位的姿态,且对象姿态检测结果不包含第三对象部位的姿态,即图像帧中包含第二对象部位,但不包含第三对象部位,则可以根据部位姿态检测结果中所包含的第一对象部位的关键点位置,确定第三对象部位对应的第一部位方向;其中,第二对象部位和第三对象部位为对象的对称部位。由于第二对象部位和第三对象部位为对称部位,因此第二对象部位的长度与第三对象部位的长度相同,因此可以获取第二对象部位在第一候选对象姿态中的第一部位长度,根据第一部位长度和第一部位方向,确定第三对象部位的关键点位置;并将第三对象部位的关键点位置添加至第一候选对象姿态,得到图像帧中的对象对应的全局姿态。
若对象姿态检测结果不包括第二对象部位和第三对象部位的姿态,即图像帧中既不包含第二对象部位,也不包含第三对象部位,则可以根据部位姿态检测结果中所包含的第一对象部位的关键点位置,确定第二对象部位对应的第二部位方向,以及第三对象部位对应的第三部位方向;进而可以在第i-1个图像帧中,获取第二对象部位对应的第二部位长度,以及第三对象部位对应的第三部位长度;换言之,可以将第二对象部位在前一个图像帧中的长度作为第二对象部位在图像帧中的长度,将第三对象部位在前一个图像帧中的图像作为第三对象部位在图像帧中的长度。即而可以根据第二部位长度和第二部位方向,确定第二对象部位的关键点位置;根据第三部位长度和第三部位方向,确定第三对象部位的关键点位置,将第二对象部位的关键点位置和第三对象部位的关键点位置添加至第一候选对象姿态,得到图像帧中的对象对应的全局姿态。若第i-1个图像帧中同样不包含第二对象部位和第三对象部位,则可以继续往前回溯,获取第二对象部位和第三对象部位分别在第i-2个图像帧中的长度,以确定第二对象部位和第三对象部位分别在图像帧中的关键点位置。若在图像帧之前的图像帧中都没有检测到第二对象部位和第三对象部位,则可以根据第一候选对象姿态为第二对象部位和第三对象部位分别设置一个近似长度,以确定第二对象部位和第三对象部位分别在图像帧中的关键点位置。
举例来说,假设对象为人体,第一对象部位为手掌,第二对象部位和第三对象部位分别为左右手臂;在图像帧中检测到左右手掌的前提下,可以通过左手掌的关键点计算 得到左小手臂的方向,通过右手掌的关键点计算得到右小手臂的方向,左小手臂属于左手臂的一部分,右小手臂属于右手臂的一部分。
若在图像帧中未检测到左右手臂,则可以将图像帧之前的图像帧(例如,第i-1个图像帧)中的左右小手臂长度(第二部位长度和第二部位长度),作为图像帧中的左右小手臂长度。若图像帧以及之前的图像帧中均未检测到左右手臂,则可以参考图像帧中的肩膀长度,赋予图像帧中的左右小手臂的参考长度。若在图像帧中检测到左右手臂中的任意一个手臂(例如,左手臂),则可以直接将左小手臂长度(第一部位长度)赋予给右小手臂。例如,已知右手腕点A,右手掌点B,右手肘点C缺失;右小手臂的方向可以表示为右手掌点B至右手腕点A的方向,可以记为向量BA;左小手臂长度可以表示为右手腕点A至右手肘点C的长度,可以记为L;通过上述信息可以计算得到右手肘点C的位置坐标,可以表示为:C=A+BA_normal*L,其中,C表示手肘点的位置坐标,A表示手腕点的位置坐标,BA_normal表示向量BA的单位向量。
可以理解的是,若在图像帧中检测到了左右手臂,则可以基于检测到的手掌关键点,对对象检测模型所预测到的手肘点进行调整更新,可以提高手肘点的准确性,进而提高全局姿态的合理性。
在一些实施例中,基于部位姿态检测结果对第一候选对象姿态进行插值处理后所得到的姿态,可能存在一些不合理的对象关键点,因此可以结合标准姿态对不合理的对象关键点进行矫正,以得到对象最终的全局姿态。具体的,假设在图像帧中未检测到第三对象部位,计算机设备可以将添加了第三对象部位的关键点位置的第一候选对象姿态,确定为第二候选对象姿态;进而可以获取标准姿态与第二候选对象姿态之间的姿态偏移量,若姿态偏移量大于偏移阈值(可以理解为是对象在正常情况下可以偏移的最大角度),则基于标准姿态对第二候选对象姿态进行关键点矫正,得到图像帧中的对象对应的全局姿态。其中,上述姿态偏移量可以理解为第二候选对象姿态与标准姿态之间的相关夹角,如对象为人体时,该姿态偏移量可以为第二候选对象姿态的肩膀与标准姿态的肩膀之间的夹角等。
请参见图11,图11是本申请实施例提供的一种对象关键点的矫正示意图。如图11所示,对于图像帧90a,在得到该图像帧90a对应的第二候选对象姿态后,基于该第二候选对象姿态可以构建人体模型90b;受限于对象检测模型的性能,该人体模型90b中的区域90c(例如,肩膀区域)相较于正常的人体结构(例如,标准姿态),很明显存在塌陷问题,如第一候选对象姿态的肩膀与标准姿态的肩膀之间的夹角大于偏移阈值。计算机设备可以通过标准姿态对人体模型90c进行矫正,得到人体模型90d;该人体模型90d中 的区域90e可以认为是对区域90c进行矫正之后的结果,人体模型90d所对应的姿人体姿态可以称为图像帧90a中的对象所对应的全局姿态。
在移动端场景下所拍摄的视频数据中通常无法包含整个对象,通过对象检测模型预测得到的针对对象的姿态是不完整的,通过关键点插值、关键点矫正等处理,可以提高全局姿态的合理性;通过部位姿态检测结果,可以计算得到与第一对象部位相关联的对象关键点位置,可以提高全局姿态的准确性。
请参见图12,图12是本申请实施例提供的一种对象姿态估计的流程示意图。如图12所示,假设对象为人体,计算机设备在获取到移动端场景下拍摄的视频数据或图像数据之后,可以获取带有置信度的人体三维姿态估计模型(对象检测模型)和带有置信度的手掌三维姿态估计模型(部位检测模型)。通过人体三维姿态估计模型可以预测得到任意一个图像帧(图像帧)中的人体三维关键点,这些人体三维关键点可以构成对象姿态检测结果;通过手掌三维姿态估计模型可以预测得到任意一个图像帧(图像帧)中的手掌三维关键点,这些手掌三维关键点可以构成部位姿态检测结果。根据人体默认姿态(标准姿态)可以对人体三维姿态估计模型预测的人体三维关键点进行插值处理,以完善缺失的人体关键点;还可以结合手掌三维关键点和人体三维关键点,对人体(对象)的手肘、手腕进行插值处理,得到候选人体姿态(上述第二候选对象姿态)。若候选人体姿态中检测到不符合正常人体结构的人体关键点(姿态偏移量大于偏移阈值的人体关键点),可以对这些不符合正常人体结构的人体关键点进行矫正,最终得到合理的三维姿态估计结果(即上述全局姿态)。
可以理解的是,在本申请的具体实施方式中,可能涉及到用户的视频采集,当本申请以上实施例运用到具体产品或技术中时,需要获得用户的许可或同意,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。
本申请实施例中,通过对图像帧中的对象分别进行对象姿态估计和特定部位姿态估计,可以得到针对对象的对象姿态检测结果,以及针对对象的第一对象部位的部位姿态检测结果,进而可以基于对象姿态检测结果、部位姿态检测结果以及标准姿态,对图像帧中的对象进行姿态估计,可以对图像帧中对象缺少的部位关键点进行补偿,并矫正不符合标准姿态的对象关键点,可以确保最终得到的对象的全局姿态的完整性和合理性,进而可以提高全局姿态的估计准确性。
请参见图13,图13是本申请实施例提供的一种数据处理装置的结构示意图。如图13所示,该数据处理装置1可以包括:姿态检测模块11,姿态估计模块12;
姿态检测模块11,用于获取图像帧中的对象对应的对象姿态检测结果以及图像帧中 所述对象的第一对象部位对应的部位姿态检测结果;其中,所述对象姿态检测结果中缺失所述对象的至少一个对象部位,所述第一对象部位为所述对象的一个或多个部位;
姿态估计模块12,用于根据部位姿态检测结果以及与对象相关联的标准姿态,对所述对象姿态检测结果中缺失的至少一个对象部位进行插值处理,得到对象对应的全局姿态,其中,所述全局姿态用于操控计算机设备以实现与所述全局姿态对应的业务功能。
其中,姿态检测模块11,姿态估计模块12的具体功能实现方式可以参见上述图3所对应实施例中对步骤S101-步骤S102的描述,这里不再进行赘述。
本申请实施例中,通过对图像帧中的对象分别进行全局对象姿态估计和特定部位姿态估计,可以得到针对对象的对象姿态检测结果,以及针对对象的第一对象部位的部位姿态检测结果,进而可以基于对象姿态检测结果、部位姿态检测结果以及标准姿态,对图像帧中的对象进行姿态估计,可以对图像帧中对象缺少的部位关键点进行补偿,可以确保最终得到的对象的全局姿态的完整性和合理性,进而可以提高全局姿态的估计准确性。
请参见图14,图14是本申请实施例提供的另一种数据处理装置的结构示意图。如图14所示,该数据处理装置2包括:姿态检测模块21,姿态估计模块22,虚拟对象构建模块23;
姿态检测模块21,用于获取图像帧中的对象对应的对象姿态检测结果和图像帧中所述对象的第一对象部位对应的部位姿态检测结果;其中,所述对象姿态检测结果中缺失所述对象的至少一个对象部位,所述第一对象部位为所述对象的一个或多个部位;
姿态估计模块22,用于根据部位姿态检测结果以及与对象相关联的标准姿态,对所述对象姿态检测结果中缺失的至少一个对象部位进行插值处理,得到对象对应的全局姿态。
虚拟对象构建模块23,用于构建与对象相关联的虚拟对象,根据所述全局姿态控制所述虚拟对象的姿态。
其中,姿态检测模块21,姿态估计模块22,虚拟对象构建模块23的具体功能实现方式可以参见前述相关步骤的描述,这里不再进行赘述。
在一个或多个实施例中,姿态检测模块21包括:对象检测单元211,部位检测单元212;
对象检测单元211,用于将图像帧输入至对象检测模型,通过对象检测模型,获得所述对象姿态检测结果;
部位检测单元212,用于将图像帧输入至部位检测模型,通过部位检测模型,获得所 述部位姿态检测结果。
其中,对象检测单元211,部位检测单元212的具体功能实现方式可以参见图3所对应实施例中的步骤S101,这里不再进行赘述。
在一个或多个实施例中,对象检测单元211可以包括:部位分类子单元2111,部位映射图生成子单元2112,定位结果确定子单元2113,检测结果确定子单元2114;
部位分类子单元2111,用于将图像帧输入至对象检测模型,通过对象检测模型获取图像帧中的对象对应的对象姿态特征,识别对象姿态特征对应的第一分类结果;第一分类结果用于表征对象的关键点所对应的对象部位类别;
部位映射图生成子单元2112,用于根据第一分类结果和所述对象检测模型输出的图像帧的对象卷积特征,生成第一激活映射图;
定位结果确定子单元2113,用于获取第一激活映射图对应的像素平均值,根据像素平均值确定对象中的关键点在图像帧中的定位结果;
检测结果确定子单元2114,用于根据对象部位类别和定位结果,确定图像帧对应的对象姿态检测结果。
其中,部位分类子单元2111,部位映射图生成子单元2112,定位结果确定子单元2113,检测结果确定子单元2114的具体功能实现方式可以参见图7所对应实施例中的步骤S201-步骤S203,这里不再进行赘述。
在一个或多个实施例中,部位分类子单元2111包括:全局分类子单元21111,全局映射图获取子单元21112,分块处理子单元21113,特征组合子单元21114;
全局分类子单元21111,用于在对象检测模型中获取图像帧中的对象对应的对象描述特征,根据对象检测模型中的分类器,输出对象描述特征对应的第二分类结果;
全局映射图获取子单元21112,用于获取对象检测模型中的卷积层所输出的针对图像帧的对象卷积特征,将第二分类结果和对象卷积特征进行乘积运算,得到图像帧对应的第二激活映射图;
分块处理子单元21113,用于根据第二激活映射图对图像帧进行分块处理,得到M个对象部位区域图像,根据对象检测模型获取M个对象部位区域图像分别对应的部位描述特征;M为正整数;
特征组合子单元21114,用于将对象描述特征和M个对象部位区域图像所对应的部位描述特征组合为对象姿态特征。
其中,全局分类子单元21111,全局映射图获取子单元21112,分块处理子单元21113,特征组合子单元21114的具体功能实现方式可以参见图7所对应实施例中的步骤 S201,这里不再进行赘述。
在一个或多个实施例中,部位检测单元212可以包括:对象部位检测子单元2121,部位姿态估计子单元2122,空值确定子单元2123;
对象部位检测子单元2121,用于将图像帧输入至部位检测模型,在部位检测模型中检测图像帧中的对象的第一对象部位;
部位姿态估计子单元2122,用于若在图像帧中检测到第一对象部位,则从图像帧中获取包含第一对象部位的区域图像,根据区域图像获取第一对象部位对应的部位关键点位置,基于部位关键点位置确定图像帧对应的部位姿态检测结果;
空值确定子单元2123,用于若在图像帧中未检测到第一对象部位,则确定图像帧对应的部位姿态检测结果为空值。
其中,对象部位检测子单元2121,部位姿态估计子单元2122,空值确定子单元2123的具体功能实现方式可以参见图7所对应实施例中的步骤S204-步骤S205,这里不再进行赘述。
在一个或多个实施例中,部位姿态估计子单元2122可以包括:图像剪裁子单元21221,部位关键点确定子单元21222,部位关键点连接子单元21223;
图像剪裁子单元21221,用于若在图像帧中检测到第一对象部位,则对图像帧进行剪裁,得到包含第一对象部位的区域图像;
部位关键点确定子单元21222,用于获取区域图像对应的部位轮廓特征,根据部位轮廓特征预测第一对象部位对应的部位关键点位置;
部位关键点连接子单元21223,用于基于部位关键点位置,对第一对象部位的关键点进行连接,得到图像帧对应的部位姿态检测结果。
其中,图像剪裁子单元21221,部位关键点确定子单元21222,部位关键点连接子单元21223的具体功能实现方式可以参见图7所对应实施例中的步骤S205,这里不再进行赘述。
在一个或多个实施例中,姿态估计模块22包括:关键点数量确定单元221,第一插值处理单元222,第二插值处理单元223;
关键点数量确定单元221,用于获取与对象相关联的标准姿态,确定标准姿态对应的第一关键点数量,以及对象姿态检测结果对应的第二关键点数量;
第一插值处理单元222,用于若第一关键点数量大于第二关键点数量,根据标准姿态对对象姿态检测结果中缺失的至少一个对象部位进行插值处理,得到第一候选对象姿态;
第二插值处理单元223,用于根据部位姿态检测结果,对第一候选对象姿态中与第一 对象部位相关联的对象部位进行插值处理,得到对象对应的全局姿态。
其中,关键点数量统计单元221,第一插值处理单元222,第二插值处理单元223的具体功能实现方式可以参见图7所对应实施例中的步骤S206-步骤S208,这里不再进行赘述。
在一个或多个实施例中,第二插值处理单元223可以包括:第一方向确定子单元2231,第一位置确定子单元2232,第一关键点增加子单元2233;
第一方向确定子单元2231,用于若对象姿态检测结果包含第二对象部位的姿态,且对象姿态检测结果不包含第三对象部位的姿态,则根据部位姿态检测结果中所包含的第一对象部位的关键点位置,确定第三对象部位对应的第一部位方向;第二对象部位和第三对象部位为对象的对称部位、且所述第二对象部位和第三对象部位与第一对象部位相关联;
第一位置确定子单元2232,用于获取第二对象部位在第一候选对象姿态中的第一部位长度,根据第一部位长度和第一部位方向,确定第三对象部位的关键点位置;
第一关键点增加子单元2233,用于将第三对象部位的关键点位置添加至第一候选对象姿态,得到图像帧中的对象对应的全局姿态。
在一些实施例中,第一关键点增加子单元2233具体用于:
将第三对象部位的关键点位置添加至第一候选对象姿态,得到图像帧中的对象对应的第二候选对象姿态,获取标准姿态与第二候选对象姿态之间的姿态偏移量;
若姿态偏移量大于偏移阈值,则基于标准姿态对第二候选对象姿态进行关键点矫正,得到图像帧中的对象对应的全局姿态。
在一些实施例中,图像帧为视频数据中的第i个图像帧,i为正整数;第二插值处理单元223还可以包括:第二方向确定子单元2234,第二位置确定子单元2235,第二关键点增加子单元2236;
第二方向确定子单元2234,用于若对象姿态检测结果不包括第二对象部位和第三对象部位的姿态,则根据部位姿态检测结果中所包含的第一对象部位的关键点位置,确定第二对象部位对应的第二部位方向,以及第三对象部位对应的第三部位方向;第二对象部位和第三对象部位为对象的对称部位、且所述第二对象部位和第三对象部位与第一对象部位相关联;
第二位置确定子单元2235,用于在第j个图像帧中,获取第二对象部位对应的第二部位长度,以及第三对象部位对应的第三部位长度,根据第二部位长度和第二部位方向,确定第二对象部位的关键点位置;其中,j为正整数且j小于i;
第二关键点增加子单元2236,用于根据第三部位长度和第三部位方向,确定第三对象部位的关键点位置,将第二对象部位的关键点位置和第三对象部位的关键点位置添加至第一候选对象姿态,得到图像帧中的对象对应的全局姿态。
其中,第一方向确定子单元2231,第一位置确定子单元2232,第一关键点增加子单元2233,第二方向确定子单元2234,第二位置确定子单元2235,第二关键点增加子单元2236的具体功能实现方式可以参见图7所对应实施例中的步骤S208,这里不再进行赘述。其中,当第一方向确定子单元2231,第一位置确定子单元2232,第一关键点增加子单元2233在执行相应的操作时,第二方向确定子单元2234,第二位置确定子单元2235,第二关键点增加子单元2236均暂停执行操作;当第二方向确定子单元2234,第二位置确定子单元2235,第二关键点增加子单元2236在执行相应的操作时,第一方向确定子单元2231,第一位置确定子单元2232,第一关键点增加子单元2233均暂停执行操作。
本申请实施例中,通过对图像帧中的对象分别进行对象姿态估计和特定部位姿态估计,可以得到针对对象的对象姿态检测结果,以及针对对象的第一对象部位的部位姿态检测结果,进而可以基于对象姿态检测结果、部位姿态检测结果以及标准姿态,对图像帧中的对象进行姿态估计,可以对图像帧中对象缺少的部位关键点进行补偿,并矫正不符合标准姿态的对象关键点,可以确保最终得到的对象的全局姿态的完整性和合理性,进而可以提高全局姿态的估计准确性。
请参见图15,图15是本申请实施例提供的一种计算机设备的结构示意图。如图15所示,该计算机设备1000可以为用户终端,例如,上述图1所对应实施例中的用户终端10a,还可以为服务器,例如,上述图1所对应实施例中的服务器10d,这里将不对其进行限制。为便于理解,本申请以计算机设备为用户终端为例,该计算机设备1000可以包括:处理器1001,网络接口1004和存储器1005,此外,该计算机设备1000还可以包括:用户接口1003,和至少一个通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。其中,用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。存储器1005可选的还可以是至少一个位于远离前述处理器1001的存储装置。如图15所示,作为一种计算机可读存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及设备控制应用程序。
其中,该计算机设备1000中的网络接口1004还可以提供网络通讯功能,且可选用户接口1003还可以包括显示屏(Display)、键盘(Keyboard)。在图15所示的计算机设 备1000中,网络接口1004可提供网络通讯功能;而用户接口1003主要用于为用户提供输入的接口;而处理器1001可以用于调用存储器1005中存储的设备控制应用程序,以实现:
获取图像帧中的对象对应的对象姿态检测结果以及所述图像帧中所述对象的第一对象部位对应的部位姿态检测结果;其中,所述对象姿态检测结果中缺失所述对象的至少一个对象部位,所述第一对象部位为所述对象的一个或多个部位;
根据部位姿态检测结果以及与对象相关联的标准姿态,对所述对象姿态检测结果中缺失的至少一个对象部位进行插值处理,得到对象对应的全局姿态,其中,所述全局姿态用于操控计算机色斑以实现与所述全局姿态对应的业务功能。
应当理解,本申请实施例中所描述的计算机设备1000可执行前文图3、图7中任一个实施例中对数据处理方法的描述,也可执行前文图13所对应实施例中对数据处理装置1的描述,还可执行前文图14所对应实施例中对数据处理装置2的描述,在此不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。
此外,这里需要指出的是:本申请实施例还提供了一种计算机可读存储介质,且计算机可读存储介质中存储有计算机程序,且计算机程序包括计算机指令,当处理器执行计算机指令时,能够执行前文图3、图7中任一个实施例中对数据处理方法的描述,因此,这里将不再进行赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。对于本申请所涉及的计算机可读存储介质实施例中未披露的技术细节,请参照本申请方法实施例的描述。作为示例,计算机指令可被部署在一个计算设备上执行,或者在位于一个地点的多个计算设备上执行,又或者,在分布在多个地点且通过通信网络互连的多个计算设备上执行,分布在多个地点且通过通信网络互连的多个计算设备可以组成区块链系统。
此外,需要说明的是:本申请实施例还提供了一种计算机程序产品或计算机程序,该计算机程序产品或者计算机程序可以包括计算机指令,该计算机指令可以存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器可以执行该计算机指令,使得该计算机设备执行前文图3、图7中任一个实施例中对数据处理方法的描述,因此,这里将不再进行赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。对于本申请所涉及的计算机程序产品或者计算机程序实施例中未披露的技术细节,请参照本申请方法实施例的描述。
需要说明的是,对于前述的各个方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制, 因为依据本申请,某一些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
本申请实施例方法中的步骤可以根据实际需要进行顺序调整、合并和删减。
本申请实施例装置中的模块可以根据实际需要进行合并、划分和删减。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,计算机程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,存储介质可为磁碟、光盘、只读存储器(Read-Only Memory,ROM)或随机存储器(Random Access Memory,RAM)等。
以上所揭露的仅为本申请较佳实施例而已,当然不能以此来限定本申请之权利范围,因此依本申请权利要求所作的等同变化,仍属本申请所涵盖的范围。

Claims (15)

  1. 一种数据处理方法,由计算机设备执行,包括:
    获取图像帧中的对象对应的对象姿态检测结果,以及所述图像帧中所述对象的第一对象部位对应的部位姿态检测结果;其中,所述对象姿态检测结果中缺失所述对象的至少一个对象部位,所述第一对象部位为所述对象的一个或多个部位;
    根据所述部位姿态检测结果以及与所述对象相关联的标准姿态,对所述对象姿态检测结果中缺失的至少一个对象部位进行插值处理,得到所述对象对应的全局姿态,其中,所述全局姿态用于操控计算机设备以实现与所述全局姿态对应的业务功能。
  2. 根据权利要求1所述的方法,其中,所述获取图像帧中的对象对应的对象姿态检测结果,以及图像帧中所述对象的第一对象部位对应的部位姿态检测结果,包括:
    将所述图像帧输入至对象检测模型,通过所述对象检测模型,获得所述对象姿态检测结果;
    将所述图像帧输入至部位检测模型,通过所述部位检测模型,获得所述部位姿态检测结果。
  3. 根据权利要求2所述的方法,其中,所述将所述图像帧输入至对象检测模型,通过所述对象检测模型,获得所述对象姿态检测结果,包括:
    将所述图像帧输入至对象检测模型,通过所述对象检测模型获取所述对象对应的对象姿态特征,识别所述对象姿态特征对应的第一分类结果;所述第一分类结果用于表征所述对象的关键点所对应的对象部位类别;
    根据所述第一分类结果和所述对象检测模型输出的图像帧的对象卷积特征,生成第一激活映射图;
    获取所述第一激活映射图对应的像素平均值,根据所述像素平均值确定所述对象中的关键点在所述图像帧中的定位结果;
    根据所述对象部位类别和所述定位结果,确定所述对象姿态检测结果。
  4. 根据权利要求3所述的方法,其中,所述通过所述对象检测模型获取所述对象对应的对象姿态特征,包括:
    在所述对象检测模型中获取所述图像帧中的所述对象对应的对象描述特征,根据所述对象检测模型中的分类器,输出所述对象描述特征对应的第二分类结果;
    获取所述对象检测模型中的卷积层所输出的针对所述图像帧的对象卷积特征,将所述第二分类结果和所述对象卷积特征进行乘积运算,得到所述图像帧对应的第二激活映射图;
    根据所述第二激活映射图对所述图像帧进行分块处理,得到M个对象部位区域图像,根据所述对象检测模型获取所述M个对象部位区域图像分别对应的部位描述特征;M为正整数;
    将所述对象描述特征和所述M个对象部位区域图像所对应的部位描述特征组合为所述对象姿态特征。
  5. 根据权利要求2所述的方法,其中,所述将所述图像帧输入至部位检测模型,通过所述部位检测模型,获得所述部位姿态检测结果,包括:
    将所述图像帧输入至部位检测模型,在所述部位检测模型中检测所述所述对象的第一对象部位;
    若在所述图像帧中检测到所述第一对象部位,则从所述图像帧中获取包含所述第一对象部位的区域图像,根据所述区域图像获取所述第一对象部位对应的部位关键点位置,基于所述部位关键点位置确定所述图像帧对应的部位姿态检测结果;
    若在所述图像帧中未检测到所述第一对象部位,则确定所述图像帧对应的部位姿态检测结果为空值。
  6. 根据权利要求5所述的方法,其中,所述若在所述图像帧中检测到所述第一对象部位,则从所述图像帧中获取包含所述第一对象部位的区域图像,根据所述区域图像获取所述第一对象部位对应的部位关键点位置,基于所述部位关键点位置确定所述图像帧对应的部位姿态检测结果,包括:
    若在所述图像帧中检测到所述第一对象部位,则对所述图像帧进行剪裁,得到包含所述第一对象部位的区域图像;
    获取所述区域图像对应的部位轮廓特征,根据所述部位轮廓特征预测所述第一对象部位对应的部位关键点位置;
    基于所述部位关键点位置,对所述第一对象部位的关键点进行连接,得到所述图像帧对应的部位姿态检测结果。
  7. 根据权利要求1所述的方法,其中,所述根据所述部位姿态检测结果,以及与所 述对象相关联的标准姿态,对所述对象姿态检测结果中缺失的至少一个对象部位进行插值处理,得到所述对象对应的全局姿态,包括:
    获取与所述对象相关联的标准姿态,确定所述标准姿态对应的第一关键点数量,以及所述对象姿态检测结果对应的第二关键点数量;
    若所述第一关键点数量大于所述第二关键点数量,根据所述标准姿态对所述对象姿态检测结果中缺失的至少一个对象部位进行插值处理,得到第一候选对象姿态;
    根据所述部位姿态检测结果,对所述第一候选对象姿态中与第一对象部位相关联的对象部位进行插值处理,得到所述对象对应的全局姿态。
  8. 根据权利要求7所述的方法,其中,所述根据所述部位姿态检测结果,对所述第一候选对象姿态中与第一对象部位相关联的对象部位进行插值处理,得到所述对象对应的全局姿态,包括:
    若所述对象姿态检测结果包含第二对象部位的姿态,且所述对象姿态检测结果不包含第三对象部位的姿态,则根据所述部位姿态检测结果中所包含的第一对象部位的关键点位置,确定所述第三对象部位对应的第一部位方向;所述第二对象部位和所述第三对象部位为所述对象的对称部位、且所述第二对象部位和第三对象部位与第一对象部位相关联;
    获取所述第二对象部位在所述第一候选对象姿态中的第一部位长度,根据所述第一部位长度和所述第一部位方向,确定所述第三对象部位的关键点位置;
    将所述第三对象部位的关键点位置添加至所述第一候选对象姿态,得到所述图像帧中的对象对应的全局姿态。
  9. 根据权利要求8所述的方法,其中,所述将所述第三对象部位的关键点位置添加至所述第一候选对象姿态,得到所述图像帧中的对象对应的全局姿态,包括:
    将所述第三对象部位的关键点位置添加至所述第一候选对象姿态,得到所述图像帧中的对象对应的第二候选对象姿态,获取所述标准姿态与所述第二候选对象姿态之间的姿态偏移量;
    若所述姿态偏移量大于偏移阈值,则基于所述标准姿态对所述第二候选对象姿态进行关键点矫正,得到所述图像帧中的对象对应的全局姿态。
  10. 根据权利要求7所述的方法,其中,所述图像帧为视频数据中的第i个图像帧, i为正整数;
    所述根据所述部位姿态检测结果对所述第一候选对象姿态中与第一对象部位相关联的对象部位进行插值处理,得到所述对象对应的全局姿态,包括:
    若所述对象姿态检测结果不包括第二对象部位和第三对象部位的姿态,则根据所述部位姿态检测结果中所包含的第一对象部位的关键点位置,确定所述第二对象部位对应的第二部位方向,以及所述第三对象部位对应的第三部位方向;所述第二对象部位和所述第三对象部位为所述对象的对称部位、且所述第二对象部位和第三对象部位与第一对象部位相关联;
    在第j个图像帧中,获取所述第二对象部位对应的第二部位长度,以及所述第三对象部位对应的第三部位长度,根据所述第二部位长度和所述第二部位方向,确定所述第二对象部位的关键点位置;其中,j为正整数且j小于i;
    根据所述第三部位长度和所述第三部位方向,确定所述第三对象部位的关键点位置,将所述第二对象部位的关键点位置和所述第三对象部位的关键点位置添加至所述第一候选对象姿态,得到所述图像帧中的对象对应的全局姿态。
  11. 根据权利要求1所述的方法,还包括:
    构建与所述对象相关联的虚拟对象,根据所述全局姿态控制所述虚拟对象的姿态。
  12. 一种数据处理装置,包括:
    姿态检测模块,用于获取图像帧中的对象对应的对象姿态检测结果、以及图像帧中所述对象的第一对象部位对应的部位姿态检测结果;所述第一对象部位为所述对象的一个或多个部位;
    姿态估计模块,用于根据所述部位姿态检测结果,以及与所述对象相关联的标准姿态,对所述对象姿态检测结果中缺失的对象部位进行插值处理,得到所述对象对应的全局姿态,其中,所述全局姿态用于操控计算机设备以实现与所述全局姿态对应的业务功能。
  13. 一种计算机设备,包括存储器和处理器;
    所述存储器与所述处理器相连,所述存储器用于存储计算机程序,所述处理器用于调用所述计算机程序,以使得所述计算机设备执行权利要求1-11任一项所述的方法。
  14. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算 机程序,所述计算机程序适于由处理器加载并执行,以使得具有所述处理器的计算机设备执行权利要求1-11任一项所述的方法。
  15. 一种计算机程序产品,包括计算机程序/指令,所述计算机程序/指令被处理器执行时实现权利要求1-11任一项所述的方法。
PCT/CN2023/073976 2022-03-31 2023-01-31 数据处理方法、装置、设备以及介质 Ceased WO2023185241A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2024556677A JP7792532B2 (ja) 2022-03-31 2023-01-31 データ処理方法、データ処理装置、コンピュータ機器、及びコンピュータプログラム
EP23777620.8A EP4411641A4 (en) 2022-03-31 2023-01-31 DATA PROCESSING METHOD AND DEVICE, DEVICE AND MEDIUM
US18/238,321 US20230401740A1 (en) 2022-03-31 2023-08-25 Data processing method and apparatus, and device and medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210332763.0 2022-03-31
CN202210332763.0A CN116934848B (zh) 2022-03-31 2022-03-31 数据处理方法、装置、设备以及介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/238,321 Continuation US20230401740A1 (en) 2022-03-31 2023-08-25 Data processing method and apparatus, and device and medium

Publications (1)

Publication Number Publication Date
WO2023185241A1 true WO2023185241A1 (zh) 2023-10-05

Family

ID=88199069

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/073976 Ceased WO2023185241A1 (zh) 2022-03-31 2023-01-31 数据处理方法、装置、设备以及介质

Country Status (5)

Country Link
US (1) US20230401740A1 (zh)
EP (1) EP4411641A4 (zh)
JP (1) JP7792532B2 (zh)
CN (1) CN116934848B (zh)
WO (1) WO2023185241A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12299852B2 (en) * 2021-09-09 2025-05-13 Stats Llc Body pose tracking of players from sports broadcast video feed
US12374037B2 (en) * 2023-03-07 2025-07-29 Red Pill Lab Limited Method and system for generating a three-dimensional global pose from an image
CN117975569B (zh) * 2024-02-29 2024-12-13 欧亚高科数字技术有限公司 一种虚拟仿真的麻醉师动作捕捉方法
TWI879635B (zh) * 2024-07-15 2025-04-01 所羅門股份有限公司 六維物件姿態追蹤方法及裝置、電腦可讀取的記錄媒體

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070268295A1 (en) * 2006-05-19 2007-11-22 Kabushiki Kaisha Toshiba Posture estimation apparatus and method of posture estimation
CN109359568A (zh) * 2018-09-30 2019-02-19 南京理工大学 一种基于图卷积网络的人体关键点检测方法
CN110139115A (zh) * 2019-04-30 2019-08-16 广州虎牙信息科技有限公司 基于关键点的虚拟形象姿态控制方法、装置及电子设备
CN111414797A (zh) * 2019-01-07 2020-07-14 一元精灵有限公司 用于基于来自移动终端的视频的姿态序列的系统和方法
WO2021097750A1 (zh) * 2019-11-21 2021-05-27 深圳市欢太科技有限公司 人体姿态的识别方法、装置、存储介质及电子设备
CN113449696A (zh) * 2021-08-27 2021-09-28 北京市商汤科技开发有限公司 一种姿态估计方法、装置、计算机设备以及存储介质

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019045967A (ja) 2017-08-30 2019-03-22 富士通株式会社 姿勢推定装置、方法、及びプログラム
CN109951628A (zh) * 2017-12-21 2019-06-28 广东欧珀移动通信有限公司 模型构建方法、拍照方法、装置、存储介质及终端
CN108256433B (zh) * 2017-12-22 2020-12-25 银河水滴科技(北京)有限公司 一种运动姿态评估方法及系统
US10885659B2 (en) * 2018-01-15 2021-01-05 Samsung Electronics Co., Ltd. Object pose estimating method and apparatus
CN110472462B (zh) * 2018-05-11 2024-08-20 北京三星通信技术研究有限公司 姿态估计方法、基于姿态估计的处理方法及电子设备
CN109558832B (zh) * 2018-11-27 2021-03-26 广州市百果园信息技术有限公司 一种人体姿态检测方法、装置、设备及存储介质
CN109977764B (zh) * 2019-02-12 2024-12-31 平安科技(深圳)有限公司 基于平面检测的活体识别方法、装置、终端及存储介质
CN110348359B (zh) * 2019-07-04 2022-01-04 北京航空航天大学 手部姿态追踪的方法、装置及系统
CN111626218B (zh) * 2020-05-28 2023-12-26 腾讯科技(深圳)有限公司 基于人工智能的图像生成方法、装置、设备及存储介质
CN112528831B (zh) * 2020-12-07 2023-11-24 深圳市优必选科技股份有限公司 多目标姿态估计方法、多目标姿态估计装置及终端设备
CN112906646A (zh) * 2021-03-23 2021-06-04 中国联合网络通信集团有限公司 人体姿态的检测方法及装置
CN113435293B (zh) * 2021-06-23 2022-04-05 同济大学 一种基于关节关系的人体姿态估计方法
CN113780176B (zh) * 2021-09-10 2023-08-25 平安科技(深圳)有限公司 局部遮挡对象识别方法、装置、设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070268295A1 (en) * 2006-05-19 2007-11-22 Kabushiki Kaisha Toshiba Posture estimation apparatus and method of posture estimation
CN109359568A (zh) * 2018-09-30 2019-02-19 南京理工大学 一种基于图卷积网络的人体关键点检测方法
CN111414797A (zh) * 2019-01-07 2020-07-14 一元精灵有限公司 用于基于来自移动终端的视频的姿态序列的系统和方法
CN110139115A (zh) * 2019-04-30 2019-08-16 广州虎牙信息科技有限公司 基于关键点的虚拟形象姿态控制方法、装置及电子设备
WO2021097750A1 (zh) * 2019-11-21 2021-05-27 深圳市欢太科技有限公司 人体姿态的识别方法、装置、存储介质及电子设备
CN113449696A (zh) * 2021-08-27 2021-09-28 北京市商汤科技开发有限公司 一种姿态估计方法、装置、计算机设备以及存储介质

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
"Master's Thesis Xi'an University of Electronic Science and Technology", 15 March 2022, XI'AN UNIVERSITY OF ELECTRONIC SCIENCE AND TECHNOLOGY, CN, article SHI, XUEYONG: "Research on Human Fall Behavior Analysis Technologies Based on Machine Learning", pages: 1 - 72, XP009549545, DOI: 10.27251/d.cnki.gnjdc.2021.000920 *
IYER SOWMYA JAYARAM; SARANYA P.; SIVARAM M.: "Human Pose-Estimation and low-cost Interpolation for Text to Indian Sign Language", 2021 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE), IEEE, 28 January 2021 (2021-01-28), pages 130 - 135, XP033888390, DOI: 10.1109/Confluence51648.2021.9377047 *
MENGHE LI, XU HONG-JI, SHI LEI-XIN, ZHAO WEN-JIE, LI JUAN: "Multi-person Activity Recognition Based on Bone Keypoints Detection", COMPUTER SCIENCE, vol. 48, no. 4, 30 April 2021 (2021-04-30), pages 138 - 143, XP093097845, DOI: 10.11896/jsjkx.200300042 *
See also references of EP4411641A4
VYAS KATHAN; MA RUI; REZAEI BEHNAZ; LIU SHUANGJUN; NEUBAUER MICHAEL; PLOETZ THOMAS; OBERLEITNER RONALD; OSTADABBAS SARAH: "Recognition Of Atypical Behavior In Autism Diagnosis From Video Using Pose Estimation Over Time", 2019 IEEE 29TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), IEEE, 13 October 2019 (2019-10-13), pages 1 - 6, XP033645886, DOI: 10.1109/MLSP.2019.8918863 *

Also Published As

Publication number Publication date
CN116934848A (zh) 2023-10-24
CN116934848B (zh) 2024-11-19
JP7792532B2 (ja) 2025-12-25
EP4411641A1 (en) 2024-08-07
JP2025510833A (ja) 2025-04-15
EP4411641A4 (en) 2025-03-12
US20230401740A1 (en) 2023-12-14

Similar Documents

Publication Publication Date Title
US12469239B2 (en) Data processing method and apparatus, electronic device, and computer-readable storage medium
CN110139115B (zh) 基于关键点的虚拟形象姿态控制方法、装置及电子设备
CN113420719B (zh) 生成动作捕捉数据的方法、装置、电子设备以及存储介质
EP3876140B1 (en) Method and apparatus for recognizing postures of multiple persons, electronic device, and storage medium
JP7792532B2 (ja) データ処理方法、データ処理装置、コンピュータ機器、及びコンピュータプログラム
WO2023109753A1 (zh) 虚拟角色的动画生成方法及装置、存储介质、终端
CN112927259A (zh) 基于多相机的裸手追踪显示方法、装置及系统
KR20220156873A (ko) 다수의 포즈 추정 엔진을 사용한 손의 마커리스 모션 캡쳐
CN102171726A (zh) 信息处理装置、信息处理方法、程序以及信息存储介质
US20260073554A1 (en) Posture data completion method and apparatus for three-dimensional object, device, storage medium, and product
CN110147737B (zh) 用于生成视频的方法、装置、设备和存储介质
CN109035415B (zh) 虚拟模型的处理方法、装置、设备和计算机可读存储介质
CN114401446B (zh) 人体姿态迁移方法、装置、系统、电子设备以及存储介质
CN117218246A (zh) 图像生成模型的训练方法、装置、电子设备及存储介质
US20240135581A1 (en) Three dimensional hand pose estimator
JP2023527627A (ja) 逆運動学に基づいた関節の回転の推測
CN116704615B (zh) 信息处理方法及装置、计算机设备和计算机可读存储介质
CN113342157B (zh) 眼球追踪处理方法及相关装置
CN112714337A (zh) 视频处理方法、装置、电子设备和存储介质
CN115115963A (zh) 三维虚拟动作的生成方法、装置、电子设备和车辆
CN116403285B (zh) 动作识别方法、装置、电子设备以及存储介质
CN112101102A (zh) 一种基于人工智能获取rgb视频中3d肢体动作的方法
CN115862054B (zh) 图像数据处理方法、装置、设备以及介质
HK40098476B (zh) 数据处理方法、装置、设备以及介质
HK40098476A (zh) 数据处理方法、装置、设备以及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23777620

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2023777620

Country of ref document: EP

Effective date: 20240429

WWE Wipo information: entry into national phase

Ref document number: 2024556677

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE