WO2021197143A1 - 动作迁移方法、装置、设备及存储介质 - Google Patents

动作迁移方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2021197143A1
WO2021197143A1 PCT/CN2021/082407 CN2021082407W WO2021197143A1 WO 2021197143 A1 WO2021197143 A1 WO 2021197143A1 CN 2021082407 W CN2021082407 W CN 2021082407W WO 2021197143 A1 WO2021197143 A1 WO 2021197143A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
sequence
key point
skeleton key
dimensional skeleton
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2021/082407
Other languages
English (en)
French (fr)
Inventor
吴文岩
朱文韬
杨卓谦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sensetime Technology Development Co Ltd
Original Assignee
Beijing Sensetime Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sensetime Technology Development Co Ltd filed Critical Beijing Sensetime Technology Development Co Ltd
Priority to EP21781900.2A priority Critical patent/EP3979204A4/en
Priority to JP2021573955A priority patent/JP2022536381A/ja
Priority to KR1020217038862A priority patent/KR20220002551A/ko
Publication of WO2021197143A1 publication Critical patent/WO2021197143A1/zh
Priority to US17/555,965 priority patent/US20220114777A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/579Depth or shape recovery from multiple images from motion
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0895Weakly supervised learning, e.g. semi-supervised or self-supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/20Three-dimensional [3D] animation
    • G06T13/40Three-dimensional [3D] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/06Topological mapping of higher dimensional structures onto lower dimensional surfaces
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/08Projecting images onto non-planar surfaces, e.g. geodetic screens
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • G06T7/596Depth or shape recovery from multiple images from stereo images from three or more stereo images
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • the present disclosure relates to the field of computer vision technology, and in particular to an action migration method, device, equipment, and storage medium.
  • Motion transfer is to transfer the motion of the initial object in the initial motion video to the target object to form the target motion video. Due to the large structure and viewing angle difference between the initial motion video and the target motion video, it is difficult to realize the migration of actions at the pixel level. Especially when the initial object makes extreme actions, or the structure difference between the initial object and the target object is relatively large, the accuracy of the actions transferred to the target object is low.
  • the present disclosure at least provides an action migration method and device.
  • the present disclosure provides an action migration method, including: acquiring a first initial video including an action sequence of an initial object; identifying the two-dimensional skeleton of the initial object in a multi-frame image of the first initial video Key point sequence; converting the two-dimensional skeleton key point sequence into a three-dimensional skeleton key point sequence of the target object; based on the three-dimensional skeleton key point sequence, a target video including the action sequence of the target object is generated.
  • the movement migration is realized and avoiding the direct in the pixel Realizing movement migration at a level can overcome the large difference in structure and perspective between the initial video and the target video, especially when the initial object makes extreme movements or the structure of the initial object and the target object is quite different, which improves the movement migration. Accuracy.
  • this aspect uses the two-dimensional skeleton key point sequence to redirect the three-dimensional skeleton key point sequence, which avoids the use of large error three-dimensional key point estimation and redirection in the motion migration, which is beneficial to improve the accuracy of the motion migration.
  • the converting the two-dimensional skeleton key point sequence into the three-dimensional skeleton key point sequence of the target object includes: determining the action of the initial object based on the two-dimensional skeleton key point sequence Transfer component sequence; based on the movement transfer component sequence of the initial object, determine the three-dimensional skeleton key point sequence of the target object.
  • the action migration component sequence after orthogonal decomposition of the two-dimensional skeleton key point sequence is used to redirect the three-dimensional skeleton key point sequence, which avoids the use of large error three-dimensional key point estimation and redirection in the action migration. Conducive to improving the accuracy of movement migration.
  • the above-mentioned motion migration method before determining the three-dimensional skeleton key point sequence of the target object, further includes: acquiring a second initial video including the target object; identifying that the target object is in the second The two-dimensional skeleton key point sequence in the multi-frame images of the initial video; the determining the three-dimensional skeleton key point sequence of the target object based on the motion migration component sequence of the initial object includes: the two-dimensional skeleton key point sequence based on the target object Skeleton key point sequence to determine the movement transition component sequence of the target object; based on the movement transition component sequence of the initial object and the movement transition component sequence of the target object, determine the target movement transition component sequence; based on the target movement transition The component sequence determines the three-dimensional skeleton key point sequence of the target object.
  • the motion migration component sequence after orthogonal decomposition of the two-dimensional skeleton key point sequence of the initial object is fused with the motion migration component sequence after orthogonal decomposition of the two-dimensional initial skeleton key point sequence of the target object. Determining the key point sequence of the three-dimensional skeleton can overcome the shortcomings of low accuracy of movement migration when the initial object makes extreme movements or the structure of the initial object and the target object are quite different.
  • the motion transition component sequence of the initial object includes a motion component sequence, an object structure component sequence, and a shooting angle component sequence; the initial object is determined based on the two-dimensional skeleton key point sequence
  • the action migration component sequence of the first initial video includes: based on the two-dimensional skeleton key points corresponding to each frame of the multi-frame image of the first initial video, respectively determining the motion component information and the object structure component information corresponding to each frame of the image And shooting angle component information; determine the motion component sequence based on the motion component information corresponding to each frame of the multi-frame image of the first initial video; determine the motion component sequence based on each frame of the multi-frame image of the first initial video
  • the object structure component information corresponding to the image determines the object structure component sequence; and the shooting angle component sequence is determined based on the shooting angle component information corresponding to each frame of the multi-frame image of the first initial video.
  • the motion migration component sequence may include multiple orthogonal component sequences.
  • the use of multiple orthogonal component sequences to determine the three-dimensional skeleton key point sequence can further overcome the extreme actions of the initial object or the initial object and the target object. When the structure difference is large, the movement transfer accuracy is low.
  • the generating a target video including an action sequence of a target object based on the three-dimensional skeleton key point sequence includes: generating a two-dimensional image of the target object based on the three-dimensional skeleton key point sequence A target skeleton key point sequence; based on the two-dimensional target skeleton key point sequence, a target video including an action sequence of the target object is generated.
  • the reconstructed three-dimensional skeleton key point sequence is reprojected to obtain a two-dimensional target skeleton key point sequence, which avoids the use of large error three-dimensional key point estimation and redirection in the action migration, and is beneficial to improve the accuracy of the action migration Spend.
  • the converting the two-dimensional skeleton key point sequence into a three-dimensional skeleton key point sequence of the target object includes: using a motion migration neural network to convert the two-dimensional skeleton key point sequence into a target The key point sequence of the three-dimensional skeleton of the object.
  • the trained motion migration neural network is used to determine the three-dimensional skeleton key point sequence of the target object, which can improve the efficiency and accuracy of key point redirection.
  • the above-mentioned motion transfer method further includes the step of training the motion transfer neural network: obtaining a sample motion video including the motion sequence of the sample object; identifying the number of the sample object in the sample motion video The first sample two-dimensional skeleton key point sequence in the frame sample image; the body scaling processing is performed on the first sample two-dimensional skeleton key point sequence to obtain the second sample two-dimensional skeleton key point sequence; based on the first sample This two-dimensional skeleton key point sequence and the second sample two-dimensional skeleton key point sequence determine a loss function; based on the loss function, adjust the network parameters of the action migration neural network.
  • the first sample two-dimensional skeleton key point sequence of the sample object and the second sample two-dimensional skeleton key point sequence of the sample object after limb scaling are used to construct a loss function to train the action migration neural network, which can improve When the structure of the initial object and the target object are quite different, the accuracy of the movement transition. And when training the above action transfer neural network, it did not use the paired action-role data in the real world to realize unsupervised construction of the loss function and training of the action transfer neural network, which is conducive to improving the training of the action transfer neural network. The accuracy of the movement migration.
  • the determining a loss function based on the first sample two-dimensional skeleton key point sequence and the second sample two-dimensional skeleton key point sequence includes: based on the first sample Two-dimensional skeleton key point sequence, determining the first sample action migration component sequence; based on the second sample two-dimensional skeleton key point sequence, determining the second sample action migration component sequence; based on the first sample The action migration component sequence determines the estimated three-dimensional skeleton key point sequence; the loss function is determined based on the first sample action migration component sequence, the second sample action migration component sequence, and the estimated three-dimensional skeleton key point sequence.
  • the first sample action migration component sequence after orthogonal decomposition of the first sample two-dimensional skeleton key point sequence, and the second sample action migration component sequence after orthogonal decomposition of the second sample two-dimensional skeleton key point sequence are used.
  • constructing a loss function based on the estimated three-dimensional skeleton key point sequence reconstructed from the first sample motion migration component sequence which can improve the accuracy of the motion migration when the structure of the initial object and the target object are quite different.
  • the loss function includes a motion-invariant loss function
  • the first sample motion transfer component sequence includes first sample motion component information and first sample structure corresponding to each frame of sample image Component information and first sample angle component information
  • the second sample motion transfer component sequence includes second sample motion component information, second sample structure component information, and second sample angle component information corresponding to each frame of sample image
  • the determining the loss function includes: determining the first sample based on the second sample motion component information, the first sample structure component information, and the first sample angle component information corresponding to the sample images of each frame The first estimated skeleton key point corresponding to the first sample two-dimensional skeleton key point in this two-dimensional skeleton key point sequence; based on the first sample motion component information corresponding to each frame of sample image, The second sample structure component information and the second sample angle component information determine the second estimated skeleton key point corresponding to the corresponding second sample two-dimensional skeleton key point in the second sample two-dimensional skeleton key point sequence; based on The first estimated skeleton key point, the second estimated skeleton key point, the first sample motion component information, the second sample motion component information, and the estimated three-dimensional skeleton key point sequence, determine that the motion remains unchanged Loss function.
  • the information obtained after orthogonal decomposition of the first sample two-dimensional skeleton key point sequence and the second sample two-dimensional skeleton key point sequence is used to perform skeleton restoration on the sample object to obtain the first estimated skeleton key point, and the limb
  • the scaled sample object undergoes skeleton restoration to obtain the second estimated skeleton key point; then, the restored first estimated skeleton key point, the second estimated skeleton key point and the reconstructed estimated three-dimensional skeleton key of the sample object are combined
  • the point sequence can construct a motion invariant loss function.
  • the motion invariant loss function is constructed by constructing the motion invariant loss function, and during training , To minimize the motion invariant loss function, which can improve the accuracy of the constructed motion migration neural network during motion migration.
  • the loss function further includes a structure-invariant loss function; the determining the loss function further includes: selecting the first moment from the sequence of two-dimensional skeleton key points of the first sample The first sample two-dimensional skeleton key point in the corresponding sample image and the first sample two-dimensional skeleton key point in the sample image corresponding to the second moment; from the second sample two-dimensional skeleton key point sequence, filter The second sample two-dimensional skeleton key point in the sample image corresponding to the second moment and the second sample two-dimensional skeleton key point in the sample image corresponding to the first moment; based on the sample image corresponding to the first moment The first sample two-dimensional skeleton key point in the first sample two-dimensional skeleton key point in the sample image corresponding to the second time, the second sample two-dimensional skeleton key point in the sample image corresponding to the second time The key point, the second sample two-dimensional skeleton key point in the sample image corresponding to the first moment, and the predicted three-dimensional skeleton key point sequence
  • the structure-invariant loss function can be constructed using the first sample two-dimensional skeleton key points and the second sample two-dimensional skeleton key points at different times, combined with the reconstructed estimated three-dimensional skeleton key point sequence of the sample object.
  • the structure of the sample object is invariant over time. Therefore, by constructing a structure-invariant loss function, and during training, minimizing the motion-invariant loss function and the structure-invariant loss function can improve the construction of the action migration neural network. The accuracy of the movement migration.
  • the loss function further includes a viewing angle invariant loss function; the determining the loss function further includes: based on the first sample two-dimensional skeleton in the sample image corresponding to the first moment The key point, the first sample two-dimensional skeleton key point in the sample image corresponding to the second time, the first sample angle component information of the sample image corresponding to the first time and the second time, the first sample angle component information of the sample image corresponding to the first time and the second time The second sample angle component information of the sample image corresponding to the time and the second time and the predicted three-dimensional skeleton key point sequence determine the viewing angle invariant loss function.
  • the two-dimensional skeleton key points of the first sample at different times and the reconstructed estimated three-dimensional skeleton key point sequence of the sample object can be used to construct the invariant loss function of the view angle.
  • the movement and structure changes of the object are invariant. Therefore, by constructing the viewing angle invariant loss function, and during training, the viewing angle invariant loss function, the motion invariant loss function and the structure invariant loss function can be minimized, which can improve the construction The accuracy of the motion transfer neural network during motion transfer.
  • the loss function further includes a reconstruction and restoration loss function; the determining the loss function further includes: based on the first sample two-dimensional skeleton key point sequence and the estimated three-dimensional skeleton The key point sequence determines the reconstruction and restoration loss function.
  • the reconstruction and restoration loss function can be constructed by using the first sample two-dimensional skeleton key point sequence and the reconstructed estimated three-dimensional skeleton key point sequence of the sample object, because the sample object should be invariant when the sample object is restored. Therefore, by constructing the reconstruction and restoration loss function, and during training, the reconstruction and restoration loss function, the viewing angle invariant loss function, the motion invariant loss function and the structure invariant loss function can be minimized, which can improve the construction of the action migration neural network in the action Accuracy during migration.
  • an action migration device including: a video acquisition module for acquiring a first initial video including an action sequence of an initial object; a key point extraction module for identifying that the initial object is in the A two-dimensional skeleton key point sequence in the multi-frame image of the first initial video; a key point conversion module for converting the two-dimensional skeleton key point sequence into a three-dimensional skeleton key point sequence of the target object; an image rendering module for Based on the three-dimensional skeleton key point sequence, a target video including the action sequence of the target object is generated.
  • the key point conversion module converts the two-dimensional skeleton key point sequence into the three-dimensional skeleton key point sequence of the target object, it is used to: based on the two-dimensional skeleton key point sequence, Determine the motion migration component sequence of the initial object; determine the three-dimensional skeleton key point sequence of the target object based on the motion migration component sequence of the initial object.
  • the video acquisition module is also used to acquire a second initial video that includes a target object; the key point extraction module is also used to identify the number of the target object in the second initial video.
  • the two-dimensional skeleton key point sequence in the frame image; the key point conversion module is used to: based on the target object when determining the three-dimensional skeleton key point sequence of the target object based on the motion migration component sequence of the initial object.
  • the sequence of two-dimensional skeleton key points of the two-dimensional skeleton to determine the sequence of the movement transition component of the target object; based on the sequence of the movement transition component of the initial object and the sequence of the movement transition component of the target object, the sequence of the target movement transition component is determined; based on the The target motion migration component sequence determines the three-dimensional skeleton key point sequence of the target object.
  • the motion migration component sequence of the initial object includes a motion component sequence, an object structure component sequence, and a shooting angle component sequence;
  • the key point conversion module is based on the two-dimensional skeleton key point sequence,
  • the motion migration component sequence of the initial object it is used to: determine the motion component information of the initial object, respectively, based on the two-dimensional skeleton key points corresponding to each frame of the multi-frame image of the first initial video, Object structure component information and shooting angle component information; determine the motion component sequence based on the motion component information corresponding to each frame of the multi-frame image of the first initial video; determine the motion component sequence based on the multi-frame image of the first initial video
  • the object structure component information corresponding to each frame of the image in the first video is determined to determine the object structure component sequence;
  • the shooting angle component sequence is determined based on the shooting angle component information corresponding to each frame of the multi-frame image of the first initial video .
  • the present disclosure provides an electronic device including a processor, a memory, and a bus.
  • the memory stores machine-readable instructions executable by the processor.
  • the processor is connected to the The memories communicate through a bus, and when the machine-readable instructions are executed by the processor, the steps of the above-mentioned action migration method are executed.
  • the present disclosure also provides a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and the computer program executes the steps of the above-mentioned action migration method when the computer program is run by a processor.
  • the foregoing apparatus, electronic equipment, and computer-readable storage medium of the present disclosure at least contain technical features that are substantially the same as or similar to the technical features of any aspect of the foregoing method or any embodiment of any aspect of the present disclosure. Therefore, regarding the foregoing apparatus, for the effect description of the electronic device, and the computer-readable storage medium, please refer to the effect description of the above method content, which will not be repeated here.
  • FIG. 1 shows a flowchart of an action migration method provided by an embodiment of the present disclosure
  • Figure 2 shows a flowchart of another action migration method provided by an embodiment of the present disclosure
  • FIG. 3 shows a flowchart of a method for training an action migration neural network provided by an embodiment of the present disclosure
  • FIG. 4 shows a flow chart of restoring key points of the skeleton in the training process of another action migration neural network provided by an embodiment of the present disclosure
  • FIG. 5 shows a schematic structural diagram of an action migration device provided by an embodiment of the present disclosure
  • Fig. 6 shows a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • the present disclosure provides an action migration method and device, through the extraction of a two-dimensional skeleton key point sequence, the redirection of the two-dimensional skeleton key point sequence to the three-dimensional skeleton key point sequence, and the action rendering of a target object based on the three-dimensional skeleton key point sequence, Realizes movement migration, avoiding direct movement migration at the pixel level, which can alleviate the problem of large structure and perspective differences between the initial video and the target video, especially when the initial object makes extreme actions or the structure difference between the initial object and the target object When it is larger, the accuracy of movement migration is improved.
  • the present disclosure uses a two-dimensional skeleton key point sequence to redirect the three-dimensional skeleton key point sequence, which avoids the use of three-dimensional key point estimation and redirection with large errors in motion migration, which is beneficial to improve the accuracy of motion migration.
  • the embodiments of the present disclosure provide an action migration method, which is applied to a terminal device or server that performs action migration. Specifically, as shown in FIG. 1, the action migration method provided by the embodiment of the present disclosure includes the following steps:
  • S110 Acquire a first initial video including the action sequence of the initial object.
  • the first initial video includes multiple frames of images, and the initial object in each frame of image may present different postures, and these postures are combined into an action sequence of the initial object.
  • S120 Identify the two-dimensional skeleton key point sequence of the initial object in the multi-frame images of the first initial video.
  • the two-dimensional skeleton key points of the initial object can be extracted from each frame of the first initial video, and the two-dimensional skeleton key points corresponding to the multiple frames of images form the aforementioned two-dimensional skeleton key point sequence.
  • the aforementioned two-dimensional skeleton key points may include key points corresponding to each joint of the initial object. The key points corresponding to each joint are combined and connected, and the skeleton of the initial object can be obtained.
  • a two-dimensional pose estimation neural network may be used to extract the two-dimensional skeleton key points of the initial object in each frame of image.
  • the aforementioned initial object may be a real person, a virtual person, an animal, etc., which is not limited in the present disclosure.
  • the sequence of motion migration components of the initial object may be determined first based on the sequence of two-dimensional skeleton key points; and then the three-dimensional sequence of the target object may be determined based on the sequence of motion migration components of the initial object. Skeleton key point sequence.
  • the motion transition component sequence of the initial object includes at least one of a motion component sequence, an object structure component sequence, and a shooting angle component sequence.
  • the motion component sequence represents the motion of the initial object
  • the object structure component sequence represents the body shape of the initial object
  • the shooting angle component sequence represents the angle of the camera.
  • the following sub-steps may be used to form the aforementioned motion component sequence, object structure component sequence, and shooting angle component sequence:
  • Sub-step 1 Based on the two-dimensional skeleton key points corresponding to each frame of the multi-frame image of the first initial video, respectively determine the motion component information, object structure component information and shooting angle component information corresponding to each frame of image ;
  • Sub-step 2 Determine the motion component sequence based on the motion component information corresponding to each frame of the multi-frame image of the first initial video;
  • Sub-step 3 Determine the object structure component sequence based on the object structure component information corresponding to each frame of the multi-frame image of the first initial video;
  • Sub-step 4 Determine the shooting angle component sequence based on the shooting angle component information corresponding to each frame of the multi-frame images of the first initial video.
  • the above steps are to encode the two-dimensional skeleton key points corresponding to each frame of image into three semantically orthogonal vectors through the neural network, and obtain the motion component information, object structure component information and shooting angle component information corresponding to each frame of image respectively.
  • the motion component information corresponding to the multiple frames of images are combined to form a motion component sequence
  • the object structure component information corresponding to the multiple frames of images are combined to form an object structure component sequence
  • the shooting angle component information corresponding to the multiple frames of images are combined to form a shooting angle component sequence.
  • each component information is invariant to the other two component information.
  • the action migration component sequence after orthogonal decomposition of the two-dimensional skeleton key point sequence is used to redirect the three-dimensional skeleton key point sequence, which avoids the use of large error three-dimensional key point estimation and redirection in the action migration, which is beneficial to Improve the accuracy of motion transfer, and can further reduce the defect of low accuracy of motion transfer when the initial object makes extreme actions or the structure of the initial object and the target object are quite different.
  • S140 Based on the three-dimensional skeleton key point sequence, a target video including the action sequence of the target object is generated.
  • the three-dimensional skeleton key points corresponding to each frame of the image in the three-dimensional skeleton key point sequence can be projected back to the two-dimensional space to obtain the two-dimensional target skeleton key points of the target object, which corresponds to multiple frames of images
  • the key points of the two-dimensional target skeleton form a sequence of key points of the two-dimensional target skeleton.
  • a target video including the action sequence of the target object is generated.
  • the action sequence of the target object corresponds to the action sequence of the initial object.
  • each group of two-dimensional target skeleton key points can be used to perform action rendering to obtain the corresponding image of each frame.
  • the posture of the target object, and the postures in each frame of image can be combined to get the action sequence of the target object.
  • a video rendering engine may be used to generate a target video including the action sequence of the target object based on the key points of the two-dimensional target skeleton corresponding to each frame of image.
  • reprojecting the reconstructed three-dimensional skeleton key point sequence to obtain a two-dimensional target skeleton key point sequence avoids the use of large error three-dimensional key point estimation and reorientation in motion migration, which is beneficial to improve the accuracy of motion migration.
  • a trained motion migration neural network may be used to orthogonally decompose the two-dimensional skeleton key point sequence, and the motion migration component sequence obtained by the decomposition may be used to determine the three-dimensional skeleton key point sequence of the target object.
  • the aforementioned motion migration neural network may include three encoders and one decoder, where each encoder is used to extract component information for each two-dimensional skeleton key point in the two-dimensional skeleton key point sequence to obtain the aforementioned motion component. Information, object structure component information, and shooting angle component information. After obtaining the above component information, a decoder is used for decoding processing to reconstruct the estimated three-dimensional skeleton key points of the target object, and finally the estimated three-dimensional skeleton key points are reprojected back to the two-dimensional space to obtain the above-mentioned three-dimensional skeleton key point sequence A key point of a three-dimensional skeleton.
  • the object structure component information and shooting angle component information obtained by the encoder can be directly decoded, or the object structure component information and shooting angle component information after averaging pooling can be used. to make sure.
  • the two-dimensional skeleton key points of the respective objects of the continuous multiple frames including the current frame of image are orthogonally decomposed to obtain the object structure component information and the shooting angle component information corresponding to each frame of image.
  • the key points of the three-dimensional skeleton corresponding to the current frame image are determined.
  • the above embodiment avoids the direct realization of motion migration at the pixel level, and reduces the problem of large structural and viewing angle differences between the first initial video and the target video, especially when the initial object makes extreme actions or the structure difference between the initial object and the target object When it is larger, the accuracy of movement migration is improved.
  • the above-mentioned embodiment orthogonally decomposes the extracted two-dimensional skeleton key points into motion component information, object structure component information, and shooting angle component information, which further reduces the extreme action of the initial object or the large difference in structure between the initial object and the target object. The defect of low accuracy of movement migration.
  • the embodiment of the present disclosure also obtains the three-dimensional skeleton key point sequence of the target object before determining The second initial video including the target object, and the two-dimensional skeleton key point sequence of the target object in the multi-frame images of the second initial video is identified.
  • the three-dimensional skeleton key point sequence of the target object when determining the three-dimensional skeleton key point sequence of the target object, firstly determine the action migration component sequence of the target object based on the two-dimensional skeleton key point sequence of the target object; then, based on the actions of the initial object The migration component sequence and the motion migration component sequence of the target object are determined to determine the target motion migration component sequence; finally, the three-dimensional skeleton key point sequence of the target object is determined based on the target motion migration component sequence.
  • the above method of determining the sequence of motion migration components of the target object is the same as the method of determining the sequence of motion migration components of the initial object.
  • the same method is to first extract the two-dimensional skeleton key points of the target object from each frame of the second initial video.
  • the two-dimensional skeleton key points in each frame of image are orthogonally decomposed, and the motion component information, object structure component information, and shooting angle component information of the target object are determined.
  • use the motion component information corresponding to the multi-frame images to form a motion component sequence use the object structure component information corresponding to the multi-frame image to form the object structure component sequence, and use the shooting angle component information corresponding to the multi-frame image to form the shooting angle component sequence.
  • the fused target action migration component sequence is used to reconstruct the three-dimensional skeleton key point sequence of the target object, and then the reconstructed three-dimensional skeleton key point sequence is reprojected to obtain the two-dimensional target skeleton key point sequence of the target object.
  • the use of three-dimensional key point estimation and reorientation with large errors in motion migration is beneficial to improve the accuracy of motion migration.
  • the action migration method of this embodiment includes the following steps:
  • Step one skeleton extraction operation. Extract the two-dimensional skeleton key points of the initial object from each frame of the first initial video to obtain the two-dimensional skeleton key point sequence of the initial object; extract the two-dimensional skeleton key points of the target object from each frame of the second initial video , Get the two-dimensional skeleton key point sequence of the target object.
  • Step two action migration processing. Encode each two-dimensional skeleton key point in the two-dimensional skeleton key point sequence of the initial object and each two-dimensional skeleton key point in the two-dimensional skeleton key point sequence of the target object respectively, that is, perform orthogonal decomposition, respectively Obtain the motion component information, object structure component information, and shooting angle component information corresponding to each two-dimensional skeleton key point of the initial object or each frame of image, and each two-dimensional skeleton key point of the target object or each frame of image corresponding Motion component information, object structure component information, and shooting angle component information.
  • the motion component information corresponding to the multi-frame images of the initial object constitutes the motion component sequence of the initial object
  • the object structure component information corresponding to the multi-frame images of the initial object constitutes the object structure component sequence of the initial object
  • the multi-frame image of the initial object corresponds to the shooting
  • the angle component information constitutes the shooting angle component sequence of the initial object.
  • the motion component sequence of the initial object, the object structure component sequence, and the shooting angle component sequence form the motion transfer component sequence of the initial object.
  • the motion component information corresponding to the multi-frame images of the target object constitutes the motion component sequence of the target object
  • the object structure component information corresponding to the multi-frame image of the target object constitutes the object structure component sequence of the target object
  • the multi-frame image of the target object corresponds to
  • the shooting angle component information of the target object constitutes the shooting angle component sequence of the target object.
  • the motion component sequence of the target object, the object structure component sequence and the shooting angle component sequence form the motion transfer component sequence of the target object.
  • a target motion migration component sequence is determined based on the motion migration component sequence of the initial object and the motion migration component sequence of the target object; the three-dimensional skeleton key point sequence of the target object is determined based on the target motion migration component sequence.
  • the motion component information, object structure component information, and shooting angle component information corresponding to each frame image of the initial object may be combined with the motion component information, object structure component information, and object structure component information corresponding to each frame image of the target object.
  • the shooting angle component information is recombined to obtain the recombined target motion component information, target structure component information, and target angle component information.
  • the target motion component information corresponding to the above-mentioned multi-frame images can constitute a target motion component sequence
  • the target structure component information corresponding to the multi-frame image can constitute the target object structure component sequence
  • the target angle component information corresponding to the multi-frame image can constitute the target shooting angle component sequence.
  • the target motion component sequence, the target object structure component sequence, and the target shooting angle component sequence form the target motion migration component sequence.
  • the target motion component information, the target structure component information, and the target angle component information are decoded to obtain the three-dimensional skeleton key points of the target object corresponding to one frame of image at three preset angles.
  • the three-dimensional skeleton key points of the multiple frames of images form the above-mentioned three-dimensional skeleton key point sequence.
  • Step three skeleton to video rendering operation. Based on the key points of the two-dimensional target skeleton of the target object at each preset angle in each frame of image, determine the target action of the target object at each preset angle, and generate the target object at each preset angle based on the target action The target video.
  • the above-mentioned embodiment can significantly improve the accuracy of the movement transition, and can realize the movement transition at any angle.
  • the target object and the initial object have a large difference in structure, and the initial object is an extreme action, accurate movement migration can still be carried out, and a good visual effect has been achieved.
  • the present disclosure also provides a method for training a motion migration neural network.
  • the method can be applied to the above-mentioned terminal device or server that performs motion migration processing, or it can be applied to a terminal device that performs neural network training alone. Or on the server. Specifically, as shown in FIG. 3, the following steps may be included:
  • S320 Identify the first sample two-dimensional skeleton key point sequence of the sample object in the multi-frame sample image of the sample motion video.
  • the first sample two-dimensional skeleton key points of the sample object are extracted from each frame of the sample motion video, and the first sample two-dimensional skeleton key points of the multi-frame sample images form the first sample two-dimensional skeleton key point sequence .
  • the above-mentioned key points of the first sample two-dimensional skeleton may include key points corresponding to each joint of the sample object.
  • the key points corresponding to each joint are combined and connected to obtain the skeleton of the sample object.
  • the two-dimensional pose estimation neural network can be used to extract the key points of the first sample two-dimensional skeleton of the sample object.
  • the aforementioned sample objects may be real people, virtual people, animals, etc., which are not limited in the present disclosure.
  • S330 Perform limb scaling processing on the two-dimensional skeleton key point sequence of the first sample to obtain the two-dimensional skeleton key point sequence of the second sample.
  • each first sample two-dimensional skeleton key point in the first sample two-dimensional skeleton key point sequence is scaled to obtain a second sample two-dimensional skeleton key point sequence.
  • the two-dimensional skeleton key points x of the first sample are scaled to obtain the two-dimensional skeleton key points x'of the second sample.
  • S340 Determine a loss function based on the first sample two-dimensional skeleton key point sequence and the second sample two-dimensional skeleton key point sequence. Based on the loss function, adjust the network parameters of the action migration neural network.
  • each first sample two-dimensional skeleton key point in the first sample two-dimensional skeleton key point sequence and each second sample in the second sample two-dimensional skeleton key point sequence Orthogonal decomposition of key points of the two-dimensional skeleton, use the information obtained from the decomposition to estimate the key point sequence of the three-dimensional skeleton, and restore the key point of the two-dimensional sample skeleton, and use the information obtained from the decomposition, the estimated key point sequence and recovery of the three-dimensional skeleton
  • the key points of the two-dimensional sample skeleton are used to construct the loss function.
  • the action transfer neural network is trained with the minimum value of the constructed loss function as the goal.
  • the first sample two-dimensional skeleton key point sequence of the sample object and the second sample two-dimensional skeleton key point sequence of the sample object after limb scaling are used to construct a loss function to train the action migration neural network, which can improve When the structure of the initial object and the target object are quite different, the accuracy of the movement transition. And when training the above action transfer neural network, it did not use the paired action-role data in the real world to realize unsupervised construction of the loss function and training of the action transfer neural network, which is conducive to improving the training of the action transfer neural network. The accuracy of the movement migration.
  • the above-mentioned action transfer neural network may specifically include three encoders and one decoder, and the training of the action transfer neural network is essentially the training of the above-mentioned three encoders and one decoder.
  • the foregoing determination of the loss function based on the two-dimensional skeleton key point sequence of the first sample and the two-dimensional skeleton key point sequence of the second sample can be specifically implemented by using the following steps:
  • Step 1 Determine the movement migration component sequence of the first sample based on the sequence of two-dimensional skeleton key points of the first sample.
  • each first sample two-dimensional keypoint in the first sample two-dimensional skeleton keypoint sequence Orthogonally decompose each first sample two-dimensional keypoint in the first sample two-dimensional skeleton keypoint sequence to obtain the first sample motion component information and first sample structure component information corresponding to each frame of sample image And the first sample angle component information.
  • the first sample motion component information corresponding to the multi-frame sample image forms the first sample motion component sequence;
  • the first sample structure component information corresponding to the multi-frame sample image forms the first sample structure component sequence;
  • the multi-frame sample image corresponds to
  • the first sample angle component information forms a first sample angle component sequence.
  • the first sample motion component sequence, the first sample angle component sequence, and the first sample structure component sequence form the first sample motion shift component sequence.
  • an encoder Em in the motion transfer neural network is used to process a first sample two-dimensional skeleton key point x to obtain the first sample motion component information
  • another encoder Es is used to The first sample two-dimensional skeleton key points x are processed to obtain the first sample structure component information
  • the last encoder Ev is used to process the first sample two-dimensional skeleton key points x to obtain the first sample angle Component information.
  • the motion component information of the first sample corresponding to the sample image of the current frame does not need to be average pooled, and can be directly used as the final first sample motion component information m.
  • Step 2 Determine the second sample action migration component sequence based on the second sample two-dimensional skeleton key point sequence.
  • each second sample two-dimensional key point in the second sample two-dimensional skeleton key point sequence Orthogonally decompose each second sample two-dimensional key point in the second sample two-dimensional skeleton key point sequence to obtain second sample motion component information, second sample structure component information, and second sample corresponding to each frame of sample image Angle component information.
  • the second sample motion component information corresponding to the multi-frame sample image forms the second sample motion component sequence;
  • the second sample structure component information corresponding to the multi-frame sample image forms the second sample structure component sequence;
  • the second sample angle corresponding to the multi-frame sample image The component information forms a second sample angle component sequence.
  • the second sample motion component sequence, the second sample angle component sequence, and the second sample structure component sequence form the second sample motion shift component sequence.
  • an encoder Em in the motion transfer neural network is used to process a second sample two-dimensional skeleton key point x'to obtain the second sample motion component information
  • another encoder Es is used to perform
  • the last encoder Ev is used to process the second-sample two-dimensional skeleton key points x'to obtain the second sample angle component information.
  • the second sample motion component information corresponding to the current frame sample image does not need to be average pooled, and can be directly used as the final second sample motion component information m′.
  • Step 3 Determine an estimated three-dimensional skeleton key point sequence based on the first sample action migration component sequence.
  • the first sample motion component information, the first sample structure component information, and the first sample angle component information corresponding to a frame of sample image are used to determine an estimated three-dimensional skeleton key point.
  • the estimated three-dimensional skeleton key points corresponding to the multi-frame sample images form the above-mentioned estimated three-dimensional skeleton key point sequence.
  • a decoder G can be used to decode the first sample motion component information, the first sample structure component information, and the first sample angle component information of a frame of sample image to obtain the reconstructed estimated three-dimensional skeleton key point.
  • Step 4 Determine the loss function based on the first sample action transfer component sequence, the second sample action transfer component sequence, and the estimated three-dimensional skeleton key point sequence.
  • the first sample motion component information, the first sample structure component information, the first sample angle component information, and the second sample motion migration component sequence in the first sample motion migration component sequence can be used.
  • the second sample motion component information, the second sample structure component information, and the second sample angle component information in the second sample are used to recover two-dimensional sample skeleton key points, and use the estimated three-dimensional skeleton key point sequence and the restored two-dimensional sample skeleton key Click to build a loss function.
  • the first sample action migration component sequence after orthogonal decomposition of the first sample two-dimensional skeleton key point sequence, and the second sample action migration component sequence after orthogonal decomposition of the second sample two-dimensional skeleton key point sequence are used.
  • constructing a loss function based on the estimated three-dimensional skeleton key point sequence reconstructed from the first sample motion migration component sequence which can improve the accuracy of the motion migration when the structure of the initial object and the target object are quite different.
  • the motion invariant loss function can be constructed and the motion invariant loss function can be minimized during training.
  • the following steps can be used to construct the aforementioned motion-invariant loss function:
  • Step 1 Determine the corresponding first sample in the first sample two-dimensional skeleton key point sequence based on the second sample motion component information, the first sample structure component information, and the first sample angle component information This two-dimensional skeleton key point corresponds to the first estimated skeleton key point.
  • the following sub-steps can be used to implement: the second sample motion component information m'and the first sample structure component information are processed by the decoder G First sample angle component information After processing, the key points of the three-dimensional skeleton can be reconstructed After that, use the rotation projection function Key points of the three-dimensional skeleton Reproject to two-dimensional space to get the first estimated skeleton key points
  • Step 2 Based on the motion component information of the first sample, the structure component information of the second sample, and the angle component information of the second sample, determine the corresponding second sample two-dimensional sequence in the second sample two-dimensional skeleton key point sequence
  • the skeleton key point corresponds to the second estimated skeleton key point.
  • the following sub-steps can be used to implement: the first sample motion component information m and the second sample structure component information are processed by the decoder G Second sample angle component information After processing, the key points of the three-dimensional skeleton can be reconstructed After that, use the rotation projection function Key points of the three-dimensional skeleton Reproject to two-dimensional space to obtain the second estimated skeleton key points
  • steps 1 and 2 generate the first estimated skeleton key points And the key points of the second estimated skeleton
  • the specific formula is as follows:
  • Step 3 Based on the first estimated skeleton key point, the second estimated skeleton key point, the first sample motion component information, the second sample motion component information, and the estimated three-dimensional skeleton key point sequence, determine The motion invariant loss function.
  • the constructed motion-invariant loss function can specifically include the following three:
  • N represents the number of frames of the sample motion video
  • T represents the number of joints corresponding to a first sample two-dimensional skeleton key point
  • M represents a preset value
  • Cm represents the code corresponding to the first sample motion component information Length
  • K represents the number of rotations of the sample object
  • the information obtained after orthogonal decomposition of the first sample two-dimensional skeleton key point sequence and the second sample two-dimensional skeleton key point sequence is used to perform skeleton restoration on the sample object to obtain the first estimated skeleton key point, and Perform skeleton restoration on the sample object after the limb is scaled to obtain the second estimated skeleton key point; then, combine the restored first estimated skeleton key point, the second estimated skeleton key point, and the estimated three-dimensionality of the reconstructed sample object
  • the skeleton key point sequence can construct a motion-invariant loss function.
  • the structure-invariant loss function can be constructed, and during training, the motion-invariant loss function and the structure-invariant loss function can be minimized to improve the construction of the action transfer nerve The accuracy of the network during movement migration. Specifically, the following steps can be used to construct the aforementioned structural invariant loss function:
  • Step 1 From the first sample two-dimensional skeleton key point sequence, filter the first sample two-dimensional skeleton key point of the sample object at the first time, and the first sample object at the second time. The key points of this two-dimensional skeleton.
  • the second sample two-dimensional skeleton key point of the sample object at the second moment and the second sample two-dimensional skeleton key point of the sample object at the first moment are screened point.
  • the aforementioned two-dimensional skeleton key points of the first sample are the two-dimensional skeleton key points of the sample object respectively extracted from the sample images corresponding to the first time t1 and the second time t2 in the sample motion video, and are samples that have not been scaled by limbs.
  • the above-mentioned second sample two-dimensional skeleton key points are the key points of the skeleton key points of the sample object extracted from the sample images corresponding to the first time t1 and the second time t2 in the sample motion video after the limbs are scaled.
  • Step 2 Based on the first sample two-dimensional skeleton key points of the sample object at the first moment, the first sample two-dimensional skeleton key points of the sample object at the second moment, and the sample object at the second moment The second sample two-dimensional skeleton key points of the second sample, the second sample two-dimensional skeleton key points of the sample object at the first moment, and the predicted three-dimensional skeleton key point sequence are used to determine the structure invariant loss function.
  • the constructed structure invariant loss function includes the following two:
  • St1 represents the sample structure component information directly extracted from the first sample two-dimensional skeleton key point at time t1
  • St2 represents the sample structure component information directly extracted from the first sample two-dimensional skeleton key point at time t2
  • St2' represents the sample structure component information directly extracted from the key points of the second sample two-dimensional skeleton at time t2
  • St1' represents the sample structure component information directly extracted from the key points of the second sample two-dimensional skeleton at time t1
  • Cb Represents the code length corresponding to the structural component information of the first sample
  • m is a preset value
  • s() represents the cosine similarity function
  • the structure-invariant loss function can be constructed .
  • the angle invariant loss function can be constructed, and during training, the angle of view invariant loss function, motion invariant loss function and structure can be made
  • the invariant loss function is the smallest, which can improve the accuracy of the constructed action transfer neural network during action transfer. Specifically, the following steps can be used to construct a viewing angle invariant loss function:
  • the angle component information and the predicted three-dimensional skeleton key point sequence determine the viewing angle invariant loss function.
  • the constructed viewing angle invariant loss function specifically includes the following two:
  • vt1 represents the sample angle component information directly extracted from the first sample two-dimensional skeleton key point at time t1
  • vt2 represents the sample angle component information directly extracted from the first sample two-dimensional skeleton key point at time t2
  • Cv represents the code length corresponding to the angle component information of the first sample
  • the reconstruction and restoration loss function can be constructed, and during training, the reconstruction restoration loss function, the viewing angle invariant loss function, the motion invariant loss function and the structure can be unchanged.
  • the loss function is minimized to improve the accuracy of the constructed action transfer neural network during action transfer. Specifically, the following steps can be used to construct the reconstruction and restoration loss function:
  • D represents a convolutional network in time series, Represents the probability distribution of x taken from the sample, and then for the following function, that is Seek expectations, Represents two reconstruction and restoration loss functions.
  • the reconstruction and restoration loss function, the viewing angle invariant loss function, the motion invariant loss function, and the structure invariant loss function are constructed through the above embodiments.
  • the following formula can be used to fuse the above loss functions to obtain the target loss function :
  • ⁇ rec, ⁇ crs, ⁇ adv, ⁇ trip, and ⁇ inv all represent preset weights.
  • the present disclosure also provides an action migration device, which is applied to a terminal device or server for action migration, and each module can implement the same method steps and obtain the same benefits as in the above-mentioned method. The effect, therefore, the description of the same parts will not be repeated in this disclosure.
  • an action device provided by the present disclosure may include:
  • the video acquisition module 510 is configured to acquire the first initial video including the action sequence of the initial object.
  • the key point extraction module 520 is configured to identify the two-dimensional skeleton key point sequence of the initial object in the multi-frame image of the first initial video.
  • the key point conversion module 530 is configured to convert the two-dimensional skeleton key point sequence into a three-dimensional skeleton key point sequence of the target object.
  • the image rendering module 540 is configured to generate a target video including an action sequence of the target object based on the three-dimensional skeleton key point sequence.
  • the key point conversion module 530 when the key point conversion module 530 converts the two-dimensional skeleton key point sequence into a three-dimensional skeleton key point sequence of the target object, it is used to determine the sequence of the two-dimensional skeleton key point sequence.
  • the sequence of movement transition components of the initial object; and the sequence of three-dimensional skeleton key points of the target object is determined based on the sequence of movement transition components of the initial object.
  • the video acquisition module 510 is also used to acquire a second initial video including a target object; the key point extraction module 520 is also used to identify multiple frames of the target object in the second initial video The key point sequence of the two-dimensional skeleton in the image; the key point conversion module 530 is used to: based on the target object when determining the three-dimensional skeleton key point sequence of the target object based on the motion migration component sequence of the initial object The sequence of two-dimensional skeleton key points of the two-dimensional skeleton to determine the sequence of the movement transition component of the target object; based on the sequence of the movement transition component of the initial object and the sequence of the movement transition component of the target object, the sequence of the target movement transition component is determined; based on the The target motion migration component sequence determines the three-dimensional skeleton key point sequence of the target object.
  • the motion migration component sequence of the initial object includes a motion component sequence, an object structure component sequence, and a shooting angle component sequence;
  • the motion migration component sequence of the initial object is used, it is used to determine the motion component information and object corresponding to each frame of the image based on the two-dimensional skeleton key points corresponding to each frame of the multi-frame image of the first initial video.
  • Structural component information and shooting angle component information determine the motion component sequence based on the motion component information corresponding to each frame of the multi-frame image of the first initial video; determine the motion component sequence based on the multi-frame image of the first initial video
  • the object structure component information corresponding to each frame of image is used to determine the object structure component sequence; and the shooting angle component sequence is determined based on the shooting angle component information corresponding to each frame of the first initial video frame.
  • the embodiment of the present disclosure discloses an electronic device, as shown in FIG. 6, comprising: a processor 601, a memory 602, and a bus 603.
  • the memory 602 stores machine-readable instructions executable by the processor 601. When the device is running, the processor 601 and the memory 602 communicate through the bus 603.
  • the steps of the following motion migration method are executed: acquiring a first initial video including the motion sequence of an initial object; identifying that the initial object is in multiple frames of the first initial video The two-dimensional skeleton key point sequence in the image; the two-dimensional skeleton key point sequence is converted into the three-dimensional skeleton key point sequence of the target object; the target video including the action sequence of the target object is generated based on the three-dimensional skeleton key point sequence.
  • the embodiment of the present disclosure also provides a computer program product corresponding to the above method and device, which includes a computer-readable storage medium storing program code.
  • the instructions included in the program code can be used to execute the method in the previous method embodiment, and the specific implementation is Please refer to the method embodiment, which will not be repeated here.
  • the modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a non-volatile computer readable storage medium executable by a processor.
  • the technical solution of the present disclosure essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure.
  • the aforementioned storage media include: U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本公开提供了一种动作迁移方法、装置、设备及存储介质,其中首先获取包括初始对象的动作序列的第一初始视频;之后,识别所述初始对象在所述第一初始视频的多帧图像中的二维骨架关键点序列;再将所述二维骨架关键点序列转换为目标对象的三维骨架关键点序列;最后,基于所述三维骨架关键点序列,生成包括目标对象的动作序列的目标视频。

Description

动作迁移方法、装置、设备及存储介质 技术领域
本公开涉及计算机视觉技术领域,具体而言,涉及一种动作迁移方法、装置、设备及存储介质。
背景技术
动作迁移是将初始运动视频中初始对象的动作迁移到目标对象上,以形成目标运动视频。由于初始运动视频和目标运动视频存在很大的结构和视角差异,很难在像素级别上实现动作的迁移。尤其在初始对象做出极端动作,或者初始对象和目标对象的结构差异比较大时,迁移到目标对象上的动作准确度较低。
发明内容
有鉴于此,本公开至少提供一种动作迁移方法及装置。
第一方面,本公开提供了一种动作迁移方法,包括:获取包括初始对象的动作序列的第一初始视频;识别所述初始对象在所述第一初始视频的多帧图像中的二维骨架关键点序列;将所述二维骨架关键点序列转换为目标对象的三维骨架关键点序列;基于所述三维骨架关键点序列,生成包括目标对象的动作序列的目标视频。
本方面,通过二维骨架关键点序列提取、二维骨架关键点序列到三维骨架关键点序列的重定向以及基于三维骨架关键点序列进行目标对象的动作渲染,实现了动作迁移,避免直接在像素级别上实现动作迁移,能够克服初始视频和目标视频之间存在的结构和视角差异大的问题,尤其在初始对象作出极端动作或初始对象与目标对象的结构差异较大时,提高了动作迁移的准确度。另外,本方面利用二维骨架关键点序列重定向三维骨架关键点序列,避免了在动作迁移中使用误差较大的三维关键点估计和重定向,有利于提高动作迁移的准确度。
在一种可能的实施方式中,所述将所述二维骨架关键点序列转换为目标对象的三维骨架关键点序列,包括:基于所述二维骨架关键点序列,确定所述初始对象的动作迁移分量序列;基于所述初始对象的动作迁移分量序列,确定所述目标对象的三维骨架关键点序列。
本实施方式,利用二维骨架关键点序列正交分解后的动作迁移分量序列,来重定向三维骨架关键点序列,避免了在动作迁移中使用误差较大的三维关键点估计和重定向,有利于提高动作迁移的准确度。
在一种可能的实施方式中,在确定所述目标对象的三维骨架关键点序列之前,上述动作迁移方法还包括:获取包括目标对象的第二初始视频;识别所述目标对象在所述第二初始视频的多帧图像中的二维骨架关键点序列;所述基于所述初始对象的动作迁移分量序列,确定所述目标对象的三维骨架关键点序列,包括:基于所述目标对象的二维骨架关键点序列,确定所述目标对象的动作迁移分量序列;基于所述初始对象的动作迁移分量序列和所述目标对象的动作迁移分量序列,确定目标动作迁移分量序列;基于所述 目标动作迁移分量序列确定所述目标对象的三维骨架关键点序列。
本实施方式,将初始对象的二维骨架关键点序列正交分解后的动作迁移分量序列,与目标对象的二维初始骨架关键点序列进行正交分解后的动作迁移分量序列进行融合后,来确定三维骨架关键点序列,能够克服初始对象作出极端动作或初始对象与目标对象的结构差异较大时,动作迁移准确度低的缺陷。
在一种可能的实施方式中,所述初始对象的动作迁移分量序列包括运动分量序列、对象结构分量序列和拍摄角度分量序列;所述基于所述二维骨架关键点序列,确定所述初始对象的动作迁移分量序列,包括:基于所述第一初始视频的多帧图像中每一帧图像对应的二维骨架关键点,分别确定所述每一帧图像对应的运动分量信息、对象结构分量信息和拍摄角度分量信息;基于所述第一初始视频的多帧图像中每一帧图像对应的运动分量信息,确定所述运动分量序列;基于所述第一初始视频的多帧图像中每一帧图像对应的对象结构分量信息,确定所述对象结构分量序列;基于所述第一初始视频的多帧图像中每一帧图像对应的拍摄角度分量信息,确定所述拍摄角度分量序列。
本实施方式,动作迁移分量序列可以包括多个正交的分量序列,利用多个正交的分量序列来确定三维骨架关键点序列,能够进一步克服了初始对象作出极端动作或初始对象与目标对象的结构差异较大时,动作迁移准确度低的缺陷。
在一种可能的实施方式中,所述基于所述三维骨架关键点序列,生成包括目标对象的动作序列的目标视频,包括:基于所述三维骨架关键点序列,生成所述目标对象的二维目标骨架关键点序列;基于所述二维目标骨架关键点序列,生成包括目标对象的动作序列的目标视频。
本实施方式,将重建的三维骨架关键点序列重投影得到二维的目标骨架关键点序列,避免了在动作迁移中使用误差较大的三维关键点估计和重定向,有利于提高动作迁移的准确度。
在一种可能的实施方式中,所述将所述二维骨架关键点序列转换为目标对象的三维骨架关键点序列,包括:采用动作迁移神经网络将所述二维骨架关键点序列转换为目标对象的三维骨架关键点序列。
本实施方式,利用训练好的动作迁移神经网络来确定目标对象的三维骨架关键点序列,能够提高了关键点重定向的效率和准确度。
在一种可能的实施方式中,上述动作迁移方法还包括训练所述动作迁移神经网络的步骤:获取包括样本对象的动作序列的样本运动视频;识别所述样本对象在所述样本运动视频的多帧样本图像中的第一样本二维骨架关键点序列;对第一样本二维骨架关键点序列进行肢体比例缩放处理,得到第二样本二维骨架关键点序列;基于所述第一样本二维骨架关键点序列和所述第二样本二维骨架关键点序列,确定损失函数;基于所述损失函数,调整所述动作迁移神经网络的网络参数。
本实施方式,利用样本对象的第一样本二维骨架关键点序列和对样本对象进行肢体比例缩放后的第二样本二维骨架关键点序列构建损失函数,来训练动作迁移神经网络,能够提高在初始对象与目标对象的结构差异较大时,动作迁移的准确度。并且在训练上述动作迁移神经网络的时候,并未使用真实世界中配对的动作-角色数据,实现了无监督的构建损失函数和训练动作迁移神经网络,有利于提高训练得到的动作迁移神经网络在进行动作迁移时的准确度。
在一种可能的实施方式中,所述基于所述第一样本二维骨架关键点序列和所述第二样本二维骨架关键点序列,确定损失函数,包括:基于所述第一样本二维骨架关键点序 列,确定所述第一样本动作迁移分量序列;基于所述第二样本二维骨架关键点序列,确定所述第二样本动作迁移分量序列;基于所述第一样本动作迁移分量序列,确定预估三维骨架关键点序列;基于所述第一样本动作迁移分量序列、第二样本动作迁移分量序列和所述预估三维骨架关键点序列,确定所述损失函数。
本实施方式,利用第一样本二维骨架关键点序列正交分解后的第一样本动作迁移分量序列、第二样本二维骨架关键点序列正交分解后的第二样本动作迁移分量序列,以及,基于第一样本动作迁移分量序列重建得到的预估三维骨架关键点序列,来构建损失函数,能够提高在初始对象与目标对象的结构差异较大时,动作迁移的准确度。
在一种可能的实施方式中,所述损失函数包括运动不变损失函数;所述第一样本动作迁移分量序列包括各帧样本图像对应的第一样本运动分量信息、第一样本结构分量信息和第一样本角度分量信息;所述第二样本动作迁移分量序列包括各帧样本图像对应的第二样本运动分量信息、第二样本结构分量信息和第二样本角度分量信息;
所述确定所述损失函数,包括:基于所述各帧样本图像对应的所述第二样本运动分量信息、第一样本结构分量信息和第一样本角度分量信息,确定所述第一样本二维骨架关键点序列中对应的所述第一样本二维骨架关键点对应的第一预估骨架关键点;基于所述各帧样本图像对应的所述第一样本运动分量信息、第二样本结构分量信息和第二样本角度分量信息,确定所述第二样本二维骨架关键点序列中对应的所述第二样本二维骨架关键点对应的第二预估骨架关键点;基于所述第一预估骨架关键点、第二预估骨架关键点、第一样本运动分量信息、第二样本运动分量信息、和所述预估三维骨架关键点序列,确定所述运动不变损失函数。
本实施方式,利用第一样本二维骨架关键点序列和第二样本二维骨架关键点序列正交分解后的信息,对样本对象进行骨架恢复得到第一预估骨架关键点,以及对肢体缩放后的样本对象进行骨架恢复得到第二预估骨架关键点;之后,结合恢复得到的第一预估骨架关键点、第二预估骨架关键点和重建得到的样本对象的预估三维骨架关键点序列能够构建运动不变损失函数,由于样本对象尽管在结构和拍摄视角上存在变化和扰动,但是迁移后的运动信息应该是不变的,因此通过构建运动不变损失函数,并且在训练时,使运动不变损失函数最小,能够提高构建的动作迁移神经网络在进行动作迁移时的准确度。
在一种可能的实施方式中,所述损失函数还包括结构不变损失函数;所述确定所述损失函数还包括:从所述第一样本二维骨架关键点序列中,筛选第一时刻对应的样本图像中的第一样本二维骨架关键点以及第二时刻对应的样本图像中的第一样本二维骨架关键点;从所述第二样本二维骨架关键点序列中,筛选所述第二时刻对应的样本图像中的第二样本二维骨架关键点以及所述第一时刻对应的样本图像中的第二样本二维骨架关键点;基于所述第一时刻对应的样本图像中的第一样本二维骨架关键点、所述第二时刻对应的样本图像中的第一样本二维骨架关键点、所述第二时刻对应的样本图像中的第二样本二维骨架关键点、所述第一时刻对应的样本图像中的第二样本二维骨架关键点、和所述预估三维骨架关键点序列,确定所述结构不变损失函数。
本实施方式,利用不同时刻的第一样本二维骨架关键点和第二样本二维骨架关键点,结合重建得到的样本对象的预估三维骨架关键点序列能够构建结构不变损失函数,由于样本对象的结构随着时间的变化存在不变性,因此通过构建结构不变损失函数,并且在训练时,使运动不变损失函数和结构不变损失函数最小,能够提高构建的动作迁移神经网络在进行动作迁移时的准确度。
在一种可能的实施方式中,所述损失函数还包括视角不变损失函数;所述确定所述损失函数还包括:基于所述第一时刻对应的样本图像中的第一样本二维骨架关键点、所 述第二时刻对应的样本图像中的第一样本二维骨架关键点、所述第一时刻和第二时刻对应的样本图像的第一样本角度分量信息、所述第一时刻和第二时刻对应的样本图像的第二样本角度分量信息、和所述预估三维骨架关键点序列,确定所述视角不变损失函数。
本实施方式中,利用不同时刻的第一样本二维骨架关键点和重建得到的样本对象的预估三维骨架关键点序列等能够构建视角不变损失函数,由于样本对象的拍摄视角随着样本对象的运动和结构的变化,存在不变性,因此通过构建视角不变损失函数,并且在训练时,使视角不变损失函数、运动不变损失函数和结构不变损失函数最小,能够提高构建的动作迁移神经网络在进行动作迁移时的准确度。
在一种可能的实施方式中,所述损失函数还包括重建恢复损失函数;所述确定所述损失函数还包括:基于所述第一样本二维骨架关键点序列和所述预估三维骨架关键点序列,确定所述重建恢复损失函数。
本实施方式,利用第一样本二维骨架关键点序列和重建得到的样本对象的预估三维骨架关键点序列能够构建重建恢复损失函数,由于样本对象在进行样本对象恢复时,应该存在不变性,因此通过构建重建恢复损失函数,并且在训练时,使重建恢复损失函数、视角不变损失函数、运动不变损失函数和结构不变损失函数最小,能够提高构建的动作迁移神经网络在进行动作迁移时的准确度。
第二方面,本公开提供了一种动作迁移装置,包括:视频获取模块,用于获取包括初始对象的动作序列的第一初始视频;关键点提取模块,用于识别所述初始对象在所述第一初始视频的多帧图像中的二维骨架关键点序列;关键点转换模块,用于将所述二维骨架关键点序列转换为目标对象的三维骨架关键点序列;图像渲染模块,用于基于所述三维骨架关键点序列,生成包括目标对象的动作序列的目标视频。
在一种可能的实施方式中,所述关键点转换模块在将所述二维骨架关键点序列转换为目标对象的三维骨架关键点序列时,用于:基于所述二维骨架关键点序列,确定所述初始对象的动作迁移分量序列;基于所述初始对象的动作迁移分量序列,确定所述目标对象的三维骨架关键点序列。
在一种可能的实施方式中,所述视频获取模块还用于获取包括目标对象的第二初始视频;所述关键点提取模块还用于识别所述目标对象在所述第二初始视频的多帧图像中的二维骨架关键点序列;所述关键点转换模块在基于所述初始对象的动作迁移分量序列,确定所述目标对象的三维骨架关键点序列时,用于:基于所述目标对象的二维骨架关键点序列,确定所述目标对象的动作迁移分量序列;基于所述初始对象的动作迁移分量序列和所述目标对象的动作迁移分量序列,确定目标动作迁移分量序列;基于所述目标动作迁移分量序列确定所述目标对象的三维骨架关键点序列。
在一种可能的实施方式中,所述初始对象的动作迁移分量序列包括运动分量序列、对象结构分量序列和拍摄角度分量序列;所述关键点转换模块在基于所述二维骨架关键点序列,确定所述初始对象的动作迁移分量序列时,用于:基于所述第一初始视频的多帧图像中每一帧图像对应的二维骨架关键点,分别确定所述初始对象的运动分量信息、对象结构分量信息和拍摄角度分量信息;基于所述第一初始视频的多帧图像中每一帧图像对应的运动分量信息,确定所述运动分量序列;基于所述第一初始视频的多帧图像中每一帧图像对应的对象结构分量信息,确定所述对象结构分量序列;基于所述第一初始视频的多帧图像中每一帧图像对应的拍摄角度分量信息,确定所述拍摄角度分量序列。
第三方面,本公开提供了一种电子设备,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当电子设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行如上述动作迁移方 法的步骤。
第四方面,本公开还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行如上述动作迁移方法的步骤。
本公开上述装置、电子设备、和计算机可读存储介质,至少包含与本公开上述方法的任一方面或任一方面的任一实施方式的技术特征实质相同或相似的技术特征,因此关于上述装置、电子设备、和计算机可读存储介质的效果描述,可以参见上述方法内容的效果描述,这里不再赘述。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1示出了本公开实施例提供的一种动作迁移方法的流程图;
图2示出了本公开实施例提供的另一种动作迁移方法的流程图;
图3示出了本公开实施例提供的一种动作迁移神经网络的训练方法的流程图;
图4示出了本公开实施例提供的另一种动作迁移神经网络训练过程中恢复骨架关键点的流程图;
图5示出了本公开实施例提供的一种动作迁移装置的结构示意图;
图6示出了本公开实施例提供的一种电子设备的结构示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,应当理解,本公开中附图仅起到说明和描述的目的,并不用于限定本公开的保护范围。另外,应当理解,示意性的附图并未按实物比例绘制。本公开中使用的流程图示出了根据本公开的一些实施例实现的操作。应该理解,流程图的操作可以不按顺序实现,没有逻辑的上下文关系的步骤可以反转顺序或者同时实施。此外,本领域技术人员在本公开内容的指引下,可以向流程图添加一个或多个其他操作,也可以从流程图中移除一个或多个操作。
另外,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
需要说明的是,本公开实施例中将会用到术语“包括”,用于指出其后所声明的特征的存在,但并不排除增加其它的特征。
本公开提供了一种动作迁移方法及装置,通过二维骨架关键点序列提取、二维骨架关键点序列到三维骨架关键点序列的重定向以及基于三维骨架关键点序列进行目标对象的动作渲染,实现了动作迁移,避免直接在像素级别上实现动作迁移,能够减轻初始 视频和目标视频之间存在的结构和视角差异大的问题,尤其在初始对象作出极端动作或初始对象与目标对象的结构差异较大时,提高了动作迁移的准确度。另外,本公开利用二维骨架关键点序列重定向三维骨架关键点序列,避免了在动作迁移中使用误差较大的三维关键点估计和重定向,有利于提高动作迁移的准确度。
下面通过具体的实施例对本公开的动作迁移方法、装置、设备及存储介质进行说明。
本公开实施例提供了一种动作迁移方法,该方法应用于进行动作迁移的终端设备或服务器等。具体地,如图1所示,本公开实施例提供的动作迁移方法包括如下步骤:
S110、获取包括初始对象的动作序列的第一初始视频。
这里,第一初始视频中包括多帧图像,每帧图像中初始对象可能呈现不同的姿势,这些姿势合并起来为初始对象的动作序列。
S120、识别所述初始对象在所述第一初始视频的多帧图像中的二维骨架关键点序列。
为了确定初始对象的动作序列,可以从第一初始视频的每帧图像中提取初始对象的二维骨架关键点,多帧图像分别对应的二维骨架关键点形成上述二维骨架关键点序列。示例性的,上述二维骨架关键点可以包括初始对象的各个关节对应的关键点。各个关节对应的关键点组合连接起来,可以得到初始对象的骨架。
在可能的实施方式中,可以利用二维姿态估计神经网络提取每帧图像中初始对象的二维骨架关键点。
上述初始对象可以是真实的人、虚拟的人、动物等,本公开对此不限定。
S130、将所述二维骨架关键点序列转换为目标对象的三维骨架关键点序列。
在可能的实施方式中,首先可以基于所述二维骨架关键点序列,确定所述初始对象的动作迁移分量序列;之后再基于所述初始对象的动作迁移分量序列,确定所述目标对象的三维骨架关键点序列。
示例性的,上述初始对象的动作迁移分量序列包括运动分量序列、对象结构分量序列和拍摄角度分量序列中的至少一种。
其中,运动分量序列表示初始对象的运动,对象结构分量序列表示初始对象的身体形态,拍摄角度分量序列表示摄像机的角度。
在一些实施例中,可以利用如下子步骤形成上述运动分量序列、对象结构分量序列和拍摄角度分量序列:
子步骤一、基于所述第一初始视频的多帧图像中每一帧图像对应的二维骨架关键点,分别确定每一帧图像对应的的运动分量信息、对象结构分量信息和拍摄角度分量信息;
子步骤二、基于所述第一初始视频的多帧图像中每一帧图像对应的运动分量信息,确定所述运动分量序列;
子步骤三、基于所述第一初始视频的多帧图像中每一帧图像对应的对象结构分量信息,确定所述对象结构分量序列;
子步骤四、基于所述第一初始视频的多帧图像中每一帧图像对应的拍摄角度分量信息,确定所述拍摄角度分量序列。
上述步骤是将每帧图像对应的二维骨架关键点通过神经网络编码成语义上正交的三个向量,分别得到每帧图像对应的运动分量信息、对象结构分量信息和拍摄角度分量信 息。之后,将多帧图像对应的运动分量信息组合形成运动分量序列,多帧图像对应的对象结构分量信息组合形成对象结构分量序列,多帧图像对应的拍摄角度分量信息组合形成拍摄角度分量序列。
上述三个分量信息中,每个分量信息对于另外两个分量信息存在不变性。
此步骤,利用二维骨架关键点序列正交分解后的动作迁移分量序列,来重定向三维骨架关键点序列,避免了在动作迁移中使用误差较大的三维关键点估计和重定向,有利于提高动作迁移的准确度,并且能够进一步降低了初始对象作出极端动作或初始对象与目标对象的结构差异较大时,动作迁移准确度低的缺陷。
S140、基于所述三维骨架关键点序列,生成包括目标对象的动作序列的目标视频。
在确定了三维骨架关键点序列之后,可以将该三维骨架关键点序列中的每帧图像对应的三维骨架关键点投影回二维空间,得到目标对象的二维目标骨架关键点,多帧图像对应的二维目标骨架关键点形成二维目标骨架关键点序列。之后,基于所述二维目标骨架关键点序列,生成包括目标对象的动作序列的目标视频。其中,目标对象的动作序列与初始对象的动作序列相对应。
在一些实施例中,在利用二维目标骨架关键点序列,生成包括目标对象的动作序列的目标视频时,可以利用得到每组二维目标骨架关键点,进行动作渲染,得到每帧图像对应的目标对象的姿势,依次将各帧图像中姿势合并起来就能够得到目标对象的动作序列。
示例性的,可以利用视频渲染引擎,基于每帧图像对应的二维目标骨架关键点,生成包括所述目标对象的动作序列的目标视频。
上述,将重建的三维骨架关键点序列重投影得到二维的目标骨架关键点序列,避免了在动作迁移中使用误差较大的三维关键点估计和重定向,有利于提高动作迁移的准确度。
示例性的,上述步骤130,可以利用一个训练好的运动迁移神经网络对二维骨架关键点序列进行正交分解,以及利用分解得到的动作迁移分量序列确定目标对象的三维骨架关键点序列。
上述运动迁移神经网络可以包括三个编码器和一个解码器,其中每个编码器分别用于对二维骨架关键点序列中的每个二维骨架关键点进行分量信息提取,以得到上述运动分量信息、对象结构分量信息和拍摄角度分量信息。在得到上述分量信息之后,利用一个解码器进行解码处理,重建得到目标对象的预估三维骨架关键点,最后将预估三维骨架关键点重投影回二维空间,得到上述三维骨架关键点序列中的一个三维骨架关键点。
应当说明的是,在确定三维骨架关键点的时候,既可以利用编码器直接解码得到的对象结构分量信息和拍摄视角分量信息,也可以利用平均池化后的对象结构分量信息和拍摄视角分量信息来确定。具体地,将包括当前帧图像的连续多帧图像分别对象的二维骨架关键点进行正交分解,得到每帧图像对应的对象结构分量信息和拍摄角度分量信息,之后,对每帧图像对应的对象结构分量信息进行平均池化操作,得到当前帧图像对应的最终的对象结构分量信息;对每帧图像对应的拍摄视角分量信息进行平均池化操作,得到当前帧图像对应的最终的拍摄视角分量信息。最后,利用直接分解得到的运动分量信息、平均池化操作得到的对象结构分量信息和平均池化操作得到的拍摄角度分量信息,确定当前帧图像对应的三维骨架关键点。
上述实施例避免直接在像素级别上实现动作迁移,减少了第一初始视频和目标视频之间存在的结构和视角差异大的问题,尤其在初始对象作出极端动作或初始对象与目标 对象的结构差异较大时,提高了动作迁移的准确度。另外,上述实施例将提取的二维骨架关键点正交分解为运动分量信息、对象结构分量信息和拍摄角度分量信息,进一步减轻了初始对象作出极端动作或初始对象与目标对象的结构差异较大时,动作迁移准确度低的缺陷。
为了进一步减轻初始对象作出极端动作或初始对象与目标对象的结构差异较大时,动作迁移准确度低的缺陷,本公开实施例在确定所述目标对象的三维骨架关键点序列之前,还获取了包括目标对象的第二初始视频,并识别了所述目标对象在所述第二初始视频的多帧图像中的二维骨架关键点序列。
之后,在确定所述目标对象的三维骨架关键点序列时,首先基于所述目标对象的二维骨架关键点序列,确定所述目标对象的动作迁移分量序列;之后,基于所述初始对象的动作迁移分量序列和所述目标对象的动作迁移分量序列,确定目标动作迁移分量序列;最后,基于所述目标动作迁移分量序列确定所述目标对象的三维骨架关键点序列。
上述确定目标对象的动作迁移分量序列的方法与确定初始对象的动作迁移分量序列的方法相同,同样是首先从第二初始视频的每帧图像中分别提取目标对象的二维骨架关键点,并对每帧图像中的二维骨架关键点进行正交分解,确定了所述目标对象的运动分量信息、对象结构分量信息、和拍摄角度分量信息。最后,利用多帧图像对应的运动分量信息形成运动分量序列,利用多帧图像对应的对象结构分量信息形成对象结构分量序列,利用多帧图像对应的拍摄角度分量信息形成拍摄角度分量序列。
上述实施例,利用融合后的目标动作迁移分量序列,重建目标对象的三维骨架关键点序列,之后再将重建的三维骨架关键点序列重投影得到目标对象的二维目标骨架关键点序列,避免了在动作迁移中使用误差较大的三维关键点估计和重定向,有利于提高动作迁移的准确度。
下面再通过一个具体的实施例对本公开的动作迁移方法进行说明。
如图2所示,本实施例的动作迁移方法包括如下步骤:
步骤一、骨架提取操作。从第一初始视频的每帧图像中提取初始对象的二维骨架关键点,得到初始对象的二维骨架关键点序列;从第二初始视频的每帧图像中提取目标对象的二维骨架关键点,得到目标对象的二维骨架关键点序列。
步骤二、动作迁移处理。分别对初始对象的二维骨架关键点序列中的每个二维骨架关键点和目标对象的二维骨架关键点序列中的每个二维骨架关键点进行编码处理,即进行正交分解,分别得到初始对象的每个二维骨架关键点或每帧图像对应的运动分量信息、对象结构分量信息、和拍摄角度分量信息,以及,目标对象的每个二维骨架关键点或每帧图像对应的运动分量信息、对象结构分量信息、和拍摄角度分量信息。
上述初始对象的多帧图像对应的运动分量信息组成初始对象的运动分量序列,初始对象的多帧图像对应的对象结构分量信息组成初始对象的对象结构分量序列,初始对象的多帧图像对应的拍摄角度分量信息组成初始对象的拍摄角度分量序列。初始对象的运动分量序列、对象结构分量序列和拍摄角度分量序列形成初始对象的动作迁移分量序列。
同样,上述目标对象的多帧图像对应的运动分量信息组成目标对象的运动分量序列,目标对象的多帧图像对应的对象结构分量信息组成目标对象的对象结构分量序列,目标对象的多帧图像对应的拍摄角度分量信息组成目标对象的拍摄角度分量序列。目标对象的运动分量序列、对象结构分量序列和拍摄角度分量序列形成目标对象的动作迁移分量序列。
之后,基于所述初始对象的动作迁移分量序列和所述目标对象的动作迁移分量序列,确定目标动作迁移分量序列;基于所述目标动作迁移分量序列确定所述目标对象的三维骨架关键点序列。
示例性的,可以是将初始对象的每帧图像对应的运动分量信息、对象结构分量信息、和拍摄角度分量信息,与,目标对象的每帧图像对应的运动分量信息、对象结构分量信息、和拍摄角度分量信息进行重新组合,得到重新组合的目标运动分量信息、目标结构分量信息、和目标角度分量信息。
上述多帧图像对应的目标运动分量信息可以组成目标运动分量序列,多帧图像对应的目标结构分量信息可以组成目标对象结构分量序列,多帧图像对应的目标角度分量信息可以组成目标拍摄角度分量序列。目标运动分量序列、目标对象结构分量序列和目标拍摄角度分量序列形成上述目标动作迁移分量序列。
之后,对目标运动分量信息、目标结构分量信息、和目标角度分量信息进行解码操作,得到目标对象对应于一帧图像在三个预设角度上的三维骨架关键点。多帧图像的三维骨架关键点形成上述三维骨架关键点序列。
最后,分别将每个预设角度上的三维骨架关键点重投回二维空间,分别得到目标对象在每个预设角度上的二维目标骨架关键点。
步骤三、骨架到视频渲染操作。基于每帧图像中目标对象在每个预设角度上的二维目标骨架关键点,确定目标对象在每个预设角度上的目标动作,并基于目标动作生成目标对象在每个预设角度上的目标视频。
上述实施例能够显著提高动作迁移的准确度,并且可以实现任意角度上的动作迁移。同时对于目标对象与初始对象在结构上差异较大、初始对象作为极端动作的情况仍然能够进行准确的动作迁移,取得了较好的视觉效果。
目前,由于运动呈现出复杂的非线性以及很难在真实世界中找到配对的动作-角色数据,因此很难建立准确的动作迁移模型来实现上述动作迁移,致使动作迁移呈现出准确度低的缺陷。为了缓解上述缺陷,本公开还提供了一种动作迁移神经网络的训练方法,该方法既可以应用于上述进行动作迁移处理的终端设备或服务器上,也可以应用于单独进行神经网络训练的终端设备或服务器上。具体地,如图3所示,可以包括如下步骤:
S310、获取包括样本对象的动作序列的样本运动视频。
S320、识别所述样本对象在所述样本运动视频的多帧样本图像中的第一样本二维骨架关键点序列。
这里,从样本运动视频的每帧图像中提取样本对象的第一样本二维骨架关键点,多帧样本图像的第一样本二维骨架关键点形成第一样本二维骨架关键点序列。
上述第一样本二维骨架关键点可以包括样本对象的各个关节对应的关键点。各个关节对应的关键点组合连接起来,可以得到样本对象的骨架。
在具体实施时,可以利用二维姿态估计神经网络提取样本对象的第一样本二维骨架关键点。
上述样本对象可以是真实的人、虚拟的人、动物等,本公开对此不限定。
S330、对第一样本二维骨架关键点序列进行肢体比例缩放处理,得到第二样本二维骨架关键点序列。
这里,按照预定的缩放比例,对第一样本二维骨架关键点序列中的每个第一样本二维骨架关键点进行肢体比例缩放,得到第二样本二维骨架关键点序列。
如图4所示,第一样本二维骨架关键点x进行肢体比例缩放后,得到第二样本二维骨架关键点x’。
S340、基于所述第一样本二维骨架关键点序列和所述第二样本二维骨架关键点序列,确定损失函数。基于所述损失函数,调整所述动作迁移神经网络的网络参数。
在具体实施时,可以分别对第一样本二维骨架关键点序列中的每个第一样本二维骨架关键点和所述第二样本二维骨架关键点序列中的每个第二样本二维骨架关键点进行正交分解,利用分解得到信息进行三维骨架关键点序列预估,和二维的样本骨架关键点恢复,并利用分解得到的信息、预估的三维骨架关键点序列和恢复的二维的样本骨架关键点构建损失函数。
这里,以构建的损失函数取值最小为目标训练动作迁移神经网络。
本实施方式,利用样本对象的第一样本二维骨架关键点序列和对样本对象进行肢体比例缩放后的第二样本二维骨架关键点序列构建损失函数,来训练动作迁移神经网络,能够提高在初始对象与目标对象的结构差异较大时,动作迁移的准确度。并且在训练上述动作迁移神经网络的时候,并未使用真实世界中配对的动作-角色数据,实现了无监督的构建损失函数和训练动作迁移神经网络,有利于提高训练得到的动作迁移神经网络在进行动作迁移时的准确度。
上述动作迁移神经网络具体可以包括三个编码器和一个解码器,对动作迁移神经网络的训练实质上是对上述是三个编码器和一个解码器的训练。
在一些实施例中,上述基于所述第一样本二维骨架关键点序列和所述第二样本二维骨架关键点序列,确定损失函数,具体可以利用如下步骤实现:
步骤一、基于所述第一样本二维骨架关键点序列,确定所述第一样本动作迁移分量序列。
对第一样本二维骨架关键点序列中的每个第一样本二维关键点进行正交分解,得到每帧样本图像对应的第一样本运动分量信息、第一样本结构分量信息和第一样本角度分量信息。多帧样本图像对应的第一样本运动分量信息形成第一样本运动分量序列;多帧样本图像对应的第一样本结构分量信息形成第一样本结构分量序列;多帧样本图像对应的第一样本角度分量信息形成第一样本角度分量序列。第一样本运动分量序列、第一样本角度分量序列和第一样本结构分量序列形成上述第一样本动作迁移分量序列。
这里,如图4所示,利用动作迁移神经网络中的一个编码器Em对一个第一样本二维骨架关键点x进行处理,得到第一样本运动分量信息,利用另一个编码器Es对该第一样本二维骨架关键点x进行处理,得到第一样本结构分量信息,利用最后一个编码器Ev对该第一样本二维骨架关键点进行处理x,得到第一样本角度分量信息。
对当前帧样本图像对应的第一样本结构分量信息和当前帧样本图像相邻的多帧(例如,64帧)样本图像对应的第一样本结构分量信息进行均值池化处理,得到最终的第一样本结构分量信息
Figure PCTCN2021082407-appb-000001
对当前帧样本图像对应的第一样本角度分量信息和当前帧样本图像相邻的多帧样本图像对应的第一样本结构分量信息进行均值池化处理,得到最终的第一样本角度分量信息
Figure PCTCN2021082407-appb-000002
当前帧样本图像对应的第一样本运动分量信息不用进行均值池化处理,可以直接作为最终的第一样本运动分量信息m。
步骤二、基于所述第二样本二维骨架关键点序列,确定所述第二样本动作迁移分量序列。
对第二样本二维骨架关键点序列中的每个第二样本二维关键点进行正交分解, 得到每帧样本图像对应的第二样本运动分量信息、第二样本结构分量信息和第二样本角度分量信息。多帧样本图像对应的第二样本运动分量信息形成第二样本运动分量序列;多帧样本图像对应的第二样本结构分量信息形成第二样本结构分量序列;多帧样本图像对应的第二样本角度分量信息形成第二样本角度分量序列。第二样本运动分量序列、第二样本角度分量序列和第二样本结构分量序列形成上述第二样本动作迁移分量序列。
这里,如图4所示,利用动作迁移神经网络中的一个编码器Em对一个第二样本二维骨架关键点x’进行处理,得到第二样本运动分量信息,利用另一个编码器Es对第二样本二维骨架关键点x’进行处理,得到第二样本结构分量信息,利用最后一个编码器Ev对第二样本二维骨架关键点x’进行处理,得到第二样本角度分量信息。
对当前帧样本图像对应的第二样本结构分量信息和当前帧样本图像相邻的多帧样本图像对应的第二样本结构分量信息进行均值池化处理,得到最终的第二样本结构分量信息
Figure PCTCN2021082407-appb-000003
对当前帧样本图像对应的第二样本角度分量信息和当前帧样本图像相邻的多帧样本图像对应的第二样本结构分量信息进行均值池化处理,得到最终的第一样本角度分量信息
Figure PCTCN2021082407-appb-000004
当前帧样本图像对应的第二样本运动分量信息不用进行均值池化处理,可以直接作为最终的第二样本运动分量信息m’。
步骤三、基于所述第一样本动作迁移分量序列,确定预估三维骨架关键点序列。
这里,具体是利用一帧样本图像对应的第一样本运动分量信息、第一样本结构分量信息和第一样本角度分量信息,确定一个预估三维骨架关键点。多帧样本图像对应的预估三维骨架关键点即形成上述预估三维骨架关键点序列。
这里,具体可以利用一个解码器G对一帧样本图像的第一样本运动分量信息、第一样本结构分量信息和第一样本角度分量信息进行解码处理,得到重建后的预估三维骨架关键点。
步骤四、基于所述第一样本动作迁移分量序列、第二样本动作迁移分量序列和所述预估三维骨架关键点序列,确定所述损失函数。
在具体实施时,可以利用第一样本动作迁移分量序列中的所述第一样本运动分量信息、第一样本结构分量信息、第一样本角度分量信息,第二样本动作迁移分量序列中的第二样本运动分量信息、第二样本结构分量信息、第二样本角度分量信息进行二维的样本骨架关键点恢复,并利用预估三维骨架关键点序列和恢复的二维的样本骨架关键点构建损失函数。
本实施方式,利用第一样本二维骨架关键点序列正交分解后的第一样本动作迁移分量序列、第二样本二维骨架关键点序列正交分解后的第二样本动作迁移分量序列,以及,基于第一样本动作迁移分量序列重建得到的预估三维骨架关键点序列,来构建损失函数,能够提高在初始对象与目标对象的结构差异较大时,动作迁移的准确度。
由于样本对象尽管在结构和拍摄视角上存在变化和扰动,但是迁移后的运动信息应该是不变的,因此可以通过构建运动不变损失函数,并且在训练时,使运动不变损失函数最小,来提高构建的动作迁移神经网络在进行动作迁移时的准确度。具体地,可以利用如下步骤构建上述运动不变损失函数:
步骤一、基于所述第二样本运动分量信息、第一样本结构分量信息和第一样本角度分量信息,确定所述第一样本二维骨架关键点序列中对应的所述第一样本二维骨架关键点对应的第一预估骨架关键点。
如图4所示,具体可以利用如下子步骤实现:利用解码器G对第二样本运动分 量信息m’、第一样本结构分量信息
Figure PCTCN2021082407-appb-000005
第一样本角度分量信息
Figure PCTCN2021082407-appb-000006
进行处理,可以重建得到三维的骨架关键点
Figure PCTCN2021082407-appb-000007
之后,利用旋转投影函数
Figure PCTCN2021082407-appb-000008
将三维的骨架关键点
Figure PCTCN2021082407-appb-000009
重投影到二维空间,得到第一预估骨架关键点
Figure PCTCN2021082407-appb-000010
步骤二、基于所述第一样本运动分量信息、第二样本结构分量信息和第二样本角度分量信息,确定所述第二样本二维骨架关键点序列中对应的所述第二样本二维骨架关键点对应的第二预估骨架关键点。
如图4所示,具体可以利用如下子步骤实现:利用解码器G对第一样本运动分量信息m、第二样本结构分量信息
Figure PCTCN2021082407-appb-000011
第二样本角度分量信息
Figure PCTCN2021082407-appb-000012
进行处理,可以重建得到三维的骨架关键点
Figure PCTCN2021082407-appb-000013
之后,利用旋转投影函数
Figure PCTCN2021082407-appb-000014
将三维的骨架关键点
Figure PCTCN2021082407-appb-000015
重投影到二维空间,得到第二预估骨架关键点
Figure PCTCN2021082407-appb-000016
步骤一和步骤二中,生成第一预估骨架关键点
Figure PCTCN2021082407-appb-000017
和第二预估骨架关键点
Figure PCTCN2021082407-appb-000018
的具体公式如下:
Figure PCTCN2021082407-appb-000019
式中,
Figure PCTCN2021082407-appb-000020
表示进行对编码器提取的样本结构分量信息进行平均池化操作,
Figure PCTCN2021082407-appb-000021
表示进行对编码器提取的样本角度分量信息进行平均池化操作。
步骤三、基于所述第一预估骨架关键点、第二预估骨架关键点、第一样本运动分量信息、第二样本运动分量信息、和所述预估三维骨架关键点序列,确定所述运动不变损失函数。
构建的运动不变损失函数具体可以包括如下三个:
Figure PCTCN2021082407-appb-000022
Figure PCTCN2021082407-appb-000023
Figure PCTCN2021082407-appb-000024
Figure PCTCN2021082407-appb-000025
式中,N表示样本运动视频的帧数,T表示一个第一样本二维骨架关键点对应的关节的数量,M表示一个预设的数值,Cm表示第一样本运动分量信息对应的编码长度,K表示样本对象旋转的数量,
Figure PCTCN2021082407-appb-000026
表示一个预估三维骨架关键点,
Figure PCTCN2021082407-appb-000027
Figure PCTCN2021082407-appb-000028
表示三个运动不变损失函数。
本公开实施例中,利用第一样本二维骨架关键点序列和第二样本二维骨架关键点序列正交分解后的信息,对样本对象进行骨架恢复得到第一预估骨架关键点,以及对肢体缩放后的样本对象进行骨架恢复得到第二预估骨架关键点;之后,结合恢复得到的第一预估骨架关键点、第二预估骨架关键点和重建得到的样本对象的预估三维骨架关键点序列能够构建运动不变损失函数。
由于样本对象的结构随着时间的变化存在不变性,因此可以通过构建结构不变损失函数,并且在训练时,使运动不变损失函数和结构不变损失函数最小,来提高构建的动作迁移神经网络在进行动作迁移时的准确度。具体地,可以利用如下步骤构建上述结构不变损失函数:
步骤一、从所述第一样本二维骨架关键点序列中,筛选所述样本对象在第一时刻的第一样本二维骨架关键点、所述样本对象在第二时刻的第一样本二维骨架关键点。
从所述第二样本二维骨架关键点序列中,筛选所述样本对象在第二时刻的第二样本二维骨架关键点、和所述样本对象在第一时刻的第二样本二维骨架关键点。
上述第一样本二维骨架关键点是从样本运动视频中第一时刻t1和第二时刻t2对应的样本图像中分别提取的样本对象的二维骨架关键点,是未经过肢体比例缩放的样本对象的骨架关键点。上述第二样本二维骨架关键点是在样本运动视频中第一时刻t1和第二时刻t2对应的样本图像中分别提取的样本对象的骨架关键点进行肢体比例缩放后的关键点。
步骤二、基于所述样本对象在第一时刻的第一样本二维骨架关键点、所述样本对象在第二时刻的第一样本二维骨架关键点、所述样本对象在第二时刻的第二样本二维骨架关键点、所述样本对象在第一时刻的第二样本二维骨架关键点、和所述预估三维骨架关键点序列,确定所述结构不变损失函数。
在具体实施时,构建的结构不变损失函数包括如下两个:
Figure PCTCN2021082407-appb-000029
Figure PCTCN2021082407-appb-000030
Figure PCTCN2021082407-appb-000031
式中,St1表示从时刻t1的第一样本二维骨架关键点中直接提取的样本结构分量信息,St2表示从时刻t2的第一样本二维骨架关键点中直接提取的样本结构分量信息,St2’表示从时刻t2的第二样本二维骨架关键点中直接提取的样本结构分量信息,St1’表示从时刻t1的第二样本二维骨架关键点中直接提取的样本结构分量信息,Cb表示第一样本结构分量信息对应的编码长度,m是一个预设的数值,s()表示余弦相似函数,
Figure PCTCN2021082407-appb-000032
表示两个结构不变损失函数。
本公开实施例中,利用不同时刻的第一样本二维骨架关键点和第二样本二维骨架关键点,结合重建得到的样本对象的预估三维骨架关键点序列能够构建结构不变损失函数。
由于样本对象的拍摄视角随着样本对象的运动和结构的变化,存在不变性,因此可以通过构建视角不变损失函数,并且在训练时,使视角不变损失函数、运动不变损失函数和结构不变损失函数最小,能够提高构建的动作迁移神经网络在进行动作迁移时的准确度。具体地,可以利用如下步骤构建视角不变损失函数:
基于所述样本对象在第一时刻的第一样本二维骨架关键点、所述样本对象在第二时刻的第一样本二维骨架关键点、第一样本角度分量信息、第二样本角度分量信息、和所述预估三维骨架关键点序列,确定所述视角不变损失函数。
构建的视角不变损失函数具体包括如下两个:
Figure PCTCN2021082407-appb-000033
Figure PCTCN2021082407-appb-000034
Figure PCTCN2021082407-appb-000035
式中,vt1表示从时刻t1的第一样本二维骨架关键点中直接提取的样本角度分量信息,vt2表示从时刻t2的第一样本二维骨架关键点中直接提取的样本角度分量信息,Cv表示第一样本角度分量信息对应的编码长度,
Figure PCTCN2021082407-appb-000036
表示两个视角不变损失函数。
由于样本对象在进行样本对象恢复时,应该存在不变性,因此可以通过构建重建恢复损失函数,并且在训练时,使重建恢复损失函数、视角不变损失函数、运动不变损失函数和结构不变损失函数最小,来提高构建的动作迁移神经网络在进行动作迁移时的准确度。具体的,可以利用如下步骤构建重建恢复损失函数:
基于所述第一样本二维骨架关键点序列和所述预估三维骨架关键点序列,确定所述重建恢复损失函数。
构建的重建恢复损失函数具体包括如下两个:
Figure PCTCN2021082407-appb-000037
Figure PCTCN2021082407-appb-000038
式中,D表示一个时序上的卷积网络,
Figure PCTCN2021082407-appb-000039
表示x取自样本的概率分布, 再对后面的函数,即
Figure PCTCN2021082407-appb-000040
求期望,
Figure PCTCN2021082407-appb-000041
Figure PCTCN2021082407-appb-000042
表示两个重建恢复损失函数。
通过上面的实施例构建了重建恢复损失函数、视角不变损失函数、运动不变损失函数和结构不变损失函数,在具体实施时,可以利用如下公式对上述损失函数进行融合,得到目标损失函数:
Figure PCTCN2021082407-appb-000043
式中,λrec、λcrs、λadv、λtrip、λinv均表示预设的权重。
在训练动作迁移神经网络的时候,以上述目标损失函数取值最小即可。
对应于上述动作迁移方法,本公开还提供了一种动作迁移装置,该装置应用于进行动作迁移的终端设备或服务器上,并且各个模块能够实现与上述方法中相同的方法步骤以及取得相同的有益效果,因此对于其中相同的部分,本公开不再进行赘述。
如图5所示,本公开提供的一种动作装置可以包括:
视频获取模块510,用于获取包括初始对象的动作序列的第一初始视频。
关键点提取模块520,用于识别所述初始对象在所述第一初始视频的多帧图像中的二维骨架关键点序列。
关键点转换模块530,用于将所述二维骨架关键点序列转换为目标对象的三维骨架关键点序列。
图像渲染模块540,用于基于所述三维骨架关键点序列,生成包括目标对象的动作序列的目标视频。
在一些实施例中,所述关键点转换模块530在将所述二维骨架关键点序列转换为目标对象的三维骨架关键点序列时,用于:基于所述二维骨架关键点序列,确定所述初始对象的动作迁移分量序列;基于所述初始对象的动作迁移分量序列,确定所述目标对象的三维骨架关键点序列。
在一些实施例中,所述视频获取模块510还用于获取包括目标对象的第二初始视频;所述关键点提取模块520还用于识别所述目标对象在所述第二初始视频的多帧图像中的二维骨架关键点序列;所述关键点转换模块530在基于所述初始对象的动作迁移分量序列,确定所述目标对象的三维骨架关键点序列时,用于:基于所述目标对象的二维骨架关键点序列,确定所述目标对象的动作迁移分量序列;基于所述初始对象的动作迁移分量序列和所述目标对象的动作迁移分量序列,确定目标动作迁移分量序列;基于所述目标动作迁移分量序列确定所述目标对象的三维骨架关键点序列。
在一些实施例中,所述初始对象的动作迁移分量序列包括运动分量序列、对象结构分量序列和拍摄角度分量序列;所述关键点转换模块530在基于所述二维骨架关键点序列,确定所述初始对象的动作迁移分量序列时,用于:基于所述第一初始视频的多帧图像中每一帧图像对应的二维骨架关键点,分别确定每一帧图像对应的运动分量信息、对象结构分量信息和拍摄角度分量信息;基于所述第一初始视频的多帧图像中每一帧图像对应的运动分量信息,确定所述运动分量序列;基于所述第一初始视频的多帧图 像中每一帧图像对应的对象结构分量信息,确定所述对象结构分量序列;基于所述第一初始视频的多帧图像中每一帧图像对应的拍摄角度分量信息,确定所述拍摄角度分量序列。
本公开实施例公开了一种电子设备,如图6所示,包括:处理器601、存储器602和总线603,所述存储器602存储有所述处理器601可执行的机器可读指令,当电子设备运行时,所述处理器601与所述存储器602之间通过总线603通信。
所述机器可读指令被所述处理器601执行时执行以下动作迁移方法的步骤:获取包括初始对象的动作序列的第一初始视频;识别所述初始对象在所述第一初始视频的多帧图像中的二维骨架关键点序列;将所述二维骨架关键点序列转换为目标对象的三维骨架关键点序列;基于所述三维骨架关键点序列,生成包括目标对象的动作序列的目标视频。
除此之外,机器可读指令被处理器61执行时,还可以执行上述方法部分描述的任一实施方式中的方法内容,这里不再赘述。
本公开实施例还提供的一种对应于上述方法及装置的计算机程序产品,包括存储了程序代码的计算机可读存储介质,程序代码包括的指令可用于执行前面方法实施例中的方法,具体实现可参见方法实施例,在此不再赘述。
上文对各个实施例的描述倾向于强调各个实施例之间的不同之处,其相同或相似之处可以相互参考,为了简洁,本文不再赘述。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的具体工作过程,可以参考方法实施例中的对应过程,本公开中不再赘述。在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或模块的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
以上仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以权利要求的保护范围为准。

Claims (18)

  1. 一种动作迁移方法,其特征在于,包括:
    获取包括初始对象的动作序列的第一初始视频;
    识别所述初始对象在所述第一初始视频的多帧图像中的二维骨架关键点序列;
    将所述二维骨架关键点序列转换为目标对象的三维骨架关键点序列;
    基于所述三维骨架关键点序列,生成包括目标对象的动作序列的目标视频。
  2. 根据权利要求1所述的动作迁移方法,其特征在于,所述将所述二维骨架关键点序列转换为目标对象的三维骨架关键点序列,包括:
    基于所述二维骨架关键点序列,确定所述初始对象的动作迁移分量序列;
    基于所述初始对象的动作迁移分量序列,确定所述目标对象的三维骨架关键点序列。
  3. 根据权利要求2所述的动作迁移方法,其特征在于,
    在确定所述目标对象的三维骨架关键点序列之前,还包括:
    获取包括所述目标对象的第二初始视频;
    识别所述目标对象在所述第二初始视频的多帧图像中的二维骨架关键点序列;
    所述基于所述初始对象的动作迁移分量序列,确定所述目标对象的三维骨架关键点序列,包括:
    基于所述目标对象的二维骨架关键点序列,确定所述目标对象的动作迁移分量序列;
    基于所述初始对象的动作迁移分量序列和所述目标对象的动作迁移分量序列,确定目标动作迁移分量序列;
    基于所述目标动作迁移分量序列确定所述目标对象的三维骨架关键点序列。
  4. 根据权利要求2所述的动作迁移方法,其特征在于,所述初始对象的动作迁移分量序列包括运动分量序列、对象结构分量序列和拍摄角度分量序列;
    所述基于所述二维骨架关键点序列,确定所述初始对象的动作迁移分量序列,包括:
    基于所述第一初始视频的多帧图像中每一帧图像对应的二维骨架关键点,分别确定每一帧图像对应的运动分量信息、对象结构分量信息和拍摄角度分量信息;
    基于所述第一初始视频的多帧图像中每一帧图像对应的运动分量信息,确定所述运动分量序列;
    基于所述第一初始视频的多帧图像中每一帧图像对应的对象结构分量信息,确定所述对象结构分量序列;
    基于所述第一初始视频的多帧图像中每一帧图像对应的拍摄角度分量信息,确定所述拍摄角度分量序列。
  5. 根据权利要求1所述的动作迁移方法,其特征在于,所述基于所述三维骨架关键点序列,生成包括目标对象的动作序列的目标视频,包括:
    基于所述三维骨架关键点序列,生成所述目标对象的二维目标骨架关键点序列;
    基于所述二维目标骨架关键点序列,生成包括所述目标对象的动作序列的所述目标视频。
  6. 根据权利要求1至5任一项所述的动作迁移方法,其特征在于,所述将所述二维骨架关键点序列转换为目标对象的三维骨架关键点序列,包括:
    采用动作迁移神经网络将所述二维骨架关键点序列转换为所述目标对象的所述三维骨架关键点序列。
  7. 根据权利要求6所述的动作迁移方法,其特征在于,还包括训练所述动作迁移神经网络的步骤:
    获取包括样本对象的动作序列的样本运动视频;
    识别所述样本对象在所述样本运动视频的多帧样本图像中的第一样本二维骨架关 键点序列;
    对第一样本二维骨架关键点序列进行肢体比例缩放处理,得到第二样本二维骨架关键点序列;
    基于所述第一样本二维骨架关键点序列和所述第二样本二维骨架关键点序列,确定损失函数;
    基于所述损失函数,调整所述动作迁移神经网络的网络参数。
  8. 根据权利要求7所述的动作迁移方法,其特征在于,所述基于所述第一样本二维骨架关键点序列和所述第二样本二维骨架关键点序列,确定损失函数,包括:
    基于所述第一样本二维骨架关键点序列,确定所述第一样本动作迁移分量序列;
    基于所述第二样本二维骨架关键点序列,确定所述第二样本动作迁移分量序列;
    基于所述第一样本动作迁移分量序列,确定预估三维骨架关键点序列;
    基于所述第一样本动作迁移分量序列、所述第二样本动作迁移分量序列和所述预估三维骨架关键点序列,确定所述损失函数。
  9. 根据权利要求8所述的动作迁移方法,其特征在于,所述损失函数包括运动不变损失函数;所述第一样本动作迁移分量序列包括各帧样本图像对应的第一样本运动分量信息、第一样本结构分量信息和第一样本角度分量信息;所述第二样本动作迁移分量序列包括各帧样本图像对应的第二样本运动分量信息、第二样本结构分量信息和第二样本角度分量信息;
    所述确定所述损失函数,包括:
    基于所述各帧样本图像对应的所述第二样本运动分量信息、第一样本结构分量信息和第一样本角度分量信息,确定所述第一样本二维骨架关键点序列中对应的所述第一样本二维骨架关键点对应的第一预估骨架关键点;
    基于所述各帧样本图像对应的所述第一样本运动分量信息、第二样本结构分量信息和第二样本角度分量信息,确定所述第二样本二维骨架关键点序列中对应的所述第二样本二维骨架关键点对应的第二预估骨架关键点;
    基于所述第一预估骨架关键点、所述第二预估骨架关键点、所述第一样本运动分量信息、所述第二样本运动分量信息、和所述预估三维骨架关键点序列,确定所述运动不变损失函数。
  10. 根据权利要求9所述的动作迁移方法,其特征在于,所述损失函数还包括结构不变损失函数;
    所述确定所述损失函数还包括:
    从所述第一样本二维骨架关键点序列中,筛选第一时刻对应的样本图像中的第一样本二维骨架关键点以及第二时刻对应的样本图像中的第一样本二维骨架关键点;
    从所述第二样本二维骨架关键点序列中,筛选所述第二时刻对应的样本图像中的第二样本二维骨架关键点以及所述第一时刻对应的样本图像中的第二样本二维骨架关键点;
    基于所述第一时刻对应的样本图像中的第一样本二维骨架关键点、所述第二时刻对应的样本图像中的第一样本二维骨架关键点、所述第二时刻对应的样本图像中的第二样本二维骨架关键点、所述第一时刻对应的样本图像中的第二样本二维骨架关键点、和所述预估三维骨架关键点序列,确定所述结构不变损失函数。
  11. 根据权利要求10所述的动作迁移方法,其特征在于,所述损失函数还包括视角不变损失函数;
    所述确定所述损失函数还包括:
    基于所述第一时刻对应的样本图像中的第一样本二维骨架关键点、所述第二时刻对应的样本图像中的第一样本二维骨架关键点、所述第一时刻和第二时刻对应的 样本图像的第一样本角度分量信息、所述第一时刻和第二时刻对应的样本图像的第二样本角度分量信息、和所述预估三维骨架关键点序列,确定所述视角不变损失函数。
  12. 根据权利要求11所述的动作迁移方法,其特征在于,所述损失函数还包括重建恢复损失函数;
    所述确定所述损失函数还包括:
    基于所述第一样本二维骨架关键点序列和所述预估三维骨架关键点序列,确定所述重建恢复损失函数。
  13. 一种动作迁移装置,其特征在于,包括:
    视频获取模块,用于获取包括初始对象的动作序列的第一初始视频;
    关键点提取模块,用于识别所述初始对象在所述第一初始视频的多帧图像中的二维骨架关键点序列;
    关键点转换模块,用于将所述二维骨架关键点序列转换为目标对象的三维骨架关键点序列;
    图像渲染模块,用于基于所述三维骨架关键点序列,生成包括目标对象的动作序列的目标视频。
  14. 根据权利要求13所述的动作迁移装置,其特征在于,所述关键点转换模块在将所述二维骨架关键点序列转换为目标对象的三维骨架关键点序列时,用于:
    基于所述二维骨架关键点序列,确定所述初始对象的动作迁移分量序列;
    基于所述初始对象的动作迁移分量序列,确定所述目标对象的三维骨架关键点序列。
  15. 根据权利要求14所述的动作迁移装置,其特征在于,所述视频获取模块还用于获取包括目标对象的第二初始视频;
    所述关键点提取模块还用于识别所述目标对象在所述第二初始视频的多帧图像中的二维骨架关键点序列;
    所述关键点转换模块在基于所述初始对象的动作迁移分量序列,确定所述目标对象的三维骨架关键点序列时,用于:
    基于所述目标对象的二维骨架关键点序列,确定所述目标对象的动作迁移分量序列;
    基于所述初始对象的动作迁移分量序列和所述目标对象的动作迁移分量序列,确定目标动作迁移分量序列;
    基于所述目标动作迁移分量序列确定所述目标对象的三维骨架关键点序列。
  16. 根据权利要求14所述的动作迁移装置,其特征在于,所述初始对象的动作迁移分量序列包括运动分量序列、对象结构分量序列和拍摄角度分量序列;
    所述关键点转换模块在基于所述二维骨架关键点序列,确定所述初始对象的动作迁移分量序列时,用于:
    基于所述第一初始视频的多帧图像中每一帧图像对应的二维骨架关键点,分别确定所述初始对象的运动分量信息、对象结构分量信息和拍摄角度分量信息;
    基于所述第一初始视频的多帧图像中每一帧图像对应的运动分量信息,确定所述运动分量序列;
    基于所述第一初始视频的多帧图像中每一帧图像对应的对象结构分量信息,确定所述对象结构分量序列;
    基于所述第一初始视频的多帧图像中每一帧图像对应的拍摄角度分量信息,确定所述拍摄角度分量序列。
  17. 一种电子设备,其特征在于,包括:处理器、存储介质和总线,所述存储介质存储有所述处理器可执行的机器可读指令,当电子设备运行时,所述处理器与所述存储 介质之间通过总线通信,所述处理器执行所述机器可读指令,以执行如权利要求1~12任一所述的动作迁移方法。
  18. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器运行时执行如权利要求1~12任一所述的动作迁移方法。
PCT/CN2021/082407 2020-03-31 2021-03-23 动作迁移方法、装置、设备及存储介质 Ceased WO2021197143A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP21781900.2A EP3979204A4 (en) 2020-03-31 2021-03-23 MOTION TRANSFER METHOD AND APPARATUS, DEVICE, AND STORAGE MEDIA
JP2021573955A JP2022536381A (ja) 2020-03-31 2021-03-23 動作遷移方法、装置、デバイス、および記憶媒体
KR1020217038862A KR20220002551A (ko) 2020-03-31 2021-03-23 움직임 전이 방법, 장치, 디바이스 및 저장 매체
US17/555,965 US20220114777A1 (en) 2020-03-31 2021-12-20 Method, apparatus, device and storage medium for action transfer

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010243906.1A CN111462209B (zh) 2020-03-31 2020-03-31 动作迁移方法、装置、设备及存储介质
CN202010243906.1 2020-03-31

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/555,965 Continuation US20220114777A1 (en) 2020-03-31 2021-12-20 Method, apparatus, device and storage medium for action transfer

Publications (1)

Publication Number Publication Date
WO2021197143A1 true WO2021197143A1 (zh) 2021-10-07

Family

ID=71685166

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/082407 Ceased WO2021197143A1 (zh) 2020-03-31 2021-03-23 动作迁移方法、装置、设备及存储介质

Country Status (7)

Country Link
US (1) US20220114777A1 (zh)
EP (1) EP3979204A4 (zh)
JP (1) JP2022536381A (zh)
KR (1) KR20220002551A (zh)
CN (1) CN111462209B (zh)
TW (1) TW202139135A (zh)
WO (1) WO2021197143A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230196712A1 (en) * 2021-12-21 2023-06-22 Snap Inc. Real-time motion and appearance transfer
US11880947B2 (en) 2021-12-21 2024-01-23 Snap Inc. Real-time upper-body garment exchange
US12002175B2 (en) 2020-11-18 2024-06-04 Snap Inc. Real-time motion transfer for prosthetic limbs
US12223672B2 (en) 2021-12-21 2025-02-11 Snap Inc. Real-time garment exchange
US12229860B2 (en) 2020-11-18 2025-02-18 Snap Inc. Body animation sharing and remixing
US12243173B2 (en) 2020-10-27 2025-03-04 Snap Inc. Side-by-side character animation from realtime 3D body motion capture

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111462209B (zh) * 2020-03-31 2022-05-24 北京市商汤科技开发有限公司 动作迁移方法、装置、设备及存储介质
CN114792441A (zh) * 2021-01-25 2022-07-26 深圳绿米联创科技有限公司 动作识别方法、装置、电子设备及计算机可读存储介质
WO2022269708A1 (ja) * 2021-06-21 2022-12-29 日本電信電話株式会社 情報処理装置及び情報処理方法
CN113870313B (zh) * 2021-10-18 2023-11-14 南京硅基智能科技有限公司 一种动作迁移方法
CN113989928B (zh) * 2021-10-27 2023-09-05 南京硅基智能科技有限公司 一种动作捕捉和重定向方法
CN115100028A (zh) * 2022-06-17 2022-09-23 北京百度网讯科技有限公司 一种迁移图像关键点的方法、装置、电子设备及存储介质
CN116434329A (zh) * 2023-03-15 2023-07-14 西安理工大学 基于姿势引导的动作迁移方法
CN116778582B (zh) * 2023-06-25 2026-01-02 上海交通大学 基于计算机视觉的建筑施工现场工人跌倒前兆检测方法
CN119722880B (zh) * 2024-12-05 2025-11-28 北京百度网讯科技有限公司 三维模型的驱动方法、装置及电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510577A (zh) * 2018-01-31 2018-09-07 中国科学院软件研究所 一种基于已有动作数据的真实感动作迁移和生成方法及系统
CN109821239A (zh) * 2019-02-20 2019-05-31 网易(杭州)网络有限公司 体感游戏的实现方法、装置、设备及存储介质
CN109978975A (zh) * 2019-03-12 2019-07-05 深圳市商汤科技有限公司 一种动作的迁移方法及装置、计算机设备
US20190295305A1 (en) * 2018-03-20 2019-09-26 Adobe Inc. Retargeting skeleton motion sequences through cycle consistency adversarial training of a motion synthesis neural network with a forward kinematics layer
CN111462209A (zh) * 2020-03-31 2020-07-28 北京市商汤科技开发有限公司 动作迁移方法、装置、设备及存储介质
CN111540055A (zh) * 2020-04-16 2020-08-14 广州虎牙科技有限公司 三维模型驱动方法、装置、电子设备及存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170086317A (ko) * 2016-01-18 2017-07-26 한국전자통신연구원 타이밍 변환을 이용한 3차원 캐릭터 동작 생성 장치 및 방법
US20190260940A1 (en) * 2018-02-22 2019-08-22 Perspective Components, Inc. Dynamic camera object tracking
CN108985259B (zh) * 2018-08-03 2022-03-18 百度在线网络技术(北京)有限公司 人体动作识别方法和装置
CN109785322B (zh) * 2019-01-31 2021-07-02 北京市商汤科技开发有限公司 单眼人体姿态估计网络训练方法、图像处理方法和装置
CN110197167B (zh) * 2019-06-05 2021-03-26 清华大学深圳研究生院 一种视频动作迁移方法
CN110246209B (zh) * 2019-06-19 2021-07-09 腾讯科技(深圳)有限公司 图像处理方法及装置
CN110490897A (zh) * 2019-07-30 2019-11-22 维沃移动通信有限公司 模仿视频生成的方法和电子设备
CN110666793B (zh) * 2019-09-11 2020-11-03 大连理工大学 基于深度强化学习实现机器人方形零件装配的方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510577A (zh) * 2018-01-31 2018-09-07 中国科学院软件研究所 一种基于已有动作数据的真实感动作迁移和生成方法及系统
US20190295305A1 (en) * 2018-03-20 2019-09-26 Adobe Inc. Retargeting skeleton motion sequences through cycle consistency adversarial training of a motion synthesis neural network with a forward kinematics layer
CN109821239A (zh) * 2019-02-20 2019-05-31 网易(杭州)网络有限公司 体感游戏的实现方法、装置、设备及存储介质
CN109978975A (zh) * 2019-03-12 2019-07-05 深圳市商汤科技有限公司 一种动作的迁移方法及装置、计算机设备
CN111462209A (zh) * 2020-03-31 2020-07-28 北京市商汤科技开发有限公司 动作迁移方法、装置、设备及存储介质
CN111540055A (zh) * 2020-04-16 2020-08-14 广州虎牙科技有限公司 三维模型驱动方法、装置、电子设备及存储介质

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ABERMAN, KFIR ET AL.: "Learning Character-Agnostic Motion for Motion Retargeting in 2D", ACM TRANSACTIONS ON GRAPHICS, vol. 38, no. 4, 12 July 2019 (2019-07-12), XP058439450, DOI: 10.1145/3306346.3322999 *
See also references of EP3979204A4 *
ZHUOQIAN YANG, WENTAO ZHU, WAYNE WU, CHEN QIAN, QIANG ZHOU, BOLEI ZHOU, CHEN CHANGE LOY: "TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting", 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 1 June 2020 (2020-06-01), pages 5306 - 5315, XP055855037, ISSN: 2575-7075, ISBN: 978-1-72817-168-5, DOI: 10.1109/CVPR42600.2020.00535 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12243173B2 (en) 2020-10-27 2025-03-04 Snap Inc. Side-by-side character animation from realtime 3D body motion capture
US12002175B2 (en) 2020-11-18 2024-06-04 Snap Inc. Real-time motion transfer for prosthetic limbs
US12229860B2 (en) 2020-11-18 2025-02-18 Snap Inc. Body animation sharing and remixing
US20230196712A1 (en) * 2021-12-21 2023-06-22 Snap Inc. Real-time motion and appearance transfer
WO2023121896A1 (en) * 2021-12-21 2023-06-29 Snap Inc. Real-time motion and appearance transfer
US11880947B2 (en) 2021-12-21 2024-01-23 Snap Inc. Real-time upper-body garment exchange
US12198398B2 (en) 2021-12-21 2025-01-14 Snap Inc. Real-time motion and appearance transfer
US12223672B2 (en) 2021-12-21 2025-02-11 Snap Inc. Real-time garment exchange

Also Published As

Publication number Publication date
KR20220002551A (ko) 2022-01-06
TW202139135A (zh) 2021-10-16
EP3979204A4 (en) 2022-11-16
US20220114777A1 (en) 2022-04-14
JP2022536381A (ja) 2022-08-15
CN111462209A (zh) 2020-07-28
CN111462209B (zh) 2022-05-24
EP3979204A1 (en) 2022-04-06

Similar Documents

Publication Publication Date Title
WO2021197143A1 (zh) 动作迁移方法、装置、设备及存储介质
US12067659B2 (en) Generating animated digital videos utilizing a character animation neural network informed by pose and motion embeddings
CN110637323B (zh) 基于部分的跟踪的方法、设备和系统
WO2022267641A1 (zh) 一种基于循环生成对抗网络的图像去雾方法及系统
CN111339870B (zh) 一种针对物体遮挡场景的人体形状和姿态估计方法
CN110264509A (zh) 确定图像捕捉设备的位姿的方法、装置及其存储介质
CN110580720B (zh) 一种基于全景图的相机位姿估计方法
CN114581613B (zh) 一种基于轨迹约束的人体模型姿态和形状优化方法和系统
CN110569768A (zh) 人脸模型的构建方法、人脸识别方法、装置及设备
WO2021228183A1 (en) Facial re-enactment
CN117541646B (zh) 一种基于参数化模型的动作捕捉方法及系统
CN115760943A (zh) 一种基于边缘特征学习的无监督单目深度估计方法
CN114663983A (zh) 网状拓扑结构获取方法、装置、电子设备及存储介质
CN117974744A (zh) 基于生成对抗网络和神经网络辐射场的冠脉三维重建方法
CN117011357A (zh) 基于3d运动流和法线图约束的人体深度估计方法及系统
CN111311732A (zh) 3d人体网格获取方法及装置
CN117979057B (zh) 一种三维点云辅助视频语义通信的发送接收方法和装置
CN109741245A (zh) 平面信息的插入方法及装置
CN111783497B (zh) 视频中目标的特征确定方法、装置和计算机可读存储介质
CN114092610B (zh) 一种基于生成对抗网络的人物视频生成方法
CN118052855A (zh) 一种三维点云配准数据的生成方法及系统、设备、介质
HK40030518B (zh) 动作迁移方法、装置、设备及存储介质
Liu et al. Depth Map Super-Resolution via Deep Cross-modality and Cross-scale Guidance
CN121458797B (zh) 一种3d人体姿态估计方法、装置、设备及可读存储介质
HK40030518A (zh) 动作迁移方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21781900

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20217038862

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2021573955

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2021781900

Country of ref document: EP

Effective date: 20211228

NENP Non-entry into the national phase

Ref country code: DE