CN108564642A

CN108564642A - Unmarked performance based on UE engines captures system

Info

Publication number: CN108564642A
Application number: CN201810217894.8A
Authority: CN
Inventors: 车武军; 吴泽烨; 谷卓; 徐波
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2018-03-16
Filing date: 2018-03-16
Publication date: 2018-09-21

Abstract

The present invention relates to the field of image processing, and proposes a UE engine-based unmarked performance capture system, aiming to solve the problem of intrusiveness caused by marker points to performers in the method of simultaneously capturing the actions and expressions of performers to generate character animation, making the performers The problem of being disturbed. The system includes: a facial performance capture module, configured to collect facial image data of the performer, and calculate the weight parameters of the facial expressions of the above-mentioned performer according to the facial image data; a motion performance capture module, configured to collect the bone image of the above-mentioned performer data, and determine the human body posture parameters of the above-mentioned performer according to the above-mentioned skeleton image data; the animation generation module is configured to use the UE graphics program to generate the actions and expressions of the role 3D model according to the weight parameters of the above-mentioned facial expressions and the above-mentioned human body posture parameters. The invention realizes the capture of the performer's movements and expressions, and endows virtual characters with real and reasonable movements and vivid expressions according to the movement and expression data.

Description

Unmarked performance capture system based on UE engine

技术领域technical field

本发明涉及了计算机图形学、计算机视觉和虚拟现实领域，特别涉及一种基于UE引擎的无标记表演捕捉系统。The invention relates to the fields of computer graphics, computer vision and virtual reality, in particular to a UE engine-based unmarked performance capture system.

背景技术Background technique

表演捕捉技术用于对表演者的动作和表情进行捕捉，在电影、动画、游戏等领域中具有广泛的应用。通过表演捕捉技术赋予虚拟角色真实合理的动作与生动的表情，可以带给用户优更秀的观感体验。动作捕捉技术包括光学式捕捉与惯性导航式捕捉。光学式捕捉通过光学相机拍摄表演者，分析计算表演者的关节点，例如kinect等；惯性导航式捕捉通过表演者身上穿戴的传感器获取关节点的运动状态，分析出表演者当前的姿态，例如诺亦腾、OptiTrack等。Performance capture technology is used to capture the movements and expressions of performers, and has a wide range of applications in movies, animations, games and other fields. Through the performance capture technology, the virtual characters can be endowed with real and reasonable movements and vivid expressions, which can bring users a better viewing experience. Motion capture technology includes optical capture and inertial navigation capture. Optical capture uses an optical camera to photograph the performer, analyzes and calculates the joint points of the performer, such as kinect, etc.; inertial navigation capture obtains the motion state of the joint points through the sensors worn by the performer, and analyzes the current posture of the performer, such as Nuo Yiteng, OptiTrack, etc.

当前，已有表演捕捉技术方案有，在表演者全身和脸部贴上标记，通过光学相机捕捉全身动作及面部表情，在后期制作中根据捕捉到的标记点将拍摄的表演者影像替换成虚拟角色模型。但标记点对表演者造成侵入感，使得自然表演的难度增加。或者，分别进行表情捕捉与动作捕捉，然后进行合成，但是在后期制作中增加了相互间结合的难度，并对用户进行其他角色编辑有所限制。At present, there are existing performance capture technology solutions, such as putting marks on the performers' body and face, capturing the whole body movements and facial expressions through optical cameras, and replacing the captured images of performers with virtual ones in post-production according to the captured mark points. role models. But the markers are intrusive to the performer, making natural performances more difficult. Or, expression capture and motion capture are carried out separately, and then synthesized, but it increases the difficulty of combining each other in post-production, and limits users to other character editing.

发明内容Contents of the invention

为了解决现有技术中的上述问题，即为了解决同时捕捉表演者的动作与表情以生成角色动画方法中，标记点对表演者造成侵入感，使得自然表演的难度增加，或者，由于分别进行表情捕捉与动作捕捉，然后进行合成，造成在后期制作中增加了相互间结合的难度，并对用户进行其他角色编辑有所限制的问题，本发明采用以下技术方案以解决上述问题：In order to solve the above-mentioned problems in the prior art, that is, in order to solve the method of capturing the actions and expressions of the performer at the same time to generate character animation, the marker points cause intrusion to the performer, which increases the difficulty of natural performance, or, due to the expression Capture and motion capture, and then synthesize, resulting in increased difficulty in combining each other in post-production, and restricting the editing of other roles for users. The present invention adopts the following technical solutions to solve the above problems:

本申请提供了基于UE引擎(Unreal Engine，虚拟引擎)的无标记表演捕捉系统，该系统包括：面部表演捕捉模块，配置为采集表演者的面部图像数据，并根据上述面部图像数据计算上述表演者的面部表情的权重参数，并记为第一权重参数；动作表演捕捉模块，配置为采集上述表演者的骨骼图像数据，并根据上述骨骼图像数据确定上述表演者的人体姿态参数；动画生成模块，配置为根据上述第一权重参数和上述人体姿态参数，利用UE图形程序生成上述表演者对应人物角色的3D模型的动作和表情。The application provides a markerless performance capture system based on UE engine (Unreal Engine, virtual engine), the system includes: a facial performance capture module configured to collect facial image data of a performer, and calculate the performance of the performer based on the facial image data The weight parameter of facial expression, and be recorded as the first weight parameter; Motion performance capture module, be configured to gather the bone image data of above-mentioned performer, and determine the human body posture parameter of above-mentioned performer according to above-mentioned bone image data; Animation generation module, It is configured to use the UE graphics program to generate the actions and expressions of the 3D model of the corresponding character of the above-mentioned performer according to the above-mentioned first weight parameter and the above-mentioned human body posture parameter.

在一些示例中，上述面部表演捕捉模块包括面部图像采集单元和表情计算单元；上述面部图像采集单元，配置为采集表演者正面人脸的面部图像数据；上述表情计算单元，配置为对上述面部图像数据进行特征点跟踪，计算上述表演者的面部表情的权重参数。In some examples, the above-mentioned facial performance capture module includes a facial image acquisition unit and an expression calculation unit; the above-mentioned facial image acquisition unit is configured to collect the facial image data of the performer's frontal face; the above-mentioned expression calculation unit is configured to perform the above-mentioned facial image The data is tracked with feature points, and the weight parameters of the facial expressions of the above-mentioned performers are calculated.

在一些示例中，上述动作表演捕捉模块包括骨骼数据采集单元和人体姿态确认单元；上述骨骼图像采集单元包括多台Kinect传感器，配置为从不同的角度采集上述表演者的多帧骨骼图像数据，各帧上述骨骼图像数据包括组成人体骨骼的各关节点的关节点坐标和各上述关节点的跟踪属性，并根据上述跟踪属性为各上述骨骼图像数据的各个关节点分配可信度；上述人体姿态确认单元，配置为根据上述表演者的骨骼图像数据中各关节点坐标和各上述关节点坐标变化确定出上述表演者的人体姿态参数。In some examples, the above-mentioned action performance capture module includes a skeleton data collection unit and a human body posture confirmation unit; the above-mentioned skeleton image collection unit includes multiple Kinect sensors, configured to collect multiple frames of skeleton image data of the above-mentioned performer from different angles, each Frame the above-mentioned skeleton image data including the joint point coordinates of each joint point forming the human skeleton and the tracking attributes of each of the above-mentioned joint points, and assign credibility to each joint point of each of the above-mentioned skeleton image data according to the above-mentioned tracking properties; the above-mentioned human body posture confirmation The unit is configured to determine the human body posture parameters of the performer according to the coordinates of each joint point in the bone image data of the performer and the change of the coordinates of each joint point.

在一些示例中，上述人体姿态确认单元进一步配置为：利用预设的坐标转换矩阵对各台Kinect传感器所采集的骨骼图像数据进行坐标系转换，生成参考骨骼数据；根据各参考骨架数据利用加权平均算法合成上述表演者的平均骨架数据。In some examples, the human body posture confirmation unit is further configured to: use a preset coordinate transformation matrix to perform coordinate system transformation on the skeleton image data collected by each Kinect sensor to generate reference skeleton data; use weighted average The algorithm synthesizes the average skeleton data of the performers mentioned above.

在一些示例中，上述“根据各参考骨架数据利用加权平均算法合成上述表演者的平均骨架数据”，包括：确定上述参考骨架数据的关节点的可信度为上述关节点的权重因子；根据各参考骨架数据的任一关节点坐标和上述关节点的权重因子计算上述关节点坐标的平均值；根据组成人体骨架的全部关节点坐标的平均值确定上述表演者的平均骨架数据。In some examples, the above-mentioned "combining the average skeleton data of the above-mentioned performer using a weighted average algorithm according to each reference skeleton data" includes: determining the reliability of the joint points of the above-mentioned reference skeleton data as the weighting factors of the above-mentioned joint points; Refer to any joint point coordinates of the skeleton data and the weight factors of the joint points to calculate the average value of the above joint point coordinates; determine the average skeleton data of the above-mentioned performer according to the average value of all joint point coordinates that make up the human skeleton.

在一些示例中，上述动画生成模块包括骨骼运动控制单元和表情控制单元；上述骨骼运动控制单元，配置为根据上述动作表演捕捉模块确定的人体姿态参数，利用上述UE图形程序生成人物角色的3D模型的动作动画；上述表情控制单元，配置为根据上述面部表演捕捉模块确定的面部表情权重参数，利用上述UE图形程序生成上述人物角色的3D模型的表情动画。In some examples, the above-mentioned animation generation module includes a skeletal motion control unit and an expression control unit; the above-mentioned skeletal motion control unit is configured to use the above-mentioned UE graphics program to generate a 3D model of the character according to the human body posture parameters determined by the above-mentioned action performance capture module The action animation; the expression control unit is configured to use the UE graphics program to generate the expression animation of the 3D model of the character according to the facial expression weight parameters determined by the facial performance capture module.

在一些示例中，上述骨骼运动控制单元，进一步配置为：利用预设的映射关系，将上述平均骨架数据转换为UE4图形程序中上述人物角色的角色模型数据；采用四元数混合的方式将上述角色模型数据通过UE4引擎赋值到上述人物角色的3D模型；计算初始骨架变化到当前骨架的过程中每根骨骼的变化量；将各上述变化量附加对应骨骼的父关节点，确定出上述人物角色的3D模型的动作动画。In some examples, the above-mentioned skeletal motion control unit is further configured to: use a preset mapping relationship to convert the above-mentioned average skeleton data into the character model data of the above-mentioned character in the UE4 graphics program; The character model data is assigned to the 3D model of the above-mentioned character through the UE4 engine; the amount of change of each bone in the process of changing the initial skeleton to the current skeleton is calculated; each of the above-mentioned changes is added to the parent joint point of the corresponding bone to determine the above-mentioned character Motion animation of 3D models.

在一些示例中，上述表情控制单元进一步配置为：将上述第一权重参数与预设角色表情库的各基础表情进行对应，确定出上述面部表情对应的基础表情组合；利用预设的目标变形函数与上述角色表情库中各基础表情的对应关系，确定出上述面部表情对应于上述人物角色的3D模型的表情动画。In some examples, the above-mentioned expression control unit is further configured to: correspond the above-mentioned first weight parameter with each basic expression of the preset character expression library, and determine the basic expression combination corresponding to the above-mentioned facial expression; use the preset target deformation function The corresponding relationship with each basic expression in the above-mentioned character expression database determines that the above-mentioned facial expression corresponds to the expression animation of the 3D model of the above-mentioned character.

在一些示例中，上述“将上述第一权重参数与预设角色表情库的各基础表情进行对应，确定出上述面部表情对应的基础表情组合”，包括：利用预设的表情权重计算程序计算上述角色表情库中的各基础表情的角色表情权重参数，并记为第二权重参数；将上述第一权重参数与上述第二权重参数进行映射，根据映射结果，确定出与上述面部表情对应的第二权重参数；根据上述第二权重参数与上述角色表情库中的各基础表情的对应关系，确定上述面部表情对应的上述角色表情库中的基础表情组合。In some examples, the above-mentioned "corresponding the above-mentioned first weight parameter with each basic expression in the preset character expression library, and determining the basic expression combination corresponding to the above-mentioned facial expression" includes: using a preset expression weight calculation program to calculate the above-mentioned The character expression weight parameters of each basic expression in the character expression database are recorded as the second weight parameter; the above-mentioned first weight parameter is mapped with the above-mentioned second weight parameter, and the first weight parameter corresponding to the above-mentioned facial expression is determined according to the mapping result. Two weight parameters: according to the corresponding relationship between the second weight parameter and the basic expressions in the character expression database, determine the basic expression combination in the character expression database corresponding to the facial expression.

在一些示例中，上述“将上述第一权重参数与上述第二权重参数进行映射，根据映射结果，确定出与上述面部表情对应的第二权重参数”，包括：将所述UE图形程序中第一权重参数的数目与所述角色表情库中基础表情的数目比较；如果数目相同，选取与上述第一权重参数序号一致的第二权重参数作为上述面部表情对应的第二权重参数；如果上述UE图形程序中第一权重参数的数目小于上述角色表情库中基础表情的数目，则根据第一权重参数的数目，从上述角色基础表情库中选取相同数目的基础表情作为表情子集，计算上述表情子集中各基础表情的角色表情权重参数，并记为新的第二权重参数，选取与上述第一权重参数序号一致的新的上述第二权重参数作为上述面部表情对应的第二权重参数；否则，选取与上述第一权重参数的差值最小的第二权重参数作为上述面部表情对应的第二权重参数。In some examples, the above-mentioned "mapping the above-mentioned first weight parameter and the above-mentioned second weight parameter, and determining the second weight parameter corresponding to the above-mentioned facial expression according to the mapping result" includes: The number of a weight parameter is compared with the number of basic expressions in the character expression library; if the number is the same, select the second weight parameter consistent with the serial number of the first weight parameter as the second weight parameter corresponding to the facial expression; if the UE The number of the first weight parameter in the graphics program is less than the number of basic expressions in the above-mentioned character expression library, then according to the number of the first weight parameter, select the same number of basic expressions from the above-mentioned character basic expression library as the expression subset, and calculate the above-mentioned expressions The character expression weight parameters of each basic expression in the subset are recorded as a new second weight parameter, and the new above-mentioned second weight parameter consistent with the above-mentioned first weight parameter sequence number is selected as the second weight parameter corresponding to the above-mentioned facial expression; otherwise , selecting the second weight parameter with the smallest difference with the first weight parameter as the second weight parameter corresponding to the facial expression.

本申请提供的基于UE引擎的无标记表演捕捉系统，通过面部表演捕捉模块捕捉表演者的面部表情、动作表演捕捉模块捕捉表演者的肢体动作，动画生成模块根据表演者的面部表情和肢体动作，利用UE图形程序生成人物角色的3D模型的动作动画和表情动画。本发明可以同时捕捉表演者的动作和表情数据，并在UE引擎中通过角色动画的形式实时渲染出来，用户可以自定义角色模型。解决了同时捕捉表演者的动作与表情以生成角色动画方法中，标记点对表演者造成侵入感，使得动画角色人物的表演受到干扰。The UE engine-based unmarked performance capture system provided by this application captures the performer's facial expressions through the facial performance capture module, and the motion performance capture module captures the performer's body movements, and the animation generation module uses the performer's facial expressions and body movements, Use the UE graphics program to generate the action animation and expression animation of the 3D model of the character. The present invention can capture the performer's action and expression data at the same time, and render them in the form of character animation in the UE engine in real time, and the user can customize the character model. Solve the problem that in the method of simultaneously capturing the actions and expressions of the performer to generate character animation, the marker points cause intrusion to the performer, which interferes with the performance of the animated character.

附图说明Description of drawings

图1是本申请可以应用于其中的示例性系统架构图；FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;

图2是根据本申请的基于UE引擎的无标记表演捕捉系统的一个实施流程图；Fig. 2 is an implementation flow chart of a UE engine-based markerless performance capture system according to the present application;

图3是本申请用于中捕捉面部表情的头盔式网络相机示意图；Fig. 3 is a schematic diagram of a helmet-mounted network camera used in the present application to capture facial expressions;

图4a和图4b是动作表演和表情表演的表演捕捉效果图。Figure 4a and Figure 4b are performance capture renderings of action performances and facial expression performances.

具体实施方式Detailed ways

下面参照附图来描述本发明的优选实施方式。本领域技术人员应当理解的是，这些实施方式仅仅用于解释本发明的技术原理，并非旨在限制本发明的保护范围。Preferred embodiments of the present invention are described below with reference to the accompanying drawings. Those skilled in the art should understand that these embodiments are only used to explain the technical principles of the present invention, and are not intended to limit the protection scope of the present invention.

需要说明的是，在不冲突的情况下，本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The present application will be described in detail below with reference to the accompanying drawings and embodiments.

图1示出了可以应用本申请的基于UE引擎的无标记表演捕捉系统的实施例的示例性系统架构。FIG. 1 shows an exemplary system architecture of an embodiment of the UE engine-based markerless performance capture system to which the present application can be applied.

如图1所示，系统包括：面部表演捕捉模块，配置为采集表演者的面部图像数据，并根据上述面部图像数据计算上述表演者的面部表情的权重参数，并记为第一权重参数；动作表演捕捉模块，配置为采集上述表演者的骨骼图像数据，并根据上述骨骼图像数据确定上述表演者的人体姿态参数；动画生成模块，配置为根据上述第一权重参数和上述人体姿态参数，利用UE图形程序生成上述表演者对应人物角色的3D模型的动作和表情。As shown in Figure 1, the system includes: a facial performance capture module configured to collect the facial image data of the performer, and calculate the weight parameter of the facial expression of the above-mentioned performer according to the facial image data, and record it as the first weight parameter; The performance capture module is configured to collect the skeletal image data of the above-mentioned performer, and determine the human body posture parameters of the above-mentioned performer according to the above-mentioned skeletal image data; the animation generation module is configured to use UE The graphics program generates the actions and expressions of the 3D model of the above-mentioned performer corresponding to the character.

继续参考图2，示出了在本实施例中系统的实施示意图。在本实施例中，上述面部表演捕捉模块和上述动作表演捕捉模块分别获取表演表演者的面部表情和动作姿态的信息，并将所获得的与表演相关的面部表情信息和动作姿态信息发送到上述动画生成模块，上述动画生成模块根据上述面部表情信息和动作姿态信息，利用UE图形程序生成该人物角色的3D模型的动作动画和表情动画。上述面部表演捕捉模块和上述动作表演捕捉模块可以是用户实时输入的信息；用户可以预先准备角色数据，根据用户输入的实时信息，实时生成人物角色的动作动画。Continue to refer to FIG. 2 , which shows a schematic diagram of the implementation of the system in this embodiment. In this embodiment, the above-mentioned facial performance capture module and the above-mentioned action performance capture module respectively acquire information on the facial expressions and gestures of performers, and send the obtained facial expression information and gesture information related to the performance to the above-mentioned An animation generation module, the above-mentioned animation generation module uses the UE graphics program to generate the action animation and expression animation of the 3D model of the character according to the above-mentioned facial expression information and action posture information. The above-mentioned facial performance capture module and the above-mentioned action performance capture module can be information input by the user in real time; the user can prepare character data in advance, and generate the action animation of the character in real time according to the real-time information input by the user.

在本实施例中，上述面部表演捕捉模块包括面部图像采集单元和表情计算单元。其中，上述面部图像采集单元，配置为采集表演者正面人脸的面部图像数据；上述表情计算单元，配置为对上述面部图像数据进行特征点跟踪与分析，计算上述表演者的面部表情的权重参数。In this embodiment, the facial performance capture module includes a facial image acquisition unit and an expression calculation unit. Wherein, the above-mentioned facial image acquisition unit is configured to collect the facial image data of the performer's frontal face; the above-mentioned expression calculation unit is configured to perform feature point tracking and analysis on the above-mentioned facial image data, and calculate the weight parameters of the above-mentioned performer's facial expression .

上述面部图像采集单元可以是视频或图像采集传感设备，例如，可以为头盔式网络相机。如图3所示，上述头盔式网络相机主要结构如下：头盔正面装有一个可调节支架，支架尾部装有一部网络摄像头，头盔背面装有电源，通过数据线与摄像头相连。在进行面部图像捕捉时上述头盔式网络相机对目标用户的正面人脸图像进行实时拍摄。所拍摄的图像或视频流通过有线或无线网络传输到设置于PC端的表情计算单元进行表情参数计算。The above-mentioned facial image acquisition unit may be a video or image acquisition sensor device, for example, may be a helmet-mounted network camera. As shown in Figure 3, the main structure of the above-mentioned helmet-mounted network camera is as follows: an adjustable bracket is installed on the front of the helmet, a network camera is installed at the rear of the bracket, and a power supply is installed on the back of the helmet, which is connected to the camera through a data cable. When the facial image is captured, the above-mentioned helmet-mounted network camera shoots the front face image of the target user in real time. The captured images or video streams are transmitted to the expression calculation unit installed on the PC side through a wired or wireless network for calculation of expression parameters.

上述表情参数计算单元配置为对上述面部图像数据进行特征点跟踪与分析，计算上述表演者的面部表情的权重参数。上述表情参数计算单元中预设表情参数计算程序，上述表情参数计算程序对所获取的表演者的面部图像数据进行特征点的跟踪，计算出表演者的面部表情的权重参数。The expression parameter calculation unit is configured to perform feature point tracking and analysis on the facial image data, and calculate weight parameters of the performer's facial expression. An expression parameter calculation program is preset in the expression parameter calculation unit, and the above expression parameter calculation program tracks the feature points of the acquired performer's facial image data, and calculates the weight parameter of the performer's facial expression.

作为示例，上述表演者的权重参数计算可以通过如下方式进行。可以用PC机连接一台Kinect传感器，FaceShift能够自动检测该Kinect传感器并与其连接，Kinect传感器捕获的人脸表情的深度数据可以实时传输给FaceShift。FaceShift将Kinect传感器获取的人脸表情深度数据与用户的基础表情模型进行对比分析，FaceShift计算出当前表情的51个权重参数，记为{w_i，i＝1，2，...，51}。As an example, the calculation of the above-mentioned performer's weight parameters can be performed in the following manner. A Kinect sensor can be connected with a PC, and FaceShift can automatically detect and connect to the Kinect sensor, and the depth data of facial expressions captured by the Kinect sensor can be transmitted to FaceShift in real time. FaceShift compares and analyzes the facial expression depth data acquired by the Kinect sensor with the user's basic expression model, and FaceShift calculates 51 weight parameters of the current expression, which are recorded as {w _i , i=1, 2, ..., 51} .

具体地，以n个基础表情组成的blendshape表情模型为例，每一个基础表情都用含有p个顶点的三维网格人脸模型表示，每个顶点有三个分量x、y、z，即每个顶点的空间坐标为(x，y，z)。将每个基础表情的顶点坐标按任意顺序展开为长向量，但是展开后每个基础表情的顶点坐标之后的展开顺序应该是一样的，展开顺序可以为(xxxyyyzzz)或者(xyzxyzxyz)等，这样就得到n个长度为3p的向量b_k，k＝1，2，...，n，用b₀表示中性表情，b_k-b₀即为第k个基础表情b_k与中性表情b₀之差，当前表情可以表示为：其中，w_k表示在区间[0,1]内的任意值。因此，51个基础表情模型可以表示为F_i＝b_i-b₀(i＝1,…,51)，将上述公式简化为其中F＝f-b₀。Specifically, taking the blendshape expression model composed of n basic expressions as an example, each basic expression is represented by a 3D mesh face model containing p vertices, and each vertex has three components x, y, z, that is, each The spatial coordinates of the vertices are (x, y, z). Expand the vertex coordinates of each basic expression into long vectors in any order, but the expansion sequence after the vertex coordinates of each basic expression should be the same after expansion. The expansion order can be (xxxyyyzzz) or (xyzxyzxyz), etc., so that Get n vectors b _k with a length of 3p, k=1, 2, ..., n, use b ₀ to represent neutral expressions, and b _k -b ₀ is the kth basic expression b _k and neutral expression b The difference between ₀ and the current expression can be expressed as: Among them, w _k represents any value in the interval [0,1]. Therefore, the 51 basic expression models can be expressed as F _i =b _i -b ₀ (i=1,...,51), and the above formula can be simplified as where F=fb ₀ .

在本实施例中，上述动作表演捕捉模块包括骨骼数据采集单元和人体姿态确认单元。上述骨骼图像采集单元包括多台Kinect传感器，配置为从不同的角度采集上述表演者的多帧骨骼图像数据，各上述骨骼图像数据包括组成人体骨骼的各关节点的关节点坐标和各上述关节点的跟踪属性，并根据上述跟踪属性为各所述骨骼图像数据的各个关节点分配可信度；上述人体姿态确认单元，配置为根据上述表演者的骨骼图像数据中各关节点坐标和各所述关节点坐标的变换确定出上述表演者的人体姿态参数。In this embodiment, the action performance capture module includes a skeleton data acquisition unit and a human body posture confirmation unit. The above-mentioned skeleton image acquisition unit includes a plurality of Kinect sensors, configured to collect multiple frames of skeleton image data of the above-mentioned performer from different angles, and each of the above-mentioned skeleton image data includes joint point coordinates of each joint point forming the human skeleton and each of the above-mentioned joint points tracking attribute, and according to the above-mentioned tracking attribute for each joint point of the said skeleton image data to assign credibility; The transformation of joint point coordinates determines the human body posture parameters of the above-mentioned performer.

在用于采集表演者骨骼动作数据的数据采集区域中的不同的位置装设多台Kinect传感器，以便从不同的角度对表演者的动作进行捕捉。上述Kinect传感器所采集到的表演者的骨骼图像数据包括组成人体骨骼的各关节点的关节点坐标和各关节点的跟踪属性。作为示例，各Kinect传感器采集的每帧数据包含一个骨架和各关节的跟踪属性，骨架可以表示为{v_ij}，其中j表示关节点编号，v_ij表示在第i台Kinect传感器坐标系中的骨架第j个关节点的坐标。上述各关节点的跟踪属性分为追踪到的、推测的、未追踪到的。可以为跟踪属性的三个状态分配依次降低的可信度，记为{w_ij}。其中，w_ij表示第i台Kinect传感器坐标系中的骨架第j个关节点的可信度。上述骨骼图像采集单元通过网络将上述骨骼图像数据发送到上述人体姿态确认单元，以对表演者进行人体姿态参数的计算。Multiple Kinect sensors are installed at different positions in the data collection area used to collect the performer's skeletal movement data, so as to capture the performer's movement from different angles. The bone image data of the performer collected by the aforementioned Kinect sensor includes the joint point coordinates of each joint point forming the human skeleton and the tracking attributes of each joint point. As an example, each frame of data collected by each Kinect sensor includes a skeleton and the tracking attributes of each joint. The skeleton can be expressed as {v _ij }, where j represents the number of joint points, and v _ij represents the position in the coordinate system of the i-th Kinect sensor The coordinates of the jth joint point of the skeleton. The tracking attributes of the above joint points are divided into tracked, speculated, and untracked. The three states of the tracking attribute can be assigned successively decreasing confidence levels, denoted as {w _ij }. Among them, w _ij represents the credibility of the jth joint point of the skeleton in the i-th Kinect sensor coordinate system. The above-mentioned skeleton image acquisition unit sends the above-mentioned skeleton image data to the above-mentioned human body posture confirmation unit through the network, so as to calculate the human body posture parameters for the performer.

在本实施例中，上述人体姿态确认单元进一步配置为：利用预设的坐标转换矩阵对各台Kinect传感器所采集的骨骼图像数据进行坐标系转换，生成参考骨骼数据；根据各参考骨架数据利用加权平均算法合成上述表演者的平均骨架数据。这里，对各台Kinect传感器进行坐标系转换，将各台Kinect传感器采集的数据转换到同一参考坐标系下。首先，可以指定其中一台Kinect传感器坐标系为参考坐标系，然后，其余各kinect传感器捕捉到的人体骨架各关节点作为自身坐标系与参考坐标系之间的匹配点；最后，确定各kinect传感器坐标系到参考坐标系的变换矩阵，使得变换后的匹配点之间距离总和最小。通过上述变换矩阵，将各kinect传感器所采集的骨骼图像数据进行坐标系转换，生成参考骨骼数据。In this embodiment, the above-mentioned human posture confirmation unit is further configured to: use a preset coordinate transformation matrix to perform coordinate system conversion on the skeleton image data collected by each Kinect sensor to generate reference skeleton data; The averaging algorithm synthesizes the average skeleton data of the performers above. Here, the coordinate system conversion is performed on each Kinect sensor, and the data collected by each Kinect sensor is converted into the same reference coordinate system. First, one of the Kinect sensor coordinate systems can be designated as the reference coordinate system, and then, the joint points of the human skeleton captured by the other kinect sensors are used as matching points between the own coordinate system and the reference coordinate system; finally, each kinect sensor is determined The transformation matrix from the coordinate system to the reference coordinate system minimizes the sum of the distances between the transformed matching points. Through the above-mentioned transformation matrix, the skeleton image data collected by each kinect sensor is transformed into a coordinate system to generate reference skeleton data.

在本实施例中，上述根据各参考骨架数据利用加权平均算法合成上述表演者的平均骨架数据，包括：确定上述参考骨架数据的关节点的可信度为上述关节点的权重因子；根据各参考骨架数据的任一关节点坐标和该关节点的权重因子计算该关节点坐标的平均值；根据组成上述人体骨骼的全部关节点坐标的平均值确定上述表演者的平均骨架数据。这里，计算表演者的平均骨架数据是计算组成人体骨架的各关节点坐标的平均值。对于任一关节点坐标的平均值的计算，可以是将该关节点在参考坐标系下的坐标进行加权平均计算，其中，权重因子为该关节点坐标的可信度。作为示例，可以将转换到参考坐标系下的一帧人体骨架数据记{v_ij，w_ij}，其中j表示关节点编号，i表示kinect传感器编号，v_ij表示第i个Kinect传感器坐标系中捕捉到的骨架中第j个关节点的坐标，w_ij为该关节点的可信度。将可信度作为权重，对同一骨架的多帧kinect骨架关节点坐标进行加权平均计算，得到一个平均骨架。In this embodiment, the above-mentioned synthesis of the average skeleton data of the above-mentioned performer based on each reference skeleton data using a weighted average algorithm includes: determining the reliability of the joint points of the above-mentioned reference skeleton data as the weighting factors of the above-mentioned joint points; Calculate the average value of any joint point coordinates of the skeleton data and the weight factor of the joint point; determine the average skeleton data of the above-mentioned performer according to the average value of all joint point coordinates that make up the above-mentioned human skeleton. Here, the calculation of the average skeleton data of the performer is to calculate the average value of the coordinates of the joint points that make up the human skeleton. The calculation of the average value of the coordinates of any joint point may be a weighted average calculation of the coordinates of the joint point in the reference coordinate system, wherein the weight factor is the reliability of the coordinates of the joint point. As an example, a frame of human skeleton data transformed into the reference coordinate system can be recorded as {v _ij , w _ij }, where j represents the joint point number, i represents the kinect sensor number, and v _ij represents the i-th Kinect sensor coordinate system The coordinates of the jth joint point in the captured skeleton, w _ij is the credibility of the joint point. Taking the credibility as the weight, the weighted average calculation is performed on the multi-frame kinect skeleton joint point coordinates of the same skeleton to obtain an average skeleton.

在本实施例中，上述动画生成模块包括骨骼运动控制单元和表情控制单元，上述骨骼运动控制单元，配置为根据上述动作表演捕捉模块确定的人体姿态参数，利用上述UE图形程序生成人物角色的3D模型的动作动画；上述表情控制单元，配置为根据上述面部表演捕捉模块确定的面部表情权重参数，利用上述UE图形程序生成上述表演者对应人物角色的3D模型的表情动画。如图4a所示为根据上述动作表演生成的动作示意图，图4b为根据上述面部表演生成的表情动画。In this embodiment, the above-mentioned animation generation module includes a skeletal motion control unit and an expression control unit, and the above-mentioned skeletal motion control unit is configured to use the above-mentioned UE graphics program to generate a 3D image of the character according to the human body posture parameters determined by the above-mentioned motion performance capture module. The action animation of the model; the above-mentioned expression control unit is configured to use the above-mentioned UE graphics program to generate the expression animation of the 3D model of the corresponding character of the above-mentioned performer according to the facial expression weight parameters determined by the above-mentioned facial performance capture module. Figure 4a is a schematic diagram of an action generated based on the above action performance, and Figure 4b is an expression animation generated based on the above facial performance.

上述骨骼运动控制单元根据上述动作表演捕捉模块确定的人体姿态参数，利用上述UE图形程序生成表演者对应人物角色的3D模型的动作动画。具体可以为，利用预设的映射关系，将上述平均骨架数据转换为UE4图形程序中上述人物角色的角色模型数据；采用四元数混合的方式将上述角色模型数据通过UE4引擎赋值到上述人物角色的3D模型；计算初始骨架变化到当前骨架的过程中每根骨骼的变化量；将各上述变化量附加对应骨骼的父关节点，确定出上述人物角色的3D模型的动作动画。The above-mentioned skeletal motion control unit uses the above-mentioned UE graphics program to generate the motion animation of the 3D model of the performer's corresponding character according to the human body posture parameters determined by the above-mentioned motion performance capture module. Specifically, using the preset mapping relationship, the above-mentioned average skeleton data is converted into the character model data of the above-mentioned character in the UE4 graphics program; the above-mentioned character model data is assigned to the above-mentioned character through the UE4 engine by using a quaternion mixing method 3D model; calculate the change amount of each bone in the process of changing from the initial skeleton to the current skeleton; add each above-mentioned change amount to the parent joint point of the corresponding bone, and determine the action animation of the 3D model of the above-mentioned character.

可以在UE图形程序中使用的3D模型维护一份骨架映射，用于将Kinect传感器的人体骨架动作的平均骨架数据转换为3D模型所需的形式。骨架映射将Kinect传感器的骨架关节点对应到3D模型骨架关节点，根据3D模型骨架结构与kinect传感器中骨架结构的相似性进行一一映射，若3D模型中有多余或缺失的关节点，不作映射处理。映射可通过关节点名称进行自动匹配，也可以进行手动绑定。UE图形程序中3D模型骨架由一系列的关节点及其连接组成，每个关节点都有唯一的命名，因此可以通过对比Kinect传感器骨架关节点名称和UE图形程序中3D模型骨架关节点名称对两个骨架进行自动映射，无法自动映射的部分可以进行手动匹配，将匹配的结果附加到需要用到的3D模型上，呈现人体骨骼动作的动作动画。上述将匹配的结果附加到需要用到的3D模型上可以为将转换后的3D模型骨架数据赋值给3D模型，赋值采用四元数混合的方式，即求出初始骨架变化到当前骨架的过程中每根骨骼的变化量(用四元数表示)，之后将每个变化量附加到对应骨骼的父关节点上。对于映射关系中不存在的关节点，其在骨骼动画中的位置朝向取决于父关节点的变化。The 3D model that can be used in the UE graphics program maintains a skeletal map that is used to convert the average skeletal data of the human skeletal motion of the Kinect sensor into the form required by the 3D model. Skeleton mapping corresponds the skeleton joint points of the Kinect sensor to the 3D model skeleton joint points, and performs one-to-one mapping according to the similarity between the skeleton structure of the 3D model and the skeleton structure in the kinect sensor. If there are redundant or missing joint points in the 3D model, no mapping is performed. deal with. Mapping can be automatically matched by joint node name, or manually bound. The 3D model skeleton in the UE graphics program is composed of a series of joint points and their connections. Each joint point has a unique name, so you can compare the name of the Kinect sensor skeleton joint point with the 3D model skeleton joint point name pair in the UE graphics program. The two skeletons are automatically mapped, and the parts that cannot be automatically mapped can be manually matched, and the matching results are attached to the 3D model that needs to be used to present the action animation of the human skeleton movement. Attaching the matching result to the 3D model that needs to be used above can be to assign the converted 3D model skeleton data to the 3D model, and the assignment adopts the method of quaternion mixing, that is, to find out the process of changing the initial skeleton to the current skeleton The change amount of each bone (expressed in quaternion), and then attach each change amount to the parent joint point of the corresponding bone. For joint points that do not exist in the mapping relationship, their position orientation in the skeletal animation depends on the change of the parent joint point.

在本实施例中，上述表情控制单元进一步配置为：将上述第一权重参数与预设角色表情库的各基础表情进行对应，确定出上述面部表情对应的基础表情组合；利用预设的目标变形函数与上述角色表情库中各基础表情的对应关系，确定出上述面部表情对应与上述人物角色3D模型的表情动画。In this embodiment, the above-mentioned expression control unit is further configured to: correspond the above-mentioned first weight parameter with each basic expression of the preset character expression library, and determine the basic expression combination corresponding to the above-mentioned facial expression; use the preset target deformation The corresponding relationship between the function and each basic expression in the above-mentioned character expression database determines that the above-mentioned facial expression is corresponding to the expression animation of the above-mentioned 3D model of the character.

在本实施例中，上述“将上述第一权重参数与预设角色表情库的各基础表情进行对应，确定出上述面部表情对应的基础表情组合”，包括：利用预设的表情权重计算程序计算上述角色表情库中的各基础表情的角色表情权重参数，并记为第二权重参数；将上述第一权重参数与上述第二权重参数进行映射，根据映射结果，确定出与上述面部表情对应的第二权重参数；根据上述第二权重参数与上述角色表情库中的各基础表情的对应关系，确定上述面部表情对应的上述角色表情库中的基础表情组合。In this embodiment, the above-mentioned "corresponding the above-mentioned first weight parameter with each basic expression in the preset character expression database, and determining the basic expression combination corresponding to the above-mentioned facial expression" includes: using the preset expression weight calculation program to calculate The character expression weight parameters of each basic expression in the above-mentioned character expression database are recorded as the second weight parameter; the above-mentioned first weight parameter is mapped with the above-mentioned second weight parameter, and according to the mapping result, the corresponding facial expression is determined. The second weight parameter: according to the corresponding relationship between the second weight parameter and the basic expressions in the character expression database, determine the basic expression combination in the character expression database corresponding to the facial expression.

在本实施例中，上述“将上述第一权重参数与上述角色表情权重参数进行映射，根据映射结果，确定出与上述面部表情对应的第二权重参数”，包括：将上述UE图形程序中第一权重参数的数目与上述角色表情库中基础表情的数目比较；如果数目相同，即看，上述UE图形程序所获取的全部面部图像数据所对应的基础表情与上述角色表情库中基础表情设定一致；选取与上述第一权重参数序号一致的第二权重参数作为上述面部表情对应的第二权重参数。如果上述UE图形程序中第一权重参数的数目小于上述角色表情库中基础表情的数目，则根据第一权重参数的数目，从上述角色基础表情库中选取相同数目的基础表情作为表情子集，即，所述UE图形程序所获取的全部面部图像数据所对应的基础表情与所述表情子集中基础表情设定一致；计算上述表情子集中各基础表情的角色表情权重参数，并记为新的第二权重参数，选取与上述第一权重参数序号一致的新的上述第二权重参数作为上述面部表情对应的第二权重参数；否则，选取与上述第一权重参数的差值最小的第二权重参数作为上述面部表情对应的第二权重参数。In this embodiment, the above-mentioned "mapping the above-mentioned first weight parameter with the above-mentioned character expression weight parameter, and determining the second weight parameter corresponding to the above-mentioned facial expression according to the mapping result" includes: The number of weight parameters is compared with the number of basic expressions in the above-mentioned character expression library; if the numbers are the same, it can be seen that the basic expressions corresponding to all the facial image data obtained by the above-mentioned UE graphics program are the same as the basic expression settings in the above-mentioned character expression library Consistent; select a second weight parameter that is consistent with the serial number of the first weight parameter as the second weight parameter corresponding to the facial expression. If the number of the first weight parameters in the above-mentioned UE graphics program is less than the number of basic expressions in the above-mentioned character expression database, then according to the number of the first weight parameters, select the same number of basic expressions from the above-mentioned character basic expression database as the expression subset, That is, the basic expressions corresponding to all the facial image data acquired by the UE graphics program are consistent with the basic expression settings in the expression subset; calculate the character expression weight parameters of each basic expression in the expression subset, and record it as a new For the second weight parameter, select the new above-mentioned second weight parameter consistent with the serial number of the above-mentioned first weight parameter as the second weight parameter corresponding to the above-mentioned facial expression; otherwise, select the second weight with the smallest difference with the above-mentioned first weight parameter parameter as the second weight parameter corresponding to the facial expression.

作为示例，上述角色表情库中有N个基础表情，转换为角色对应的权重参数，记为第二权重参数{v_i，i＝1，2，...，N}。上述UE图形程序中可以接收到全部面部图像数据所对应的基础表情为M个，转换为表演者对应的权重参数的个数为M个，记为第一权重参数{w_i，i＝1，2，...，M}，优选的M的个数为51。若角色表情库与全部面部图像数据所对应的基础表情的设定完全一致，那么N＝M，则角色的表情权重v_i＝w_i，i＝1，2，...，M；若角色表情库中基础表情种类较少，则选择与角色表情库的第i个基础表情最接近的表情j的权重参数w_j赋值给v_i，即v_i＝w_j；若角色表情库中基础表情种类较多，则选取角色基础表情库的一个子集与全部面部图像数据所对应的基础表情一一对应，该子集中的权重参数设置为其余表情的权重参数置为0。根据上述UE图形程序中全部面部图像数据所对应的基础表情的第一权重参数与上述角色表情库中的基础表情的第二权重参数的对应关系，确定出上述面部表情对应的第二权重参数。UE图形程序中的UE引擎通过调用权重参数转换的函数计算角色最终的表情权重参数。UE引擎将得到的最终权重参数输入到目标变形设置函数中，控制角色的面部顶点或特征点的变形，使角色做出相应的表情，呈现表情动画。As an example, there are N basic expressions in the above-mentioned character expression database, which are converted into weight parameters corresponding to the characters, which are recorded as the second weight parameters {v _i , i=1, 2, . . . , N}. In the above-mentioned UE graphics program, the number of basic expressions corresponding to all facial image data that can be received is M, and the number of weight parameters corresponding to performers is M, recorded as the first weight parameter {w _i , i=1, 2, ..., M}, the preferred number of M is 51. If the character expression library is completely consistent with the basic expression settings corresponding to all facial image data, then N=M, then the character’s expression weight v _i =w _i , i=1, 2, ..., M; if the character There are few types of basic expressions in the expression library, then select the weight parameter w _j of the expression j that is closest to the i-th basic expression in the character expression library and assign it to v _i , that is, v _i = w _j ; if the basic expression in the character expression library If there are many types, select a subset of the character's basic expression library One-to-one correspondence with the basic expressions corresponding to all facial image data, the weight parameters in this subset are set to The weight parameters of other expressions are set to 0. The second weight parameter corresponding to the facial expression is determined according to the correspondence between the first weight parameter of the basic expression corresponding to all the facial image data in the UE graphics program and the second weight parameter of the basic expression in the character expression library. The UE engine in the UE graphics program calculates the final expression weight parameters of the character by calling the weight parameter conversion function. The UE engine inputs the obtained final weight parameters into the target deformation setting function to control the deformation of the character's facial vertices or feature points, so that the character can make corresponding expressions and present expression animations.

至此，已经结合附图所示的优选实施方式描述了本发明的技术方案，但是，本领域技术人员容易理解的是，本发明的保护范围显然不局限于这些具体实施方式。在不偏离本发明的原理的前提下，本领域技术人员可以对相关技术特征作出等同的更改或替换，这些更改或替换之后的技术方案都将落入本发明的保护范围之内。So far, the technical solutions of the present invention have been described in conjunction with the preferred embodiments shown in the accompanying drawings, but those skilled in the art will easily understand that the protection scope of the present invention is obviously not limited to these specific embodiments. Without departing from the principles of the present invention, those skilled in the art can make equivalent changes or substitutions to relevant technical features, and the technical solutions after these changes or substitutions will all fall within the protection scope of the present invention.

Claims

1. A markerless performance capture system based on a UE engine, characterized in that the system includes:

The facial performance capture module is configured to collect the facial image data of the performer, and calculate the weight parameter of the facial expression of the performer according to the facial image data, and record it as the first weight parameter;

The action performance capture module is configured to collect the skeleton image data of the performer, and determine the human body posture parameters of the performer according to the skeleton image data;

The animation generation module is configured to generate the actions and expressions of the 3D model of the corresponding character of the performer by using the UE graphics program according to the first weight parameter and the human body posture parameter.

2. The unmarked performance capture system based on UE engine according to claim 1, wherein the facial performance capture module includes a facial image acquisition unit and an expression calculation unit,

The facial image acquisition unit is configured to acquire facial image data of the performer's frontal face;

The expression calculation unit is configured to perform feature point tracking on the facial image data, and calculate weight parameters of the performer's facial expression.

3. The UE engine-based markerless performance capture system according to claim 1, wherein the motion performance capture module includes a skeleton data acquisition unit and a human body posture confirmation unit;

The skeletal image acquisition unit includes a plurality of Kinect sensors configured to collect multiple frames of skeletal image data of the performer from different angles, and the skeletal image data of each frame includes joint point coordinates and Tracking attributes of each of the joint points, and assigning credibility to each joint point of each of the bone image data according to the tracking attributes;

The human body posture confirmation unit is configured to determine the human body posture parameters of the performer according to the coordinates of each joint point in the bone image data of the performer and the changes of the coordinates of each joint point.

4. The UE engine-based markerless performance capture system according to claim 3, wherein the human body posture confirmation unit is further configured as:

Use the preset coordinate transformation matrix to convert the coordinate system of the bone image data collected by each Kinect sensor to generate reference bone data;

The average skeleton data of the performer is synthesized by using a weighted average algorithm according to each reference skeleton data.

5. The UE engine-based unmarked performance capture system according to claim 4, characterized in that, "combining the average skeleton data of the performer with a weighted average algorithm according to each reference skeleton data" includes:

Determining the credibility of the joint points of the reference skeleton data as the weighting factors of the joint points;

calculating the average value of the joint point coordinates according to any joint point coordinates of each reference skeleton data and the weight factors of the joint points;

The average skeleton data of the performer is determined according to the average value of the coordinates of all joint points making up the human skeleton.

6. The UE engine-based unmarked performance capture system according to claim 1, wherein the animation generation module includes a skeletal motion control unit and an expression control unit;

The skeletal motion control unit is configured to use the UE graphics program to generate a motion animation of a 3D model of a character according to the human body posture parameters determined by the motion performance capture module;

The expression control unit is configured to use the UE graphics program to generate an expression animation of the 3D model of the character according to the facial expression weight parameters determined by the facial performance capture module.

7. The UE engine-based markerless performance capture system according to claim 6, wherein the skeletal motion control unit is further configured to:

Using a preset mapping relationship, converting the average skeleton data into role model data of the character in the UE4 graphics program;

Assigning the character model data to the 3D model of the character through the UE4 engine in a quaternion mixing manner;

Calculate the change amount of each bone during the process of changing the initial skeleton to the current skeleton;

Add each variation amount to the parent joint point of the corresponding bone to determine the motion animation of the 3D model of the character.

8. The UE engine-based unmarked performance capture system according to claim 6, wherein the expression control unit is further configured as:

Corresponding the first weight parameter with each basic expression in the preset character expression library, and determining the basic expression combination corresponding to the facial expression;

Using the preset corresponding relationship between the target deformation function and each basic expression in the character expression library, it is determined that the facial expression corresponds to the expression animation of the 3D model of the character.

9. The unmarked performance capture system based on UE engine according to claim 8, characterized in that, "corresponding the first weight parameter to each basic expression in the preset character expression database, and determining the facial expression Corresponding basic expression combinations", including:

Using a preset expression weight calculation program to calculate the character expression weight parameters of each basic expression in the character expression library, and record it as the second weight parameter;

Mapping the first weight parameter and the second weight parameter, and determining a second weight parameter corresponding to the facial expression according to the mapping result;

According to the correspondence between the second weight parameter and each basic expression in the character expression database, determine the basic expression combination in the character expression database corresponding to the facial expression.

10. The markerless performance capture system based on UE engine according to claim 9, characterized in that, "the first weight parameter is mapped with the second weight parameter, and according to the mapping result, the The second weight parameter corresponding to the facial expression", including:

Comparing the number of the first weight parameter in the UE graphics program with the number of basic expressions in the character expression library;

If the numbers are the same, select a second weight parameter consistent with the serial number of the first weight parameter as the second weight parameter corresponding to the facial expression;

If the number of the first weight parameters in the UE graphics program is less than the number of basic expressions in the character expression library, then according to the number of the first weight parameters, select the same number of basic expressions from the character basic expression library as expressions Subset, calculate the role expression weight parameters of each basic expression in the expression subset, and record it as a new second weight parameter, select the new second weight parameter consistent with the first weight parameter sequence number as the The second weight parameter corresponding to the facial expression;

Otherwise, select a second weight parameter having the smallest difference with the first weight parameter as the second weight parameter corresponding to the facial expression.