WO2023030343A1 - 一种提供通话附加服务的方法、装置及系统 - Google Patents
一种提供通话附加服务的方法、装置及系统 Download PDFInfo
- Publication number
- WO2023030343A1 WO2023030343A1 PCT/CN2022/115986 CN2022115986W WO2023030343A1 WO 2023030343 A1 WO2023030343 A1 WO 2023030343A1 CN 2022115986 W CN2022115986 W CN 2022115986W WO 2023030343 A1 WO2023030343 A1 WO 2023030343A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- terminal device
- user
- communication server
- digital human
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/1066—Session management
- H04L65/1069—Session establishment or de-establishment
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/10—Architectures or entities
- H04L65/1046—Call controllers; Call servers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/10—Architectures or entities
- H04L65/1059—End-user terminal functionalities specially adapted for real-time communication
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/10—Architectures or entities
- H04L65/1063—Application servers providing network services
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/1066—Session management
- H04L65/1096—Supplementary features, e.g. call forwarding or call holding
Definitions
- the present application relates to the technical field of communications, and in particular to a method, device and system for providing call additional services.
- the digital human service is a software service that maps the real person's appearance, action or voice and other character characteristics to the digital human, and can transfer the real person's dynamic character characteristics (including appearance, movement, voice, etc.) to the electronic device through the image of the digital human to reproduce.
- Providing digital human services to users in a call scene can realize the presentation of the user's digital human image in the call application interface of the terminal device of the other person who is talking with the user.
- the existing digital human services are usually processed offline, that is, the images, behaviors, and activity scenes of digital human beings are pre-arranged.
- the video content with the image of a digital human will be played on TV stations or theaters after the already produced video content. Since the digital human service in the prior art cannot perform real-time mapping between a digital human and a real person, it cannot be promoted in a real-time interactive call scene.
- Embodiments of the present application provide a method, device, and system for providing additional call services, so as to provide users with additional services including digital human services in real-time interactive call scenarios.
- the first aspect provides a method for providing call additional services, which can be applied to a terminal device or a chip in a terminal device.
- the terminal device communicates with at least one peer terminal device through the service provided by the communication server Making a call
- the method includes: the terminal device acquires call additional service demand information, wherein the call additional service demand information is used to indicate that the user corresponding to the terminal device requests to present the user's digital human image in the call application interface of at least one peer terminal device; the terminal The device causes the communication server to send the user's digital human content to at least one peer terminal device according to the call additional service demand information, so that at least one peer terminal device presents the user's digital human image on the call application interface based on the user's digital human content; or The terminal device enables at least one peer terminal device to generate the user's digital human content according to the call additional service demand information, so that the at least one peer terminal device presents the user's digital human image on the call application interface based on the user's digital
- the above solution can realize digital human services for users in real-time interactive call scenarios according to user needs, for example, the user corresponding to the terminal device requests to present the user's digital human image in the call application interface of at least one peer terminal device At this time, at least one peer terminal device can present the user's digital human image on the call application interface based on the user's digital human content, which can improve user experience.
- the terminal device causes the communication server to send the user's digital human content to at least one opposite terminal device according to the call additional service demand information, including: the terminal device generates the user's digital human content according to the call additional service demand information; The terminal device sends the user's digital human content to the communication server, so that the communication server sends the user's digital human content to at least one opposite terminal device; or, the terminal device sends call additional service demand information to the communication server, so that the communication server generates the user's digital human content. and send the user's digital human content to at least one peer terminal device.
- the user's digital human content can be generated by the terminal device or the communication server.
- the terminal device enables at least one peer terminal device to generate the user's digital human content according to the call additional service demand information, including: the terminal device sends the call additional service demand information to at least one peer terminal device, so that At least one peer terminal device generates digital human content of the user.
- the user's digital human content can be generated by the peer terminal device.
- the call additional service demand information is also used to indicate that the user corresponding to the terminal device requests to present the user's virtual scene picture in the call application interface of at least one peer terminal device
- the method further includes: the terminal device according to the call
- the additional service demand information enables the communication server to send the user's virtual scene content to at least one peer terminal device, so that at least one peer terminal device presents the user's virtual scene picture on the call application interface based on the user's virtual scene content; or, the terminal device Make at least one peer terminal device generate the user's virtual scene content according to the call additional service demand information, so that the at least one peer terminal device presents the user's virtual scene picture on the call application interface based on the user's virtual scene content.
- the terminal device causes the communication server to send the user's virtual scene content to at least one opposite terminal device according to the call additional service demand information, including: the terminal device generates the user's virtual scene content according to the call additional service demand information; The terminal device sends the user's virtual scene content to the communication server, so that the communication server sends the user's virtual scene content to at least one opposite terminal device; or, the terminal device sends call additional service demand information to the communication server, so that the communication server generates the user's virtual scene content. and sending the user's virtual scene content to at least one peer terminal device.
- the user's virtual scene content can be generated by the terminal device or the communication server.
- the terminal device enables at least one peer terminal device to generate the user's virtual scene content according to the additional call service requirement information, including: the terminal device sends the call additional service requirement information to at least one peer terminal device.
- the user's virtual scene content can be generated by the peer terminal device.
- a method for providing call additional services is provided, which can be applied to a communication server or a chip in a communication server.
- the communication server is used to serve as a connection between a terminal device and at least one opposite terminal device.
- the method includes: the communication server receives call additional service demand information from a terminal device, wherein the call additional service demand information is used to indicate that the user request corresponding to the terminal device is presented in the call application interface of at least one peer terminal device The user's digital human image; the communication server sends the user's digital human content generated by the communication server to at least one peer terminal device according to the call additional service demand information, so that at least one peer terminal device can display the user's digital human content based on the user's digital human content on the call application interface Presenting the digital human image of the user; or, the communication server sends call additional service demand information to at least one peer terminal device, so that at least one peer terminal device generates the user's digital human content and based on the user's digital human content in the call application interface Presents a digital human avatar of the user.
- the call additional service demand information is also used to indicate that the user corresponding to the terminal device requests to present the user's virtual scene picture in the call application interface of at least one peer terminal device
- the method further includes: the communication server according to the call The additional service demand information sends the user's virtual scene content generated by the communication server to at least one peer terminal device, so that at least one peer terminal device presents the user's virtual scene picture on the call application interface based on the user's virtual scene content; or, the communication The server sends call additional service requirement information to at least one peer terminal device, so that at least one peer terminal device generates the user's virtual scene content and presents the user's virtual scene picture on the call application interface based on the user's virtual scene content.
- the third aspect provides a method for providing call additional services, which can be applied to a terminal device or a chip in a terminal device.
- a terminal device Taking the method applied to a terminal device as an example, it is applied to a terminal device.
- the service provided by the terminal device through a communication server is at least the same as A peer terminal device makes a call, and the method includes: the terminal device sends first call additional service capability information to the communication server, and the first call additional service capability information is used to indicate the digital human service capability of the terminal device; the terminal device receives the information sent by the communication server.
- the indication information wherein, the indication information is used to indicate the digital human processing operation performed by the terminal device, and/or, the indication information includes the second call additional service capability information and the third call additional service capability information, the second call additional service capability information It is used to indicate the digital human service capability possessed by the communication server, and the third call additional service capability information is used to indicate the digital human service capability possessed by at least one peer terminal device; the terminal device executes at least one digital human processing operation according to the instruction information or does not execute Digital humans handle operations.
- the digital human processing operation includes one or more operations of capture operation, reconstruction operation, and rendering operation;
- the capture operation includes: obtaining the driving parameters of the user corresponding to the terminal device, and the driving parameters include the corresponding At least one of the user's lip shape, expression, action, and depth information;
- the reconstruction operation includes: generating a sequence of digital human images according to the driving parameters of the user corresponding to the terminal device and the digital human model of the user corresponding to the terminal device, and the digital human image The sequence includes multiple frames of images driven by the digital human model;
- the rendering operation includes: generating the call screen content of the user corresponding to the terminal device according to the digital human image sequence and the scene image sequence.
- the technology stack (capture operation, reconstruction operation, rendering operation) involved in providing digital human services to users in real-time interaction scenarios can be clarified, and the corresponding relationship between the processing capabilities of terminal devices and the digital human processing operations that can be provided can be clarified.
- the indication information is used to indicate the second call additional service capability information and/or the third call additional service capability information; the terminal device performs at least one digital human processing operation according to the indication information, including: the terminal device executes at least one digital human processing operation according to the first At least one of the call additional service capability information, the second call additional service capability information, and the third call additional service capability information determines at least one operation among capture operation, reconstruction operation, and rendering operation; the terminal device executes at least one operation.
- the first call additional service capability information includes one or more of the following:
- the second call additional service capability information includes one or more of the following:
- the information used to indicate whether the communication server can return the content of the user's call screen to the terminal device is not limited.
- the third call additional service capability information includes one or more of the following:
- the terminal device performs the reconstruction operation, and the method further includes: the terminal device establishes a first channel with the communication server, and the first channel is used to transmit the digital human model of the user corresponding to the terminal device.
- the terminal device performs the reconstruction operation and the communication server performs the rendering operation
- the method further includes: the terminal device establishes a second channel with the communication server, and the second channel is used to transmit the digital human image sequence of the user corresponding to the terminal device .
- the terminal device performs the capture operation and the communication server performs the reconstruction operation
- the method further includes: the terminal device establishes a third channel with the communication server, and the third channel is used to transmit the driving parameters of the user corresponding to the terminal device.
- the communication server performs a rendering operation
- the first call additional service capability information includes information used to instruct the terminal device to save the virtual scene of the user corresponding to the terminal device, and the content of the call screen of the user corresponding to the terminal device includes The screen content of the user's virtual scene
- the method further includes: the terminal device establishes a fourth channel with the communication server, and the fourth channel is used to transmit the user's virtual scene corresponding to the terminal device.
- the first call additional service capability information includes information indicating that the terminal device needs the communication server to return the content of the user's call screen
- the method further includes: the terminal device establishes a fifth channel with the communication server, and the fifth channel The channel is used to transmit the call screen content of the user corresponding to the terminal device.
- the method further includes: the terminal device establishes a sixth channel with the communication server, and the sixth channel is used to transmit the viewing angle information of the user corresponding to the terminal device, wherein the first call additional service capability information includes information indicating that the terminal device Whether or not the device provides viewing angle information; and/or, the terminal device establishes a seventh channel or an eighth channel with the communication server, the seventh channel is used to transmit the audio of the user corresponding to the terminal device, and the eighth channel is used to transmit the audio of the terminal device The corresponding user's video.
- the terminal device performs a capture operation; obtaining the driving parameters of the user corresponding to the terminal device includes: the terminal device uses a sensor to collect the driving parameters of the user corresponding to the terminal device; The video and/or audio of the user determines the driving parameters of the user corresponding to the terminal device.
- a method for providing additional call services is provided, which can be applied to a communication server or a chip in a communication server.
- the communication server is used to provide a communication service between a terminal device and at least one opposite terminal device.
- the method includes: the communication server receives the first call additional service capability information from the terminal device, the first call additional service capability information is used to indicate the digital human service capability of the terminal device; the communication server sends an instruction to the terminal device information; wherein, the indication information is used to indicate the digital human processing operation performed by the terminal device, and/or, the indication information includes second call additional service capability information and third call additional service capability information, and the second call additional service capability information is used for Indicating the digital human service capability possessed by the communication server, the third call additional service capability information is used to indicate the digital human service capability possessed by at least one peer terminal device.
- the digital human processing operation includes one or more operations of capture operation, reconstruction operation, and rendering operation;
- the capture operation includes: obtaining the driving parameters of the user corresponding to the terminal device, and the driving parameters include the corresponding At least one of the user's lip shape, expression, action, and depth information;
- the reconstruction operation includes: generating a sequence of digital human images according to the driving parameters of the user corresponding to the terminal device and the digital human model of the user corresponding to the terminal device, and the digital human image The sequence includes multiple frames of images driven by the digital human model;
- the rendering operation includes: generating the call screen content of the user corresponding to the terminal device according to the digital human image sequence and the scene image sequence.
- the method further includes: the communication server performs a capture operation and a reconstruction operation according to at least one of the first call additional service capability information, the second call additional service capability information, and the third call additional service capability information , at least one of rendering operations.
- the communication server needs to perform when providing digital human services to users in real-time interactive scenarios, and then provide digital human services to users in real-time interactive call scenarios.
- the digital human processing operation performed by the communication server can reduce the requirements of digital human communication on terminal equipment, which is conducive to the promotion of digital human communication in real-time interactive call scenarios.
- the method further includes: the communication server receiving third call additional service capability information from at least one opposite terminal device.
- the communication server can obtain the third call additional service capability information, and then determine the division of labor between the communication server, the terminal device, and the opposite terminal device in combination with the digital human service capability of the opposite terminal device.
- the first call additional service capability information includes one or more of the following:
- the second call additional service capability information includes one or more of the following:
- the information used to indicate whether the communication server can return the content of the user's call screen to the terminal device is not limited.
- the third call additional service capability information includes one or more of the following:
- the terminal device performs the reconstruction operation
- the method further includes: the communication server establishes a first channel with the terminal device, and the first channel is used to transmit a digital human model of a user corresponding to the terminal device.
- the terminal device performs the reconstruction operation and the communication server performs the rendering operation
- the method further includes: the communication server establishes a second channel with the terminal device, and the second channel is used to transmit the digital human image sequence of the user corresponding to the terminal device .
- the terminal device performs the capture operation and the communication server performs the reconstruction operation
- the method further includes: the communication server establishes a third channel with the terminal device, and the third channel is used to transmit the driving parameters of the user corresponding to the terminal device.
- the communication server performs a rendering operation
- the first call additional service capability information includes information used to instruct the terminal device to save the virtual scene of the user corresponding to the terminal device, and the content of the call screen of the user corresponding to the terminal device includes The screen content of the user's virtual scene
- the method further includes: the terminal device establishes a fourth channel with the communication server, and the fourth channel is used to transmit the user's virtual scene corresponding to the terminal device.
- the first call additional service capability information includes information indicating that the terminal device needs the communication server to return the content of the user's call screen
- the method further includes: the communication server establishes a fifth channel with the terminal device, and the fifth channel The channel is used to transmit the call screen content of the user corresponding to the terminal device.
- the method further includes: the communication server establishes a sixth channel with the terminal device, and the sixth channel is used to transmit the viewing angle information of the user corresponding to the terminal device, wherein the first call additional service capability information includes information indicating that the terminal device Whether or not the device provides viewing angle information; and/or, the communication server establishes a seventh channel or an eighth channel with the terminal device, the seventh channel is used to transmit the audio of the user corresponding to the terminal device, and the eighth channel is used to transmit the audio of the terminal device The corresponding user's video.
- the communication server performs a capture operation; obtaining the driving parameters of the user corresponding to the terminal device includes: the communication server receives the driving parameters of the user corresponding to the terminal device from the terminal device; The video and/or audio of the user corresponding to the terminal device of the terminal device, and determine the driving parameters of the user corresponding to the terminal device according to the video and/or audio.
- a method for providing call additional services is provided, which can be applied to a communication server or a chip in a communication server.
- the communication server is used to serve as a connection between a terminal device and at least one peer terminal device.
- the method includes: the communication server receives a first request from a terminal device, where the first request is used to request a digital human model;
- the communication server judges whether the user corresponding to the terminal device is a legal user of the digital human model, wherein the user corresponding to the terminal device is the user who operates the terminal device;
- the communication server After determining that the user corresponding to the terminal device is a legitimate user of the digital human model, the communication server sends the digital human model to the terminal device in response to the first request, wherein the digital human model is used for the user corresponding to the terminal device to communicate with the digital human image.
- the communication server sends the digital human model to the terminal device after determining that the user corresponding to the terminal device is a legitimate user of the digital human model, which can avoid the problem of illegal use of the digital human model, and improve the communication efficiency in real-time interactive call scenarios.
- Users provide the security of digital human services.
- the communication server determines whether the user corresponding to the terminal device is a legitimate user of the digital human model, including:
- the communication server acquires the characteristics of the user corresponding to the terminal device and the verification characteristics associated with the digital human model, wherein the verification characteristics are the characteristics of the legal user of the digital human model; the communication server judges the terminal according to the characteristics and verification characteristics of the user corresponding to the terminal device Whether the user corresponding to the device is a legal user of the digital human model; or, the communication server receives the verification result carrying the first digital signature from the terminal device; the communication server verifies the first digital signature, and after the verification of the first digital signature passes, according to the As a result, it is determined whether the user corresponding to the terminal device is a legitimate user of the digital human model.
- the legality verification of the user identity can be performed by the terminal device or by the network device, which improves the flexibility of the solution.
- the characteristics of the user corresponding to the terminal device include one or more of the face, fingerprint, voiceprint or iris of the user corresponding to the terminal device
- the verification features associated with the digital human model include digital human One or more of the face, fingerprint, voiceprint, or iris of the legitimate user of the model.
- the method further includes: the communication server acquires the characteristics of the character image in the digital human content, and the digital human content includes multiple frames of images of the digital human model; The verification feature associated with the digital human model determines whether the character image in the digital human content matches the digital human model; after the communication server determines that the character image in the digital human content matches the digital human model, it sends a digital human to at least one peer terminal device. human content.
- the features of the character image in the digital human content include the face and/or voiceprint of the character image;
- the verification features associated with the digital human model include: the face and/or voiceprint of the legitimate user of the digital human model or voiceprint.
- the method further includes: the communication server receives the digital human model carrying the second digital signature, the verification feature, and the identity information of the legal user of the digital human model; wherein, the second digital signature is a digital human verification device digital signature; the communication server uses the public key of the digital human verification device to verify the second digital signature; after verifying the second digital signature, the communication server communicates the digital human model and verification features with the legal user according to the identity information of the legal user.
- the account opening information of the server is bound.
- the communication server can bind the digital human model with the account opening information of the legitimate user of the digital human model, so that it can be used to verify the identity of the user in the future; and only the digital human who has been approved by the digital human verification device Models can be applied in real-time interactive call scenarios, further improving the security of digital human services.
- the digital human model also carries a third digital signature; wherein, the third digital signature is the digital signature of the digital human model making device; the method also includes: before the communication server saves the digital human model and verifies the features, Use the public key of the digital human model making device to verify the third digital signature; after verifying the second digital signature and the third digital signature, the communication server communicates the digital human model and verification features with the legal user according to the identity information of the legal user The account opening information of the server is bound.
- the sixth aspect provides a method for providing additional call services, which can be applied to a digital human model manufacturing device or a chip in a digital human model manufacturing device.
- the method includes: a digital human model
- the production device uses the first public key of the digital human verification device to encrypt the digital human model, verification features and identity information of the user corresponding to the terminal device, and encrypts the digital human model, verification features and identity information of the user corresponding to the terminal device.
- the identity information is sent to the digital human verification device; the digital human model making device receives the second digital signature, digital human model, verification features and identity information from the digital human verification device; the digital human model making device sends the second digital signature, digital human
- the model, verification features and identity information are sent to the communication server, wherein the digital human model is used for the user corresponding to the terminal device to communicate with the digital human image.
- the method further includes: the digital human model making device uses the private key of the digital human model making device to add a third digital signature to the digital human model, verification features and identity information; the digital human model making device adds the two The digital signature, the third digital signature, the digital human model, the verification feature and the identity information are sent to the communication server.
- a method for providing additional call services is provided, which can be applied to a digital human auditing device or a chip in a digital human auditing device.
- the method includes: the digital human auditing device receives The digital human model, verification features, and identity information of the user corresponding to the encrypted terminal device of the digital human model making device; The digital human model, verification features and identity information are decrypted; the legitimacy of the user's digital human model corresponding to the terminal device is reviewed; after the review is passed, the digital human model, calibration verification features and identity information to add a second digital signature; the digital human verification device sends the second digital signature, digital human model, verification features and identity information to the digital human model making device, wherein the digital human model is used for the user corresponding to the terminal device Talk as a digital human.
- a communication device including the first aspect or any possible implementation manner of the first aspect or the second aspect or any possible implementation manner of the second aspect or the third aspect or any possible implementation manner of the second aspect or the third aspect or the third aspect Any possible implementation of the fourth aspect or any possible implementation of the fourth aspect or any possible implementation of the fourth aspect or the fifth aspect or any possible implementation of the fifth aspect or the sixth aspect or any possible implementation of the sixth aspect or the module of the method described in the seventh aspect or any possible implementation manner of the seventh aspect.
- a communication device including a processor and a memory, the processor is coupled to the memory; the memory is used to store program instructions; the processor is used to read the program instructions stored in the memory, so as to implement the first aspect Or any possible implementation of the first aspect, or any possible implementation of the second aspect, or any possible implementation of the second aspect, or the third aspect, or any possible implementation of the third aspect, or the fourth aspect, or any possible implementation of the fourth aspect A possible implementation manner or the fifth aspect or any possible implementation manner of the fifth aspect or the sixth aspect or any possible implementation manner of the sixth aspect or the seventh aspect or any possible implementation manner of the seventh aspect method described in the method.
- a computer-readable storage medium in which computer programs or instructions are stored, and when the computer programs or instructions are executed by a communication device, any possible implementation manner as in the first aspect or the first aspect can be realized Or the second aspect or any possible implementation of the second aspect or the third aspect or any possible implementation of the third aspect or the fourth aspect or any possible implementation of the fourth aspect or the fifth aspect or The method described in the fifth aspect or any possible implementation manner of the sixth aspect or any possible implementation manner of the sixth aspect or the seventh aspect or any possible implementation manner of the seventh aspect.
- a computer program product including instructions, when it is run on a computer, such that the first aspect or any possible implementation manner of the first aspect or the second aspect or any one of the second aspect Possible implementation or the third aspect or any possible implementation of the third aspect or the fourth aspect or any possible implementation of the fourth aspect or the fifth aspect or any possible implementation of the fifth aspect or The method described in the sixth aspect or any possible implementation manner of the sixth aspect or the seventh aspect or any possible implementation manner of the seventh aspect is executed.
- a call system including a first terminal device, a second terminal device, and a communication server that provides services for calls between the first terminal device and the second terminal device;
- the first terminal device is configured to execute the method described in the first aspect or any possible implementation manner of the first aspect;
- the communication server is configured to execute the method described in the second aspect or any possible implementation manner of the second aspect
- the second terminal device is configured to present the digital human image of the user corresponding to the first terminal device on the call application interface based on the digital human content of the user corresponding to the first terminal device.
- a call system including a first terminal device, a second terminal device, and a communication server that provides services for calls between the first terminal device and the second terminal device;
- the first terminal device is configured to execute the method described in the third aspect or any possible implementation manner of the third aspect
- the communication server is configured to execute the method described in the fourth aspect or any possible implementation manner of the fourth aspect
- the second terminal device is configured to present the digital human image of the user corresponding to the first terminal device on the call application interface based on the digital human content of the user corresponding to the first terminal device.
- a communication system which is characterized in that it includes a communication server, a digital human model manufacturing device, and a digital human model manufacturing device;
- the communication server is configured to execute the method as described in the fifth aspect or any possible implementation manner of the fifth aspect;
- the digital human model making device is used to execute the method as described in the sixth aspect or any possible implementation manner of the sixth aspect;
- the device for making a digital human model is configured to execute the method as described in the seventh aspect or any possible implementation manner of the seventh aspect.
- FIGS. 1A to 1G are schematic diagrams of several call scenarios provided by the embodiment of the present application.
- FIG. 2 is a flow chart of a method for providing call additional services provided by an embodiment of the present application
- Fig. 3A is a schematic diagram of the digital human technology stack provided by the embodiment of the present application.
- FIG. 3B is a flow chart of another method for providing call additional services provided by the embodiment of the present application.
- FIG. 3C is a flow chart of another method for providing call additional services provided by the embodiment of the present application.
- FIG. 4 is a flow chart of another method for providing call additional services provided by the embodiment of the present application.
- FIG. 5 is a flow chart of another method for providing call additional services provided by the embodiment of the present application.
- FIG. 6 is a schematic diagram of several types of data transmission channels provided by the embodiment of the present application.
- FIG. 7 is a schematic diagram of several possible keys provided by the embodiment of the present application.
- FIG. 8A is a flow chart of an authentication and authentication method provided in the embodiment of the present application.
- FIG. 8B is a flow chart of an authentication and authentication method provided in the embodiment of the present application.
- FIG. 8C is a flow chart of an authentication and authentication method provided in the embodiment of the present application.
- Fig. 8D is a flow chart of a digital asset purchase method provided by the embodiment of the present application.
- FIGS. 9A to 9C are schematic diagrams of a specific embodiment provided by the embodiment of the present application.
- FIGS. 10A to 10C are schematic diagrams of a specific embodiment provided by the embodiment of the present application.
- FIGS. 11A to 11C are schematic diagrams of another specific embodiment provided by the embodiment of the present application.
- FIGS. 12A to 12C are schematic diagrams of another specific embodiment provided by the embodiment of the present application.
- FIGS. 13A to 13D are schematic diagrams of another specific embodiment provided by the embodiment of the present application.
- FIGS. 14A to 14C are schematic diagrams of another specific embodiment provided by the embodiment of the present application.
- 15A to 15C are schematic diagrams of another specific embodiment provided by the embodiment of the present application.
- 16A to 16C are schematic diagrams of another specific embodiment provided by the embodiment of the present application.
- FIG. 17A to 17B are schematic diagrams of another specific embodiment provided by the embodiment of the present application.
- 18A to 18B are schematic diagrams of another specific embodiment provided by the embodiment of the present application.
- FIG. 19 is a flow chart of a specific method for providing call additional services provided by the embodiment of the present application.
- FIG. 20 is a schematic structural diagram of a communication device provided by an embodiment of the present application.
- FIG. 21 is a schematic structural diagram of another communication device provided by an embodiment of the present application.
- the "digital human” involved in this article refers to the presentation of character feature data generated by computer equipment through electronic equipment (such as mobile phones, computers, virtual reality (Virtual Reality, VR)/augmented reality (Augmented Reality, AR) glasses, etc.)
- electronic equipment such as mobile phones, computers, virtual reality (Virtual Reality, VR)/augmented reality (Augmented Reality, AR) glasses, etc.
- "digital human” is also called “virtual human”.
- the "digital human service” mentioned in this article can also be called “digital human value-added service”, which refers to a software service that maps the appearance, movement or voice of a real person to a digital human. Character characteristics (including shape, action, voice, etc.) are reproduced in electronic equipment in real time through the image of a digital human, for example, in the form of two-dimensional or three-dimensional video. In this way, the service enables digital humans to have expressions, behaviors and language expressions similar to real people.
- the image of a real person in the embodiment of the present application may include the appearance, expression, action, voice, etc. of a real person.
- the subject of the call that user a sees in the call application interface of terminal device A can be the real image of user b; if terminal device B uses digital human service, user a
- the subject of the call seen in the call application interface of terminal device A is the digital human image of user b.
- the call application interface includes but is not limited to: an interface in a phone application (Application, APP) built in the terminal device system, an interface in an instant chat software, an interface in a game application, and the like.
- the behavior of user a's digital human can change as user a's behavior changes, for example, user a raises his left hand , then the digital human of user a also raises his left hand, and if user a blinks, the digital human of user a also blinks, and so on.
- the behavior of user b's digital human can change as user b's behavior changes, for example, user b raises his left hand , then the digital human of user b also raises his left hand, and if user b blinks, the digital human of user b also blinks, and so on.
- the digital human service in the embodiment of the present application includes, but is not limited to, presenting the user's digital human image in the call application interface of at least one peer terminal device, and may also include: the call application interface of at least one peer terminal device The user's virtual scene picture is presented in the interface, the props used to modify the user's digital human are presented in the call application interface of at least one peer terminal device, the user's virtual voice is played on at least one peer terminal device, and so on.
- the "digital human model” mentioned in this article refers to the static data of the digital human, mainly including the surface information of the digital human, which can be collected through technical means such as photography and structured light scanning.
- the digital human service can generate digital human dynamic data based on the driving parameters of real people (such as expressions, actions, etc.) and the digital human model (this process is called "reconstruction", please refer to the following description for details).
- the display device performs display, that is, the display device can display the digital human.
- the "digital human dynamic data” mentioned in this article refers to the dynamic data used to display the virtual character, which can be stored in the storage space of the device as a file, and the file describes one or more frames of the virtual character. Two-dimensional or three-dimensional images, so “digital human dynamic data” can also be called “digital human image sequence”.
- the display device can present the one or more frames of two-dimensional or three-dimensional images through a display screen.
- the file can be in binary form or in text form, which is not limited in this application.
- the "user” mentioned in this article can be understood as the user of the terminal device.
- the user can operate the terminal device and use various functions provided by the terminal device, such as a function of making a call, a function of answering a call, and the like.
- the terminal equipment involved in this article may also be referred to as terminal, user equipment (user equipment, UE), mobile station, mobile terminal, etc.
- the terminal device can be any device that supports the call function, such as a mobile phone, a computer, a virtual reality (Virtual Reality, VR) device, an augmented reality (Augmented Reality, referred to as AR) device, a wearable device, a vehicle, a drone, a helicopter , aircraft, ships, robots, robotic arms, smart home devices, etc.
- the embodiment of the present application does not limit the specific call network adopted by the terminal device and the specific device form of the terminal device.
- the terminal device can implement the call based on the telephone line, or implement the call based on the IP line, or implement the call based on other technologies, and the embodiment of the present application does not limit the specific call technology adopted by the terminal device.
- the user's scene involved in this paper includes the user's real scene and the user's virtual scene, wherein the real scene refers to the real environment in which the user is located, and the user's virtual scene is not the real environment in which the user is located.
- the embodiments of the present application can be applied to various scenarios of real-time interaction, such as call scenarios, online games, live broadcast, and the like.
- this article mainly takes the call scenario as an example.
- FIG. 1A it is a schematic diagram of a call scenario provided by the embodiment of the present application.
- the call system shown in FIG. 1A includes terminal device A and terminal device B.
- Terminal device A and terminal device B can establish a call connection through a communication server. and make a call.
- terminal device that actively initiates a call among terminal device A and terminal device B can be defined as the calling terminal device (may be referred to as "calling terminal” or “caller” for short), and the terminal device that is opposite to the calling terminal device ( That is, the device that receives the call) can be defined as the called terminal device (may be referred to as "called terminal” or “called” for short).
- the calling terminal device may be referred to as "calling terminal” or "caller” for short
- the terminal device that is opposite to the calling terminal device That is, the device that receives the call
- the called terminal device may be defined as "called terminal” or “called” for short.
- terminal device A may also actively initiate a call, that is, terminal device B is the calling terminal device, and terminal device A is the called terminal device.
- FIG. 1A is an example of a scene where two terminal devices are talking.
- the embodiment of the present application is not limited to a scene where two terminal devices are talking. It can also be applied to a scene where two or more terminal devices are talking, such as a video conference scene.
- FIG. 1B it is a schematic diagram of another call scenario provided by the embodiment of the present application.
- the call system shown in FIG. 1B includes terminal device C, terminal device D and terminal device E, terminal device C, terminal device D and terminal device E can establish a call connection through the communication server and make a call.
- terminal equipment C terminal equipment C
- terminal equipment D terminal equipment D
- terminal equipment E the terminal equipment that initiates a call actively is defined as the calling terminal equipment (for example, terminal equipment C).
- the called terminal device (such as terminal device D, terminal device E, etc.).
- terminal device B is a peer terminal device of terminal device A
- terminal device A is a peer terminal device of terminal device B
- terminal device B and terminal device C are peer terminal devices of terminal device A
- terminal device A and terminal device C are peer terminal devices of terminal device B
- terminal device B and terminal device A are both The peer terminal device of terminal device C.
- the communication server in this embodiment of the present application may be one or more devices in a communication network that provide call services for terminal devices. It can be understood that FIG. 1A and FIG. 1B only show one communication server, and are not limited thereto.
- the communication server may be a device in an Internet protocol (internet protocol, IP) multimedia subsystem (IP multimedia subsystem, IMS) network.
- IP Internet protocol
- IMS IP multimedia subsystem
- the IMS network is a network system for providing multimedia services in the IP network, and various multimedia services, such as voice calls and video calls, can be provided for terminal devices through the IMS network.
- the IMS network may include one or more network elements, for example, may include a call control function (serving-call session control function, CSCF) network element.
- CSCF serving-call session control function
- the CSCF network element is a functional entity inside the IMS network and the core of the entire IMS network, and is mainly responsible for processing signaling control during the multimedia call session.
- CSCF network elements can be further divided into serving-call session control function (S-CSCF network element), query-call session control function (interrogating-call session control function, I-CSCF network element), Proxy-call session control function (proxy-call session control function, P-CSCF network element), etc.
- S-CSCF network element serving-call session control function
- query-call session control function interrogating-call session control function, I-CSCF network element
- Proxy-call session control function proxy-call session control function
- P-CSCF network element is the edge network node of the IMS network.
- the role of the P-CSCF network element in the IMS network is similar to performing proxy services.
- S-CSCF network element is the service processing node of the IMS network, responsible for IMS network registration of terminal equipment and related calling and called service processing; I-CSCF network element can connect to S-CSCF network element And the P-CSCF network element, used to provide the entrance to the home network for the terminal device, when the terminal device roams to other networks, send a message to the P-CSCF network element, and the P-CSCF network element can forward the message from the terminal device To the I-CSCF network element, send the message from the terminal device to the S-CSCF network element through the I-CSCF network element.
- the P-CSCF network element, the S-CSCF network element, and the I-CSCF network element may be independently configured in different entities, or may be integrated in the same entity.
- the P-CSCF network element, the S-CSCF network element, and the I-CSCF network element are collectively referred to as CSCF network elements.
- the embodiment of the present application also provides a media service (Media Server, MS) network element, which is used to provide digital human-related services, such as digital human services, for terminal devices.
- MS Media Server
- the specific implementation of the MS network element may be a type of Application Server (AS) network element, or a newly defined network element, which is not limited in this application.
- AS Application Server
- the MS network element can be deployed in the IMS network, as shown in Figure 1C, the MS network element is a separate network element, and communicates with the CSCF network element, or as shown in Figure 1D, the MS network element can also communicate with the CSCF network element integrated together.
- the MS network element can also be deployed outside the IMS network, as shown in Figure 1E.
- the communication server needs to implement multiple functions related to the digital human service, multiple MS network elements can be deployed at the same time to be responsible for different functions respectively, and of course one MS network element can also be deployed to be responsible for multiple functions.
- the embodiment of the present application does not limit the specific deployment manner of the MS network element.
- FIG. 1C-FIG. 1E are examples where terminal device A and terminal device B belong to the same IMS network.
- terminal device A and terminal device B may also belong to different IMS networks respectively.
- terminal device A belongs to the IMS-1 network
- terminal device B belongs to the IMS-2 network.
- Both the IMS-1 network and the IMS-2 network are deployed with CSCF network elements and MS network elements.
- the IMS-1 network , and the IMS-2 network cooperate with each other to provide call services for terminal equipment A and terminal equipment B.
- the IMS-1 network and the IMS-2 network respectively correspond to different operators (for example, are deployed or maintained by different operators).
- the communication server may be a device in a non-IMS network.
- the communication network can also be built based on a private cloud or a public cloud or a data center, and the communication server can be an MS network element in the private cloud or a public cloud or a data center.
- the specific implementation of the MS network element is, for example, instant messaging server, etc. This application does not specifically limit the communication network.
- this article mainly takes the communication network as an IMS network as an example.
- the MS network element can provide the digital human service for the terminal equipment with the cooperation of the CSCF network element.
- the "call" in the embodiment of the present application may be a video call, or other forms of calls, such as voice calls, instant chat, etc., which are not limited in this application.
- both parties to the call can see each other's video in the calling application, and if the other party uses the digital human service, they can see the digital human image of the other party in the other party's video.
- the two parties in the call can hear each other’s voice in the calling application, and if the other party uses the digital human service, they can also see the digital human image of the other party, which can be presented in the form of video , or dynamic images, or static images, etc., this application does not limit.
- a video call between two terminal devices is mainly taken as an example.
- the following describes how to introduce digital human beings into large-scale real-time communication scenarios, how to provide local digital human content to the peer end according to user needs, and how to present digital human content provided by the peer end locally according to user needs.
- FIG. 2 it is a flow chart of a method for providing call additional services provided by an embodiment of the present application. Taking the method applied to the scenario shown in FIG. 1C as an example, it includes:
- the terminal device A acquires the first call additional service requirement information.
- the first call additional service requirement information includes the digital human service requirement information of terminal device A, for example, the first call additional service requirement information may indicate that user a corresponding to terminal device A requires a call application on at least one opposite terminal device A digital human image of user a is presented on the interface.
- this embodiment of the present application takes a peer terminal device "terminal device B" as an example, and the implementation method of other peer terminal devices may refer to the implementation method of terminal device B.
- the acquisition of call plus service requirement information by terminal device A includes: receiving an operation input by user a, and generating first call plus service requirement information according to the operation.
- This operation indicates that: user a requests to present the digital human image of user a in the calling application interface of terminal device B. It can be understood that this application does not specifically limit the operation input by the user. For example, the operation may be clicking the control corresponding to the digital human service on the display interface of terminal device A, or the operation may be clicking on the display interface of terminal device A. Check the menu corresponding to the digital human service, etc.
- terminal device A's acquisition of the call plus service requirement information includes: obtaining system setting information, and determining the call plus service requirement information according to the system setting information.
- the system setting information indicates that user a requires at least The digital human image of user a is presented on the call application interface of user a.
- Terminal device A enables at least one opposite terminal device (such as terminal device B) to present user a's digital human image on the call application interface based on user a's digital human content according to the first call additional service demand information.
- the digital human content may be a visualized digital human image, or a coded result of the visualized digital human image, which is not limited in this application.
- the terminal device B can present the user's digital human image in the form of an image on the display screen of the terminal device B.
- the communication server refers to a communication server that provides digital human services for the terminal device A, for example, it may be the MS network element shown in FIG. 1C .
- the operation of generating digital human content can be performed by terminal device A, or by a communication server, or by terminal device B, which is not limited in this application.
- the terminal device makes the communication server send the user's digital human content to the terminal device B according to the first call additional service demand information, so that the terminal device B presents the user's digital human content on the call application interface based on the user's digital human content. person image.
- the terminal device generates the user's digital human content according to the first call additional service demand information; the terminal device sends the user's digital human content to the communication server, so that the communication server sends the user's digital human content to terminal device B; terminal device B receives the user's digital human content After receiving the digital human content, present the digital human image of user a in the call application interface based on the digital human content; or,
- the terminal device sends the first call additional service demand information to the communication server, so that the communication server generates the user's digital human content and sends the user's digital human content to terminal device B; after terminal device B receives the digital human content, it The human content presents the digital human image of user a in the call application interface.
- the terminal device enables terminal device B to generate the user's digital human content based on the first call additional service demand information, so that terminal device B presents the user's digital human image on the call application interface based on the user's digital human content .
- the terminal device sends the first call additional service requirement information to terminal device B through the communication server (the communication server first receives the first call additional service requirement information sent by the terminal device, and then sends the additional service requirement information to terminal device B),
- the terminal device B generates the user's digital human content according to the first call additional service demand information, and then presents the user's digital human image on the call application interface based on the user's digital human content.
- the user's digital human image can be presented in the call application interface of the peer terminal device according to the user's requirements, which can improve user experience.
- the first call additional service requirement information may also be used to indicate that user a corresponding to terminal device A requests to present a virtual scene picture of user a in the call application interface of terminal device B.
- the virtual scene picture of user a is opposite to the real scene picture of user a.
- the real scene picture refers to an image corresponding to the real environment of user a.
- the real scene picture of user a is an outdoor image that can actually be collected by the terminal device.
- the virtual scene picture of user a is not an image corresponding to the real environment of user a, for example, it is a meeting room image downloaded by user a from the Internet.
- image and “screen” can be interchanged.
- terminal device A may also make the communication server send the user's virtual scene content to terminal device B according to the first call additional service demand information, so that terminal device B presents user a's virtual scene content on the call application interface based on user a's virtual scene content.
- a virtual scene picture; or, terminal device A can also make terminal device B generate user a's virtual scene content according to the first call additional service demand information, so that terminal device B presents user a on the call application interface based on user a's virtual scene content virtual scene screen.
- the content of the virtual scene may be a visualized virtual scene image, or a coded result of the visualized virtual scene image, which is not limited in this application.
- the terminal device B can present the virtual scene picture of the user a in the form of an image on the display screen of the terminal device B.
- terminal device A generates user a's virtual scene content according to the first call additional service demand information; terminal device A sends user a's virtual scene content to the communication server, so that the communication server sends user a's virtual scene content to terminal device B ;or,
- terminal device A sends the first call additional service demand information to the communication server, so that the communication server generates the virtual scene content of user a and sends the virtual scene content of user a to terminal device B; or,
- terminal device A sends the first call additional service requirement information to terminal device B through the communication server, so that the terminal device generates the virtual scene content of user a according to the first call additional service requirement information.
- the call screen of user a displayed by terminal device B may include user a's digital avatar and virtual scene screen at the same time, and the two together form the call screen of user a. Therefore, if user a requests to present user a's digital human image and virtual scene picture in the call application interface of terminal device B, the picture presented on the call application interface of the opposite terminal device is user a's digital human activity in user a's virtual scene. scene.
- user a's digital human content and virtual scene content together constitute user a's call screen content.
- the user's virtual scene picture can be presented in the call application interface of the peer terminal device according to the user's requirement, which can improve user experience.
- the communication server needs to be a call connection relationship between terminal device A and terminal device B.
- the CSCF network element in the communication server triggers the MS network element to establish a call connection relationship between terminal device A and terminal device B.
- the establishment of the call connection relationship between terminal equipment A and terminal equipment B by the MS network element includes: the establishment of a data transmission channel between the MS network element and the terminal equipment A by the MS network element according to the address information and port information of the terminal equipment A, the MS network element The data transmission channel between terminal device B's address information, port information, etc. and terminal device B.
- For the data transmission channel please refer to the related introduction later.
- the call is bidirectional, so terminal device B may also perform the method performed by terminal device A above.
- the terminal device B obtains the second call additional service requirement information, where the second call additional service requirement information is used to indicate that user b corresponding to the terminal device B requests to present the digital human of user b in the call application interface of the terminal device A image.
- Terminal device B causes the communication server to send user b's digital human content to terminal device A according to the second call additional service demand information, so that terminal device A presents user b's digital human image on the call application interface based on user b's digital human content;
- terminal device B enables terminal device A to generate user b's digital human content based on the second call additional service demand information, so that terminal device A presents user b's digital human image on the call application interface based on user b's digital human content.
- the digital human service can be provided for both parties in the call according to the requirements of the two parties in the call, which can further improve the user experience.
- terminal device A may also acquire third call additional service requirement information, where the third call additional service requirement information is used to indicate that user a requests that the call subject corresponding to user b (which may be user b's A real person image (or a digital human image of user b) is presented in the scene picture corresponding to user a or user b.
- third call additional service requirement information is used to indicate that user a requests that the call subject corresponding to user b (which may be user b's A real person image (or a digital human image of user b) is presented in the scene picture corresponding to user a or user b.
- terminal device A After receiving user b's digital human content from the communication server, terminal device A synthesizes user b's call screen content based on user b's digital human content and user a's virtual scene content, and then displays user b's call screen content. User a can see in the call application interface of terminal device A that the digital avatar corresponding to user b is active in user a's virtual scene.
- the communication server generates user b's digital human content, synthesizes user b's call screen content based on user b's digital human content and user a's real scene content, and sends the synthesized user b's call screen content to terminal device A.
- terminal device A After terminal device A receives the synthesized call screen content of user b from the communication server, it displays it on the call application interface of terminal device A.
- User a can see in the call application interface of terminal device A that the digital avatar corresponding to user b is active in user b's real scene.
- the call is bidirectional, so the above method is also applicable to the terminal device B, for example, the terminal device B can also obtain the fourth call additional service demand information, where the fourth call additional service demand information is used to indicate that user b requires the user
- the subject of the call corresponding to user a (it can be the real person image of user b, or the digital human image of user b) is presented in the scene picture corresponding to user a or user b.
- the specific method refer to the implementation method of terminal device A, which is not described here. Let me repeat.
- the call screen content of user a may be presented in the call application interface of terminal device B according to the fourth call additional service requirement information.
- user a requests to display the virtual scene picture of user a in the call application interface of terminal device B
- user b requests to present the digital human figure corresponding to user a in the virtual scene picture corresponding to user b
- terminal device B presents the virtual scene picture of user b. Corresponding virtual scene picture.
- the terminal device can present the call screen content of the peer terminal device according to the user's requirements, which can further improve user experience.
- the additional service requirement information for the first call, the additional service requirement information for the second call, and the additional service requirement information for the third call may be carried in a Session Description Protocol (Session Description Protocol, SDP) message or session initiation Protocol (Session Initiation Protocol, SIP) message.
- SDP Session Description Protocol
- SIP Session Initiation Protocol
- the additional service requirement information of the first call may be carried in a SIP message header field.
- the CSCF network element After the CSCF network element receives the SIP message carrying the additional service requirement information of the first call, it can directly forward the SIP message to the MS network element (that is, the transparent transmission mode), and can also analyze the data content carried by the SIP message, and reassess the data content. After encapsulation (such as adding the address information and port information of terminal device A, etc.), it is forwarded to the MS network element. Based on this implementation manner, resource overhead can be saved.
- the following describes the specific implementation process for the communication server and each terminal device to provide digital human services to users in a large-scale real-time communication scenario when users have a demand for digital human services.
- the digital human technology stack mainly involves modeling, capture, reconstruction, and rendering.
- the operations for implementing digital human services in the embodiments of the present application include one or more of modeling, capture, reconstruction, and rendering.
- Modeling operation that is, making a digital human model whose image corresponds to that of a real person and has an appearance similar to that of a real person.
- the digital human model may be a binary file describing a two-dimensional or three-dimensional image of a real person.
- modeling is mainly made offline, and the surface information of the modeling object (that is, a real person) is collected through technical means such as camera shooting and structured light scanning to form a digital human model.
- Capturing operation refers to the technology of recording and processing the movements of people or other objects. It is widely used in many fields such as entertainment, sports, medical applications, computer vision and robotics. In the field of digital human development, it usually records human actions, expressions, etc., and converts them into actions that can drive digital models, thereby generating two-dimensional or three-dimensional computer animations. When it captures subtle movements of the face or fingers, it's often called performance capture. In many fields, motion capture is sometimes called motion tracking.
- the capture operation is mainly used to obtain the user's driving parameters, including but not limited to at least one of the user's lip shape, expression, movement, depth information, etc.
- Method 1 Extract driving parameters from the video stream
- the user's expression information, action information, etc. are recorded from the user's video collected by the terminal device.
- the gyroscope in the terminal device collects the movement information of real people, etc.
- the depth camera on the terminal device collects the spatial location information of real people, etc.
- the terminal device may not collect the user's video.
- the specific implementation method can be to generate digital human dynamic data (that is, dynamic data for displaying virtual characters) according to the driving parameters and the digital human model, such as generating a digital human image sequence, which includes a set of time-series image frames (The image frame can be stored in the device storage space in the form of a file, and the file describes the multi-frame two-dimensional or three-dimensional image of the virtual character), when the digital human image sequence is loaded into the display device, the display device can The multi-frame two-dimensional or three-dimensional images are presented through the display screen, so that the digital human can show actions, expressions and other behaviors similar to real people.
- the image frame obtained after the reconstruction operation may only include the image of the digital human, that is, there may be no background image.
- Rendering operation refers to the process of generating an image from a model by software.
- a model is a strictly defined two-dimensional/three-dimensional object or virtual scene description in a computer language or data structure, which includes information such as geometry, viewpoint, texture, lighting, and shadow.
- the rendering operation may include the process of synthesizing (or merging and superimposing) the image of the digital human with the scene image, that is, the process of fusing each frame of the digital human image with the scene image, and the fused
- Each image frame includes a digital human image and a scene image.
- the digital persons in different image frames after fusion may be different, and the scenes may also be different.
- the display device presents a picture of a digital human moving in the scene, presenting an appearance and behavior similar to a real person.
- the image obtained after the rendering operation is the image in the call screen content presented by the user to the opposite end.
- the scene image may be an image corresponding to a real scene, such as a background image of a real person (user) in a real environment, or a virtual scene image, such as a pre-arranged background image or a background image downloaded from the Internet. No restrictions.
- the rendering operation may also be performed in combination with an observer's perspective.
- the observer refers to the user who finally watches the content of the call screen.
- user a is an observer.
- the viewing angle information of user a can be used The content of the call screen of user b is synthesized so that the content of the call screen of user b can better meet the viewing needs of user a.
- user b watches the content of the call screen of user a
- user b is an observer.
- the content of the call screen of user a can be synthesized according to the viewing angle information of user b, so that user a The content of the call screen can better meet the viewing needs of user b.
- the rendering operation may also include merging digital assets with digital human images and scene images. That is to say, the image obtained after the rendering operation may also include the user's digital assets and the like.
- digital assets include but are not limited to virtual clothes, virtual scenes or other props used to modify the user's digital human.
- each operation in the above-mentioned capture operation, reconstruction operation, and rendering operation may be performed by a communication server (such as an MS network element), or may be performed by a terminal device, which is not limited in this application.
- a communication server such as an MS network element
- Example 1 When the capture operation is performed on terminal device A of user a, terminal device A obtains the driving parameters of user a from the video, audio, etc. collected by terminal device A.
- the MS network element When the capture is executed on the MS network element, the MS network element obtains the driving parameters of the user a from the video and audio collected by the terminal device A.
- terminal device A maps user a's expression and/or action to user a's digital character model according to user a's driving parameters, and obtains user a's digital character model Human image sequence; wherein, the driving parameters of user a can be obtained by terminal device A by performing a capture operation, or can be provided by MS network elements to terminal device A; terminal device A can read the digital human model (such as digital human model stored locally in terminal device A), and terminal device A can also download the digital human model from the MS network element (for example, the digital human model is stored in the MS network element).
- the digital human model such as digital human model stored locally in terminal device A
- terminal device A can also download the digital human model from the MS network element (for example, the digital human model is stored in the MS network element).
- the MS network element maps the expression and/or action of the user a to the digital character model of the user a according to the driving parameters of the user a, and obtains the digital human image sequence of the user a; wherein, the user
- the driving parameters of a can be obtained by the MS network element through the capture operation, or can be provided by the terminal device A to the MS network element; the MS network element can obtain the digital human model from the terminal device A (for example, the digital human model is stored locally in the terminal device A ), the digital human model can also be read locally from the MS network element (for example, the digital human model is stored in the MS network element).
- Example 3 When the rendering operation is performed on terminal device A of user a, terminal device A superimposes the digital human image of user a on the scene image of user a to obtain the content of the call screen of user a; wherein, the digital human image of user a It may be obtained by the terminal device A by performing a reconstruction operation, or it may be provided by the MS network element to the terminal device A.
- the MS When the rendering operation is performed on the MS network element, the MS superimposes the digital human image of user a on the scene image of user a to obtain the content of the call screen of user a; wherein, the digital human image of user a can be reconstructed by the MS network element
- the obtained operation may also be provided by terminal equipment A to the MS network element.
- terminal device B When the rendering operation is performed on terminal device B, terminal device B superimposes the digital human image of user a on the scene image of user a to obtain the content of the call screen of user a; wherein, the digital human image of user a may be provided by the MS network element to terminal device B, or terminal device A to terminal device B.
- a device such as a terminal device, or a communication server, or a peer terminal device to execute.
- FIG. 3B it is a flow chart of another method for providing call additional services provided by the embodiment of the present application. Taking this method applied to the scene shown in FIG. 1C as an example, and providing digital human service for terminal device A as an example , methods include:
- the terminal device A sends the first call additional service capability information, and correspondingly, the communication server receives the first call additional service capability information.
- the communication server is specifically, for example, an MS network element, and the call additional service capability information of device A may first arrive at the CSCF network element, and then be forwarded to the MS network element via the CSCF network element, as shown in FIG. 3C .
- the first call additional service capability information includes terminal device A's digital human service capability information.
- the first call additional service capability information is used to indicate the digital human service capability of the terminal device A.
- the first call additional service capability information may include one or more of the following:
- Terminal device A can provide (such as audio, video, expression, body movements, etc.);
- the communication server sends the indication information to the terminal device A, and correspondingly, the terminal device A receives the indication information sent by the communication server.
- the indication information may be sent by the MS network element to the CSCF network element, and then forwarded to the terminal device A via the CSCF network element, as shown in FIG. 3C .
- the operation to be performed by the terminal device A may be decided by the MS network element.
- the MS network element determines the digital human processing operation performed by the terminal device according to at least one of the first call additional service capability information, the second call additional service capability information, and the third call additional service capability information; then the MS network element determines through the CSCF
- the network element sends indication information to terminal device A, and the indication information is used to indicate the digital human processing operation performed by terminal device A.
- the digital human processing operation performed by terminal device A is, for example, at least one of capture operation, reconstruction operation, and rendering operation, Or "none" or "0" (ie, do not perform digital human processing operations).
- the second call additional service capability information includes digital human service capability information of the communication server.
- the second call additional service capability information is used to indicate the digital human service capability of the communication server.
- the call additional service capability information of the communication server includes one or more of the following:
- the third call additional service capability information includes digital human service capability information of at least one opposite terminal device.
- the third call additional service capability information is used to indicate the digital human service capability of at least one opposite terminal device.
- the third call additional service capability information is used to indicate the digital human service capability of terminal device B.
- the third call additional service capability information includes one or more of the following:
- the terminal device B also sends third call additional service capability information, and the communication server receives the third call additional service capability information.
- the operation performed by terminal device A may be decided by terminal device B.
- terminal device B performs the digital human processing operation performed by terminal device A according to the first call additional service capability information, the second call additional service capability information, and the third call additional service capability information, and then terminal device B transmits to the MS network element via the CSCF network element Send the indication information, and then the MS network element sends the indication information to the terminal device A via the CSCF network element.
- the indication information is used to indicate the digital human processing operation performed by the terminal device A.
- the digital human processing operation performed by the terminal device A is, for example, capturing At least one of operations, reconstruction operations, rendering operations, or "None" or "0" (ie, do not perform digital human processing operations).
- the operation that terminal device A needs to perform may be decided by terminal device A itself.
- the indication information may be used to indicate: the second call additional service capability information, the third call additional service capability information.
- the terminal device A executes at least one digital human processing operation or does not execute the digital human processing operation according to the instruction information.
- the indication information is used to instruct the terminal device A to perform at least one operation among capture operation, reconstruction operation, and rendering operation, then the terminal device A directly executes at least one operation indicated by the indication information.
- the terminal device A does not perform the digital human processing operation.
- the indication information is used to indicate the second call additional service capability information and the third call additional service capability information; At least one item of determines the digital human processing operation performed by terminal device A.
- terminal device A performs the at least one operation.
- the above S31-S32 may occur before the terminal device A and the terminal device B start talking, for example, the above-mentioned S31-S32 may be performed during the call phase.
- the communication server may perform capture operation, reconstruction operation, rendering At least one of the operations. It can be understood that the digital human processing operations performed by the terminal device A and the communication server B are different.
- the terminal device performs the capture operation, and the communication server performs the reconstruction operation and rendering operation; or, the terminal device performs the capture operation and reconstruction operation, and the communication server performs the rendering operation; or, the terminal device performs the capture operation, and the communication server performs the reconstruction operation, rendering operation ,etc.
- terminal device B It can be understood that if user b corresponding to terminal device B requires the digital human of user a to appear in user b's (real or virtual) scene, then the rendering operation can also be performed by terminal device B.
- the capture operation, the reconstruction operation, and the rendering operation may all be performed by the terminal device A or the communication server.
- the above embodiments shown in S31-S33 are provided as an example for terminal device A to provide digital human services.
- the call is two-way.
- the digital human service can also be provided only for user b, and the digital human service can also be provided for user a and user b at the same time. Therefore, the same method can also be applied to provide digital human service for terminal device B.
- the above-mentioned first call additional service capability information, second call additional service capability information, third call additional service capability information, indication information, etc. may be carried in an SDP message or a SIP message. For example, it may be carried in a SIP message header field.
- terminal device A may carry the capabilities of terminal device A (such as first call additional service capability information), user a's requirements on digital human services (such as first call additional service demand information, The third call adds service requirement information, etc.), and then notifies the communication server and terminal device B of the capabilities of terminal device A and the requirements of user a.
- terminal device A and the communication server cooperate with each other to complete operations such as capture operation, reconstruction operation, rendering operation, etc.
- Device B sends the call screen content (including digital human content and/or virtual scene content) of user a.
- the operation of generating digital human content may include all operations in capture operation, reconstruction operation, and rendering operation, or may only include some of them (for example, only include rendering operation) , which is not limited in this embodiment of the present application.
- the "digital human content" described in the embodiments shown in S21-S22 can be static digital human data (such as digital human models), or dynamic digital human data (that is, data obtained after reconstruction operations , such as a sequence of digital human images), it may also be the content of the call screen including the digital human image (that is, the data obtained after the rendering operation), etc., which are not limited in this application.
- the content of the virtual scene may only include the image corresponding to the virtual scene, or it may be the content of the call screen including the picture of the virtual scene, etc., which are not limited in this application.
- FIG. 4 is a flow chart of another communication method provided in the embodiment of the present application.
- the method takes terminal device A as the calling terminal device and terminal device B as the called terminal device as an example.
- the method includes:
- the terminal device A sends a message A to the communication server (taking the MS network element as an example), which carries the following parameters: the demand of user a (Expect), the content of the negotiation between the terminal device A and the terminal device B (Caller-Callee ), the negotiation content (Caller-CN) between the terminal device A and the communication server;
- the communication server sends a message B to the terminal device B, which carries the following parameters: the demand of user a, the negotiation content between the terminal device A and the terminal device Bb, the negotiation content between the communication server and the terminal device B (CN-Callee );
- the terminal device B determines the negotiation result (Callee) of the terminal device B in combination with the requirements of the user b, the call additional service capability information of the terminal device B, and the parameters carried in the message B;
- the terminal device B sends a message C to the communication server, which carries the following parameters: the negotiation result of the terminal device B;
- the communication server determines the negotiation result (CN-Caller) between the communication server and the terminal device A;
- the communication server sends a message D to the terminal device B, which carries the following parameters: the negotiation result of the terminal device B, the negotiation result between the communication server and the terminal device A;
- the terminal device B confirms the negotiation result (Caller) of the terminal device A according to the negotiation result of the terminal device B and the negotiation result between the communication server and the terminal device A;
- the terminal device A sends a message E to the communication server, which carries the following parameters: the negotiation result of the terminal device A;
- the communication server sends a message F to the terminal device B, which carries the following parameters: the negotiation result of the terminal device A.
- terminal device A is the calling terminal device
- terminal device B is the called terminal device
- the calling terminal device that is, terminal device A
- the calling terminal device first initiates the negotiation as an example.
- it may also be The called terminal device (such as terminal device B) initiates the negotiation first, or the communication server initiates the negotiation first, etc. This application does not limit this.
- the functions performed by the communication server in FIG. 4 can be completed by the MS network element.
- the message (such as message A) sent by terminal device A to the communication server first arrives at the CSCF network element (any one of the P_CSCF network element, I_CSCF network element, and S_CSCF network element), and then the CSCF network element
- the message sent by terminal device B to the communication server (such as message C) first reaches the CSCF network element, and then forwarded by the CSCF network element to the MS network element;
- the message (such as message F) sent by the communication server to the terminal device B is from the MS network element to the CSCF network element, and then sent by the CSCF network element forwarded to terminal device B.
- the CSCF network element when forwards the message, it can forward the message to the MS network element in a transparent transmission manner (that is, directly forward the data content carried in the message without parsing), or forward it in a non-transparent transmission manner (such as parsing).
- a transparent transmission manner that is, directly forward the data content carried in the message without parsing
- a non-transparent transmission manner such as parsing
- the data content carried in the message is re-encapsulated and then forwarded to the MS network element, which is not limited in this application.
- the CSCF network element of the network can also perform identity verification on user a, and perform subsequent procedures after confirming that the current user of user a is consistent with the user bound to its number.
- the CSCF network element of the network can also perform identity verification on user b, and after confirming that the current user of user b is consistent with the user bound to its number, the subsequent process can then be performed.
- message A in step 1 and message A in step 3 are only used to indicate that the negotiation parameters carried in the messages are the same.
- Different messages may have different names, and may also carry other different data contents, which are not limited in this application.
- FIG. 5 is an example where terminal device A and terminal device B belong to the same IMS network.
- terminal device A and terminal device B may also be served by different IMS networks.
- More network elements are required to participate in the communication.
- terminal equipment A corresponds to a set of CSCF network elements and MS network elements
- terminal equipment B corresponds to another set of CSCF network elements and MS network elements.
- the communication between terminal equipment A and terminal equipment B The transmission parameters need to go through two sets of CSCF network elements and MS network elements.
- the communication server can establish a data transmission channel between the communication server (eg, MS network element) and the terminal device, for transmitting data that needs to be exchanged between the terminal device and the communication server (eg, MS network element) during a call.
- the types of data transmission channels involved in the embodiment of the present application include but are not limited to the following:
- digitalman_model_channel digital human model channel, used to transmit the unreconstructed digital human model
- digitalman_channel digital man channel, used to transmit reconstructed digital man dynamic data (that is, digital man image sequence);
- action_channel drive parameter channel, used to transmit drive parameters
- scene_channel virtual scene channel, used to transmit the virtual scene, that is, the virtual background picture data of the digital human;
- backhaul_channel backhaul channel, used to transmit the content of the call screen of the local end from the perspective of the peer end;
- viewpoint_channel view information channel, used to transmit view information
- Viewing angle information is used to describe an observer's viewing angle. For example, when user a watches the content of the call screen of user b, the viewing angle information of user a is the spatial position information of user a; for example, when user b watches the content of the call screen of user a, the viewing angle information of user b is the spatial position information of user b information.
- audio_channel audio channel, used to transmit audio
- video_channel video channel, used to transmit video, the video can be a video containing a real image (such as the original video collected by a terminal device), or a video containing a digital human image (such as a video obtained by a digital human processing operation) );
- each of the above channels may correspond to the same physical channel, or may correspond to different physical channels respectively.
- the five-tuple information (source IP, source port, destination IP, destination port, transmission protocol) of the above-mentioned different channels is different; or, different channels can be merged, for example, the driving parameter channel and the viewing angle information channel are merged, that is, the same five-tuple information is shared. tuple information.
- the type of the data transmission channel to be established between the communication server (MS network element) and the terminal device is associated with the division of labor between the communication server and the terminal device.
- terminal device A needs to use the digital human service during a call between terminal device A and terminal device B:
- Terminal device A performs a reconstruction operation.
- a first channel (digitalman_model_channel) is established between terminal device A and the communication server.
- the first channel is used to transmit the digital human model of the user corresponding to terminal device A.
- Terminal device A executes the reconstruction operation, and the communication server or terminal device B executes the rendering operation.
- a second channel (digitalman_channel) is established between terminal device A and the communication server, and the second channel is used to transmit user a corresponding to terminal device A digital human image sequence.
- Example 3 The terminal device A executes the capture operation and the communication server executes the reconstruction operation.
- a third channel (action_channel) is established between the terminal device A and the communication server. The third channel is used to transmit the driving parameters of the user corresponding to the terminal device A.
- Example 4 The communication server executes the rendering operation and the scene image is the virtual scene image of user a, and the virtual scene image is saved in terminal device A.
- a fourth channel (scene_channel) is established between the communication server and any terminal device A, and the fourth channel uses The virtual scene of user b corresponding to transmission terminal device B. It can be understood that if the virtual scene is saved on the communication server and the rendering operation is performed on the communication server, the channel may not be established.
- Terminal A needs the communication server to return the content of the call screen of terminal A.
- a fifth channel (backhaul_channel) is established between the communication server and terminal A.
- the fifth channel is used for the communication server to return the terminal device to terminal A.
- Example 6 The communication server executes the rendering operation and the call additional service capability information of the terminal device B includes viewing angle information used to indicate the terminal device B.
- a sixth channel (viewpoint_channel) is established between the communication server and the terminal device B. The sixth channel uses to transmit the viewing angle information provided by terminal device B.
- Example 7 If the capture operation is performed on the communication server, and the communication server captures the user's driving parameters from the audio of user a, then a seventh channel (audio_channel) can be established between terminal device A and the communication server. If the capture operation is performed on the communication server, and the communication server captures the driving parameters of the user from the video of user a, an eighth channel (video_channel) can be established between the terminal device A and the communication server.
- audio_channel the seventh channel
- video_channel an eighth channel
- the scenes may not be distinguished, but some or all of the above-mentioned channels may be established by default.
- video_channel, audio_channel and action_channel can be established in any scenario, which can reduce changes to the existing call flow.
- each of the above channels can be established in any scenario (that is, all the above channels need to be established in any division of labor), which can reduce the complexity of the solution and improve the applicability of the solution. It can be understood that the above is only an example rather than a limitation, and more types of data transmission channels can be extended according to the type of data to be transmitted.
- the data transmission channel between the communication server and the terminal device can be established according to the digital human service capabilities of the communication server and the terminal device, so as to meet the different requirements of different data for transmission during the communication process of the digital human.
- the authentication and authentication scenarios involved in this embodiment of the application include but are not limited to several scenarios shown in Table 3 below.
- FIG. 7 it is a schematic diagram of several keys (M1, M2, M3, M4, M5, M6, M7, M8) that may be involved in the above authentication scenarios.
- the purpose of each key is as follows:
- the digital human auditing device uses the private key M1 to decrypt the digital human model and user information ciphertext provided by the digital human model making device for auditing.
- the user information ciphertext includes, but is not limited to, the user's identity information, verification features, and the like, for example.
- the digital human model making device uses the public key M2 to encrypt the digital human model and user information and send it to the digital human review device.
- M3, M6 The digital human model production equipment uses the private key M3 to digitally sign the produced digital human model and user information, encrypts it with the public key M6, and sends it to the management center.
- M4, M5 The management center uses the private key M5 to decrypt the digital human model and user information ciphertext provided by the digital human model making equipment, and uses the public key M4 to verify the digital signature.
- Purpose 1 Before the user uses the digital human service, when the user goes to the digital human model making device to request to make a digital human model, collects his own verification features (fingerprint, voiceprint, face, etc.) and encrypts it with the public key M8 and sends it to Digital human model making equipment;
- Purpose 2 Before the user uses the digital human service, when the user goes to the management center to manage (including download) his own digital human model, collect his own verification features (fingerprint, voiceprint, face, etc.) and encrypt it with the public key M8 sent to the management center;
- the terminal device collects the user's verification features (fingerprint, voiceprint, face, etc.), encrypts it with the public key M8, and sends it to the network.
- verification features fingerprint, voiceprint, face, etc.
- the digital human model production equipment needs to send the ciphertext such as the identity information and verification features provided by the user to the user identity authentication device (such as the device of the household registration management center) before making the digital human model , the user identity authentication device uses the private key M7 to decrypt the verification feature and then authenticates;
- the digital human management center device Before the user uses the digital human service, the digital human management center device needs to send the ciphertext such as the identity information and verification features provided by the user to the user identity authentication device before the user uses the digital human model, and the user identity authentication device uses The private key M7 decrypts the verification feature and authenticates it;
- the network sends the ciphertext such as the identity information and verification features provided by the user to the user identity authentication device, and the user identity authentication device uses the private key M7 to decrypt the identity information, verification features, etc. post-authentication.
- the terminal device When the user uses the digital human service, the terminal device authenticates the user's legitimacy, and uses the private key M9 to add a digital signature to the authentication result, and then sends the authentication result to the network.
- the network uses the public key M10 to verify the digital signature of the authentication result sent by the terminal device. If the verification is passed, it means that the authentication result has not been tampered with.
- the embodiment of the present application provides an authentication and authentication method, including:
- A1 Digital human audit equipment creates public-private key pairs K1&K2, K3&K4;
- the digital human verification device can specifically be an application service (Application Server, AS) network element, which can be located in the mobile network that provides call services for the user's terminal equipment (such as the IMS network to which the terminal equipment belongs), or can be located in the Outside of the mobile network, this application is not limited.
- AS Application Server
- K1 is a private key
- K2 is a public key
- K3 is a private key
- K4 is a public key.
- the digital human management center device creates a public-private key pair K5&K6;
- the digital human management center device may specifically be an AS network element, and may be located in a mobile network that provides call services for terminal devices, or may be located outside the mobile network, which is not limited in this application.
- this article takes the digital human management center device located in the mobile network as an example.
- the digital human management center equipment can be integrated with the network element used to provide digital human services (the MS network element shown in Figure 1C) in one entity, or can be integrated in different entities respectively.
- the MS network element can obtain the digital human model and the information related to the digital human model (such as the identity information corresponding to the digital human model, school name) from the digital human management center equipment. test features, etc.).
- K5 is the private key
- K6 is the public key
- the digital human management center device issues a digital certificate (including the public key K6) to the digital human management center device generation device.
- the digital human model making device may specifically be an AS network element, which may be located in the mobile network that provides call services for the terminal equipment, or located outside the mobile network, which is not limited in this application.
- the digital human auditing device issues a digital certificate (including public key K2) to the digital human model making device;
- the digital human verification device issues a digital certificate (including the public key K4) to the digital human management center device;
- the user applies to the digital human model making device to make a digital human (carrying the user's identity information and verification features);
- the user's identity information is, for example, the user's name, phone number, etc.
- the user's verification feature is, for example, the user's face image, fingerprint, iris and other information.
- the digital human model production equipment verifies the identity information of the user, assuming that the verification is passed;
- the digital human model making device makes a corresponding digital human model for the user, and uses the public key K2 to encrypt the user's digital human model, identity information, and verification features;
- the digital human model making device sends the encrypted digital human model, identity information, and verification features to the digital human review device;
- the digital human model production equipment uses the private key K1 to decrypt, and obtains the decrypted digital human model, identity information, and verification features;
- the digital human model production equipment reviews the user's digital human model, identity information, and verification features. After passing the review, use the private key K3 to add a digital signature to the digital human model, identity information, and verification features;
- verifying the authenticity and legitimacy of the user's identity For example, verifying the authenticity and legitimacy of the user's identity; reviewing the legitimacy of the user's digital human model; for example, verifying whether the user's digital human model matches the user's verification features, and so on.
- the digital human review device returns a review response message to the digital human model making device, which carries the digital signature of the digital human review device;
- the digital human model production device uses the public key K6 to encrypt the digital human model, identity information, verification features, and digital signature of the digital human verification device returned by the digital human verification device;
- the digital human model production equipment sends the encrypted digital human model, identity information, verification features, and digital signature of the digital human verification equipment to the digital human management center equipment;
- the digital human management center device uses the private key K5 to decrypt, obtains the user's digital human model, identity information, verification features, digital signature of the digital human verification device; then uses the public key K4 to verify the digital signature of the digital human verification device.
- the digital human management center device verifies that the digital signature of the digital human verification device passes, it is determined that the user's digital human model, identity information, and verification features have passed the verification of the digital human verification device, are legal, and can be used by the user Initiate/receive digital human-based calls.
- the digital human management center device binds the user's digital human model, identity information, and verification features with the user's account opening information (such as phone number, etc.).
- the digital human management center device stores the user's digital human model, identity information, verification features in the user's account opening information, or the digital human management center device stores the user's digital human model, identity information, verification features and The mapping relationship of the user's account opening information (such as phone number, etc.), or the mapping relationship of the user's digital human model, identity information, verification features and the user's account opening information (such as phone number, etc.) It is stored in other network elements of the network, which is not limited in this application.
- the user's digital assets can also be superimposed on the content of the call screen, which can further improve user experience.
- the embodiment of the present application also provides an authentication authentication method, including:
- the digital asset verification device may specifically be an AS network element, which may be located in a mobile network providing call services for users, or located outside the mobile network, which is not limited in this application.
- the digital asset verification device can be integrated with the above-mentioned digital human verification device, or can be separated, which is not limited in this application.
- K7 is a private key
- K8 is a public key
- K9 is a private key
- K10 is a public key.
- the digital asset management center equipment creates a public-private key pair K11&K12;
- the digital asset management center device may specifically be an AS network element, and may be located in a mobile network that provides call services for terminal equipment, or may be located outside the mobile network, which is not limited in this application.
- this article takes the digital asset management center device located in the mobile network as an example.
- the digital asset management center equipment can be integrated with the network element used to provide digital human services (the MS network element shown in Figure 1C), or can be separated into different network elements, which is not limited in this application.
- K11 is a private key
- K12 is a public key
- the digital asset management center equipment can be integrated with the above-mentioned digital human management center equipment, or can be separated, which is not limited in this application.
- the digital asset management center device issues a digital certificate (including the public key K12) to the digital asset management center device generation device.
- the digital asset production device may specifically be an AS network element, which may be located in the mobile network that provides call services for the terminal device, or located outside the mobile network, which is not limited in this application.
- the digital asset production equipment can be integrated with the above-mentioned digital human model production equipment, or can be separated, which is not limited in this application.
- Digital asset review equipment issues digital certificates (including public key K8) to digital asset production equipment;
- the digital asset review device issues a digital certificate (including public key K10) to the digital asset management center device;
- Digital asset production equipment is used to produce digital assets, and use the public key K8 to encrypt digital assets;
- the digital asset production equipment sends the encrypted digital assets to the digital asset review equipment;
- the digital asset production equipment uses the private key K7 to decrypt and obtain the decrypted digital asset
- the digital asset production equipment reviews the digital assets, and after passing the review, uses the private key K9 to add a digital signature to the digital assets;
- the digital asset review device returns a review response message to the digital asset production device, which carries the digital signature of the review device;
- the digital asset production equipment uses the public key K12 to encrypt the digital assets and digital signatures returned by the digital asset audit equipment;
- the digital asset production equipment sends the encrypted digital assets and digital signatures to the digital asset management center equipment;
- the equipment in the digital asset management center uses the private key K11 to decrypt to obtain digital assets and digital signatures; then uses the public key K10 to verify the digital signatures.
- the digital asset management center device verifies that the digital signature of the digital asset review device passes, it is determined that the digital asset has passed the review of the digital asset review device and is legal.
- the embodiment of the present application provides an authentication and authentication method, the method includes:
- the first user is the user currently using the first terminal device, and the first terminal device can collect the verification features of the first user, such as the first user's fingerprint, voiceprint, iris or face etc.
- the scenario that triggers the first terminal device to collect the verification feature of the first user may be that the first terminal device receives a preset operation input by the user (such as receiving an operation of making or answering a call from the user), or it may be that the first The terminal device receives the relevant instruction sent by the network, which is not limited in this application.
- the verification features (fingerprint, voiceprint, iris or face, etc.) of each user's digital human model are stored in the communication server. Therefore, by comparing the verification features of the first user with those of the digital character model, and judging whether the verification features match, it is possible to determine whether the identity of the first user is consistent with the identity bound to the digital character model, that is, the first user Whether there is permission to use the digital character model.
- Allowing the first terminal device to use the digital human service for example, presenting the digital human image of the first user as the communication subject of the first user to the opposite terminal device during a call;
- the first terminal device is not allowed to use the digital human service.
- not allowing the first terminal device to use the digital human service may be: allowing the first terminal device to use the ordinary call service (that is, presenting and providing the original call screen content of the first user to the peer); or not allowing the first
- the terminal device uses any call service (including digital human call and ordinary call), which is not limited in this application.
- the method shown in FIG. 8C may be executed by the first terminal device, or may be executed by the communication server, which is not limited in this application.
- the method shown in FIG. 8C When the method shown in FIG. 8C is executed by the communication server, it may specifically be executed by a CSCF network element in the IMS network. Further, the CSCF network element may be triggered to perform the above authentication and authorization process by a network element (for example, the MS network element shown in FIG. 1C ) in the IMS network or outside the IMS network that is responsible for the digital human service.
- a network element for example, the MS network element shown in FIG. 1C
- the MS network element when it detects that the digital human service of the user with the first phone number is triggered, it sends a notification message to the CSCF network element, instructing the CSCF network element to verify the current call user of the first terminal device corresponding to the first phone number (referred to as The identity of the current user) (that is, the first user), and then the CSCF network element obtains the verification feature of the first user from the first terminal device, and obtains the digital person bound to the phone number of the first terminal device from the digital human management center device The verification feature of the model, and then execute the verification process.
- the identity of the current user that is, the first user
- the CSCF network element After the CSCF network element verifies the identity of the current user of the first terminal device (that is, the verification feature of the first user matches the verification feature of the digital character model), it notifies the MS network Yuan continues to execute relevant processes for providing digital human services to users.
- the digital human service of the user of the first phone number is triggered.
- the MS network element may receive a request from the first terminal device to download the digital character model, and the CSCF network element is verifying the identity of the current user of the first terminal device. After passing, the MS network element delivers the digital human model to the first terminal device.
- the first terminal device uses the private key of the first terminal device to add a digital signature to the verification, and then sends the digital signature
- the final verification result is uploaded to the network, and the CSCF network element in the network uses the public key of the first terminal device to verify the digital signature, and determines the identity of the current user of the first terminal device according to the verification result uploaded by the first terminal device.
- the embodiment of this application provides a digital asset purchase method, including:
- the digital asset management center creates a public-private key pair K13&K14, where K13 is the private key and K14 is the public key;
- the digital asset management center issues a digital certificate (including the public key K14) to the MS network element;
- the terminal device initiates a request to the digital asset management center to purchase digital assets (which carries the user identity information corresponding to the terminal device);
- the digital asset management center verifies the user's identity information
- the digital asset management center uses the private key K13 to add digital signatures to digital assets and user identity information;
- the digital asset management center records the digital asset identification (Identity Document, ID) and digital signature corresponding to the digital asset in the user's asset library, and combines the recorded digital asset ID with the digital signature;
- the MS network element detects that the user's digital human service is triggered
- the MS network element receives a message sent by the CSCF network element, indicating that the user requests to use the user's digital human during the call.
- the MS network element sends a request (carrying the digital asset ID) to obtain the user's digital asset to the digital asset management center;
- the digital asset management center searches for the corresponding digital asset and the digital signature associated with the asset according to the digital asset ID sent by the MS network element;
- the digital asset management center returns digital assets and digital signatures to the MS network element
- MS network element uses the public key K8 to verify the digital signature
- digital assets are superimposed on the call screen corresponding to the user, rendered, and the rendered call is sent to other users talking with the user.
- the network refers to the MS network element in the network.
- the calling terminal device is terminal device A (the calling user is user a)
- the called terminal device is terminal device B (the called user is user b)
- both user a and user b use numbers Human service as an example.
- Example 1 Capturing, reconstructing, rendering, and authentication are implemented on the communication server.
- the digital human model is stored in the communication server. What the calling and called users see is the real scene of the opposite end. Both the calling terminal device and the called terminal device support variable viewing angles (that is, the calling terminal device supports The perspective presents the call screen, and the called terminal device supports presenting the call screen according to the perspective of the called user) as an example.
- the types of data that need to be transmitted between the calling and called terminal devices and the communication server include those shown in FIG. 9A .
- the data transmission between the calling terminal device and the communication server includes:
- the calling terminal device sends calling audio to the communication server
- the calling terminal device sends the calling video (that is, the original video of the calling user, including the real image of the calling user) to the communication server;
- the calling terminal device sends the calling scene to the communication server
- the calling terminal device sends the calling angle information to the communication server;
- the communication server returns the rendering video of the calling party (including the digital human image of the calling user) to the calling terminal device;
- the communication server sends the called audio to the calling terminal device
- the communication server sends the called rendered video (including the digital human image of the called user) to the calling terminal device.
- the data transmission between the called terminal device and the communication server includes:
- the called terminal device sends the called audio to the communication server
- the called terminal device sends the called video (that is, the original video of the called user, including the real image of the called user) to the communication server;
- the called terminal device sends the called scene to the communication server;
- the called terminal device sends the called perspective information to the communication server;
- the communication server returns the called rendered video to the called terminal device
- the communication server sends the calling audio to the called terminal device
- the communication server sends the caller rendered video (including the digital human image of the caller and the real scene of the caller) to the called terminal device.
- the original video of the calling user (including the real image of the calling user and the video of the real scene) is described as "calling video", and the calling user's video obtained through the digital human processing operation
- the video (including the digital human image and/or virtual scene picture of the calling user) is described as "calling rendered video”
- the original video of the called user (including the real image and real scene of the called user) is described as "called "Video” description
- the called user's video (including the called user's digital human image and/or virtual scene picture) obtained through the digital human processing operation is described as "called rendered video”.
- FIG. 9B to FIG. 9C the communication flow between the calling user, the called user, and the communication server is shown, and the flow description is as follows:
- the calling terminal device sends a message A (the message type is a SIP message) to the communication server, and the message A carries the following parameters: the requirements of the calling user, the negotiation content between the calling terminal device and the called terminal device, the week Negotiation content between the calling terminal device and the communication server;
- the communication server performs consistent authentication of the calling party number, that is, verifies whether the current user of the calling terminal device is the account holder of the telephone number corresponding to the calling terminal device, and executes the next step S903 after the verification is passed;
- the communication server sends a message B (the message type is a SIP message) to the called terminal device, which carries the following parameters: the requirements of the calling user, the negotiation content between the calling terminal device and the called terminal device, the communication server and the called terminal device.
- a message B the message type is a SIP message
- the called terminal device determines the negotiation result of the called terminal device in combination with the called user's demand, the call additional service capability information of the called terminal device, and the parameters carried in the message B;
- the called terminal device sends a message C (the message type is a SIP message) to the communication server, which carries the following parameters: the called terminal device negotiation result;
- the communication server performs consistent authentication of the called party number, that is, verifies whether the current user of the called terminal device is the account holder of the telephone number corresponding to the called terminal device; the outcome of the negotiation;
- the communication server sends a message D (the message type is a SIP message) to the calling terminal device, which carries the following parameters: the negotiation result of the called terminal device, the negotiation result between the communication server and the calling terminal device;
- the calling terminal device confirms the negotiation result of the calling terminal device according to the negotiation result of the called terminal device and the negotiation result between the communication server and the calling terminal device;
- the calling terminal device sends a message E (the message type is a SIP message) to the communication server, which carries the following parameters: the negotiation result of the calling terminal device;
- the communication server sends a message F (the message type is a SIP message) to the called terminal device, which carries the following parameter: the negotiation result of the calling terminal device.
- a message F the message type is a SIP message
- the called terminal device sends an "OK” message to the communication server, and the communication server sends an "OK” message to the calling terminal device, where the "OK" message is used to indicate that the parameter interaction is completed;
- the calling terminal device establishes a transmission channel with the communication server; Step S913.
- the called terminal device establishes a transmission channel with the communication server;
- the types of data transmission channels that need to be created include: audio_channel, video_channel, backhaul_channel, viewpoint_channel.
- the calling terminal device sends calling audio to the communication server based on the audio_channel;
- the calling terminal device sends the calling video to the communication server based on the video_channel;
- the calling terminal device sends the calling perspective information to the communication server based on the viewpoint_channel;
- the communication server captures driving parameters (such as expressions, actions, etc.) and scene information from the video, and then reconstructs the digital human of the calling user based on information such as voices and expressions;
- the communication server sends the calling party rendering video to the calling terminal device based on the backhaul_channel;
- the communication server sends the calling audio to the called terminal device based on the audio_channel;
- the communication server sends the calling party rendering video to the called terminal device based on the video_channel;
- the called terminal device can play the calling screen content of the calling party. For example, after the called terminal device receives the caller's rendering video, it can load it on the display screen for display, and can also play the caller's audio.
- the calling terminal device After the calling terminal device receives the calling rendering video, it can also load it on the display screen for display, and at the same time, it can also play the calling audio, so that the calling user can review the content of the call screen at the local end.
- the called terminal sends the called audio to the communication server based on the audio_channel;
- the called terminal device sends the called video to the communication server based on the video_channel;
- the called terminal device sends the called perspective information to the communication server based on the viewpoint_channel;
- the communication server captures driving parameters and scene information from the video, and then rebuilds the digital human of the called user based on the driving parameters;
- the communication server sends the called rendered video to the called terminal device based on the backhaul_channel;
- the communication server sends the called audio to the calling terminal device based on the audio_channel;
- the communication server sends the called rendered video to the calling terminal device based on the video_channel.
- the calling terminal device can play the call screen content of the called party. For example, after receiving the called rendering video of the called user, the calling terminal device can load it on the display screen for display, and can also play the called audio at the same time.
- the called terminal device After receiving the called rendered video, the called terminal device can also load it on the display screen for display, and can also play the called audio at the same time, so that the called user can view the content of the called user's call screen.
- the terminal device can express the user's call needs to the communication server and the peer terminal device, realizing the provision of the call screen content of the local end to the peer end according to the user's demand and presenting the content provided by the peer end on the local end according to the user's demand.
- the technical effect of the content of the call screen that is, both the calling and called users communicate in the image of a digital human, and both the calling and called users see the digital human at the opposite end in the real scene of the opposite end; in addition, the communication server and terminal equipment attach
- the service capability information clarifies the division of labor between the terminal device and the communication server (that is, what operations each device needs to perform) in the process of providing digital human services to users, so that capture, reconstruction, rendering, authentication (such as the same human ID) Authentication) are all implemented in the communication server, and the data transmission channel required by the digital human service is established according to the division of labor; and, through authentication and authentication, it is guaranteed that the digital human service is used safely and legally in the call scene.
- Example 2 Capturing is implemented on the terminal device, and authentication, authentication, reconstruction, and rendering are implemented on the communication server.
- the digital human model is stored on the communication server, captured on the terminal device, and reconstructed and rendered on the communication server. What the calling and called users see is the real scene of the opposite end, and neither the calling terminal nor the called terminal support it. Variable viewing angles, for example.
- the types of data that need to be transmitted between the calling and called users and the communication server include those shown in FIG. 10A .
- the data transmission between the calling terminal device and the communication server includes:
- the calling terminal device sends the calling driver parameters to the communication server;
- the calling terminal device sends calling audio to the communication server
- the calling terminal device sends the calling video to the communication server
- the calling terminal device sends the calling scene to the communication server
- the communication server returns the rendering video of the calling party to the calling terminal device
- the communication server sends the called audio to the calling terminal device
- the communication server sends the called rendered video to the calling terminal device.
- the data transmission between the called terminal device and the communication server includes:
- the called terminal device sends the called drive parameter to the communication server
- the called terminal device sends the called audio to the communication server
- the called terminal device sends the called video to the communication server;
- the called terminal device sends the called scene to the communication server;
- the communication server returns the called rendered video to the called terminal device
- the communication server sends the calling audio to the called terminal device
- the communication server sends the calling rendering video to the called terminal device.
- the calling terminal device sends a message A to the communication server.
- the message A carries the following parameters: the requirements of the calling user, the content of the negotiation between the calling terminal device and the called terminal device, and the communication between the calling terminal device and the communication server. content of negotiations between
- the communication server performs consistent authentication of the calling party number, that is, verifies whether the current user of the calling terminal device is the account holder of the telephone number corresponding to the calling terminal device, and executes the next step S1003 after the verification is passed;
- the communication server sends a message B to the called terminal device, which carries the following parameters: the requirements of the calling user, the negotiation content between the calling terminal device and the called terminal device, and the negotiation between the communication server and the called terminal device content;
- the called terminal device determines the negotiation result of the called terminal device in combination with the called user's demand, the call additional service capability information of the called terminal device, and the parameters carried in the message B;
- the called terminal device sends a message C to the communication server, which carries the following parameters: the called terminal device negotiation result;
- the communication server performs consistent authentication of the called party number, that is, verifies whether the current user of the called terminal device is the account holder of the telephone number corresponding to the called terminal device; the outcome of the negotiation;
- the communication server sends a message D to the calling terminal device, which carries the following parameters: the negotiation result of the called terminal device, the negotiation result between the communication server and the calling terminal device;
- the calling terminal device confirms the negotiation result of the calling terminal device according to the negotiation result of the called terminal device and the negotiation result between the communication server and the calling terminal device;
- the calling terminal device sends a message E to the communication server, which carries the following parameters: the negotiation result of the calling terminal device;
- the communication server sends a message F to the called terminal device, which carries the following parameters: the negotiation result of the calling terminal device.
- the called terminal device sends an "OK” message to the communication server, and the communication server sends an "OK” message to the calling terminal device, where the "OK" message is used to indicate that the parameter interaction is completed;
- the calling terminal device establishes a transmission channel with the communication server; Step S1013.
- the called terminal device establishes a transmission channel with the communication server;
- the types of data transmission channels that need to be created include: audio_channel, video_channel, backhaul_channel, action_channel.
- the calling terminal device sends calling audio to the communication server based on the audio_channel;
- the calling terminal device sends the calling video to the communication server based on the video_channel;
- the calling terminal device captures the calling driver parameters
- the calling terminal device sends the calling driver parameters to the communication server based on the action_channel;
- the communication server reconstructs the calling digital human based on the calling driver parameters
- the communication server renders the reconstructed scene of the calling digital person and the calling user to obtain a rendered image
- the communication server encodes the rendering image to obtain the rendering video of the calling party
- the communication server sends the calling party rendering video to the calling terminal device based on the backhaul_channel;
- the communication server sends the calling audio to the called terminal device based on the audio_channel;
- the communication server sends the calling party rendering video to the called terminal device based on the video_channel;
- S1019.1, S1019.2, and S1019.3 do not distinguish the sequence.
- the called terminal device After the called terminal device receives the caller's rendering video, it can load it on the display screen for display, and can also play the caller's audio.
- the calling terminal device After the calling terminal device receives the calling rendering video, it can also load it on the display screen for display, and at the same time play the calling audio, so that the calling user can look back at the content of the call screen at the local end.
- the called terminal sends the called audio to the communication server based on the audio_channel;
- the called terminal device sends the called video to the communication server based on the video_channel;
- the called terminal device captures the called drive parameters
- the called terminal device sends the called drive parameter to the communication server
- the communication server reconstructs the digital human of the called user based on the called drive parameter
- the communication server renders the reconstructed digital human of the called user and the scene of the called user to obtain a rendering image
- the communication server encodes the rendered image to obtain the called rendered video
- the communication server sends the called rendering video to the called terminal device based on the backhaul_channel;
- the communication server sends the called audio to the calling terminal device based on the audio_channel;
- the communication server sends the called rendered video to the calling terminal device based on the video_channel.
- the calling terminal device After the calling terminal device receives the called rendered video of the called user, it can load it on the display screen for display, and can also play the called audio.
- the called terminal device After receiving the called rendered video, the called terminal device can also load it on the display screen for display, and can also play the called audio at the same time, so that the called user can view the content of the called user's call screen.
- the terminal device can express the user's call needs to the communication server and the peer terminal device, realizing the provision of the call screen content of the local end to the peer end according to the user's demand and presenting the content provided by the peer end on the local end according to the user's demand.
- the technical effect of the content of the call screen that is, both the calling and called users communicate in the image of a digital human, and both the calling and called users see the digital human at the opposite end in the real scene of the opposite end; in addition, the communication server and terminal equipment attach
- the service capability information clarifies the division of labor between the terminal device and the communication server (that is, what operations each device needs to perform) in the process of providing digital human services to users, so that the capture is realized on the terminal device, while reconstruction, rendering, authentication and authentication (such as the authentication of the identity of the person number) is implemented in the communication server, and the data transmission channel required for the digital human service is established according to the division of labor; and, through authentication and authentication, it is guaranteed that the digital human service is used safely and legally in the call scene.
- Example 3 Reconstruction is implemented on the terminal device, and authentication, authentication, capture, and rendering are implemented on the communication server.
- the digital human model is stored on the communication server, the capture and rendering are implemented on the communication server, and the reconstruction is implemented on the terminal device.
- What the calling and called users see is the real scene of the opposite end, and neither the calling terminal device nor the called terminal device supports it. Variable viewing angles, for example.
- the types of data to be transmitted between the calling and called users and the communication server include those shown in FIG. 11A .
- the data transmission between the calling terminal device and the communication server includes:
- the calling terminal device sends calling audio to the communication server
- the calling terminal device sends the calling video to the communication server
- the calling terminal device sends the calling scene to the communication server
- the communication server returns the rendering video of the calling party to the calling terminal device
- the communication server sends the calling digital human model to the calling terminal device
- the communication server sends the calling driver parameters to the calling terminal device
- the calling terminal device sends the reconstructed calling digital human (that is, the dynamic data of the calling digital human) to the communication server;
- the communication server sends the called audio to the calling terminal device
- the communication server sends the called rendered video to the calling terminal device.
- the data transmission between the called terminal device and the communication server includes:
- the called terminal device sends the called audio to the communication server
- the called terminal device sends the called video to the communication server;
- the called terminal device sends the called scene to the communication server;
- the communication server returns the called rendered video to the called terminal device
- the communication server sends the called digital human model to the called terminal device
- the communication server sends the called drive parameter to the called terminal device
- the called terminal device sends the called digital person (that is, the dynamic data of the called digital person) reconstructed by the called party to the communication server;
- the communication server sends the calling audio to the called terminal device
- the communication server sends the calling rendering video to the called terminal device.
- Fig. 11B to Fig. 11C the communication flow between the calling user, the called user, and the communication server is shown, and the flow description is as follows:
- the calling terminal device sends a message A to the communication server.
- the message A carries the following parameters: the requirements of the calling user, the content of the negotiation between the calling terminal device and the called terminal device, and the communication between the calling terminal device and the communication server. content of negotiations between
- the communication server performs consistent authentication of the calling party number, that is, verifies whether the current user of the calling terminal device is the account holder of the telephone number corresponding to the calling terminal device, and executes the next step S1103 after the verification is passed;
- the communication server sends a message B to the called terminal device, which carries the following parameters: the calling user's demand, the negotiation content between the calling terminal device and the called terminal device, and the negotiation between the communication server and the called terminal device content;
- the called terminal device determines the negotiation result of the called terminal device in combination with the called user's demand, the call additional service capability information of the called terminal device, and the parameters carried in the message B;
- the called terminal device sends a message C to the communication server, which carries the following parameters: the called terminal device negotiation result;
- the communication server performs consistent authentication of the called party number, that is, verifies whether the current user of the called terminal device is the account holder of the telephone number corresponding to the called terminal device; the outcome of the negotiation;
- the communication server sends a message D to the calling terminal device, which carries the following parameters: the negotiation result of the called terminal device, the negotiation result between the communication server and the calling terminal device;
- the calling terminal device confirms the negotiation result of the calling terminal device according to the negotiation result of the called terminal device and the negotiation result between the communication server and the calling terminal device;
- the calling terminal device sends a message E to the communication server, which carries the following parameters: the negotiation result of the calling terminal device;
- the communication server sends a message F to the called terminal device, which carries the following parameters: the negotiation result of the calling terminal device.
- the called terminal device sends an "OK” message to the communication server, and the communication server sends an "OK” message to the calling terminal device, where the "OK" message is used to indicate that the parameter interaction is completed;
- step S1112 the calling terminal device establishes a transmission channel with the communication server;
- step S1113 the called terminal device establishes a transmission channel with the communication server;
- the types of data transmission channels that need to be created include: audio_channel, video_channel, backhaul_channel, action_channel, digitalman_channel, digitalman_model_channel.
- the calling terminal device downloads the calling digital human model from the communication server based on the digitalman_model_channel;
- the called terminal device downloads the called digital human model from the communication server based on the digitalman_model_channel;
- the calling terminal device sends calling audio to the communication server based on the audio_channel;
- the calling terminal device sends the calling video to the communication server based on the video_channel;
- the communication server captures the calling driver parameters
- the communication server sends the calling driver parameters to the calling terminal device based on the action_channel;
- the calling terminal device reconstructs the calling digital human based on the calling driving parameters
- the calling terminal device sends the reconstructed calling digital man to the communication server based on digitalman__channel;
- the communication server performs a consistency check on the calling digital person
- the communication server After the consistency check of the calling digital person is passed, the communication server renders the scene of the reconstructed calling digital person and the calling user to obtain a rendered picture; encodes the rendered picture to obtain a rendered video of the calling party;
- the communication server sends the calling party rendering video to the calling terminal device based on the backhaul_channel;
- the communication server sends the calling audio to the called terminal device based on the audio_channel;
- the communication server sends the calling party rendering video to the called terminal device based on the video_channel;
- the called terminal device After the called terminal device receives the caller rendered video, it can load it on the display screen for display, and can also play the caller's audio.
- the calling terminal device After the calling terminal device receives the calling rendering video, it can also load it on the display screen for display, and at the same time play the calling audio, so that the calling user can look back at the content of the call screen at the local end.
- the called terminal sends the called audio to the communication server based on the audio_channel;
- the called terminal device sends the called video to the communication server based on the video_channel;
- the communication server captures the called drive parameter
- the communication server sends the called drive parameter to the called terminal device based on the action_channel;
- the called terminal device reconstructs the called digital human based on the called driving parameters
- the called terminal device sends the reconstructed called digital man to the communication server based on the digitalman__channel;
- the communication server performs a consistency check on the called digital person
- the communication server After the called digital person passes the consistency check, the communication server renders the reconstructed called digital person and the called user scene to obtain a rendered image; encodes the rendered image to obtain the called rendered video;
- the communication server sends the called rendering video to the called terminal device based on the backhaul_channel;
- the communication server sends the called audio to the calling terminal device based on the audio_channel;
- the communication server sends the called rendered video to the calling terminal device based on the video_channel.
- the calling terminal device After receiving the rendered video from the called party, the calling terminal device can load it on the display screen for display, and can also play the called party's audio.
- the called terminal device After the called terminal device receives the called rendered video, it can also load it on the display screen for display, and at the same time, it can also play the called audio, so that the called user can review the content of the call screen at the local end.
- the terminal device can express the user's call needs to the communication server and the peer terminal device, realizing the provision of the call screen content of the local end to the peer end according to the user's demand and presenting the content provided by the peer end on the local end according to the user's demand.
- the technical effect of the content of the call screen that is, both the calling and called users communicate in the image of a digital human, and both the calling and called users see the digital human at the opposite end in the real scene of the opposite end; in addition, the communication server and terminal equipment attach
- the service capability information clarifies the division of labor between the terminal device and the communication server (that is, what operations each device needs to perform) in the process of providing digital human services to users, so that the reconstruction can be realized on the terminal device, while capturing, rendering, and authentication (such as the authentication of the identity of the person number) is implemented in the communication server, and the data transmission channel required for the digital human service is established according to the division of labor; and, through authentication and authentication, it is guaranteed that the digital human service is used safely and legally in the call scene.
- Example 4 Capturing and rebuilding are implemented on the terminal device, and authentication and rendering are implemented on the communication server.
- the digital human model is stored on the communication server, rendering is implemented on the communication server, and capture and reconstruction are implemented on the terminal device. What the calling and called users see is the real scene of the opposite end, and does not support variable viewing angles as an example.
- the types of data that need to be transmitted between the calling and called users and the communication server include those shown in FIG. 12A .
- the data transmission between the calling terminal device and the communication server includes:
- the calling terminal device sends calling audio to the communication server
- the calling terminal device sends the calling video to the communication server
- the calling terminal device sends the calling scene to the communication server
- the calling terminal device sends the calling angle information to the communication server;
- the communication server returns the rendering video of the calling party to the calling terminal device
- the communication server sends the calling digital human model to the calling terminal device
- the calling terminal device sends the reconstructed calling digital person to the communication server;
- the communication server sends the called audio to the calling terminal device
- the communication server sends the called rendered video to the calling terminal device.
- the data transmission between the called terminal device and the communication server includes:
- the called terminal device sends the called audio to the communication server
- the called terminal device sends the called video to the communication server;
- the called terminal device sends the called scene to the communication server;
- the called terminal device sends the called perspective information to the communication server;
- the communication server returns the called rendered video to the called terminal device
- the communication server sends the called digital human model to the called terminal device
- the called terminal device sends the called digital person reconstructed by the called party to the communication server;
- the communication server sends the calling audio to the called terminal device
- the communication server sends the calling rendering video to the called terminal device.
- the calling terminal device sends a message A to the communication server.
- the message A carries the following parameters: the requirements of the calling user, the negotiation content between the calling terminal device and the called terminal device, content of negotiations between
- the communication server performs consistent authentication of the calling party number, that is, verifies whether the current user of the calling terminal device is the account holder of the telephone number corresponding to the calling terminal device, and executes the next step S1203 after the verification is passed;
- the communication server sends a message B to the called terminal device, which carries the following parameters: the requirements of the calling user, the negotiation content between the calling terminal device and the called terminal device, and the negotiation between the communication server and the called terminal device content;
- the called terminal device determines the negotiation result of the called terminal device in combination with the called user's demand, the call additional service capability information of the called terminal device, and the parameters carried in the message B;
- the called terminal device sends a message C to the communication server, which carries the following parameters: the called terminal device negotiation result;
- the communication server performs consistent authentication of the called party number, that is, verifies whether the current user of the called terminal device is the account holder of the telephone number corresponding to the called terminal device; the outcome of the negotiation;
- the communication server sends a message D to the calling terminal device, which carries the following parameters: the negotiation result of the called terminal device, the negotiation result between the communication server and the calling terminal device;
- the calling terminal device confirms the negotiation result of the calling terminal device according to the negotiation result of the called terminal device and the negotiation result between the communication server and the calling terminal device;
- the calling terminal device sends a message E to the communication server, which carries the following parameters: the negotiation result of the calling terminal device;
- the communication server sends a message F to the called terminal device, which carries the following parameters: the negotiation result of the calling terminal device.
- the called terminal device sends an "OK” message to the communication server, and the communication server sends an "OK” message to the calling terminal device, where the "OK" message is used to indicate that the parameter interaction is completed;
- the calling terminal device establishes a transmission channel with the communication server; Step S1213.
- the called terminal device establishes a transmission channel with the communication server;
- the types of data transmission channels that need to be created include: audio_channel, video_channel, backhaul_channel, digitalman_channel, digitalman_model_channel.
- the calling terminal device downloads the calling digital human model from the communication server based on digitalman_model_channel;
- the called terminal device downloads the called digital human model from the communication server based on the digitalman_model_channel;
- the calling terminal device captures the calling driver parameters
- the calling terminal device reconstructs the calling digital human based on the calling driving parameters
- the calling terminal device sends the reconstructed calling digital man to the communication server based on the digitalman_channel;
- the calling terminal device sends the calling audio to the communication server based on the audio_channel;
- the calling terminal device sends the calling video to the communication server based on the video_channel;
- the communication server performs a consistency check on the calling digital person
- the communication server renders the scene of the reconstructed calling digital person and the calling user to obtain a rendered picture; encodes the rendered picture to obtain a rendered video of the calling party;
- the communication server sends the calling party rendering video to the calling terminal device based on the backhaul_channel;
- the communication server sends the calling audio to the called terminal device based on the audio_channel;
- the communication server sends the calling party rendering video to the called terminal device based on the video_channel;
- the called terminal device After the called terminal device receives the caller's rendering video, it can load it on the display screen for display, and can also play the caller's audio.
- the calling terminal device After the calling terminal device receives the calling rendering video, it can also load it on the display screen for display, and at the same time play the calling audio, so that the calling user can look back at the content of the call screen at the local end.
- the called terminal device captures the called drive parameter
- the called terminal device reconstructs the called digital human based on the called driving parameters
- the called terminal device sends the reconstructed called digital man to the communication server based on the digitalman_channel;
- the called terminal device sends the called audio to the communication server based on the audio_channel;
- the called terminal device sends the called video to the communication server based on the video_channel;
- the communication server performs a consistency check on the called digital person
- the communication server After the called digital person passes the consistency check, the communication server renders the reconstructed called digital person and the called user scene to obtain a rendered image; encodes the rendered image to obtain the called rendered video;
- the communication server sends the called rendered video to the called terminal device based on the backhaul_channel;
- the communication server sends the called audio to the calling terminal device based on the audio_channel;
- the communication server sends the called rendered video to the calling terminal device based on the video_channel.
- the calling terminal device After receiving the rendered video from the called party, the calling terminal device can load it on the display screen for display, and can also play the called party's audio.
- the called terminal device After the called terminal device receives the called rendered video, it can also load it on the display screen for display, and at the same time, it can also play the called audio, so that the called user can review the content of the call screen at the local end.
- the terminal device can express the user's call needs to the communication server and the peer terminal device, realizing the provision of the call screen content of the local end to the peer end according to the user's demand and presenting the content provided by the peer end on the local end according to the user's demand.
- the communication server and terminal equipment attach Service capability information clarifies the division of labor between the terminal device and the communication server (that is, what operations each device needs to perform) in the process of providing digital human services to users, so that capture and reconstruction can be realized on the terminal device, while rendering, authentication and authentication (such as the authentication of the identity of the person number) is implemented in the communication server, and the data transmission channel required for the digital human service is established according to the division of labor; and, through authentication and authentication, it is guaranteed that the digital human service is used safely and legally in the call scene.
- Example 5 capture, reconstruction, and rendering are implemented on the terminal device, and authentication and authentication are implemented on the communication server.
- the digital human model is stored in the communication server, and the capture, reconstruction, and rendering are all implemented on the terminal device. What the calling and called users see is the real scene of the opposite end.
- the calling terminal device and the called terminal device support variable viewing angles as an example. .
- the terminal device can render the content displayed on the local end (client rendering mode), and can also render the content displayed on the peer end (server rendering mode).
- client rendering mode of the called terminal device means that the client side of the called user renders the content seen by the called user, such as the digital person of the calling user in the real/virtual scene of the called user;
- server rendering mode of the called terminal device means The called user client renders what the calling user sees, such as the called user’s digital human being in the called user’s real/virtual scene;
- the calling terminal device client rendering mode means that the calling user client renders what the calling user sees For example, the called user's digital human is in the calling user's real/virtual scene;
- the calling terminal device server rendering mode means that the calling user client renders the content seen by the called user, such as the calling user's digital human in the main Call the user in the real/virtual scene.
- the mode of the calling and called terminal equipment can be determined according
- the data transmission between the calling terminal device and the communication server may include:
- the calling terminal device sends calling audio to the communication server
- the calling terminal device sends the calling video to the communication server
- the communication server sends the calling scene to the calling terminal device
- the calling terminal device sends the calling driver parameters to the communication server;
- the communication server sends the called digital human model to the calling terminal device
- the communication server sends the called drive parameter to the calling terminal device
- the communication server sends the called audio to the calling terminal device
- the communication server sends the called video to the calling terminal device
- the communication server sends the called scene to the calling terminal device
- the calling terminal device sends the called rendering video to the communication server;
- the communication server sends the calling rendering video to the calling terminal device.
- the data transmission between the called terminal device and the communication server includes:
- the called terminal device sends the called audio to the communication server
- the called terminal device sends the called video to the communication server;
- the communication server sends the called scene to the called terminal device
- the called terminal device sends the called drive parameter to the communication server
- the communication server sends the calling digital human model to the called terminal device
- the communication server sends the calling driver parameters to the called terminal device
- the communication server sends the calling audio to the called terminal device
- the communication server sends the calling video to the called terminal device
- the communication server sends the calling scene to the called terminal device
- the called terminal device sends the calling rendering video to the communication server
- the communication server sends the called rendered video to the called terminal device.
- the calling terminal device and the communication server will display the digital human model of the calling user, the calling user
- the user digital human driving parameters are transmitted to the called terminal device; the called terminal device reconstructs the calling user digital human according to the calling user digital human model and the calling user digital human driving parameters; the called terminal device reconstructs the calling user digital human according to the calling user digital human and the called user digital human.
- the calling terminal device and the communication server will combine the digital human model of the calling user + the calling user
- the driving parameters of the user's digital human are transmitted to the called terminal device; the calling terminal device and the communication server transmit the real/virtual scene and perspective of the calling user to the called terminal device; the called terminal device communicates with the calling user's digital human model
- the digital human of the calling user is driven by parameters to reconstruct the digital human of the calling user; the called terminal device renders and displays it according to the digital human of the calling user and the real/virtual scene and perspective of the calling user.
- the calling and called terminal devices can also establish two rendering stream channels with the communication server, one for uploading the rendering results of the other party at the local end, and one for receiving the rendering results of the local end. Rendering results in the other side.
- the calling terminal device receives the calling user scene from the communication server only when the calling user virtual scene is stored in the communication server; the called terminal device receives the called user scene from the communication server because the called user virtual scene is stored in the communication server. Only required in the server.
- both the calling and called terminal devices are in server-side rendering mode as an example, see FIG. 13B , which shows an example of data types that need to be transmitted between the calling and called users and the communication server.
- the data transmission between the calling terminal device and the communication server may include:
- the calling terminal device sends calling audio to the communication server
- the communication server sends the calling scene to the calling terminal device
- the communication server sends the called digital human model to the calling terminal device
- the communication server sends the called audio to the calling terminal device
- the communication server sends the called rendered video to the calling terminal device
- the communication server sends the called scene to the calling terminal device
- the communication server sends the called perspective information to the calling terminal device
- the calling terminal device sends the calling rendering video to the communication server.
- the data transmission between the called terminal device and the communication server may include:
- the called terminal device sends the called audio to the communication server
- the communication server sends the called scene to the called terminal device
- the communication server sends the calling digital human model to the called terminal device
- the communication server sends the calling audio to the called terminal device
- the communication server sends the calling rendering video to the called terminal device
- the communication server sends the calling scene to the called terminal device
- the communication server sends the calling angle information to the called terminal device
- the called terminal device sends the called rendered video to the communication server.
- the called terminal device and the communication server will display the real/virtual scene and perspective of the called user Transmission to the calling terminal device; the calling terminal device reconstructs the calling user's digital human according to the calling user's digital human model and the calling user's digital human driving parameters; the calling terminal device reconstructs the calling user's digital human according to the calling user's digital human and the called user's real/virtual Scenes and perspectives are rendered; the calling terminal device encodes and transmits the rendering result to the called terminal device; the called terminal device decodes and displays the rendering result.
- the called terminal device and the communication server transmit the called user’s perspective to the calling terminal equipment; the calling terminal device reconstructs the calling user digital human according to the calling user digital human model and the calling user digital human driving parameters; the calling terminal device reconstructs the calling user digital human according to the calling user digital human and the real/virtual scene and perspective Rendering; the calling terminal device encodes and transmits the rendering result to the called terminal device; the called terminal device decodes and displays the rendering result.
- the rendering modes of the calling and called terminals are inconsistent, for example, the calling terminal is in server-side rendering mode, the called terminal is in client-side rendering mode, or the called terminal is in server-side rendering mode, and the calling terminal is in Client rendering mode.
- the called user sees the calling user's digital human in the calling user's virtual scene
- the calling user sees the called user's digital human in the calling user's real scene
- the rendered videos of the calling user and the called user are both on the calling terminal device processing, which requires high performance of the calling terminal equipment. The specific process and channels will not be repeated here.
- the calling terminal device sends a message A to the communication server.
- the message A carries the following parameters: the requirements of the calling user, the negotiation content between the calling terminal device and the called terminal device, content of negotiations between
- the communication server performs consistent authentication of the calling party number, that is, verifies whether the current user of the calling terminal device is the account holder of the telephone number corresponding to the calling terminal device, and executes the next step S1303 after the verification is passed;
- the communication server sends a message B to the called terminal device, which carries the following parameters: the requirements of the calling user, the negotiation content between the calling terminal device and the called terminal device, and the negotiation between the communication server and the called terminal device content;
- the called terminal device determines the negotiation result of the called terminal device in combination with the called user's demand, the call additional service capability information of the called terminal device, and the parameters carried in the message B;
- the called terminal device sends a message C to the communication server, which carries the following parameters: the called terminal device negotiation result;
- the communication server performs consistent authentication of the called party number, that is, verifies whether the current user of the called terminal device is the account holder of the telephone number corresponding to the called terminal device; the outcome of the negotiation;
- the communication server sends a message D to the calling terminal device, which carries the following parameters: the negotiation result of the called terminal device, the negotiation result between the communication server and the calling terminal device;
- the calling terminal device confirms the negotiation result of the calling terminal device according to the negotiation result of the called terminal device and the negotiation result between the communication server and the calling terminal device;
- the calling terminal device sends a message E to the communication server, which carries the following parameters: the negotiation result of the calling terminal device;
- the communication server sends a message F to the called terminal device, which carries the following parameters: the negotiation result of the calling terminal device.
- the called terminal device sends an "OK” message to the communication server, and the communication server sends an "OK” message to the calling terminal device, where the "OK" message is used to indicate that the parameter interaction is completed;
- the calling terminal device establishes a transmission channel with the communication server; Step S1313.
- the called terminal device establishes a transmission channel with the communication server;
- the types of data transmission channels that need to be created include: audio_channel, video_channel, digitalman_model_channel, viewpoint_channel.
- the calling terminal device downloads the calling digital human model from the communication server based on the digitalman_model_channel;
- the called terminal device downloads the called digital human model from the communication server based on the digitalman_model_channel;
- the calling terminal device captures the calling driver parameters
- the calling terminal device reconstructs the calling digital human based on the calling driving parameters
- the calling terminal device renders the scene of the reconstructed calling digital person and calling user to obtain a rendered image; encodes the rendered image to obtain a rendered video of the calling party;
- the calling terminal device sends the calling audio to the communication server based on the audio_channel;
- the calling terminal device sends the calling rendering video to the communication server based on the video_channel;
- the calling terminal device sends the calling perspective information to the communication server based on the viewpoint_channel;
- the communication server performs a consistency check on the calling digital person, assuming that the consistency check passes;
- the communication server sends the calling audio to the called terminal device based on the audio_channel;
- the communication server sends the calling party rendering video to the called terminal device based on the video_channel;
- the communication server sends the caller's perspective information to the called terminal device based on the viewpoint_channel;
- the called terminal device After the called terminal device receives the caller's rendering video, it can load it on the display screen for display, and can also play the caller's audio.
- the calling terminal device After the calling terminal device receives the calling rendering video, it can also load it on the display screen for display, and at the same time play the calling audio, so that the calling user can look back at the content of the call screen at the local end.
- the called terminal device captures the called drive parameter
- the called terminal device reconstructs the called digital human based on the called driving parameters
- the called terminal device renders the reconstructed scene of the called digital person and the called user to obtain a rendered image; encodes the rendered image to obtain the called rendered video;
- the called terminal device sends the called audio to the communication server based on the audio_channel;
- the called terminal device sends the called rendered video to the communication server based on the video_channel;
- the called terminal device sends the called perspective information to the communication server based on the viewpoint_channel;
- the communication server performs a consistency check on the called digital person, assuming that the consistency check passes;
- the communication server sends the called audio to the calling terminal device based on the audio_channel;
- the communication server sends the called rendered video to the calling terminal device based on the video_channel;
- the communication server sends the called perspective information to the calling terminal device based on the viewpoint_channel.
- the calling terminal device After receiving the rendered video from the called party, the calling terminal device can load it on the display screen for display, and can also play the called party's audio.
- the called terminal device After the called terminal device receives the called rendered video, it can also load it on the display screen for display, and at the same time, it can also play the called audio, so that the called user can review the content of the call screen at the local end.
- the terminal device can express the user's call needs to the communication server and the peer terminal device, realizing the provision of the call screen content of the local end to the peer end according to the user's demand and presenting the content provided by the peer end on the local end according to the user's demand.
- the communication server and terminal equipment attach Service capability information clarifies the division of labor between the terminal device and the communication server (that is, what operations each device needs to perform) in the process of providing digital human services to users, so that capture, reconstruction, and rendering are all implemented on the terminal device, while authentication
- the rights (such as the consistent authentication of the personal number) are implemented in the communication server, and the data transmission channels required for the digital human service are established according to the division of labor; and, through authentication and authentication, the digital human service is guaranteed to be used safely and legally in the call scene .
- Example 6 Reconstruction and rendering are implemented on the terminal device, and authentication and capture are implemented on the communication server.
- the digital human model is stored in the communication server, the capture is realized in the communication server, and the reconstruction and rendering are realized in the terminal device.
- What the calling and called users see is the real scene of the opposite end, and neither the calling terminal device nor the called terminal device supports it. Variable viewing angles, for example.
- the calling and called terminal devices no longer need to report the driving parameters of the calling user/called user, but the communication server according to the audio of the calling and called users and/or video capture, generate drive parameters, and send them to the calling and called terminal devices, and there is no difference in the rest.
- the types of data that need to be transmitted between the calling and called users and the communication server include those shown in FIG. 14A .
- the data transmission between the calling terminal device and the communication server may include:
- the calling terminal device sends calling audio to the communication server
- the calling terminal device sends the calling video to the communication server
- the communication server sends the calling scene to the calling terminal device
- the communication server sends the called digital human model to the calling terminal device
- the communication server sends the called drive parameter to the calling terminal device
- the communication server sends the called audio to the calling terminal device
- the communication server sends the called video to the calling terminal device
- the communication server sends the called scene to the calling terminal device
- the calling terminal device sends the called rendering video to the communication server;
- the communication server sends the calling rendering video to the calling terminal device.
- the data transmission between the called terminal device and the communication server may include:
- the called terminal device sends the called audio to the communication server
- the called terminal device sends the called video to the communication server;
- the communication server sends the called scene to the called terminal device
- the communication server sends the calling digital human model to the called terminal device
- the communication server sends the calling driver parameters to the called terminal device
- the communication server sends the calling audio to the called terminal device
- the communication server sends the calling video to the called terminal device
- the communication server sends the calling scene to the called terminal device
- the called terminal device sends the calling rendering video to the communication server
- the communication server sends the called rendered video to the called terminal device.
- the calling terminal device sends a message A to the communication server.
- the message A carries the following parameters: the requirements of the calling user, the content of the negotiation between the calling terminal device and the called terminal device, and the communication between the calling terminal device and the communication server. content of negotiations between
- the communication server performs consistent authentication of the calling party number, that is, verifies whether the current user of the calling terminal device is the account holder of the telephone number corresponding to the calling terminal device, and executes the next step S1403 after the verification is passed;
- the communication server sends a message B to the called terminal device, which carries the following parameters: the requirements of the calling user, the negotiation content between the calling terminal device and the called terminal device, and the negotiation between the communication server and the called terminal device content;
- the called terminal device determines the negotiation result of the called terminal device in combination with the called user's demand, the call additional service capability information of the called terminal device, and the parameters carried in the message B;
- the called terminal device sends a message C to the communication server, which carries the following parameters: the called terminal device negotiation result;
- the communication server performs consistent authentication of the called party number, that is, verifies whether the current user of the called terminal device is the account holder of the telephone number corresponding to the called terminal device; the outcome of the negotiation;
- the communication server sends a message D to the calling terminal device, which carries the following parameters: the negotiation result of the called terminal device, the negotiation result between the communication server and the calling terminal device;
- the calling terminal device confirms the negotiation result of the calling terminal device according to the negotiation result of the called terminal device and the negotiation result between the communication server and the calling terminal device;
- the calling terminal device sends a message E to the communication server, which carries the following parameters: the negotiation result of the calling terminal device;
- the communication server sends a message F to the called terminal device, which carries the following parameters: the negotiation result of the calling terminal device.
- the called terminal device sends an "OK” message to the communication server, and the communication server sends an "OK” message to the calling terminal device, where the "OK" message is used to indicate that the parameter interaction is completed;
- the calling terminal device establishes a transmission channel with the communication server; Step S1413.
- the called terminal device establishes a transmission channel with the communication server;
- the types of data transmission channels that need to be created include: digitalman_model_channel, audio_channel, video_channel, action_channel.
- the calling terminal device downloads the calling digital human model from the communication server based on digitalman_model_channel;
- the called terminal device downloads the called digital human model from the communication server based on the digitalman_model_channel;
- the calling terminal device sends the calling video to the communication server based on the video_channel;
- the communication server captures the calling driver parameters based on the calling video
- the communication server sends the calling driver parameter to the calling terminal device based on the action_channel;
- the calling terminal device reconstructs the calling digital human based on the calling driver parameters; the calling terminal device renders the reconstructed calling digital human and the calling user scene to obtain a rendered picture; encodes the rendered picture to obtain the calling rendering video;
- the calling terminal device sends calling audio to the communication server based on the audio_channel;
- the calling terminal device sends the calling rendering video to the communication server based on the video_channel;
- the communication server performs a consistency check on the calling digital person, assuming that the consistency check passes;
- the communication server sends the calling audio to the called terminal device based on the audio_channel;
- the communication server sends the calling party rendering video to the called terminal device based on the video_channel;
- the called terminal device After the called terminal device receives the caller rendered video, it can load it on the display screen for display, and can also play the caller's audio.
- the calling terminal device After the calling terminal device receives the calling rendering video, it can also load it on the display screen for display, and at the same time, it can also play the calling audio, so that the calling user can review the content of the call screen at the local end.
- the called terminal device sends the called video to the communication server based on the video_channel;
- the communication server captures the called drive parameter based on the called video
- the communication server sends the called drive parameter to the called terminal device based on the action_channel;
- the called terminal device reconstructs the called digital person based on the called driver parameters; the called terminal device renders the reconstructed called digital person and the called user scene to obtain a rendered image; encodes the rendered image to obtain the called rendered image video;
- the called terminal device sends the called audio to the communication server based on the audio_channel;
- the called terminal device sends the called rendered video to the communication server based on the video_channel;
- the communication server performs a consistency check on the called digital human, assuming that the consistency check passes;
- the communication server sends the called audio to the calling terminal device based on the audio_channel;
- the communication server sends the called rendered video to the calling terminal device based on the video_channel.
- the calling terminal device After receiving the rendered video from the called party, the calling terminal device can load it on the display screen for display, and can also play the called party's audio.
- the called terminal device After the called terminal device receives the called rendered video, it can also load it on the display screen for display, and at the same time, it can also play the called audio, so that the called user can review the content of the call screen at the local end.
- the terminal device can express the user's call needs to the communication server and the peer terminal device, realizing the provision of the call screen content of the local end to the peer end according to the user's demand and presenting the content provided by the peer end on the local end according to the user's demand.
- the technical effect of the content of the call screen that is, both the calling and called users communicate in the image of a digital human, and both the calling and called users see the digital human at the opposite end in the real scene of the opposite end; in addition, the communication server and terminal equipment attach
- the service capability information clarifies the division of labor between the terminal device and the communication server (that is, what operations each device needs to perform) in the process of providing digital human services to users, so that reconstruction and rendering are all implemented on the terminal device, while capture, authentication
- the rights (such as the consistent authentication of the personal number) are implemented in the communication server, and the data transmission channels required for the digital human service are established according to the division of labor; and, through authentication and authentication, the digital human service is guaranteed to be used safely and legally in the call scene .
- Example 7 Capturing and rendering are implemented on the terminal device, and authentication and reconstruction are implemented on the communication server.
- the digital human model is stored on the communication server, the capture and rendering are implemented on the terminal device, and the reconstruction is implemented on the communication server.
- What the calling and called users see is the real scene of the opposite end. Neither the calling terminal device nor the called terminal device support it. Variable viewing angles, for example.
- the communication server no longer needs to send the digital human model to the calling and called terminal equipment, but the communication server rebuilds the digital human of the calling and called user according to the driving parameters of the calling and called user's digital human, and sends it to The calling and called terminal equipment. There is no difference in the rest.
- the types of data that need to be transmitted between the calling and called users and the communication server include those shown in FIG. 15A .
- the data transmission between the calling terminal device and the communication server may include:
- the calling terminal device sends the calling video to the communication server
- the calling terminal device sends the calling scene to the communication server
- the communication server sends the calling driver parameters to the calling terminal device
- the communication server sends the called digital person to the calling terminal device
- the communication server sends the called audio to the calling terminal device
- the communication server sends the called video to the calling terminal device
- the communication server sends the called scene to the calling terminal device
- the calling terminal device sends the called rendering video to the communication server;
- the communication server sends the calling rendering video to the calling terminal device.
- the data transmission between the called terminal device and the communication server may include:
- the called terminal device sends the called video to the communication server;
- the called terminal device sends the called scene to the communication server;
- the communication server sends the called drive parameter to the called terminal device
- the communication server sends the calling digital person to the called terminal device
- the communication server sends the calling audio to the called terminal device
- the communication server sends the calling video to the called terminal device
- the communication server sends the calling scene to the called terminal device
- the called terminal device sends the calling rendering video to the communication server
- the communication server sends the called rendered video to the called terminal device.
- the calling terminal device sends a message A to the communication server.
- the message A carries the following parameters: the requirements of the calling user, the content of the negotiation between the calling terminal device and the called terminal device, and the communication between the calling terminal device and the communication server. content of negotiations between
- the communication server performs consistent authentication of the calling party number, that is, verifies whether the current user of the calling terminal device is the account holder of the telephone number corresponding to the calling terminal device, and executes the next step S1503 after the verification is passed;
- the communication server sends a message B to the called terminal device, which carries the following parameters: the requirements of the calling user, the negotiation content between the calling terminal device and the called terminal device, and the negotiation between the communication server and the called terminal device content;
- the called terminal device determines the negotiation result of the called terminal device in combination with the called user's demand, the call additional service capability information of the called terminal device, and the parameters carried in the message B;
- the called terminal device sends a message C to the communication server, which carries the following parameters: the called terminal device negotiation result;
- the communication server performs consistent authentication of the called party number, that is, verifies whether the current user of the called terminal device is the account holder of the telephone number corresponding to the called terminal device; the outcome of the negotiation;
- the communication server sends a message D to the calling terminal device, which carries the following parameters: the negotiation result of the called terminal device, the negotiation result between the communication server and the calling terminal device;
- the calling terminal device confirms the negotiation result of the calling terminal device according to the negotiation result of the called terminal device and the negotiation result between the communication server and the calling terminal device;
- the calling terminal device sends a message E to the communication server, which carries the following parameters: the negotiation result of the calling terminal device;
- the communication server sends a message F to the called terminal device, which carries the following parameters: the negotiation result of the calling terminal device.
- the called terminal device sends an "OK” message to the communication server, and the communication server sends an "OK” message to the calling terminal device, where the "OK" message is used to indicate that the parameter interaction is completed;
- the calling terminal device establishes a transmission channel with the communication server;
- the called terminal device establishes a transmission channel with the communication server;
- the types of data transmission channels that need to be created include: audio_channel, video_channel, action_channel, digitalman_channel.
- the calling terminal device captures the calling driver parameters
- the calling terminal device sends the calling driver parameter to the communication server based on the action_channel, and sends the calling audio to the communication server based on the audio_channel;
- the communication server reconstructs the calling digital person based on the calling driver parameters and the calling audio;
- the communication server sends the reconstructed calling digital man to the calling terminal device based on the digitalman_channel;
- the calling terminal device renders the reconstructed scene of the calling digital person and the calling user to obtain a rendered image; encodes the rendered image to obtain the rendered video of the calling party;
- the calling terminal device sends the calling rendering video to the communication server based on the video_channel;
- the communication server performs a consistency check on the calling digital person, assuming that the check is passed;
- the communication server sends the calling audio to the called terminal device based on the audio_channel; sends the calling rendering video to the called terminal device based on the video_channel;
- the called terminal device After the called terminal device receives the caller's rendering video, it can load it on the display screen for display, and can also play the caller's audio.
- the calling terminal device After the calling terminal device receives the calling rendering video, it can also load it on the display screen for display, and at the same time play the calling audio, so that the calling user can look back at the content of the call screen at the local end.
- the called terminal device captures the called drive parameter
- the called terminal device sends the called drive parameter to the communication server based on the action_channel, and sends the called audio to the communication server based on the audio_channel;
- the communication server reconstructs the called digital person based on the called driving parameter and the called audio;
- the communication server sends the reconstructed called digital man to the called terminal device based on the digitalman_channel;
- the called terminal device renders the reconstructed scene of the called digital person and the called user to obtain a rendered image; encodes the rendered image to obtain the called rendered video;
- the called terminal device sends the called rendering video to the communication server based on the video_channel;
- the communication server performs a consistency check on the called digital person, assuming that the check is passed;
- the communication server After passing the called digital human consistency check, the communication server sends the called audio to the calling terminal device based on the audio_channel; sends the called rendered video to the calling terminal device based on the video_channel.
- the calling terminal device After receiving the rendered video from the called party, the calling terminal device can load it on the display screen for display, and can also play the called party's audio.
- the called terminal device After the called terminal device receives the called rendered video, it can also load it on the display screen for display, and at the same time, it can also play the called audio, so that the called user can review the content of the call screen at the local end.
- the terminal device can express the user's call needs to the communication server and the peer terminal device, realizing the provision of the call screen content of the local end to the peer end according to the user's demand and presenting the content provided by the peer end on the local end according to the user's demand.
- the communication server and terminal equipment attach Service capability information clarifies the division of labor between the terminal device and the communication server in the process of providing digital human services to users (that is, what operations each device needs to perform), so that capture and rendering are all implemented on the terminal device, while reconstruction, authentication Authorization (such as identity verification of personal numbers, verification of digital human identity) is implemented in the communication server, and the data transmission channel required for digital human services is established according to the division of labor; and, through authentication and authentication, it is guaranteed that digital human services can be used in call scenarios. be used safely and legally.
- Example 8 Rendering is implemented on the terminal device, and authentication, authentication, capture, and reconstruction are implemented on the communication server.
- the digital human model is stored in the communication server, the rendering is realized in the terminal device, and the capture and reconstruction are realized in the communication server.
- What the calling and called users see is the real scene of the opposite end, and neither the calling terminal device nor the called terminal device supports it. Variable viewing angles, for example.
- the calling and called terminal devices no longer need to report the driving parameters of the calling user/called user digital human, but the communication server captures and produces the digital human driving according to the audio and video of the calling and called users Parameters, and use the input model of the calling and called users to reconstruct, generate the digital human of the calling and called users, and send it to the calling and called terminal equipment. There is no difference in the rest.
- the types of data that need to be transmitted between the calling and called users and the communication server include those shown in FIG. 16A .
- the data transmission between the calling terminal device and the communication server may include:
- the calling terminal device sends the calling video to the communication server
- the calling terminal device sends the calling scene to the communication server
- the communication server sends the called digital person to the calling terminal device
- the communication server sends the called audio to the calling terminal device
- the communication server sends the called video to the calling terminal device
- the communication server sends the called scene to the calling terminal device
- the calling terminal device sends the called rendering video to the communication server;
- the communication server sends the calling rendering video to the calling terminal device.
- the data transmission between the called terminal device and the communication server may include:
- the called terminal device sends the called video to the communication server;
- the called terminal device sends the called scene to the communication server;
- the communication server sends the calling digital person to the called terminal device
- the communication server sends the calling audio to the called terminal device
- the communication server sends the calling video to the called terminal device
- the communication server sends the calling scene to the called terminal device
- the called terminal device sends the calling rendering video to the communication server
- the communication server sends the called rendered video to the called terminal device.
- the calling terminal device sends a message A to the communication server.
- the message A carries the following parameters: the requirements of the calling user, the content of the negotiation between the calling terminal device and the called terminal device, and the communication between the calling terminal device and the communication server. content of negotiations between
- the communication server performs consistent authentication of the calling party number, that is, verifies whether the current user of the calling terminal device is the account holder of the telephone number corresponding to the calling terminal device, and executes the next step S1603 after the verification is passed;
- the communication server sends a message B to the called terminal device, which carries the following parameters: the requirements of the calling user, the negotiation content between the calling terminal device and the called terminal device, and the negotiation between the communication server and the called terminal device content;
- the called terminal device determines the negotiation result of the called terminal device in combination with the called user's demand, the call additional service capability information of the called terminal device, and the parameters carried in the message B;
- the called terminal device sends a message C to the communication server, which carries the following parameters: the called terminal device negotiation result;
- the communication server performs consistent authentication of the called party number, that is, verifies whether the current user of the called terminal device is the account holder of the telephone number corresponding to the called terminal device; the outcome of the negotiation;
- the communication server sends a message D to the calling terminal device, which carries the following parameters: the negotiation result of the called terminal device, the negotiation result between the communication server and the calling terminal device;
- the calling terminal device confirms the negotiation result of the calling terminal device according to the negotiation result of the called terminal device and the negotiation result between the communication server and the calling terminal device;
- the calling terminal device sends a message E to the communication server, which carries the following parameters: the negotiation result of the calling terminal device;
- the communication server sends a message F to the called terminal device, which carries the following parameters: the negotiation result of the calling terminal device.
- the called terminal device sends an "OK” message to the communication server, and the communication server sends an "OK” message to the calling terminal device, where the "OK" message is used to indicate that the parameter interaction is completed;
- the calling terminal device establishes a transmission channel with the communication server;
- the called terminal device establishes a transmission channel with the communication server;
- the types of data transmission channels that need to be created include: audio_channel, video_channel, digitalman_channel.
- the calling terminal device sends the calling video to the communication server based on the video_channel, and sends the calling audio to the communication server based on the audio_channel;
- the communication server captures the calling driving parameters based on the calling video; reconstructs the calling digital person based on the calling driving parameters and the calling audio;
- the communication server sends the reconstructed calling digital man to the calling terminal device based on the digitalman_channel;
- the calling terminal device renders the scene of the reconstructed calling digital person and the calling user to obtain a rendered image; encodes the rendered image to obtain the rendered video of the calling party;
- the calling terminal device sends the calling rendering video to the communication server based on the video_channel;
- the communication server performs a consistency check on the calling digital person, assuming that the check is passed;
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Business, Economics & Management (AREA)
- General Business, Economics & Management (AREA)
- Telephonic Communication Services (AREA)
Abstract
本申请实施例提供一种提供通话附加服务的方法、装置及系统,用以实现在实时交互的通话场景中,为用户提供包括如数字人服务在内的附加服务。终端设备获取通话附加服务需求信息,其中通话附加服务需求信息用于表示终端设备对应的用户要求在至少一个对端终端设备的通话应用界面中呈现用户的数字人形象;根据通话附加服务需求信息使通信服务器向至少一个对端终端设备发送用户的数字人内容,以使至少一个对端终端设备基于用户的数字人内容在通话应用界面呈现用户的数字人形象;或者,根据通话附加服务需求信息使至少一个对端终端设备生成用户的数字人内容,以使至少一个对端终端设备基于用户的数字人内容在通话应用界面呈现用户的数字人形象。
Description
相关申请的交叉引用
本申请要求在2021年09月04日提交中国专利局、申请号为202111034920.1、申请名称为“一种通话方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中;本申请要求在2021年12月29日提交中国专利局、申请号为202111632624.1、申请名称为“一种提供通话附加服务的方法、装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及通信技术领域,尤其涉及一种提供通话附加服务的方法、装置及系统。
随着通话的普及,越来越多的用户选择使用通话与他人进行实时交互。近年来,人们对通话提出新的附加服务需求,例如数字人服务。数字人服务是一种将真人的外形、动作或声音等人物特征映射到数字人的软件服务,可以将真人的动态人物特征(包括外形、动作、声音等)在电子设备中通过数字人的形象进行再现。在通话场景中为用户提供数字人服务,可以实现在与用户通话的他人的终端设备的通话应用界面中呈现用户的数字人形象。
然而,现有的数字人服务的通常都是离线处理的,即预先编排好数字人的形象、行为以及活动场景等,例如在影视中使用数字人作为虚拟演员时,会在影视制作阶段制作好有数字人形象的视频内容,后续再到电视台或院线等播放已经制作好的视频内容。由于现有技术中的数字人服务无法将数字人与真人进行实时映射,所以未能在实时交互的通话场景中进行推广。
如何在实时交互的通话场景中,为用户提供包括如数字人服务在内的附加服务,是本申请要解决的技术问题。
发明内容
本申请实施例提供一种提供通话附加服务的方法、装置及系统,用以实现在实时交互的通话场景中,为用户提供包括如数字人服务在内的附加服务。
第一方面,提供一种提供通话附加服务的方法,可以应用于终端设备或者终端设备中的芯片,以方法应用于终端设备为例,终端设备通过通信服务器提供的服务与至少一个对端终端设备进行通话,方法包括:终端设备获取通话附加服务需求信息,其中通话附加服务需求信息用于表示终端设备对应的用户要求在至少一个对端终端设备的通话应用界面中呈现用户的数字人形象;终端设备根据通话附加服务需求信息使通信服务器向至少一个对端终端设备发送用户的数字人内容,以使至少一个对端终端设备基于用户的数字人内容在通话应用界面呈现用户的数字人形象;或者,终端设备根据通话附加服务需求信息使至少一个对端终端设备生成用户的数字人内容,以使至少一个对端终端设备基于用户的数字人内容在通话应用界面呈现用户的数字人形象。
上述方案,可以实现在实时交互的通话场景中,根据用户的需求为用户提供数字人服 务,例如在终端设备对应的用户要求在至少一个对端终端设备的通话应用界面中呈现用户的数字人形象时,使得至少一个对端终端设备基于用户的数字人内容在通话应用界面呈现用户的数字人形象,可以提高用户体验。
一种可能的实施方式中,终端设备根据通话附加服务需求信息使通信服务器向至少一个对端终端设备发送用户的数字人内容,包括:终端设备根据通话附加服务需求信息生成用户的数字人内容;终端设备向通信服务器发送用户的数字人内容,以使通信服务器向至少一个对端终端设备发送用户的数字人内容;或者,终端设备向通信服务器发送通话附加服务需求信息,以使通信服务器生成用户的数字人内容以及向至少一个对端终端设备发送用户的数字人内容。
如此,可以实现由终端设备或通信服务器生成用户的数字人内容。
一种可能的实施方式中,终端设备根据通话附加服务需求信息使至少一个对端终端设备生成用户的数字人内容,包括:终端设备向至少一个对端终端设备发送通话附加服务需求信息,以使至少一个对端终端设备生成用户的数字人内容。
如此,可以实现由对端终端设备生成用户的数字人内容。
一种可能的实施方式中,通话附加服务需求信息还用于表示终端设备对应的用户要求在至少一个对端终端设备的通话应用界面中呈现用户的虚拟场景画面,方法还包括:终端设备根据通话附加服务需求信息使通信服务器向至少一个对端终端设备发送用户的虚拟场景内容,以使至少一个对端终端设备基于用户的虚拟场景内容在通话应用界面呈现用户的虚拟场景画面;或者,终端设备根据通话附加服务需求信息使至少一个对端终端设备生成用户的虚拟场景内容,以使至少一个对端终端设备基于用户的虚拟场景内容在通话应用界面呈现用户的虚拟场景画面。
如此,可以根据用户的需求在至少一个对端终端设备的通话应用界面中呈现用户的虚拟场景画面时,可以进一步提高用户体验。
一种可能的实施方式中,终端设备根据通话附加服务需求信息使通信服务器向至少一个对端终端设备发送用户的虚拟场景内容,包括:终端设备根据通话附加服务需求信息生成用户的虚拟场景内容;终端设备向通信服务器发送用户的虚拟场景内容,以使通信服务器向至少一个对端终端设备发送用户的虚拟场景内容;或者,终端设备向通信服务器发送通话附加服务需求信息,以使通信服务器生成用户的虚拟场景内容以及向至少一个对端终端设备发送用户的虚拟场景内容。
如此,可以实现由终端设备或通信服务器生成用户的虚拟场景内容。
一种可能的实施方式中,终端设备根据通话附加服务需求信息使至少一个对端终端设备生成用户的虚拟场景内容,包括:终端设备向至少一个对端终端设备发送通话附加服务需求信息。
如此,可以实现由对端终端设备生成用户的虚拟场景内容。
第二方面,提供一种提供通话附加服务的方法,可以应用于通信服务器或通信服务器中的芯片,以方法应用于通信服务器为例,通信服务器用于为终端设备与至少一个对端终端设备之间的通话提供服务,方法包括:通信服务器接收来自终端设备的通话附加服务需求信息,其中通话附加服务需求信息用于表示终端设备对应的用户要求在至少一个对端终端设备的通话应用界面中呈现用户的数字人形象;通信服务器根据通话附加服务需求信息向至少一个对端终端设备发送通信服务器生成的用户的数字人内容,以使至少一个对端终 端设备基于用户的数字人内容在通话应用界面呈现用户的数字人形象;或者,通信服务器向至少一个对端终端设备发送通话附加服务需求信息,以使至少一个对端终端设备生成用户的数字人内容以及基于用户的数字人内容在通话应用界面呈现用户的数字人形象。
一种可能的实施方式中,通话附加服务需求信息还用于表示终端设备对应的用户要求在至少一个对端终端设备的通话应用界面中呈现用户的虚拟场景画面,方法还包括:通信服务器根据通话附加服务需求信息向至少一个对端终端设备发送通信服务器生成的用户的虚拟场景内容,以使至少一个对端终端设备基于用户的虚拟场景内容在通话应用界面呈现用户的虚拟场景画面;或者,通信服务器向至少一个对端终端设备发送通话附加服务需求信息,以使至少一个对端终端设备生成用户的虚拟场景内容以及基于用户的虚拟场景内容在通话应用界面呈现用户的虚拟场景画面。
第三方面,提供一种提供通话附加服务的方法,可以应用于终端设备或者终端设备中的芯片,以方法应用于终端设备为例,应用于终端设备,终端设备通过通信服务器提供的服务与至少一个对端终端设备进行通话,方法包括:终端设备向通信服务器发送第一通话附加服务能力信息,第一通话附加服务能力信息用于指示终端设备具备的数字人服务能力;终端设备接收通信服务器发送的指示信息;其中,指示信息用于指示终端设备执行的数字人处理操作,和/或,指示信息包括第二通话附加服务能力信息和第三通话附加服务能力信息,第二通话附加服务能力信息用于指示通信服务器具备的数字人服务能力,第三通话附加服务能力信息用于指示至少一个对端终端设备具备的数字人服务能力;终端设备根据指示信息执行至少一个数字人处理操作或不执行数字人处理操作。
通过该方案,可以明确在实时交互场景中为用户提供数字人服务时,用户对应的终端设备需要执行的操作,进而实现在实时交互的通话场景中,为用户提供数字人服务。
一种可能的实施方式中,数字人处理操作包括捕捉操作、重建操作、渲染操作中的一个或多个操作;捕捉操作包括:获取终端设备对应的用户的驱动参数,驱动参数包括终端设备对应的用户的唇形、表情、动作、深度信息中的至少一种;重建操作包括:根据终端设备对应的用户的驱动参数、终端设备对应的用户的数字人模型,生成数字人图像序列,数字人图像序列包括数字人模型驱动后的多帧图像;渲染操作包括:根据数字人图像序列和场景图像序列,生成终端设备对应的用户的通话画面内容。
如此,可以明确在实时交互场景中为用户提供数字人服务涉及的技术栈(捕捉操作、重建操作、渲染操作),明确终端设备的处理能力与可以提供的数字人处理操作之间的对应关系。
一种可能的实施方式中,指示信息用于指示第二通话附加服务能力信息和/或第三通话附加服务能力信息;终端设备根据指示信息执行至少一个数字人处理操作,包括:终端设备根据第一通话附加服务能力信息、第二通话附加服务能力信息、第三通话附加服务能力信息中的至少一项确定捕捉操作、重建操作、渲染操作中的至少一个操作;终端设备执行至少一个操作。
如此,可以实现由终端设备决策自身执行的至少一个数字人处理操作。
一种可能的实施方式中,第一通话附加服务能力信息包括以下一项或多项:
用于指示终端设备是/否保存有终端设备对应的用户的数字人模型的信息;
用于指示终端设备是/否保存有终端设备对应的用户的虚拟场景的信息;
用于指示终端设备是/否能够执行重建操作的信息;
用于指示终端设备能够提供的驱动参数的信息;
用于指示终端设备是/否能够执行渲染操作的信息;
用于指示终端设备是/否需要通信服务器回传用户的通话画面内容的信息;
用于指示终端设备是/否提供视角信息的信息。
可以理解,以上仅为举例而非具体限定。
一种可能的实施方式中,第二通话附加服务能力信息包括以下一项或多项:
用于指示通信服务器是/否保存有终端设备对应的用户的数字人模型的信息;
用于指示通信服务器是/否保存有终端设备对应的用户的虚拟场景的信息;
用于指示通信服务器是/否能够执行重建操作的信息;
用于指示通信服务器能够接收的驱动参数的信息;
用于指示通信服务器能够提取的驱动参数的信息;
用于指示通信服务器是/否能够执行渲染操作的信息;
用于指示通信服务器是/否能够向终端设备回传用户的通话画面内容的信息。
可以理解,以上仅为举例而非具体限定。
一种可能的实施方式中,第三通话附加服务能力信息包括以下一项或多项:
用于指示至少一个对端终端设备是/否能够执行渲染操作的信息;
用于指示至少一个对端终端设备是/否提供视角信息的信息。
可以理解,以上仅为举例而非具体限定。
在本申请实施例中,根据终端设备、通信服务器之间的分工情况,在终端设备和通信服务器之间建立不同的数据传输通道。
一种可能的实施方式中,终端设备执行重建操作,方法还包括:终端设备与通信服务器建立第一通道,第一通道用于传输终端设备对应的用户的数字人模型。
一种可能的实施方式中,终端设备执行重建操作且通信服务器执行渲染操作,方法还包括:终端设备与通信服务器建立第二通道,第二通道用于传输终端设备对应的用户的数字人图像序列。
一种可能的实施方式中,终端设备执行捕捉操作且通信服务器执行重建操作,方法还包括:终端设备与通信服务器建立第三通道,第三通道用于传输终端设备对应的用户的驱动参数。
一种可能的实施方式中,通信服务器执行渲染操作,第一通话附加服务能力信息包括用于指示终端设备保存有终端设备对应的用户的虚拟场景的信息,终端设备对应的用户的通话画面内容包括用户的虚拟场景画面内容;方法还包括:终端设备与通信服务器建立第四通道,第四通道用于传输终端设备对应的用户的虚拟场景。
一种可能的实施方式中,第一通话附加服务能力信息包括用于指示终端设备需要通信服务器回传用户的通话画面内容的信息,方法还包括:终端设备与通信服务器建立第五通道,第五通道用于传输终端设备对应的用户的通话画面内容。
一种可能的实施方式中,方法还包括:终端设备与通信服务器建立第六通道,第六通道用于传输终端设备对应的用户的视角信息,其中第一通话附加服务能力信息包括用于指示终端设备是/否提供视角信息的信息;和/或,终端设备与通信服务器建立第七通道或第八通道,第七通道用于传输终端设备对应的用户的音频,第八通道用于传输终端设备对应的用户的视频。
基于以上几种实施方式,可以实现根据终端设备、通信服务器之间的分工情况,灵活建立数据传输通道,满足数字人通信过程中不同数据对传输的不同要求。
一种可能的实施方式中,终端设备执行捕捉操作;获取终端设备对应的用户的驱动参数,包括:终端设备利用传感器采集终端设备对应的用户的驱动参数;和/或,终端设备根据终端设备对应的用户的视频和/或音频确定终端设备对应的用户的驱动参数。
可以理解,以上两种获取驱动参数的方式仅为举例,实际还可以有其它实现方式。
第四方面,提供一种提供通话附加服务的方法,可以应用于通信服务器或通信服务器中的芯片,以方法应用于通信服务器为例,通信服务器用于为终端设备与至少一个对端终端设备之间的通话提供服务,方法包括:通信服务器接收来自终端设备的第一通话附加服务能力信息,第一通话附加服务能力信息用于指示终端设备具备的数字人服务能力;通信服务器向终端设备发送指示信息;其中,指示信息用于指示终端设备执行的数字人处理操作,和/或,指示信息包括第二通话附加服务能力信息和第三通话附加服务能力信息,第二通话附加服务能力信息用于指示通信服务器具备的数字人服务能力,第三通话附加服务能力信息用于指示至少一个对端终端设备具备的数字人服务能力。
一种可能的实施方式中,数字人处理操作包括捕捉操作、重建操作、渲染操作中的一个或多个操作;捕捉操作包括:获取终端设备对应的用户的驱动参数,驱动参数包括终端设备对应的用户的唇形、表情、动作、深度信息中的至少一种;重建操作包括:根据终端设备对应的用户的驱动参数、终端设备对应的用户的数字人模型,生成数字人图像序列,数字人图像序列包括数字人模型驱动后的多帧图像;渲染操作包括:根据数字人图像序列和场景图像序列,生成终端设备对应的用户的通话画面内容。
一种可能的实施方式中,方法还包括:通信服务器根据第一通话附加服务能力信息、第二通话附加服务能力信息、第三通话附加服务能力信息中的至少一项,执行捕捉操作、重建操作、渲染操作中的至少一个操作。
如此,可以明确在实时交互场景中为用户提供数字人服务时,通信服务器需要执行的操作,进而实现在实时交互的通话场景中,为用户提供数字人服务。并且,通过通信服务器执行数字人处理操作,可以降低数字人通信对终端设备的要求,有利于数字人通信在实时交互的通话场景中推广。
一种可能的实施方式中,方法还包括:通信服务器接收来自至少一个对端终端设备的第三通话附加服务能力信息。
如此,可以实现通信服务器获取第三通话附加服务能力信息,进而结合对端终端设备的数字人服务能力来确定通信服务器、终端设备、对端终端设之间的分工。
一种可能的实施方式中,第一通话附加服务能力信息包括以下一项或多项:
用于指示终端设备是/否保存有终端设备对应的用户的数字人模型的信息;
用于指示终端设备是/否保存有终端设备对应的用户的虚拟场景的信息;
用于指示终端设备是/否能够执行重建操作的信息;
用于指示终端设备能够提供的驱动参数的信息;
用于指示终端设备是/否能够执行渲染操作的信息;
用于指示终端设备是/否需要通信服务器回传用户的通话画面内容的信息;
用于指示终端设备是/否提供视角信息的信息。
一种可能的实施方式中,第二通话附加服务能力信息包括以下一项或多项:
用于指示通信服务器是/否保存有终端设备对应的用户的数字人模型的信息;
用于指示通信服务器是/否保存有终端设备对应的用户的虚拟场景的信息;
用于指示通信服务器是/否能够执行重建操作的信息;
用于指示通信服务器能够接收的驱动参数的信息;
用于指示通信服务器能够提取的驱动参数的信息;
用于指示通信服务器是/否能够执行渲染操作的信息;
用于指示通信服务器是/否能够向终端设备回传用户的通话画面内容的信息。
一种可能的实施方式中,第三通话附加服务能力信息包括以下一项或多项:
用于指示至少一个对端终端设备是/否能够执行渲染操作的信息;
用于指示至少一个对端终端设备是/否提供视角信息的信息。
一种可能的实施方式中,终端设备执行重建操作,方法还包括:通信服务器与终端设备建立第一通道,第一通道用于传输终端设备对应的用户的数字人模型。
一种可能的实施方式中,终端设备执行重建操作且通信服务器执行渲染操作,方法还包括:通信服务器与终端设备建立第二通道,第二通道用于传输终端设备对应的用户的数字人图像序列。
一种可能的实施方式中,终端设备执行捕捉操作且通信服务器执行重建操作,方法还包括:通信服务器与终端设备建立第三通道,第三通道用于传输终端设备对应的用户的驱动参数。
一种可能的实施方式中,通信服务器执行渲染操作,第一通话附加服务能力信息包括用于指示终端设备保存有终端设备对应的用户的虚拟场景的信息,终端设备对应的用户的通话画面内容包括用户的虚拟场景画面内容;方法还包括:终端设备与通信服务器建立第四通道,第四通道用于传输终端设备对应的用户的虚拟场景。
一种可能的实施方式中,第一通话附加服务能力信息包括用于指示终端设备需要通信服务器回传用户的通话画面内容的信息,方法还包括:通信服务器与终端设备建立第五通道,第五通道用于传输终端设备对应的用户的通话画面内容。
一种可能的实施方式中,方法还包括:通信服务器与终端设备建立第六通道,第六通道用于传输终端设备对应的用户的视角信息,其中第一通话附加服务能力信息包括用于指示终端设备是/否提供视角信息的信息;和/或,通信服务器与终端设备建立第七通道或第八通道,第七通道用于传输终端设备对应的用户的音频,第八通道用于传输终端设备对应的用户的视频。
一种可能的实施方式中,通信服务器执行捕捉操作;获取终端设备对应的用户的驱动参数,包括:通信服务器接收来自终端设备的终端设备对应的用户的驱动参数;和/或,通信服务器接收来自终端设备的终端设备对应的用户的视频和/或音频,根据视频和/或音频确定终端设备对应的用户的驱动参数。
第五方面,提供一种提供通话附加服务的方法,可以应用于通信服务器或通信服务器中的芯片,以方法应用于通信服务器为例,通信服务器用于为终端设备与至少一个对端终端设备之间的通话提供服务,方法包括:通信服务器接收来自终端设备的第一请求,其中第一请求用于请求数字人模型;
通信服务器判断终端设备对应的用户是否为数字人模型的合法用户,其中终端设备对应的用户为操作终端设备的用户;
通信服务器在确定终端设备对应的用户为数字人模型的合法用户之后,响应于第一请求,向终端设备发送数字人模型,其中数字人模型用于终端设备对应的用户以数字人形象通话。
上述方案,通信服务器在确定终端设备对应的用户为数字人模型的合法用户之后才向终端设备发送数字人模型,可以避免数字人模型被非法使用的问题,提高了在实时交互的通话场景中为用户提供数字人服务的安全性。
一种可能的实施方式中,通信服务器判断终端设备对应的用户是否为数字人模型的合法用户,包括:
通信服务器获取终端设备对应的用户的特征、数字人模型关联的校验特征,其中校验特征为数字人模型的合法用户的特征;通信服务器根据终端设备对应的用户的特征、校验特征判断终端设备对应的用户是否为数字人模型的合法用户;或者,通信服务器接收来自终端设备的携带第一数字签名的验证结果;通信服务器验证第一数字签名,在验证第一数字签名通过之后,根据验证结果确定终端设备对应的用户是否为数字人模型的合法用户。
通过该方式,用户身份的合法性验证可以由终端设备执行,也可以由网络设备执行,提高了方案的灵活性。
一种可能的实施方式中,终端设备对应的用户的特征包括终端设备对应的用户的人脸、指纹、声纹或虹膜中的一项或多项,数字人模型关联的校验特征包括数字人模型的合法用户的人脸、指纹、声纹或虹膜中的一项或多项。
可以理解,以上几种仅为示例而非限定。
一种可能的实施方式中,方法还包括:通信服务器获取数字人内容中的人物形象的特征,数字人内容包括数字人模型的多帧图像;通信服务器根据数字人内容中的人物形象的特征、数字人模型关联的校验特征判断数字人内容中的人物形象是否与数字人模型匹配;通信服务器在确定数字人内容中的人物形象与数字人模型匹配之后,向至少一个对端终端设备发送数字人内容。
通过该方式,可以验证重建前的数字人模型是否和重建后的数字人(即数字人内容中的人物形象)匹配,进一步提高了数字人服务的安全性。
一种可能的实施方式中,数字人内容中的人物形象的特征包括人物形象的人脸和/或声纹;数字人模型关联的校验特征包括:数字人模型的合法用户的人脸和/或声纹。
可以理解,以上几种仅为示例而非限定。
一种可能的实施方式中,方法还包括:通信服务器接收携带第二数字签名的数字人模型、校验特征、数字人模型的合法用户的身份信息;其中,第二数字签名为数字人审核设备的数字签名;通信服务器使用数字人审核设备的公钥验证第二数字签名;在验证第二数字签名通过之后,通信服务器根据合法用户的身份信息将数字人模型、校验特征与合法用户在通信服务器的开户信息进行绑定。
通过该实施方式,可以实现通信服务器将数字人模型与数字人模型的合法用户的开户信息进行绑定,以便于后续用于验证用户的身份;并且,只有被数字人审核设备审核通过的数字人模型才能在实时交互的通话场景中应用,进一步提高了数字人服务的安全性。
一种可能的实施方式中,数字人模型还携带第三数字签名;其中,第三数字签名为数字人模型制作设备的数字签名;方法还包括:通信服务器保存数字人模型、校验特征之前,使用数字人模型制作设备的公钥验证第三数字签名;在验证第二数字签名、第三数字签名 通过之后,通信服务器根据合法用户的身份信息将数字人模型、校验特征与合法用户在通信服务器的开户信息进行绑定。
通过该方式,可以验证数字人模型的未被篡改性,进一步提高了数字人服务的安全性。
第六方面,提供一种提供通话附加服务的方法,可以应用于数字人模型制作设备或数字人模型制作设备中的芯片,以方法应用于数字人模型制作设备为例,方法包括:数字人模型制作设备使用数字人审核设备的第一公钥对终端设备对应的用户的数字人模型、校验特征以及身份信息进行加密,将加密后的终端设备对应的用户的数字人模型、校验特征以及身份信息发送到数字人审核设备;数字人模型制作设备接收来自数字人审核设备的第二数字签名、数字人模型、校验特征以及身份信息;数字人模型制作设备将第二数字签名、数字人模型、校验特征以及身份信息发送到通信服务器,其中数字人模型用于终端设备对应的用户以数字人形象通话。
一种可能的实施方式中,方法还包括:数字人模型制作设备使用数字人模型制作设备的私钥为数字人模型、校验特征以及身份信息添加第三数字签名;数字人模型制作设备将二数字签名、第三数字签名、数字人模型、校验特征以及身份信息发送到通信服务器。
第七方面,提供一种提供通话附加服务的方法,可以应用于数字人审核设备或数字人审核设备中的芯片,以方法应用于数字人审核设备为例,方法包括:数字人审核设备接收来自数字人模型制作设备的加密后的终端设备对应的用户的数字人模型、校验特征以及身份信息;数字人审核设备使用数字人审核设备的第一私钥对加密后的终端设备对应的用户的数字人模型、校验特征以及身份信息进行解密;对终端设备对应的用户的数字人模型的合法性进行审核;在审核通过之后,使用数字人审核设备的第二私钥对数字人模型、校验特征以及身份信息添加第二数字签名;数字人审核设备将第二数字签名、数字人模型、校验特征以及身份信息发送给数字人模型制作设备,其中数字人模型用于终端设备对应的用户以数字人形象通话。
第八方面,提供一种通信装置,包括用实现如第一方面或第一方面任一种可能的实施方式或第二方面或第二方面任一种可能的实施方式或第三方面或第三方面任一种可能的实施方式或第四方面或第四方面任一种可能的实施方式或第五方面或第五方面任一种可能的实施方式或第六方面或第六方面任一种可能的实施方式或第七方面或第七方面任一种可能的实施方式中所述的方法的模块。
第九方面,提供一种通信装置,包括处理器和存储器,处理器与存储器耦合;存储器,用于存储程序指令;处理器,用于读取存储器中存储的程序指令,以实现如第一方面或第一方面任一种可能的实施方式或第二方面或第二方面任一种可能的实施方式或第三方面或第三方面任一种可能的实施方式或第四方面或第四方面任一种可能的实施方式或第五方面或第五方面任一种可能的实施方式或第六方面或第六方面任一种可能的实施方式或第七方面或第七方面任一种可能的实施方式中所述的方法。
第十方面,提供一种计算机可读存储介质,存储介质中存储有计算机程序或指令,当计算机程序或指令被通信装置执行时,实现如第一方面或第一方面任一种可能的实施方式或第二方面或第二方面任一种可能的实施方式或第三方面或第三方面任一种可能的实施方式或第四方面或第四方面任一种可能的实施方式或第五方面或第五方面任一种可能的实施方式或第六方面或第六方面任一种可能的实施方式或第七方面或第七方面任一种可能的实施方式中所述的方法。
第十一方面,提供一种计算机程序产品,包括指令,当其在计算机上运行时,使得如第一方面或第一方面任一种可能的实施方式或第二方面或第二方面任一种可能的实施方式或第三方面或第三方面任一种可能的实施方式或第四方面或第四方面任一种可能的实施方式或第五方面或第五方面任一种可能的实施方式或第六方面或第六方面任一种可能的实施方式或第七方面或第七方面任一种可能的实施方式中所述的方法被执行。
第十二方面,提供一种通话系统,包括第一终端设备、第二终端设备、以及为第一终端设备与第二终端设备之间的通话提供服务的通信服务器;
第一终端设备用于执行如第一方面或第一方面任一种可能的实施方式中所述方法;
通信服务器用于执行如第二方面或第二方面任一种可能的实施方式中所述方法;
第二终端设备用于基于第一终端设备对应的用户的数字人内容在通话应用界面呈现第一终端设备对应的用户的数字人形象。
第十三方面,提供一种通话系统,包括第一终端设备、第二终端设备以及为第一终端设备与第二终端设备之间的通话提供服务的通信服务器;
第一终端设备用于执行如第三方面或第三方面任一种可能的实施方式中所述的方法;
通信服务器用于执行如第四方面或第四方面任一种可能的实施方式中所述的方法;
第二终端设备用于基于第一终端设备对应的用户的数字人内容在通话应用界面呈现第一终端设备对应的用户的数字人形象。
第十四方面,提供一种通话系统,其特征自安于,包括通信服务器、数字人模型制作设备、数字人模型制作设备;
通信服务器用于执行如第五方面或第五方面任一种可能的实施方式中所述的方法;
数字人模型制作设备用于执行如第六方面或第六方面任一种可能的实施方式中所述的方法;
数字人模型制作设备用于执行如第七方面或第七方面任一种可能的实施方式中所述的方法。
图1A~图1G为本申请实施例提供的几种通话的场景示意图;
图2为本申请实施例提供的一种提供通话附加服务的方法的流程图;
图3A为本申请实施例提供的数字人技术栈的示意图;
图3B为本申请实施例提供的另一种提供通话附加服务的方法的流程图;
图3C为本申请实施例提供的另一种提供通话附加服务的方法的流程图;
图4为本申请实施例提供的另一种提供通话附加服务的方法的流程图;
图5为本申请实施例提供的另一种提供通话附加服务的方法的流程图;
图6为本申请实施例提供的几种数据传输通道的类型的示意图;
图7为本申请实施例提供的几种可能的密钥的示意图;
图8A为本申请实施例提供的一种认证鉴权方法的流程图;
图8B为本申请实施例提供的一种认证鉴权方法的流程图;
图8C为本申请实施例提供的一种认证鉴权方法的流程图;
图8D为本申请实施例提供一种数字资产购买方法的流程图;
图9A~图9C为本申请实施例提供的一种具体实施例的示意图;
图10A~图10C为本申请实施例提供的一个具体实施例的示意图;
图11A~图11C为本申请实施例提供的另一个具体实施例的示意图;
图12A~图12C为本申请实施例提供的另一个具体实施例的示意图;
图13A~图13D为本申请实施例提供的另一个具体实施例的示意图;
图14A~图14C为本申请实施例提供的另一个具体实施例的示意图;
图15A~图15C为本申请实施例提供的另一个具体实施例的示意图;
图16A~图16C为本申请实施例提供的另一个具体实施例的示意图;
图17A~图17B为本申请实施例提供的另一个具体实施例的示意图;
图18A~图18B为本申请实施例提供的另一个具体实施例的示意图;
图19为本申请实施例提供的一种具体的提供通话附加服务的方法的流程图;
图20为本申请实施例提供的一种通信装置的结构示意图;
图21为本申请实施例提供的另一种通信装置的结构示意图。
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
以下,对本申请涉及的技术概念进行描述。
1)本文涉及的“数字人”是指将计算机设备生成的人物特征数据通过电子设备(如手机、电脑、虚拟现实(Virtual Reality,VR)/增强现实(Augmented Reality,简称AR)眼镜等)呈现出来的虚拟人物;由于数字人的人物特征数据可以基于真人的人物特征数据产生,数字人可以具有与真人相同或相近的相貌、性别等人物特征。在有些场合中,“数字人”也被称为“虚拟人”。
2)本文涉及的“数字人服务”,也可以称为“数字人增值服务”,是指一种将真人的外形、动作或声音等人物特征映射到数字人的软件服务,也即将真人的动态人物特征(包括外形、动作、声音等)实时地在电子设备中通过数字人的形象进行再现,例如以二维或三维视频的方式来再现。如此,该服务使得数字人拥有与真人相近的表情、行为方式和语言表达方式。
可以理解,本申请实施例中的真人形象可以包括真人的外貌、表情、动作、声音等。
以两个终端设备(包括终端设备A和终端设备B)进行视频通话为例:在终端设备A和终端设备B视频通话的过程中,如果终端设备A没有使用数字人服务,则用户b(这里的用户b是指使用终端设备B进行视频通话的用户)在终端设备B的通话应用界面中看到的通话主体为用户a的真人形象(这里的用户a是指使用终端设备A进行视频通话的用户),如果终端设备A使用数字人服务,则用户b在终端设备B的通话应用界面中看到的通话主体为用户a的数字人形象。同理,如果终端设备B没有使用数字人服务,则用户a在终端设备A的通话应用界面中看到的通话主体可以为用户b的真人形象,如果终端设备B使用数字人服务,则用户a在终端设备A的通话应用界面中看到的通话主体为用户b的数字人形象。其中,通话应用界面包括但不限于是:终端设备系统自带的电话应用(Application,APP)中的界面,即时聊天软件中的界面,游戏应用中的界面,等等。
进一步的,在终端设备A和终端设备B视频通话的过程中,如果终端设备A使用数字人服务,用户a的数字人的行为可以随着用户a的行为变化而变化,例如用户a举起左手,则用户a的数字人也举起左手,用户a眨眼,则用户a的数字人也眨眼,等等。同理,在终端设备A和终端设备B视频通话的过程中,如果终端设备B使用数字人服务,用户b的数字人的行为可以随着用户b的行为变化而变化,例如用户b举起左手,则用户b的数字人也举起左手,用户b眨眼,则用户b的数字人也眨眼,等等。
可以理解,本申请实施例中数字人服务包括但不限于是在至少一个对端终端设备的通话应用界面中呈现用户的数字人形象,另外还可以包括:在至少一个对端终端设备的通话应用界面中呈现用户的虚拟场景画面,在至少一个对端终端设备的通话应用界面中呈现用于修饰用户的数字人的道具,在至少一个对端终端设备播放用户的虚拟声音,等等。
3)本文涉及的“数字人模型”,是指数字人的静态数据,主要包括数字人的表面信息,可以通过拍摄、结构光扫描等技术手段收集到。数字人服务基于真人的驱动参数(如表情、动作等)和数字人模型可以生成数字人动态数据(这个过程称为“重建”,具体可参考下文描述),数字人动态数据经过渲染后可通过显示设备进行显示,即显示设备可显示出数字人。
4)本文涉及的“数字人动态数据”,是指用以显示虚拟人物的动态数据,该数据可以以文件的方式被存储在设备存储空间中,该文件描述了虚拟人物的一到多帧的二维或三维的图像,因此“数字人动态数据”也可称为“数字人图像序列”。当该文件在电子设备中被渲染后再输入到显示设备中,显示设备可以通过显示屏呈现所述一到多帧的二维或三维的图像。该文件可以是二进制形式,也可以是文本形式,本申请不作限定。
5)本文涉及的“用户”,可以理解为终端设备的用户。用户可以操作终端设备,使用终端设备提供的各项功能,例如拨通通话的功能、接听通话的功能等。
6)本文涉及的“终端设备”,也可以称为终端、用户设备(user equipment,UE)、移动台、移动终端等。该终端设备可以是支持通话功能的任何设备,例如是手机、电脑、虚拟现实(Virtual Reality,VR)设备,增强现实(Augmented Reality,简称AR)设备、可穿戴设备、车辆、无人机、直升机、飞机、轮船、机器人、机械臂、智能家居设备等。本申请的实施例对终端设备所采用的具体通话网络和终端设备的具体设备形态不做限定。该终端设备可以基于电话线路实现通话,或者基于IP线路实现通话,或者基于其它技术实现通话,本申请的实施例对终端设备所采用的具体通话技术不做限定。
7)本文涉及的用户的场景包括用户的真实场景和用户的虚拟场景,其中真实场景是指用户所处的真实环境,而用户的虚拟场景并非用户所处的真实环境,例如可以是通过建模软件生成的二维或三维的环境模型,或者是从网络上下载的环境数据等。
本申请实施例可以应用于各类实时交互的场景中,例如通话、在线游戏、直播等通话场景等。为了便于理解和说明,本文主要以通话场景为例。
参见图1A,为本申请实施例提供的一种通话的场景示意图,图1A所示的通话系统中包括终端设备A,终端设备B,终端设备A和终端设备B可以经过通信服务器建立通话连接,并进行通话。
可以理解,终端设备A和终端设备B中主动发起通话的终端设备可以定义为主叫终端设备(可简称为“主叫终端”或“主叫”),与主叫终端设备相对的终端设备(即接收通话的设备)可以定义为被叫终端设备(可简称为“被叫终端”或“被叫”)。例如,以终端设 备A主动发起通话为例,终端设备A呼叫终端设备B的电话号码,则终端设备A为主叫终端设备,终端设备B为被叫终端设备。当然,具体实施时终端设备B也可以主动发起呼叫,即终端设备B为主叫终端设备,终端设备A为被叫终端设备。
图1A是以两个终端设备进行通话的场景为例,本申请实施例不限于两个终端设备进行通话的场景,也可以应用于两个以上的终端设备进行通话的场景,例如视频会议场景。
参见图1B,为本申请实施例提供的另一种通话的场景示意图,图1B所示的通话系统中包括终端设备C、终端设备D和终端设备E,终端设备C、终端设备D和终端设备E可以经过通信服务器建立通话连接,并进行通话。
类似的,终端设备C、终端设备D和终端设备E中主动发起呼叫的终端设备定义为主叫终端设备(例如终端设备C),与主叫相对,除主叫以外的其它终端设备可以定义为被叫终端设备(如终端设备D、终端设备E等)。
可以理解,对于任一终端设备,只要是与其通话的另一终端设备,均可视为该终端设备的“对端终端设备”。例如,在图1A中,终端设备B是终端设备A的对端终端设备,终端设备A是终端设备B的对端终端设备。在图1B中,终端设备B、终端设备C均是终端设备A的对端终端设备,终端设备A、终端设备C均是终端设备B的对端终端设备,终端设备B、终端设备A均是终端设备C的对端终端设备。
当然,实际应用中参与通话的终端设备的数量还可以更多,此处不再一一列举。
本申请实施例中的通信服务器可以是为终端设备提供的通话服务的通信网络中的一个或多个设备。可以理解,图1A、图1B仅示出了一个通信服务器,实际不限于此。
一种可能的实现方式中,通信服务器可以是网际协议(internet protocol,IP)多媒体子系统(IP multimedia subsystem,IMS)网络中的设备。
其中,IMS网络是IP网络中一种用于提供多媒体业务的网络系统,通过IMS网络可以为终端设备提供多种多媒体业务,如语音通话、视频通话等。IMS网络中可以包括一个或多个网元,例如可以包括呼叫控制功能(serving-call session control function,CSCF)网元。
CSCF网元是IMS网络内部的功能实体,是整个IMS网络的核心,主要负责处理多媒体呼叫会话过程中的信令控制。CSCF网元根据功能进一步可分为服务-呼叫控制功能(serving-call session control function,S-CSCF网元)、查询-呼叫会话控制功能(interrogating-call session control function,I-CSCF网元)、代理-呼叫会话控制功能(proxy-call session control function,P-CSCF网元)等。其中,P-CSCF网元为IMS网络的边缘网络节点,P-CSCF网元在IMS网络中的作用类似于执行代理服务,无论是来自终端设备的信息或者发送给终端设备的信息,均需通过P-CSCF网元转发;S-CSCF网元是IMS网络的业务处理节点,负责终端设备的IMS网络注册以及相关的主叫、被叫业务处理;I-CSCF网元可以连接S-CSCF网元和P-CSCF网元,用于为终端设备提供到归属网络的入口,当终端设备漫游到其他网络时,向P-CSCF网元发送消息,P-CSCF网元可以将来自终端设备的消息转发给I-CSCF网元,通过I-CSCF网元将来自终端设备的消息发送给S-CSCF网元。需要说明的是,P-CSCF网元、S-CSCF网元、I-CSCF网元可以独立配置于不同实体,也可以集成于同一实体。以下,为了便于理解和说明,P-CSCF网元、S-CSCF网元、I-CSCF网元统称为CSCF网元。
本申请实施例还提供一种媒体服务(Media Server,MS)网元,用于为终端设备提供数字人相关的服务,如数字人服务等。其中,MS网元的具体实现可以是应用服务 (Application Server,AS)网元的一种,也可以是新定义的一种网元,本申请不做限制。
MS网元可以部署在IMS网络中,如图1C所示,MS网元是一个单独的网元,并与CSCF网元通信连接,或者如图1D所示,MS网元也可以和CSCF网元集成在一起。MS网元也可以部署在IMS网络之外,如图1E所示。当通信服务器需要实现的与数字人服务相关的功能有多个时,可以同时部署多个MS网元来分别负责不同的功能,当然也部署一个MS网元来负责多个功能。本申请实施例对MS网元的具体部署方式不做限制。
可以理解,图1C~图1E是以终端设备A、终端设备B归属于同一个IMS网络为例,在具体应用中,终端设备A、终端设备B也可以分别归属于不同的IMS网络。如图1F所示,终端设备A归属于IMS-1网络、终端设备B归属于IMS-2网络,IMS-1网络、IMS-2网络均部署有CSCF网元和MS网元,IMS-1网络、IMS-2网络相互配合为终端设备A、终端设备B提供通话服务。可选的,IMS-1网络和IMS-2网络分别对应不同的运营商(例如,由不同的运营商部署或维护)。
另一种可能的实现方式中,通信服务器可以是非IMS网络中的设备。例如,参见图1G所示,通信网络还可以基于私有云或公有云或数据中心搭建,通信服务器可以是该私有云或公有云或数据中心中的MS网元,MS网元的具体实现例如是即时通信服务器等。本申请对通信网络不做具体限制。
为了便于理解和说明,本文主要以通信网络是IMS网络为例。
本申请实施例中,MS网元可以在CSCF网元的配合下,为终端设备提供数字人服务。
不同的用户,或者用户在不同场景中,对数字人服务的需求不同。例如,在家里进行通话的用户,对数字人服务的需求较大;在办公室进行通话的用户,对数字人服务的需求小。例如,用户和熟人通话时,对数字人服务的需求小,用户和陌生人通话时,对数字人服务的需求大。
可以理解,本申请实施例中的“通话”,可以是视频通话,也可以是其它形式的通话,例如,语音通话,即时聊天,等等,本申请不做限制。例如,在视频通话场景中,通话双方可以在通话应用中看到对方的视频,如果对方使用了数字人服务,则可以在对方的视频中看到对方的数字人形象。例如,在语音聊天通话中,通话双方可以在通话应用中听到对方的语音,如果对方使用了数字人服务,则还可以看到对方的数字人图像,该数字人图像的呈现方式可以是视频,或动态图像,或静态图像,等等,本申请不做限制。
为了便于描述,在接下来的实施例中,主要以两个终端设备进行视频通话为例。
以下介绍将数字人引入到大规模的实时通信场景时,如何按照用户的需求向对端提供本端的数字人内容,以及如何按照用户的需求在本端呈现对端提供的数字人内容。
参见图2,为本申请实施例提供的一种提供通话附加服务的方法的流程图,以方法应用于以图1C所示的场景为例,包括:
S21、终端设备A获取第一通话附加服务需求信息。
其中,第一通话附加服务需求信息包括终端设备A的数字人服务需求信息,例如,第一通话附加服务需求信息可以表示:终端设备A对应的用户a要求在至少一个对端终端设备的通话应用界面中呈现用户a的数字人形象。
为了便于描述,本申请实施例以一个对端终端设备“终端设备B”为例,其它对端终端设备的实现方法可以参考终端设备B的实现方法。
一种可能的实现方式中,终端设备A获取通话加服务需求信息包括:接收用户a输入 的操作,根据操作生成第一通话附加服务需求信息。该操作指示了:用户a要求在终端设备B的通话应用界面中呈现用户a的数字人形象。可以理解,本申请对用户输入的操作不做具体限制,例如,该操作可以是在终端设备A的显示界面上点击与数字人服务对应的控件,或者该操作可以是在终端设备A的显示界面上勾选与数字人服务对应的菜单,等等。
另一种可能的实现方式中,终端设备A获取通话加服务需求信息包括:获取系统设置信息,根据系统设置信息确定通话加服务需求信息,该系统设置信息指示了:用户a要求在至少终端设备B的通话应用界面中呈现用户a的数字人形象。
S22、终端设备A根据第一通话附加服务需求信息使至少一个对端终端设备(如终端设备B)基于用户a的数字人内容在通话应用界面呈现用户a的数字人形象。
其中,数字人内容,可以是可视化的数字人图像,或者是可视化的数字人图像经过编码后的结果,本申请不做限制。终端设备B基于数字人内容,可以在终端设备B的显示屏幕上以图像的形式呈现用户的数字人形象。
其中,通信服务器是指为终端设备A提供数字人服务的通信服务器,例如可以是图1C所示的MS网元。
在本申请实施例中,生成数字人内容的操作可以由终端设备A执行,也可以由通信服务器执行,还可以由终端设备B执行,本申请不做限制。
一种可能的设计中,终端设备根据第一通话附加服务需求信息使通信服务器向终端设备B发送用户的数字人内容,以使终端设备B基于用户的数字人内容在通话应用界面呈现用户的数字人形象。
例如,终端设备根据第一通话附加服务需求信息生成用户的数字人内容;终端设备向通信服务器发送用户的数字人内容,以使通信服务器向终端设备B发送用户的数字人内容;终端设备B收到数字人内容后,基于数字人内容在通话应用界面中呈现用户a的数字人形象;或者,
例如,终端设备向通信服务器发送第一通话附加服务需求信息,以使通信服务器生成用户的数字人内容以及向终端设备B发送用户的数字人内容;终端设备B收到数字人内容后,基于数字人内容在通话应用界面中呈现用户a的数字人形象。
另一种可能的设计中,终端设备根据第一通话附加服务需求信息使终端设备B生成用户的数字人内容,以使终端设备B基于用户的数字人内容在通话应用界面呈现用户的数字人形象。
例如,终端设备通过通信服务器向终端设备B发送第一通话附加服务需求信息(通信服务器先接收终端设备发送的第一通话附加服务需求信息,然后再将附加服务需求信息发送给终端设备B),终端设备B根据第一通话附加服务需求信息生成用户的数字人内容,进而基于用户的数字人内容在通话应用界面呈现用户的数字人形象。
基于以上方案,可以实现按照用户的需求在对端终端设备的通话应用界面中呈现用户的数字人形象,可以提高用户体验。
作为一种可选的实施方式,第一通话附加服务需求信息还可以用于表示终端设备A对应的用户a要求在终端设备B的通话应用界面中呈现用户a的虚拟场景画面。
可以理解,与用户a的虚拟场景画面相对的是用户a的真实场景画面。其中,真实场景画面是指用户a真实所处环境对应的图像,例如,用户处于室外环境,则用户a的真实场景画面为终端设备真实可采集到室外图像。相应的,用户a的虚拟场景画面则不是用户 a真实所处环境对应的图像,例如是用户a从网络上下载的会议室图像。
在本申请实施例中,除非有特别说明,“图像”与“画面”可以互换。
相应的,终端设备A还可以根据第一通话附加服务需求信息使通信服务器向终端设备B发送用户的虚拟场景内容,以使终端设备B基于用户a的虚拟场景内容在通话应用界面呈现用户a的虚拟场景画面;或者,终端设备A还可以根据第一通话附加服务需求信息使终端设备B生成用户a的虚拟场景内容,以使终端设备B基于用户a的虚拟场景内容在通话应用界面呈现用户a的虚拟场景画面。
其中,虚拟场景内容,可以是可视化的虚拟场景图像,或者是可视化的虚拟场景图像经过编码后的结果,本申请不做限制。终端设备B基于虚拟场景内容,可以在终端设备B的显示屏幕上以图像的形式呈现用户a的虚拟场景画面。
例如,终端设备A根据第一通话附加服务需求信息生成用户a的虚拟场景内容;终端设备A向通信服务器发送用户a的虚拟场景内容,以使通信服务器向终端设备B发送用户a的虚拟场景内容;或者,
例如,终端设备A向通信服务器发送第一通话附加服务需求信息,以使通信服务器生成用户a的虚拟场景内容以及向终端设备B发送用户a的虚拟场景内容;或者,
例如,终端设备A通过通信服务器向终端设备B发送第一通话附加服务需求信息,以使终端设备根据第一通话附加服务需求信息生成用户a的虚拟场景内容。
需要说明的是,通话过程中,终端设备B显示的用户a的通话画面中可以同时包括用户a的数字人形象和虚拟场景画面,两者共同组成用户a的通话画面。因此,若用户a要求在终端设备B的通话应用界面中呈现用户a的数字人形象和虚拟场景画面,则对端终端设备通话应用界面呈现出的画面为用户a数字人活动在用户a的虚拟场景中。相应的,用户a的数字人内容和虚拟场景内容共同组成用户a的通话画面内容。
基于以上方案,可以实现按照用户的需求在对端终端设备的通话应用界面中呈现用户的虚拟场景画面,可以提高用户体验。
作为一种可选的实施方式,在S22之前,通信服务器需要为终端设备A与终端设备B的通话连接关系。示例性的,通信服务器中的CSCF网元触发MS网元建立终端设备A与终端设备B的通话连接关系。MS网元建立终端设备A与终端设备B的通话连接关系包括:MS网元根据终端设备A的地址信息、端口信息等建立与MS网元建立终端设备A之间的数据传输通道、MS网元终端设备B的地址信息、端口信息等与终端设备B之间的数据传输通道。其中数据传输通道可参见后文相关介绍。
作为一种可选的实施方式,通话是双向的,因此终端设备B也可以执行上述终端设备A执行的方法。
示例性的,终端设备B获取第二通话附加服务需求信息,其中第二通话附加服务需求信息用于表示终端设备B对应的用户b要求在终端设备A的通话应用界面中呈现用户b的数字人形象。终端设备B根据第二通话附加服务需求信息使通信服务器向终端设备A发送用户b的数字人内容,以使终端设备A基于用户b的数字人内容在通话应用界面呈现用户b的数字人形象;或者,终端设备B根据第二通话附加服务需求信息使终端设备A生成用户b的数字人内容,以使终端设备A基于用户b的数字人内容在通话应用界面呈现用户b的数字人形象。具体实现方法可以参考终端设备A的实现方法,此处不再赘述。
基于以上方案,可以按照通话双方的要求为通话双方提供数字人服务,可以进一步提 高用户体验。
作为一种可选的实施方式,终端设备A还可以获取第三通话附加服务需求信息,其中第三通话附加服务需求信息用于指示用户a要求将用户b对应的通话主体(可以是用户b的真人形象,也可以是用户b的数字人形象)呈现在用户a或用户b对应的场景画面中。
例如,终端设备A从通信服务器接收用户b的数字人内容后,根据用户b的数字人内容和用户a的虚拟场景内容,合成用户b的通话画面内容,然后显示用户b的通话画面内容。用户a可以在终端设备A的通话应用界面中看到用户b对应的数字人形象活动在用户a的虚拟场景中。
例如,通信服务器生成用户b的数字人内容,根据用户b的数字人内容和用户a的真实场景内容,合成用户b的通话画面内容,将合成的用户b的通话画面内容发送给终端设备A。终端设备A从通信服务器接收到合成后的用户b的通话画面内容后,将其显示到终端设备A的通话应用界面中。用户a可以在终端设备A的通话应用界面中看到用户b对应的数字人形象活动在用户b的真实场景中。
当然,以上两种仅为举例,实际还可以有其它实现方式。
类似的,通话是双向的,因此上述方法同样适用于终端设备B,例如,终端设备B还可以获取第四通话附加服务需求信息,其中第四通话附加服务需求信息用于指示用户b要求将用户a对应的通话主体(可以是用户b的真人形象,也可以是用户b的数字人形象)呈现在用户a或用户b对应的场景画面中,具体方法参考终端设备A的实现方法,此处不再赘述。
可选的,当第一通话附加服务需求信息和第四通话附加服务需求信息矛盾时,可以根据第四通话附加服务需求信息在终端设备B的通话应用界面中呈现用户a的通话画面内容。例如,用户a要求在终端设备B的通话应用界面中呈现用户a的虚拟场景画面,用户b要求将用户a对应的数字人形象呈现在用户b对应的虚拟场景画面中,终端设备B呈现用户b对应的虚拟场景画面。
示例性的,参见表1,为用户a、用户b提供给对端的场景以及本端观看的场景的几种示例,其中用户a、用户b均是以使用自身的数字人作为通话主体与对端通话为例。表1中“用户a提供的场景”,表示用户b要观看用户a的场景时,第一通话附加服务需求信息指示的用户a要求在终端设备B的通话应用界面中呈现的用户a的虚拟场景或真实场景,“用户b提供的场景”表示用户a要观看用户b的场景时,第二通话附加服务需求信息指示的用户b要求在终端设备A的通话应用界面中呈现的用户b的虚拟场景或真实场景,“用户a观看的场景”表示第三通话附加服务需求信息指示的用户a要求在终端设备A的通话应用界面中呈现的用户a的场景或用户b的场景,“用户b观看的场景”表示第四通话附加服务需求信息指示的用户b要求在终端设备B的通话应用界面中呈现的用户a的场景或用户b的场景。
表1
基于以上方案,终端设备可以按照用户的需求呈现对端终端设备的通话画面内容,可以进一步提高用户体验。
作为一种可选的实施方式,第一通话附加服务需求信息、第二通话附加服务需求信息、第三通话附加服务需求信息等可以携带于会话描述协议(Session Description Protocol,SDP)消息或者会话初始协议(Session Initiation Protocol,SIP)消息中。例如,第一通话附加服务需求信息可以承载在SIP消息头域中。CSCF网元收到携带有第一通话附加服务需求信息的SIP消息后,可以直接将SIP消息转发给MS网元(即透传方式),也可以解析SIP消息携带的数据内容,对数据内容重新封装(如添加终端设备A的地址信息、端口信息等)后,再转发给MS网元。基于该实施方式,可以节省资源开销。
以下介绍在大规模的实时通信场景中,用户具有使用数字人服务需求的情况下,通信服务器以及各终端设备为用户提供数字人服务的具体实现过程。
参见图3A,数字人的技术栈主要涉及建模、捕捉、重建和渲染等。相应的,本申请实施例中用于实现数字人服务的操作包括建模、捕捉、重建和渲染等中的一项或多项。
1)建模操作:即制作数字人模型,该数字人模型的形象与真人的形象对应,具有与真人相似的外观。在具体实现时,数字人模型可以是一种二进制文件,描述了真人的二维或三维的图像。目前,建模主要是线下制作,通过相机拍摄、结构光扫描等技术手段收集到建模对象(即真人)的表面信息形成数字人模型。
本申请实施例将数字人引入到大规模的实时通信场景时,主要涉及到以下几个部分的分工:
2)捕捉操作:是指记录并处理人或其他物体动作的技术。它广泛应用于娱乐、体育、医疗应用、计算机视觉以及机器人技术等诸多领域。在数字人开发领域,它通常是记录人类的动作、表情等,并将其转换为可以驱动数字模型的动作,从而生成二维或三维的计算机动画。当它捕捉面部或手指的细微动作时,它通常被称为性能捕获(performance capture)。在许多领域,动作捕捉有时也被称为运动跟踪(motion tracking)。
在本申请实施例提供的数字人服务场景中,捕捉操作主要用于获取用户的驱动参数,驱动参数包括但不限于是用户唇形、表情、动作、深度信息等中的至少一种。
本申请实施例中,捕捉操作的具体实现方式可以有多种,包括但不限于以下几种:
方式1、从视频流中提取驱动参数;
例如,从终端设备采集的用户的视频中记录用户的表情信息、动作信息等。
方式2、从音频中提取驱动参数;
例如,从终端设备采集的用户的音频中分析真人的喜怒哀乐,生成表情信息等。
方式3、使用传感器采集驱动参数;
例如,终端设备中的陀螺仪采集真人的动作信息等;
例如,终端设备上的深度相机采集真人的空间位置信息等。
可以看出,在上述方式2和方式3中,终端设备可以不采集用户的视频。
可以理解,上述四种提取驱动参数的方法仅为示例而非具体限定,实际还可以有其它方式。
3)重建操作:将静态的数字人模型转化为动态的数字人的过程。具体实现方式可以是根据驱动参数和数字人模型,生成数字人动态数据(即用以显示虚拟人物的动态数据),例如生成数字人图像序列,该数字人图像序列包括一组有时序的图像帧(图像帧可以以文件的方式被存储在设备存储空间中,该文件描述了虚拟人物的多帧的二维或三维的图像),当该数字人图像序列加载到显示设备中时,显示设备可以通过显示屏呈现所述多帧的二维或三维的图像,使得数字人表现出与真人相似的动作、表情等行为。
可以理解,重建操作之后获得图像帧中可以仅包括数字人的图像,即可以没有背景图像。
4)渲染操作:是指以软件由模型生成图像的过程。模型是在计算机中用语言或者数据结构进行严格定义的二维/三维物体或虚拟场景的描述,它包括几何、视点、纹理、照明和阴影等信息。
在本申请实施例中,渲染操作可以包括将数字人的图像与场景图像进行合成(或者说合并、叠加)的过程,即让每帧数字人图像都与场景图像进行融合的过程,融合得到的每个图像帧包括数字人图像和场景图像。可以理解,融合后的不同图像帧中的数字人可以不同,场景也可以不同。当融合后的多个图像帧按照时序依次被加载到显示设备中时,显示设备呈现的画面为数字人在场景中活动,呈现与真人类似的外形和行为。在本申请实施例中,在通话场景中为用户提供数字人服务时,渲染操作后得到的图像即为用户呈现给对端的通话画面内容中的图像。
其中,场景图像可以是真实场景对应的图像,例如真人(用户)所在真实环境中的背景图像,也可以是虚拟场景图像,例如预先编排的背景图像或者从网络上下载的背景图像等,本申请不做限制。
在本申请实施例中,还可以结合观察者的角度来执行渲染操作。其中观察者是指最终观看通话画面内容的用户,例如用户a看用户b的通话画面内容时,用户a为观察者,在对用户b的数字人进行渲染操作时,可以根据用户a的视角信息合成用户b的通话画面内容,使得用户b的通话画面内容能够更好地满足用户a的观看需求。类似的,用户b看用户a的通话画面内容时,用户b为观察者,在对用户a的数字人进行渲染操作时,可以根据用户b的视角信息合成用户a的通话画面内容,使得用户a的通话画面内容能够更好地满足用户b的观看需求。
进一步的,渲染操作还可以包括将数字资产与数字人图像、场景图像进行融合。也就是说,渲染操作后得到的图像中,还可以包括用户的数字资产等。其中,数字资产包括但不限于是虚拟衣服、虚拟场景或者其它用于修饰用户的数字人的道具。
在本申请实施例中,上述捕捉操作、重建操作、渲染操作中的各项操作,可以由通信服务器(如MS网元)执行,也可以由终端设备执行,本申请对此不做限制。
以为终端设备A的用户a提供数字人服务为例:
示例1、当捕捉操作在用户a的终端设备A执行时,终端设备A从终端设备A采集的视频、音频等中获取用户a的驱动参数。
当捕捉在MS网元执行时,MS网元从终端设备A采集的视频、音频等中获取用户a的驱动参数。
示例2、当重建操作在用户a的终端设备A执行时,终端设备A根据用户a的驱动参数,将用户a用户的表情和/或动作映射到用户a的数字人物模型,得到用户a的数字人图像序列;其中,用户a的驱动参数可以是终端设备A通过执行捕捉操作得到,也可以是MS网元提供给终端设备A;终端设备A可以从本地读取数字人模型(例如数字人模型保存在终端设备A本地),终端设备A也可以从MS网元下载数字人模型(例如数字人模型保存在MS网元)。
当重建在MS网元执行时,MS网元根据用户a的驱动参数将用户a用户的表情和/或动作映射到用户a用户的数字人物模型,得到用户a的数字人图像序列;其中,用户a的驱动参数可以是MS网元通过执行捕捉操作得到,也可以是终端设备A提供给MS网元;MS网元可以从终端设备A获取数字人模型(例如数字人模型保存在终端设备A本地),也可以从MS网元从本地读取数字人模型(例如数字人模型保存在MS网元)。
示例3、当渲染操作在用户a的终端设备A执行时,终端设备A将用户a的数字人图像叠加到用户a的场景图像,得到用户a的通话画面内容;其中,用户a的数字人图像可以是终端设备A通过执行重建操作得到,也可以是MS网元提供给终端设备A。
当渲染操作在MS网元执行时,MS将用户a的数字人图像叠加到用户a的场景图像,得到用户a的通话画面内容;其中,用户a的数字人图像可以是MS网元通过执行重建操作得到,也可以是终端设备A提供给MS网元。
当渲染操作在终端设备B执行时,终端设备B将用户a的数字人图像叠加到用户a的场景图像,得到用户a的通话画面内容;其中,用户a的数字人图像可以是MS网元提供给终端设备B,也可以是终端设备A提供给终端设备B。
可以理解,终端设备和至少一个对端终端设备通话时,如果该终端设备对应的用户要求使用数字人服务,则需要通信服务器和终端设备之间的分工情况,例如明确上述每个操作具有由哪一个设备(如终端设备、或通信服务器、或对端终端设备)来执行。
参见图3B,为本申请实施例提供的另一种提供通话附加服务的方法的流程图,以该方法应用于图1C所示的场景为例,且以为终端设备A提供数字人服务提为例,方法包括:
S31、终端设备A发送第一通话附加服务能力信息,相应的,通信服务器接收第一通话附加服务能力信息。
其中,通信服务器具体例如为MS网元,设备A的通话附加服务能力信息可以先到达CSCF网元,然后经由CSCF网元转发到MS网元,如图3C所示。
第一通话附加服务能力信息包括终端设备A的数字人服务能力信息。例如,第一通话附加服务能力信息用于指示终端设备A具备的数字人服务能力。
示例性的,第一通话附加服务能力信息可以包括以下一项或多项:
1)用于指示终端设备A是/否保存有终端设备A对应的用户的数字人模型的信息;
2)用于指示终端设备A是/否保存有终端设备A对应的用户的虚拟场景的信息;
3)用于指示终端设备A是/否能够执行重建操作的信息;
4)用于指示终端设备A能够提供的驱动参数的信息(如音频,视频,表情,肢体动作等);
5)用于指示终端设备A是/否能够执行渲染操作的信息;
6)用于指示终端设备A是/否需要通信服务器回传终端设备A的通话画面内容的信息;
7)用于指示终端设备A是/否提供视角信息的信息。
可以理解,以上几种仅为示例而非具体限定。
S32、通信服务器向终端设备A发送指示信息,相应的,终端设备A接收通信服务器发送的指示信息。
其中,指示信息可以由MS网元发送给CSCF网元,然后经由CSCF网元转发到终端设备A,如图3C所示。
一种可能的实现方式中,终端设备A需要执行的操作可以由MS网元来决策。例如,MS网元根据第一通话附加服务能力信息、第二通话附加服务能力信息、第三通话附加服务能力信息中的至少一项确定终端设备执行的数字人处理操作;然后MS网元经由CSCF网元向终端设备A发送指示信息,指示信息用于指示终端设备A执行的数字人处理操作,终端设备A执行的数字人处理操作例如为捕捉操作、重建操作、渲染操作中的至少一个操作,或者为“无”或“0”(即不执行数字人处理操作)。
第二通话附加服务能力信息包括通信服务器的数字人服务能力信息。例如,第二通话附加服务能力信息用于指示通信服务器具备的数字人服务能力。
示例性的,通信服务器(MS网元)的通话附加服务能力信息包括以下一项或多项:
1)用于指示通信服务器是/否保存有终端设备A对应的用户的数字人模型的信息;
2)用于指示通信服务器是/否保存有终端设备A对应的用户的虚拟场景的信息;
3)用于指示通信服务器是/否能够执行重建操作的信息;
4)用于指示通信服务器能够接收的驱动参数的信息;
5)用于指示通信服务器能够提取的驱动参数的信息;
6)用于指示通信服务器是/否能够执行渲染操作的信息;
7)用于指示通信服务器是/否能够向终端设备A回传终端设备A的通话画面内容的信息。
可以理解,以上几种仅为示例而非具体限定。
第三通话附加服务能力信息包括至少一个对端终端设备的数字人服务能力信息。例如,第三通话附加服务能力信息用于指示至少一个对端终端设备具备的数字人服务能力。
示例性的,以只有一个对端终端设备,如终端设备B,则第三通话附加服务能力信息用于指示终端设备B具备的数字人服务能力。第三通话附加服务能力信息包括以下一项或多项:
1)用于指示终端设备B是/否能够执行渲染操作的信息;
2)用于指示终端设备B是/否提供视角信息的信息。
可以理解,以上几种仅为示例而非具体限定。
可选的,终端设备B还发送第三通话附加服务能力信息,通信服务器接收第三通话附加服务能力信息。
另一种可能的实现方式中,终端设备A执行的操作可以由终端设备B来决策。例如终端设备B根据第一通话附加服务能力信息、第二通话附加服务能力信息、第三通话附加服 务能力信息终端设备A执行的数字人处理操作,然后终端设备B经由CSCF网元向MS网元发送指示信息,然后MS网元经由CSCF网元将该指示信息发送给终端设备A,该指示信息用于指示终端设备A执行的数字人处理操作,终端设备A执行的数字人处理操作例如为捕捉操作、重建操作、渲染操作中的至少一个操作,或者为“无”或“0”(即不执行数字人处理操作)。
另一种可能的实现方式中,终端设备A需要执行的操作可以由终端设备A自身决策。相应的,指示信息可以用于指示:第二通话附加服务能力信息、第三通话附加服务能力信息。
S33、终端设备A根据指示信息执行至少一个数字人处理操作或不执行数字人处理操作。
相应的,如果指示信息用于指示终端设备A执行捕捉操作、重建操作、渲染操作中的至少一个操作,则终端设备A直接执行指示信息指示的至少一个操作。
如果终端设备A执行的数字人处理操作为“无”或“0”(即不执行数字人处理操作),则终端设备A不执行数字人处理操作。
如果指示信息用于指示第二通话附加服务能力信息和第三通话附加服务能力信息;则终端设备A根据第一通话附加服务能力信息、第二通话附加服务能力信息、第三通话附加服务能力信息中的至少一项确定终端设备A执行的数字人处理操作。
基于以上方法,可以明确在为用户提供数字人服务的过程中,用户对应的终端设备需要执行的操作。
可以理解,终端设备A与终端设备B通话的过程中(即已建立通话连接),终端设备A执行该至少一个操作。上述S31~S32则可以发生在终端设备A和终端设备B开始通话之前,例如可以在呼叫阶段执行上述S31~S32。
作为一种可选的实施方式,通信服务器可以根据第一通话附加服务能力信息、第二通话附加服务能力信息、第三通话附加服务能力信息中的至少一项,执行捕捉操作、重建操作、渲染操作中的至少一个操作。可以理解,终端设备A和通信服务器B执行的数字人处理操作不同。
例如,终端设备执行捕捉操作,通信服务器执行重建操作、渲染操作;或者,终端设备执行捕捉操作、重建操作,通信服务器执行渲染操作;或者,终端设备执行捕捉操作,通信服务器执行重建操作、渲染操作,等等。
可以理解,如果终端设备B对应的用户b要求用户a的数字人出现在用户b的(真实或虚拟)场景中,则渲染操作还可以由终端设备B来执行。
可以理解,捕捉操作、重建操作、渲染操作也可以全部由终端设备A或通信服务器执行。
可以理解,以上S31~S33所示的实施例是以为终端设备A提供数字人服务提为例,在实际应用中,通话是双向的,在一次通话过程中,可以仅为用户a提供数字人服务,也可以仅为用户b提供数字人服务,还可以同时为用户a和用户b提供数字人服务。因此,同样的方法也可以应用于为终端设备B提供数字人服务。作为一种可选的实施方式,上述第一通话附加服务能力信息、第二通话附加服务能力信息、第三通话附加服务能力信息、指示信息等,可以携带于SDP消息或者SIP消息中。例如,可以承载在SIP消息头域中。
可以理解,S31~S33所示的实施例可以与S21~S22所示的实施例相互结合实施。例如, 在呼叫阶段,终端设备A可以在同一个消息中携带终端设备A的能力(如第一通话附加服务能力信息)、用户a关于数字人服务的需求(如第一通话附加服务需求信息、第三通话附加服务需求信息等),进而将终端设备A的能力和用户a的需求通知通信服务器和终端设备B。例如,在呼叫接通后,终端设备A和终端设备B通话的过程中,终端设备A和通信服务器相互配合完成捕捉操作、重建操作、渲染操作等操作,进而实现终端设备A使通信服务器向终端设备B,发送用户a的通话画面内容(包括数字人内容和/或虚拟场景内容)。
可以理解,在S21~S22所示实施例中,生成数字人内容的操作,可以包括捕捉操作、重建操作、渲染操作中的全部操作,也可以仅包括其中的部分操作(例如仅包括渲染操作),本申请实施例对此不做限制。进一步的,在S21~S22所示实施例中所述的“数字人内容”可以是静态的数字人数据(如数字人模型),也可以是动态的数字人数据(即重建操作后得到的数据,如数字人图像序列),还可以是包含了数字人形象的通话画面内容(即渲染操作后得到的数据),等等,本申请不做限制。类似的,虚拟场景内容可以仅包括虚拟场景对应的图像,也可以是包含了虚拟场景画面的通话画面内容,等等,本申请不做限制。
示例性的,参见图4,为本申请实施例提供的另一种通信方法的流程图,该方法以终端设备A为主叫终端设备、终端设备B为被叫终端设备为例。该方法包括:
S401、终端设备A向通信服务器(以MS网元为例)发送消息A,其中携带如下几种参数:用户a的需求(Expect)、终端设备A与终端设备B间的协商内容(Caller-Callee)、终端设备A与通信服务器间的协商内容(Caller-CN);
S402、通信服务器向终端设备B发送消息B,其中携带如下几种参数:用户a的需求、终端设备A与终端设备Bb间的协商内容、通信服务器与终端设备B间的协商内容(CN-Callee);
S403、终端设备B结合用户b的需求、终端设备B的通话附加服务能力信息以及消息B中携带的参数,确定终端设备B的协商结果(Callee);
S404、终端设备B向通信服务器发送消息C,其中携带如下几种参数:终端设备B协商结果;
S405、通信服务器确定通信服务器与终端设备A间的协商结果(CN-Caller);
S406、通信服务器向终端设备B发送消息D,其中携带如下几种参数:终端设备B的协商结果、通信服务器与终端设备A间的协商结果;
S407、终端设备B根据终端设备B的协商结果、通信服务器与终端设备A间的协商结果,确认终端设备A的协商结果(Caller);
S408、终端设备A向通信服务器发送消息E,其中携带如下参数:终端设备A的协商结果;
S409、通信服务器向终端设备B发送消息F,其中携带如下参数:终端设备A的协商结果。
上述步骤S401~S409中涉及的相关参数及说明如下表2所示: 表2
可以理解,图4是以终端设备A为主叫终端设备,终端设备B为被叫终端设备,且主叫终端设备(即终端设备A)先发起协商为例,在实际应用中,也可以是被叫终端设备(如终端设备B)先发起协商,或者通信服务器先发起协商,等等,本申请对此不做限制。
在具体实现时,图4中通信服务器执行的功能可以由MS网元完成。例如,参见图5所示,终端设备A发送给通信服务器的消息(如消息A)先到达CSCF网元(P_CSCF网元、I_CSCF网元、S_CSCF网元中的任一种),然后由CSCF网元转发给MS网元;终端设备B发送给通信服务器的消息(如消息C)先到达CSCF网元,然后由CSCF网元转发给MS网元;通信服务器发送给终端设备A的消息(如消息D)从MS网元到CSCF网元, 然后由CSCF网元转发给终端设备A;通信服务器发送给终端设备B的消息(如消息F)从MS网元到CSCF网元,然后由CSCF网元转发给终端设备B。可以理解,CSCF网元在转发消息时,可以是透传的方式将消息转发给MS网元(即不解析消息携带的数据内容,直接转发),也可以采用非透传的方式转发(例如解析消息携带的数据内容,对数据内容重新封装后,再转发给MS网元),本申请不做限制。
可选的,网络的CSCF网元在收到消息A之后,还可以对用户a进行人号一致认证,在确定用户a当前的用户与其号码绑定的用户一致之后,再进行后续的流程。类似的,网络的CSCF网元在收到消息C之后,还可以对用户b进行人号一致认证,在确定用户b当前的用户与其号码绑定的用户一致之后,再进行后续的流程。
可以理解,图5中不同传输阶段中的相同消息标识(例如步骤1的消息A和步骤3的消息A)仅仅用于指示消息中携带的协商参数相同,在实际实施时,携带相同协商参数的不同消息可以有不同的名称,以及还可以携带其它不同的数据内容,本申请不做限制。
可以理解,图5是以终端设备A和终端设备B归属于同一IMS网络为例,在具体应用中,终端设备A和终端设备B还可以由不同的IMS网络提供服务,在这种情况下,则需要更多的网元参与通信,例如终端设备A对应一套CSCF网元、MS网元,终端设备B对应另一套CSCF网元、MS网元,终端设备A与终端设备B之间的传输参数需要经过两套CSCF网元、MS网元。
作为一种可选的实施方式,在参与通话的终端设备(如终端设备A、终端设备B)以及为终端设备提供通话服务的通信服务器(如MS网元)各自明确自身的分工之后,通信服务器(如MS网元)可以建立通信服务器(如MS网元)与终端设备之间的数据传输通道,用于传输通话过程中终端设备和通信服务器(如MS网元)需要交互的数据。
示例性的,参见图6,本申请实施例涉及的数据传输通道的类型包括但不限于以下几种:
1)digitalman_model_channel:数字人模型通道,用于传输未重建的数字人模型;
2)digitalman_channel:数字人通道,用于传输重建好的数字人动态数据(即数字人图像序列);
3)action_channel:驱动参数通道,用于传输驱动参数;
4)scene_channel:虚拟场景通道,用于传输虚拟场景,即数字人的虚拟背景画面数据;
5)backhaul_channel:回传通道,用于传输对端视角下的本端通话画面内容;
6)viewpoint_channel:视角信息通道,用于传输视角信息;
视角信息用于描述观察者的视角。例如,当用户a看用户b的通话画面内容,用户a的视角信息为用户a的空间位置信息;例如,当用户b看用户a的通话画面内容,用户b的视角信息为用户b的空间位置信息。
7)audio_channel:音频通道,用于传输音频;
8)video_channel:视频通道,用于传输视频,该视频可以是包含真人形象的视频(如终端设备采集的原始视频),也可以是包含数字人形象的视频(如由数字人处理操作得到的视频);
需要说明的是,上述几种通道类型的划分为逻辑上的划分,在物理上,上述各通道可以对应同一物理通道,也可以分别对应不同的物理通道。例如,上述不同通道的五元组信息(源IP、源端口、目的IP、目的端口、传输协议)不同;或者,不同通道可以合并,例 如驱动参数通道与视角信息通道合并,即共用相同的五元组信息。
可选的,通信服务器(MS网元)与终端设备之间需要建立的数据传输通道的类型,与通信服务器和终端设备之间的分工情况相关联。
以终端设备A与终端设备B通话过程中终端设备A需要使用数字人服务为例:
示例1、终端设备A执行重建操作,终端设备A与通信服务器之间建立第一通道(digitalman_model_channel),第一通道用于传输终端设备A对应的用户的数字人模型。
示例2、终端设备A执行重建操作,且通信服务器或终端设备B执行渲染操作,终端设备A与通信服务器之间建立第二通道(digitalman_channel),第二通道用于传输终端设备A对应的用户a的数字人图像序列。
示例3、终端设备A执行捕捉操作且通信服务器执行重建操作,终端设备A与通信服务器之间建立第三通道(action_channel),第三通道用于传输终端设备A对应的用户的驱动参数。
示例4、通信服务器执行渲染操作且场景图像为用户a的虚拟场景图像且虚拟场景图像在终端设备A保存,通信服务器与任一终端设备A之间建立第四通道(scene_channel),第四通道用于传输终端设备B对应的用户b的虚拟场景。可以理解,若虚拟场景在通信服务器保存,且渲染操作在通信服务器执行,则可以不建立该通道。
示例5、终端设备A需要通信服务器回传终端设备A的通话画面内容,通信服务器与终端设备A之间建立第五通道(backhaul_channel),第五通道用于通信服务器向终端设备A回传终端设备A对应的用户的通话画面内容。
示例6、通信服务器执行渲染操作且终端设备B的通话附加服务能力信息包括用于指示终端设备B提供的视角信息,通信服务器与终端设备B之间建立第六通道(viewpoint_channel),第六通道用于传输终端设备B提供的视角信息。
示例7、若捕捉操作在通信服务器执行,且通信服务器从用户a的音频中捕捉用户的驱动参数,则可以在终端设备A和通信服务器之间建立第七通道(audio_channel)。若捕捉操作在通信服务器执行,且通信服务器从用户a的视频中捕捉用户的驱动参数,则可以在终端设备A和通信服务器之间建立第八通道(video_channel)。
可以理解,以上对各通道的建立场景进行了举例(不同的分工情况下,建立的通道类型可以不同),但不限制具体的建立场景。
在一些可能的设计中,可以不区分场景,而是默认建立上述部分通道或全部通道。例如,在任何场景下都可以建立video_channel和audio_channe和action_channel,如此可以减少对现有通话流程的改动。例如,在任何场景下都可以建立以上每个通道(即任何分工情况下,都需要建立以上所有通道),如此可以降低方案的复杂度,提高方案的适用性。可以理解,以上仅为示例而非限定,可根据需要传输的数据类型扩展更多的数据传输通道的类型。
基于以上实施方式,可以根据通信服务器、终端设备的数字人服务能力建立通信服务器与终端设备之间的数据传输通道,满足数字人通信过程中不同数据对传输的不同要求。
以下介绍将数字人引入到大规模的实时通信场景时,如何保障数字人服务的安全性。
数字人服务在带给人们更好的体验的同时,也给通话安全、反诈治理工作带来了新的风险与挑战。为了确保数字人服务在通话场景中的合法使用,在制作(即制作数字人模型)、审核(即审核数字人模型)、管理(即管理数字人模型)、通信(用户使用数字人服务)等 过程中均需进行相应的认证和/或鉴权(简称“认证鉴权”)。
示例性的,本申请实施例涉及的认证鉴权场景包括但不限于下表3所示的几个场景。
表3
可以理解,以上几种认证鉴权场景仅为示例而非限定。
参见图7,为以上几种认证鉴权场景中可能涉及的几种密钥(M1、M2、M3、M4、M5、M6、M7、M8)的示意图。各密钥的用途如下:
M1:数字人审核设备使用私钥M1解密数字人模型制作设备提供的数字人模型、用户信息密文后进行审核。
用户信息密文例如包括但不限于用户的身份信息、校验特征等。
M2:数字人模型制作设备使用公钥M2将制作好的数字人模型、用户信息加密后发给数字人审核设备。
M3、M6:数字人模型制作设备使用私钥M3将制作好的数字人模型、用户信息数字签名,并使用公钥M6加密后发给管理中心。
M4、M5:管理中心使用私钥M5解密数字人模型制作设备提供的数字人模型、用户信息密文,并使用公钥M4验证数字签名。
M8:
用途1:用户使用数字人服务之前,用户到数字人模型制作设备请求制作数字人模型时,采集自己的校验特征(指纹、声纹、人脸等)并使用公钥M8进行加密后发送给数字人模型制作设备;
用途2:用户使用数字人服务之前,用户到管理中心管理(含下载)自己的数字人模型时,采集自己的校验特征(指纹、声纹、人脸等)并使用公钥M8进行加密后发送给管理中心;
用途3:用户使用数字人服务时,终端设备采集用户的校验特征(指纹、声纹、人脸等),使用公钥M8进行加密后发送给网络。
M7:
用途1:用户使用数字人服务之前,数字人模型制作设备在制作数字人模型之前,需要将用户提供的身份信息、校验特征等密文发送到用户身份认证设备(例如户籍管理中心的设备),用户身份认证设备则使用私钥M7对校验特征进行解密后认证;
用途2:用户使用数字人服务之前,数字人管理中心设备在用户使用数字人模型之前,需要将用户提供的身份信息、校验特征等密文发送到用户身份认证设备,用户身份认证设备则使用私钥M7对校验特征进行解密后认证;
用途3:用户使用数字人服务时,网络将用户提供的身份信息、校验特征等密文发送到用户身份认证设备,用户身份认证设备则使用私钥M7对身份信息、校验特征等进行解密后认证。
M9:用户使用数字人服务时,终端设备对用户进行合法性认证,并使用私钥M9对认证结果添加数字签名后,将认证结果发送到网络。
M10:用户使用数字人服务时,网络使用公钥M10验证终端设备发送的认证结果数字签名,验证通过则说明认证结果未被篡改。
可以理解,以上几种密钥仅为示例而非限定。
下结合图8A对数字人在制作、审核、管理过程中的相关认证鉴权进行举例。
参见图8A,本申请实施例提供一种认证鉴权方法,包括:
A1.数字人审核设备创建公私密钥对K1&K2、K3&K4;
其中,数字人审核设备具体可以是一个应用服务(Application Server,AS)网元,可以位于为用户的终端设备提供通话服务的移动网络(如该终端设备归属的IMS网络)中,也可以位于该移动网络之外,本申请不做限制。
K1为私钥,K2为公钥;K3为私钥,K4为公钥。
A2.数字人管理中心设备创建公私密钥对K5&K6;
其中,数字人管理中心设备具体可以是一个AS网元,可以位于为终端设备提供通话服务的移动网络中,也可以位于该移动网络之外,本申请不做限制。为了便于描述,本文以数字人管理中心设备位于该移动网络中为例。
在具体实现时,数字人管理中心设备可以与用于提供数字人服务的网元(如图1C所示的MS网元)集成在一个实体中,也可以分别集成在不同实体中,本申请不做限制。当数字人管理中心设备与MS网元集成在不同实体中时,MS网元可以从数字人管理中心设备获取数字人模型、与数字人模型相关的信息(如数字人模型对应的身份信息、校验特征等)。
K5为私钥,K6为公钥。
A3.数字人管理中心设备向数字人管理中心设备生成设备发放数字证书(内含公钥K6)。
数字人模型制作设备具体可以是一个AS网元,可以位于为终端设备提供通话服务的移动网络中,也可以位于该移动网络之外,本申请不做限制。
A4.数字人审核设备向数字人模型制作设备发放数字证书(内含公钥K2);
A5.数字人审核设备向数字人管理中心设备发放数字证书(内含公钥K4);
A6.用户向数字人模型制作设备申请制作数字人(携带用户的身份信息、校验特征);
[根据细则91更正 10.10.2022]
其中,用户的身份信息例如是用户的姓名、电话号码等;用户的校验特征例如是用户的人脸图像、指纹、虹膜等信息。
其中,用户的身份信息例如是用户的姓名、电话号码等;用户的校验特征例如是用户的人脸图像、指纹、虹膜等信息。
A7.数字人模型制作设备对用户的身份信息进行验证,假设验证通过;
例如,验证身份的真实性、合法性等。
A8.数字人模型制作设备为用户制作对应的数字人模型,并使用公钥K2对用户的数字人模型、身份信息、校验特征进行加密;
A9.数字人模型制作设备将加密后的数字人模型、身份信息、校验特征发送给数字人审核设备;
A10.数字人模型制作设备使用私钥K1解密,获得解密后的数字人模型、身份信息、校验特征;
A11.数字人模型制作设备对用户的数字人模型、身份信息、校验特征进行审核,在审核通过之后,使用私钥K3对数字人模型、身份信息、校验特征添加数字签名;
例如,审核用户的身份的真实性、合法性等;审核用户的数字人模型的合法性等;例如,审核用户的数字人模型是否与用户的校验特征匹配,等等。
A12.数字人审核设备向数字人模型制作设备返回审核响应消息,其中携带数字人审核设备的数字签名;
A13.数字人模型制作设备使用公钥K6对数字人审核设备返回的数字人模型、身份信 息、校验特征、数字人审核设备数字签名进行加密;
A14.数字人模型制作设备将加密后的数字人模型、身份信息、校验特征、数字人审核设备数字签名发送给数字人管理中心设备;
A15.数字人管理中心设备使用私钥K5解密,获得用户的数字人模型、身份信息、校验特征、数字人审核设备数字签名;然后使用公钥K4验证数字人审核设备的数字签名。
[根据细则91更正 10.10.2022]
如果数字人管理中心设备验证数字人审核设备的数字签名通过,则确定用户的数字人模型、身份信息、校验特征是经过数字人审核设备审核通过的,具有合法性,能够被该用户用于发起/接收基于数字人的通话。数字人管理中心设备将该用户的数字人模型、身份信息、校验特征与该用户的开户信息(如电话号码等)进行绑定。
如果数字人管理中心设备验证数字人审核设备的数字签名通过,则确定用户的数字人模型、身份信息、校验特征是经过数字人审核设备审核通过的,具有合法性,能够被该用户用于发起/接收基于数字人的通话。数字人管理中心设备将该用户的数字人模型、身份信息、校验特征与该用户的开户信息(如电话号码等)进行绑定。
[根据细则91更正 10.10.2022]
例如数字人管理中心设备将该用户的数字人模型、身份信息、校验特征存到该用户的开户信息中,或者数字人管理中心设备存储该用户的数字人模型、身份信息、校验特征与该用户的开户信息(如电话号码等)的映射关系,或者数字人管理中心设备将该用户的数字人模型、身份信息、校验特征与该用户的开户信息(如电话号码等)的映射关系存储到网络的其它网元中,本申请不做限制。
例如数字人管理中心设备将该用户的数字人模型、身份信息、校验特征存到该用户的开户信息中,或者数字人管理中心设备存储该用户的数字人模型、身份信息、校验特征与该用户的开户信息(如电话号码等)的映射关系,或者数字人管理中心设备将该用户的数字人模型、身份信息、校验特征与该用户的开户信息(如电话号码等)的映射关系存储到网络的其它网元中,本申请不做限制。
在本申请实施例中,还可以在通话画面内容中叠加用户的数字资产,可以进一步提高用户体验。
通过以上流程,可以保证只有经过数字人审核设备审核的数字人才能进入网络中被用户使用,以及可以防止数字人被篡改,规范化数字人的制作、审核、使用流程。
下结合图8B对数字资产在制作、审核、管理过程中的相关认证鉴权进行举例。
参见图8B,本申请实施例还提供一种认证鉴权方法,包括:
B1、数字资产审核设备创建公私密钥对K7&K8、K9&K10;
其中,数字资产审核设备具体可以是一个AS网元,可以位于为用户提供通话服务的移动网络中,也可以位于该移动网络之外,本申请不做限制。在具体实现时,数字资产审核设备可以与上述数字人审核设备集成在一起,也可以分开,本申请不做限制。K7为私钥,K8为公钥;K9为私钥,K10为公钥。
B2、数字资产管理中心设备创建公私密钥对K11&K12;
其中,数字资产管理中心设备具体可以是一个AS网元,可以位于为终端设备提供通话服务的移动网络中,也可以位于该移动网络之外,本申请不做限制。为了便于描述,本文以数字资产管理中心设备位于该移动网络中为例。
在具体实现时,数字资产管理中心设备可以与用于提供数字人服务的网元(如图1C所示的MS网元)集成在一起,也可以分开为不同网元,本申请不做限制。
K11为私钥,K12为公钥。在具体实现时,数字资产管理中心设备可以与上述数字人管理中心设备集成在一起,也可以分开,本申请不做限制。
B3、数字资产管理中心设备向数字资产管理中心设备生成设备发放数字证书(内含公钥K12)。
数字资产制作设备具体可以是一个AS网元,可以位于为终端设备提供通话服务的移动网络中,也可以位于该移动网络之外,本申请不做限制。在具体实现时,数字资产制作设备可以与上述数字人模型制作设备集成在一起,也可以分开,本申请不做限制。
B4、数字资产审核设备向数字资产制作设备发放数字证书(内含公钥K8);
B5、数字资产审核设备向数字资产管理中心设备发放数字证书(内含公钥K10);
B6、数字资产制作设备为制作数字资产,并使用公钥K8对数字资产加密;
B7、数字资产制作设备将加密后的数字资产发送给数字资产审核设备;
B8、数字资产制作设备使用私钥K7解密,获得解密后的数字资产;
B9、数字资产制作设备对数字资产进行审核,在审核通过之后,使用私钥K9对数字资产添加数字签名;
例如,审核数字资产的合法性,等等。
B10、数字资产审核设备向数字资产制作设备返回审核响应消息,其中携带审核设备的数字签名;
B11、数字资产制作设备使用公钥K12对数字资产审核设备返回的数字资产、数字签名进行加密;
B12、数字资产制作设备将加密后的数字资产、数字签名发送给数字资产管理中心设备;
B13、数字资产管理中心设备使用私钥K11解密,获得数字资产、数字签名;然后使用公钥K10验证数字签名。
如果数字资产管理中心设备验证数字资产审核设备的数字签名通过,则确定数字资产是经过数字资产审核设备审核通过的,具有合法性。
通过以上流程,可以保证只有经过数字资产审核设备审核的数字资产才能进入网络中被用户使用,以及可以防止数字资产被篡改,规范化数字资产的制作、审核、使用流程。
下结合图8C对数字人在通信过程中的相关认证鉴权进行举例。
以为第一终端设备(第一终端设备例如是图1A或图1B中的任一终端设备)提供数字人服务为例,参见图8C,本申请实施例提供一种认证鉴权方法,方法包括:
C1、获取第一终端设备的第一用户的校验特征,获取第一终端设备的第一电话号码绑定的数字人物模型的校验特征;
其中,第一用户为当前使用第一终端设备的用户,第一终端设备可以采集到第一用户的校验特征,第一用户的校验特征例如是第一用户的指纹、声纹、虹膜或人脸等。
其中,触发第一终端设备采集第一用户的校验特征的场景可以是第一终端设备接收到用户输入的预设操作(例如收到用户拨打电话或接听电话的操作),也可以是第一终端设备接收到网络发送的相关指示,本申请不做限定。
C2、判断第一用户的校验特征与数字人物模型的校验特征是否匹配;若第一用户的校验特征与数字人物模型的校验特征匹配,则执行C3;否则,执行C4;
根据上文图8A相关实施例的介绍,通信服务器中存储每个用户的数字人模型的校验特征(指纹、声纹、虹膜或人脸等)。所以可以通过将第一用户的校验特征与数字人物模型的校验特征进行比较,判断校验特征是否匹配来确定第一用户的身份是否与数字人物模型绑定的身份一致,即第一用户是否有使用该数字人物模型的权限。
C3、允许第一终端设备使用数字人服务,例如在通话过程中以第一用户的数字人形象作为第一用户的通讯主体呈现给对端终端设备;
C4、不允许第一终端设备使用数字人服务。
其中,不允许第一终端设备使用数字人服务,可以是:允许第一终端设备使用普通通话服务(即呈现将第一用户的原始通话画面内容提供给对端);也可以是不允许第一终端设备使用任何通话服务(包括数字人通话和普通通话),本申请对此不做限定。
可以理解,图8C所示的方法可以由第一终端设备执行,也可以由通信服务器执行,本申请不做限定。
当图8C所示的方法由通信服务器执行时,具体可以由IMS网络中的CSCF网元来执行。进一步,可以由IMS网络中或IMS网络之外的专门负责数字人服务的网元(例如是图1C所示的MS网元)来触发CSCF网元执行上述认证授权过程。
例如,MS网元检测到第一电话号码的用户的数字人服务被触发,则向CSCF网元发送通知信息,通知CSCF网元验证第一电话号码对应的第一终端设备当前通话的用户(简称当前用户)(即第一用户)的身份,进而CSCF网元从第一终端设备获取第一用户的校验特征,并从数字人管理中心设备获取第一终端设备的电话号码绑定的数字人物模型的校验特征,然后执行验证过程,CSCF网元在验证第一终端设备当前用户的身份通过(即第一用户的校验特征与数字人物模型的校验特征匹配)之后,再通知MS网元继续执行为用户提供数字人服务的相关流程。可选的,第一电话号码的用户的数字人服务被触发,具体可以是MS网元收到来自第一终端设备下载数字人物模型的请求,CSCF网元在验证第一终端设备当前用户的身份通过之后,MS网元再将数字人模型下发到第一终端设备。
当图8C所示的流程由第一终端设备执行时,第一终端设备在获得验证结果(匹配或不匹配)之后,使用第一终端设备的私钥对该验证添加数字签名,然后将数字签名后的验证结果上传给网络,网络中的CSCF网元使用第一终端设备的公钥验证数字签名,根据第一终端设备上传的验证结果确定第一终端设备当前用户的身份。
参见图8D,本申请实施例提供一种数字资产购买方法,包括:
D1.数字资产管理中心创建公私密钥对K13&K14,其中K13为私钥,K14为公钥;
D2.数字资产管理中心给MS网元发放数字证书(内含公钥K14);
D3.终端设备向数字资产管理中心发起购买数字资产的请求(其中携带终端设备对应的用户身份信息);
D4.数字资产管理中心验证用户身份信息;
例如,验证身份信息的真实性(是否开户)、合法性(如是否成年)等。
D5.数字资产管理中心使用私钥K13对数字资产、用户身份信息添加数字签名;
D6.数字资产管理中心在用户的资产库中记录数字资产对应的数字资产标识(Identity Document,ID)、数字签名,并将该记录数字资产ID与该数字签名;
D7.MS网元检测到用户的数字人服务被触发;
例如,MS网元接收到CSCF网元发送的消息,指示用户要求在通话的过程中使用用户的数字人。
D8.MS网元向数字资产管理中心发送获取用户的数字资产的请求(携带数字资产ID);
D9.数字资产管理中心根据MS网元发送的数字资产ID查找对应的数字资产,以及与该资产关联的数字签名;
D10.数字资产管理中心向MS网元返回数字资产、数字签名;
D11.MS网元使用公钥K8验证数字签名;
D12.若验证数字签名通过,则继续该用户的数字人服务的相关流程。
例如,在该用户对应的通话画面中叠加数字资产,进行渲染,并将渲染好的通话发送给与该用户通话的其它用户。
通过以上流程,可以实现用户在通话过程中使用其购买的数字资产,有效防止数字资 产被他人非法盗用等问题,规范化数字资产的购买、使用流程。
可以理解,以上各实施方式可以分别单独实施,也可以相互结合实施。
以下列举几种可能的具体示例。在以下示例中,除非有特别说明之外,网络均指网络中的MS网元。
以下示例1至示例10,以主叫终端设备为终端设备A(主叫用户为用户a),被叫终端设备为终端设备B(被叫用户为用户b),用户a和用户b均使用数字人服务为例。
示例1、捕捉、重建、渲染、认证鉴权在通信服务器实现。
具体以数字人模型存储在通信服务器,主被叫用户看到的都是对端真实场景,主叫终端设备和被叫终端设备均支持视角可变(即主叫终端设备支持根据主叫用户的视角呈现通话画面、被叫终端设备支持根据被叫用户的视角呈现通话画面)为例。
主被叫终端设备与通信服务器之间需要传输的数据类型包括如图9A所示。
其中,主叫终端设备与通信服务器之间的数据传输包括:
主叫终端设备向通信服务器发送主叫音频;
主叫终端设备向通信服务器发送主叫视频(即主叫用户的原始视频,包含主叫用户的真人形象);
主叫终端设备向通信服务器发送主叫场景;
主叫终端设备向通信服务器发送主叫视角信息;
通信服务器向主叫终端设备回传主叫渲染视频(包含主叫用户的数字人形象);
通信服务器向主叫终端设备发送被叫音频;
通信服务器向主叫终端设备发送被叫渲染视频(包含被叫用户的数字人形象)。
其中,被叫终端设备与通信服务器之间的数据传输包括:
被叫终端设备向通信服务器发送被叫音频;
被叫终端设备向通信服务器发送被叫视频(即被叫用户的原始视频,包含被叫用户的真人形象);
被叫终端设备向通信服务器发送被叫场景;
被叫终端设备向通信服务器发送被叫视角信息;
通信服务器向被叫终端设备回传被叫渲染视频;
通信服务器向被叫终端设备发送主叫音频;
通信服务器向被叫终端设备发送主叫渲染视频(包含主叫用户的数字人形象和主叫用户的真实场景)。
可以理解,为了便于说明,在后文中,主叫用户的原始视频(包含主叫用户的真人形象和真实场景的视频)以“主叫视频”描述,经过数字人处理操作得到的主叫用户的视频(包含主叫用户的数字人形象和/或虚拟场景画面)以“主叫渲染视频”描述,被叫用户的原始视频(包含被叫用户的真人形象和真实场景的视频)以“被叫视频”描述,经过数字人处理操作得到的被叫用户的视频(包含被叫用户的数字人形象和/或虚拟场景画面)以“被叫渲染视频”描述。
参见图9B~图9C示出了主叫用户、被叫用户、通信服务器之间的通信流程,流程说明如下:
S901,主叫终端设备向通信服务器发送消息A(消息类型为SIP消息),消息A中携带如下几种参数:主叫用户的需求、主叫终端设备与被叫终端设备间的协商内容、周主叫 终端设备与通信服务器间的协商内容;
S902,通信服务器进行主叫人号一致认证,即验证主叫终端设备的当前用户是否是主叫终端设备对应的电话码号的开户人,验证通过之后再执行下一步S903;
S903,通信服务器向被叫终端设备发送消息B(消息类型为SIP消息),其中携带如下几种参数:主叫用户的需求、主叫终端设备与被叫终端设备间的协商内容、通信服务器与被叫终端设备间的协商内容;
S904,被叫终端设备结合被叫用户的需求、被叫终端设备的通话附加服务能力信息以及消息B中携带的参数,确定被叫终端设备的协商结果;
S905,被叫终端设备向通信服务器发送消息C(消息类型为SIP消息),其中携带如下几种参数:被叫终端设备协商结果;
S906,通信服务器进行被叫人号一致认证,即验证被叫终端设备的当前用户是否是被叫终端设备对应的电话码号的开户人;验证通过之后,确定通信服务器与主叫终端设备间的协商结果;
S907,通信服务器向主叫终端设备发送消息D(消息类型为SIP消息),其中携带如下几种参数:被叫终端设备的协商结果、通信服务器与主叫终端设备间的协商结果;
S908,主叫终端设备根据被叫终端设备的协商结果、通信服务器与主叫终端设备间的协商结果,确认主叫终端设备的协商结果;
S909.主叫终端设备向通信服务器发送消息E(消息类型为SIP消息),其中携带如下参数:主叫终端设备的协商结果;
S910,通信服务器向被叫终端设备发送消息F(消息类型为SIP消息),其中携带如下参数:主叫终端设备的协商结果。
以上步骤中涉及的参数的说明可参考表2,此处不再赘述。
S911,被叫终端设备向通信服务器发送“OK”消息,通信服务器向主叫终端设备发送“OK”消息,其中“OK”消息为用于指示参数交互完成;
S912,主叫终端设备与通信服务器建立传输通道;步骤S913.被叫终端设备与通信服务器建立传输通道;
需要创建的数据传输通道类型包括:audio_channel、video_channel、backhaul_channel、viewpoint_channel。
S914.1,主叫终端设备基于audio_channel向通信服务器发送主叫音频;
S914.2,主叫终端设备基于video_channel向通信服务器发送主叫视频;
S914.3,主叫终端设备基于viewpoint_channel向通信服务器发送主叫视角信息;
可以理解,S914.1、S914.2、S914.3不区分先后顺序。
S915,通信服务器从视频中捕捉驱动参数(如表情、动作等信息)与场景信息,再基于声音、表情等信息重建主叫用户数字人;
S916,将重建的主叫用户数字人与主叫用户场景进行渲染,得到渲染画面;
S917,对渲染画面进行编码,得到主叫渲染视频;
S918.1,通信服务器基于backhaul_channel向主叫终端设备发送主叫渲染视频;
S918.2,通信服务器基于audio_channel向被叫终端设备发送主叫音频;
S918.3,通信服务器基于video_channel向被叫终端设备发送主叫渲染视频;
可以理解,S918.1、S918.2、S918.3不区分先后顺序。
被叫终端设备可以播放主叫的通话画面内容。例如,被叫终端设备收到主叫渲染视频之后,就可以将其加载到显示屏上进行显示,同时还可以播放主叫音频。
主叫终端设备收到主叫渲染视频之后,也可以将其加载到显示屏上进行显示,同时还可以播放主叫音频,实现主叫用户回看本端的通话画面内容。
S919.1,被叫终端设基于audio_channel向通信服务器发送被叫音频;
S919.2,被叫终端设备基于video_channel向通信服务器发送被叫视频;
S919.3,被叫终端设备基于viewpoint_channel向通信服务器发送被叫视角信息;
可以理解,S919.1、S919.2、S919.3不区分先后顺序。
S920,通信服务器从视频中捕捉驱动参数与场景信息,再基于驱动参数重建被叫用户数字人;
S921,将重建的被叫用户数字人与被叫用户场景进行渲染,得到渲染画面;
S922,对渲染画面进行编码,得到被叫渲染视频;
S923.1,通信服务器基于backhaul_channel向被叫终端设备发送被叫渲染视频;
S923.2,通信服务器基于audio_channel向主叫终端设备发送被叫音频;
S923.3,通信服务器基于video_channel向主叫终端设备发送被叫渲染视频。
可以理解,S923.1、S923.2、S923.3不区分先后顺序。
主叫终端设备可以播放被叫的通话画面内容。例如,主叫终端设备收到被叫用户被叫渲染视频之后,就可以将其加载到显示屏上进行显示,同时还可以播放被叫音频。
被叫终端设备收到被叫渲染视频之后,也可以将其加载到显示屏上进行显示,同时还可以播放被叫音频,实现被叫用户查看被叫用户的通话画面内容。
在上述示例1中,终端设备可以向通信服务器以及对端终端设备表达用户的通话需求,实现了按照用户的需求向对端提供本端的通话画面内容以及按照用户的需求在本端呈现对端提供的通话画面内容的技术效果,即主被叫用户均以数字人形象进行通话,以及主被叫用户均看到对端数字人在对端真实场景中;另外通信服务器、终端设备通过交互通话附加服务能力信息,明确了在为用户提供数字人服务的过程中,终端设备和通信服务器的分工情况(即各设备需要执行哪些操作),使得捕捉、重建、渲染、认证鉴权(如人号一致认证)都在通信服务器实现,并根据分工情况建立数字人服务所需的数据传输通道;并且,通过认证鉴权,保证了数字人服务在通话场景中被安全、合法地使用。
示例2、捕捉在终端设备实现,认证鉴权、重建、渲染在通信服务器实现。
具体以数字人模型存储在通信服务器,捕捉在终端设备实现,重建和渲染在通信服务器实现,主被叫用户看到的都是对端真实场景,主叫终端设备和被叫终端设备均不支持视角可变为例。
主被叫用户与通信服务器之间需要传输的数据类型包括如图10A所示。
其中,主叫终端设备与通信服务器之间的数据传输包括:
主叫终端设备向通信服务器发送主叫驱动参数;
主叫终端设备向通信服务器发送主叫音频;
主叫终端设备向通信服务器发送主叫视频;
主叫终端设备向通信服务器发送主叫场景;
通信服务器向主叫终端设备回传主叫渲染视频;
通信服务器向主叫终端设备发送被叫音频;
通信服务器向主叫终端设备发送被叫渲染视频。
其中,被叫终端设备与通信服务器之间的数据传输包括:
被叫终端设备向通信服务器发送被叫驱动参数;
被叫终端设备向通信服务器发送被叫音频;
被叫终端设备向通信服务器发送被叫视频;
被叫终端设备向通信服务器发送被叫场景;
通信服务器向被叫终端设备回传被叫渲染视频;
通信服务器向被叫终端设备发送主叫音频;
通信服务器向被叫终端设备发送主叫渲染视频。
参见图10B~图10C示出了主叫用户、被叫用户、通信服务器之间的通信流程,流程说明如下:
S1001,主叫终端设备向通信服务器发送消息A,消息A中携带如下几种参数:主叫用户的需求、主叫终端设备与被叫终端设备间的协商内容、周主叫终端设备与通信服务器间的协商内容;
S1002,通信服务器进行主叫人号一致认证,即验证主叫终端设备的当前用户是否是主叫终端设备对应的电话码号的开户人,验证通过之后再执行下一步S1003;
S1003,通信服务器向被叫终端设备发送消息B,其中携带如下几种参数:主叫用户的需求、主叫终端设备与被叫终端设备间的协商内容、通信服务器与被叫终端设备间的协商内容;
S1004,被叫终端设备结合被叫用户的需求、被叫终端设备的通话附加服务能力信息以及消息B中携带的参数,确定被叫终端设备的协商结果;
S1005,被叫终端设备向通信服务器发送消息C,其中携带如下几种参数:被叫终端设备协商结果;
S1006,通信服务器进行被叫人号一致认证,即验证被叫终端设备的当前用户是否是被叫终端设备对应的电话码号的开户人;验证通过之后,确定通信服务器与主叫终端设备间的协商结果;
S1007,通信服务器向主叫终端设备发送消息D,其中携带如下几种参数:被叫终端设备的协商结果、通信服务器与主叫终端设备间的协商结果;
S1008,主叫终端设备根据被叫终端设备的协商结果、通信服务器与主叫终端设备间的协商结果,确认主叫终端设备的协商结果;
S1009.主叫终端设备向通信服务器发送消息E,其中携带如下参数:主叫终端设备的协商结果;
S1010,通信服务器向被叫终端设备发送消息F,其中携带如下参数:主叫终端设备的协商结果。
以上步骤中涉及的参数的说明可参考表2,此处不再赘述。
S1011,被叫终端设备向通信服务器发送“OK”消息,通信服务器向主叫终端设备发送“OK”消息,其中“OK”消息为用于指示参数交互完成;
S1012,主叫终端设备与通信服务器建立传输通道;步骤S1013.被叫终端设备与通信服务器建立传输通道;
需要创建的数据传输通道类型包括:audio_channel、video_channel、backhaul_channel、action_channel。
S1014.1,主叫终端设备基于audio_channel向通信服务器发送主叫音频;
S1014.2,主叫终端设备基于video_channel向通信服务器发送主叫视频;
可以理解,S1014.1、S1014.2不区分先后顺序。
S1015.1,主叫终端设备捕捉主叫驱动参数;
S1015.2,主叫终端设备基于action_channel将主叫驱动参数发送给通信服务器;
S1016,通信服务器基于主叫驱动参数重建主叫数字人;
S1017,通信服务器将重建的主叫数字人与主叫用户场景进行渲染,得到渲染画面;
S1018,通信服务器对渲染画面进行编码,得到主叫渲染视频;
S1019.1,通信服务器基于backhaul_channel向主叫终端设备发送主叫渲染视频;
S1019.2,通信服务器基于audio_channel向被叫终端设备发送主叫音频;
S1019.3,通信服务器基于video_channel向被叫终端设备发送主叫渲染视频;
可以理解,S1019.1、S1019.2、S1019.3不区分先后顺序。
被叫终端设备收到主叫渲染视频之后,就可以将其加载到显示屏上进行显示,同时还可以播放主叫音频。
主叫终端设备收到主叫渲染视频之后,也可以将其加载到显示屏上进行显示,同时还可以播放主叫音频,实现主叫用户回看本端的通话画面内容。
S1020.1,被叫终端设基于audio_channel向通信服务器发送被叫音频;
S1020.2,被叫终端设备基于video_channel向通信服务器发送被叫视频;
可以理解,S1020.1、S1020.2不区分先后顺序。
S1021.1,被叫终端设备捕捉被叫驱动参数;
S1021.2,被叫终端设备将被叫驱动参数发送给通信服务器;
S1022,通信服务器基于被叫驱动参数重建被叫用户数字人;
S1023,通信服务器将重建的被叫用户数字人与被叫用户场景进行渲染,得到渲染画面;
S1024,通信服务器对渲染画面进行编码,得到被叫渲染视频;
S1025.1,通信服务器基于backhaul_channel向被叫终端设备发送被叫渲染视频;
S1025.2,通信服务器基于audio_channel向主叫终端设备发送被叫音频;
S1025.3,通信服务器基于video_channel向主叫终端设备发送被叫渲染视频。
可以理解,S1025.1、S1025.2、S1025.3不区分先后顺序。
主叫终端设备收到被叫用户被叫渲染视频之后,就可以将其加载到显示屏上进行显示,同时还可以播放被叫音频。
被叫终端设备收到被叫渲染视频之后,也可以将其加载到显示屏上进行显示,同时还可以播放被叫音频,实现被叫用户查看被叫用户的通话画面内容。
在上述示例2中,终端设备可以向通信服务器以及对端终端设备表达用户的通话需求,实现了按照用户的需求向对端提供本端的通话画面内容以及按照用户的需求在本端呈现对端提供的通话画面内容的技术效果,即主被叫用户均以数字人形象进行通话,以及主被叫用户均看到对端数字人在对端真实场景中;另外通信服务器、终端设备通过交互通话附加服务能力信息,明确了在为用户提供数字人服务的过程中,终端设备和通信服务器的分 工情况(即各设备需要执行哪些操作),使得捕捉在终端设备实现,而重建、渲染、认证鉴权(如人号一致认证)在通信服务器实现,并根据分工情况建立数字人服务所需的数据传输通道;并且,通过认证鉴权,保证了数字人服务在通话场景中被安全、合法地使用。
示例3、重建在终端设备实现,认证鉴权、捕捉、渲染在通信服务器实现。
具体以数字人模型存储在通信服务器,捕捉和渲染在通信服务器实现,重建在终端设备实现,主被叫用户看到的都是对端真实场景,主叫终端设备和被叫终端设备均不支持视角可变为例。
主被叫用户与通信服务器之间需要传输的数据类型包括如图11A所示。
其中,主叫终端设备与通信服务器之间的数据传输包括:
主叫终端设备向通信服务器发送主叫音频;
主叫终端设备向通信服务器发送主叫视频;
主叫终端设备向通信服务器发送主叫场景;
通信服务器向主叫终端设备回传主叫渲染视频;
通信服务器向主叫终端设备发送主叫数字人模型;
通信服务器向主叫终端设备发送主叫驱动参数;
主叫终端设备向通信服务器发送主叫重建好的主叫数字人(即主叫的数字人动态数据);
通信服务器向主叫终端设备发送被叫音频;
通信服务器向主叫终端设备发送被叫渲染视频。
其中,被叫终端设备与通信服务器之间的数据传输包括:
被叫终端设备向通信服务器发送被叫音频;
被叫终端设备向通信服务器发送被叫视频;
被叫终端设备向通信服务器发送被叫场景;
通信服务器向被叫终端设备回传被叫渲染视频;
通信服务器向被叫终端设备发送被叫数字人模型;
通信服务器向被叫终端设备发送被叫驱动参数;
被叫终端设备向通信服务器发送被叫重建好的被叫数字人(即被叫的数字人动态数据);
通信服务器向被叫终端设备发送主叫音频;
通信服务器向被叫终端设备发送主叫渲染视频。参见图11B~图11C示出了主叫用户、被叫用户、通信服务器之间的通信流程,流程说明如下:
S1101,主叫终端设备向通信服务器发送消息A,消息A中携带如下几种参数:主叫用户的需求、主叫终端设备与被叫终端设备间的协商内容、周主叫终端设备与通信服务器间的协商内容;
S1102,通信服务器进行主叫人号一致认证,即验证主叫终端设备的当前用户是否是主叫终端设备对应的电话码号的开户人,验证通过之后再执行下一步S1103;
S1103,通信服务器向被叫终端设备发送消息B,其中携带如下几种参数:主叫用户的需求、主叫终端设备与被叫终端设备间的协商内容、通信服务器与被叫终端设备间的协商内容;
S1104,被叫终端设备结合被叫用户的需求、被叫终端设备的通话附加服务能力信息以及消息B中携带的参数,确定被叫终端设备的协商结果;
S1105,被叫终端设备向通信服务器发送消息C,其中携带如下几种参数:被叫终端设备协商结果;
S1106,通信服务器进行被叫人号一致认证,即验证被叫终端设备的当前用户是否是被叫终端设备对应的电话码号的开户人;验证通过之后,确定通信服务器与主叫终端设备间的协商结果;
S1107,通信服务器向主叫终端设备发送消息D,其中携带如下几种参数:被叫终端设备的协商结果、通信服务器与主叫终端设备间的协商结果;
S1108,主叫终端设备根据被叫终端设备的协商结果、通信服务器与主叫终端设备间的协商结果,确认主叫终端设备的协商结果;
S1109.主叫终端设备向通信服务器发送消息E,其中携带如下参数:主叫终端设备的协商结果;
S1110,通信服务器向被叫终端设备发送消息F,其中携带如下参数:主叫终端设备的协商结果。
以上步骤中涉及的参数的说明可参考表2,此处不再赘述。
S1111,被叫终端设备向通信服务器发送“OK”消息,通信服务器向主叫终端设备发送“OK”消息,其中“OK”消息为用于指示参数交互完成;
S1112,主叫终端设备与通信服务器建立传输通道;步骤S1113.被叫终端设备与通信服务器建立传输通道;
需要创建的数据传输通道类型包括:audio_channel、video_channel、backhaul_channel、action_channel、digitalman_channel、digitalman_model_channel。
S1114.1,主叫终端设备基于digitalman_model_channel从通信服务器下载主叫数字人模型;
S1114.2,被叫终端设备基于digitalman_model_channel从通信服务器下载被叫数字人模型;
可以理解,S1114.1、S1114.2不区分先后顺序。
S1115.1,主叫终端设备基于audio_channel向通信服务器发送主叫音频;
S1115.2,主叫终端设备基于video_channel向通信服务器发送主叫视频;
可以理解,S1115.1、S1115.2不区分先后顺序。
S1116,通信服务器捕捉主叫驱动参数;
S1117,通信服务器基于action_channel将主叫驱动参数发送给主叫终端设备;
S1118,主叫终端设备基于主叫驱动参数重建主叫数字人;
S1119,主叫终端设备基于digitalman__channel将重建好的主叫数字人发送给通信服务器;
S1121,通信服务器对主叫数字人进行一致性校验;
S1122,在主叫数字人一致性校验通过之后,通信服务器将重建的主叫数字人与主叫用户场景进行渲染,得到渲染画面;对渲染画面进行编码,得到主叫渲染视频;
S1123.1,通信服务器基于backhaul_channel向主叫终端设备发送主叫渲染视频;
S1123.2,通信服务器基于audio_channel向被叫终端设备发送主叫音频;
S1123.3,通信服务器基于video_channel向被叫终端设备发送主叫渲染视频;
可以理解,S1123.1、S1123.2、S1123.3不区分先后顺序。
被叫终端设备收到主叫渲染视频之后,就可以将其加载到显示屏上进行显示,同时还可以播放主叫音频。
主叫终端设备收到主叫渲染视频之后,也可以将其加载到显示屏上进行显示,同时还可以播放主叫音频,实现主叫用户回看本端的通话画面内容。
S1124.1,被叫终端设基于audio_channel向通信服务器发送被叫音频;
S1124.2,被叫终端设备基于video_channel向通信服务器发送被叫视频;
可以理解,S1124.1、S1124.2不区分先后顺序。
S1125,通信服务器捕捉被叫驱动参数;
S1126,通信服务器基于action_channel将被叫驱动参数发送给被叫终端设备;
S1127,被叫终端设备基于被叫驱动参数重建被叫数字人;
S1128,被叫终端设备基于digitalman__channel将重建好的被叫数字人发送给通信服务器;
S1129,通信服务器对被叫数字人进行一致性校验;
S1130,在被叫数字人一致性校验通过之后,通信服务器将重建的被叫数字人与被叫用户场景进行渲染,得到渲染画面;对渲染画面进行编码,得到被叫渲染视频;
S1131.1,通信服务器基于backhaul_channel向被叫终端设备发送被叫渲染视频;
S1131.2,通信服务器基于audio_channel向主叫终端设备发送被叫音频;
S1131.3,通信服务器基于video_channel向主叫终端设备发送被叫渲染视频。
可以理解,S1131.1、S1131.2、S1131.3不区分先后顺序。
主叫终端设备收到被叫渲染视频之后,就可以将其加载到显示屏上进行显示,同时还可以播放被叫音频。
被叫终端设备收到被叫渲染视频之后,也可以将其加载到显示屏上进行显示,同时还可以播放被叫音频,实现被叫用户回看本端的通话画面内容。
在上述示例3中,终端设备可以向通信服务器以及对端终端设备表达用户的通话需求,实现了按照用户的需求向对端提供本端的通话画面内容以及按照用户的需求在本端呈现对端提供的通话画面内容的技术效果,即主被叫用户均以数字人形象进行通话,以及主被叫用户均看到对端数字人在对端真实场景中;另外通信服务器、终端设备通过交互通话附加服务能力信息,明确了在为用户提供数字人服务的过程中,终端设备和通信服务器的分工情况(即各设备需要执行哪些操作),使得重建在终端设备实现,而捕捉、渲染、认证鉴权(如人号一致认证)在通信服务器实现,并根据分工情况建立数字人服务所需的数据传输通道;并且,通过认证鉴权,保证了数字人服务在通话场景中被安全、合法地使用。
示例4、捕捉、重建在终端设备实现,认证鉴权、渲染在通信服务器实现。
具体以数字人模型存储在通信服务器,渲染在通信服务器实现,捕捉、重建在终端设备实现,主被叫用户看到的都是对端真实场景,不支持视角可变为例。
主被叫用户与通信服务器之间需要传输的数据类型包括如图12A所示。
其中,主叫终端设备与通信服务器之间的数据传输包括:
主叫终端设备向通信服务器发送主叫音频;
主叫终端设备向通信服务器发送主叫视频;
主叫终端设备向通信服务器发送主叫场景;
主叫终端设备向通信服务器发送主叫视角信息;
通信服务器向主叫终端设备回传主叫渲染视频;
通信服务器向主叫终端设备发送主叫数字人模型;
主叫终端设备向通信服务器发送主叫重建好的主叫数字人;
通信服务器向主叫终端设备发送被叫音频;
通信服务器向主叫终端设备发送被叫渲染视频。
其中,被叫终端设备与通信服务器之间的数据传输包括:
被叫终端设备向通信服务器发送被叫音频;
被叫终端设备向通信服务器发送被叫视频;
被叫终端设备向通信服务器发送被叫场景;
被叫终端设备向通信服务器发送被叫视角信息;
通信服务器向被叫终端设备回传被叫渲染视频;
通信服务器向被叫终端设备发送被叫数字人模型;
被叫终端设备向通信服务器发送被叫重建好的被叫数字人;
通信服务器向被叫终端设备发送主叫音频;
通信服务器向被叫终端设备发送主叫渲染视频。
参见图12B~图12C示出了主叫用户、被叫用户、通信服务器之间的通信流程,流程说明如下:
S1201,主叫终端设备向通信服务器发送消息A,消息A中携带如下几种参数:主叫用户的需求、主叫终端设备与被叫终端设备间的协商内容、周主叫终端设备与通信服务器间的协商内容;
S1202,通信服务器进行主叫人号一致认证,即验证主叫终端设备的当前用户是否是主叫终端设备对应的电话码号的开户人,验证通过之后再执行下一步S1203;
S1203,通信服务器向被叫终端设备发送消息B,其中携带如下几种参数:主叫用户的需求、主叫终端设备与被叫终端设备间的协商内容、通信服务器与被叫终端设备间的协商内容;
S1204,被叫终端设备结合被叫用户的需求、被叫终端设备的通话附加服务能力信息以及消息B中携带的参数,确定被叫终端设备的协商结果;
S1205,被叫终端设备向通信服务器发送消息C,其中携带如下几种参数:被叫终端设备协商结果;
S1206,通信服务器进行被叫人号一致认证,即验证被叫终端设备的当前用户是否是被叫终端设备对应的电话码号的开户人;验证通过之后,确定通信服务器与主叫终端设备间的协商结果;
S1207,通信服务器向主叫终端设备发送消息D,其中携带如下几种参数:被叫终端设备的协商结果、通信服务器与主叫终端设备间的协商结果;
S1208,主叫终端设备根据被叫终端设备的协商结果、通信服务器与主叫终端设备间的协商结果,确认主叫终端设备的协商结果;
S1209.主叫终端设备向通信服务器发送消息E,其中携带如下参数:主叫终端设备的 协商结果;
S1210,通信服务器向被叫终端设备发送消息F,其中携带如下参数:主叫终端设备的协商结果。
以上步骤中涉及的参数的说明可参考表2,此处不再赘述。
S1211,被叫终端设备向通信服务器发送“OK”消息,通信服务器向主叫终端设备发送“OK”消息,其中“OK”消息为用于指示参数交互完成;
S1212,主叫终端设备与通信服务器建立传输通道;步骤S1213.被叫终端设备与通信服务器建立传输通道;
需要创建的数据传输通道类型包括:audio_channel、video_channel、backhaul_channel、digitalman_channel、digitalman_model_channel。
S1214.1,主叫终端设备基于digitalman_model_channel从通信服务器下载主叫数字人模型;
S1214.2,被叫终端设备基于digitalman_model_channel从通信服务器下载被叫数字人模型;
可以理解,S1214.1、S1214.2不区分先后顺序。
S1215,主叫终端设备捕捉主叫驱动参数;
S1216,主叫终端设备基于主叫驱动参数重建主叫数字人;
S1217.1,主叫终端设备基于digitalman_channel将重建好的主叫数字人发送给通信服务器;
S1217.2,主叫终端设备基于audio_channel向通信服务器发送主叫音频;
S1217.3,主叫终端设备基于video_channel向通信服务器发送主叫视频;
可以理解,1217.1、1217.2、1217.3不区分先后顺序。
S1218,通信服务器对主叫数字人进行一致性校验;
S1219,在主叫数字人一致性校验通过之后,通信服务器将重建的主叫数字人与主叫用户场景进行渲染,得到渲染画面;对渲染画面进行编码,得到主叫渲染视频;
S1220.1,通信服务器基于backhaul_channel向主叫终端设备发送主叫渲染视频;
S1220.2,通信服务器基于audio_channel向被叫终端设备发送主叫音频;
S1220.3,通信服务器基于video_channel向被叫终端设备发送主叫渲染视频;
可以理解,S1220.1、S1220.2、S1220.3不区分先后顺序。
被叫终端设备收到主叫渲染视频之后,就可以将其加载到显示屏上进行显示,同时还可以播放主叫音频。
主叫终端设备收到主叫渲染视频之后,也可以将其加载到显示屏上进行显示,同时还可以播放主叫音频,实现主叫用户回看本端的通话画面内容。
S1221,被叫终端设备捕捉被叫驱动参数;
S1222,被叫终端设备基于被叫驱动参数重建被叫数字人;
S1223.1,被叫终端设备基于digitalman_channel将重建好的被叫数字人发送给通信服务器;
S1223.2,被叫终端设备基于audio_channel向通信服务器发送被叫音频;
S1223.3,被叫终端设备基于video_channel向通信服务器发送被叫视频;
可以理解,1223.1、1223.2、1223.3不区分先后顺序。
S1224,通信服务器对被叫数字人进行一致性校验;
S1225,在被叫数字人一致性校验通过之后,通信服务器将重建的被叫数字人与被叫用户场景进行渲染,得到渲染画面;对渲染画面进行编码,得到被叫渲染视频;
S1226.1,通信服务器基于backhaul_channel向被叫终端设备发送被叫渲染视频;
S1226.2,通信服务器基于audio_channel向主叫终端设备发送被叫音频;
S1226.3,通信服务器基于video_channel向主叫终端设备发送被叫渲染视频。
可以理解,S1226.1、S1226.2、S1226.3不区分先后顺序。
主叫终端设备收到被叫渲染视频之后,就可以将其加载到显示屏上进行显示,同时还可以播放被叫音频。
被叫终端设备收到被叫渲染视频之后,也可以将其加载到显示屏上进行显示,同时还可以播放被叫音频,实现被叫用户回看本端的通话画面内容。
在上述示例4中,终端设备可以向通信服务器以及对端终端设备表达用户的通话需求,实现了按照用户的需求向对端提供本端的通话画面内容以及按照用户的需求在本端呈现对端提供的通话画面内容的技术效果,即主被叫用户均以数字人形象进行通话,以及主被叫用户均看到对端数字人在对端真实场景中;另外通信服务器、终端设备通过交互通话附加服务能力信息,明确了在为用户提供数字人服务的过程中,终端设备和通信服务器的分工情况(即各设备需要执行哪些操作),使得捕捉、重建在终端设备实现,而渲染、认证鉴权(如人号一致认证)在通信服务器实现,并根据分工情况建立数字人服务所需的数据传输通道;并且,通过认证鉴权,保证了数字人服务在通话场景中被安全、合法地使用。
示例5、捕捉、重建、渲染在终端设备实现,认证鉴权在通信服务器实现。
具体以数字人模型存储在通信服务器,捕捉和重建和渲染都在终端设备实现,主被叫用户看到的都是对端真实场景,主叫终端设备和被叫终端设备支持视角可变为例。
当渲染能力在终端设备实现时,终端设备既可以渲染本端显示内容(客户端渲染模式),也可以渲染对端显示的内容(服务端渲染模式)。其中,被叫终端设备客户端渲染模式表示被叫用户客户端渲染被叫用户看到的内容,如主叫用户数字人在被叫用户真实/虚拟场景中;被叫终端设备服务端渲染模式表示被叫用户客户端渲染主叫用户看到的内容,如被叫用户数字人在被叫用户真实/虚拟场景中;主叫终端设备客户端渲染模式表示主叫用户客户端渲染主叫用户看到的内容,如被叫用户数字人在主叫用户真实/虚拟场景中;主叫终端设备服务端渲染模式表示主叫用户客户端渲染被叫用户看到的内容,如主叫用户数字人在主叫用户真实/虚拟场景中。在通信过程中,可根据主被叫用户的需求、主被叫终端设备的能力确定主被叫终端设备模式。主被叫终端设备渲染模式可不一致。
1、若主被叫终端设备都是客户端渲染模式,参见图13A,为主被叫用户与通信服务器之间需要传输的数据类型的示例。
其中,主叫终端设备与通信服务器之间的数据传输可以包括:
主叫终端设备向通信服务器发送主叫音频;
主叫终端设备向通信服务器发送主叫视频;
通信服务器向主叫终端设备发送主叫场景;
主叫终端设备向通信服务器发送主叫驱动参数;
通信服务器向主叫终端设备发送被叫数字人模型;
通信服务器向主叫终端设备发送被叫驱动参数;
通信服务器向主叫终端设备发送被叫音频;
通信服务器向主叫终端设备发送被叫视频;
通信服务器向主叫终端设备发送被叫场景;
主叫终端设备向通信服务器发送被叫渲染视频;
通信服务器向主叫终端设备发送主叫渲染视频。
其中,被叫终端设备与通信服务器之间的数据传输包括:
被叫终端设备向通信服务器发送被叫音频;
被叫终端设备向通信服务器发送被叫视频;
通信服务器向被叫终端设备发送被叫场景;
被叫终端设备向通信服务器发送被叫驱动参数;
通信服务器向被叫终端设备发送主叫数字人模型;
通信服务器向被叫终端设备发送主叫驱动参数;
通信服务器向被叫终端设备发送主叫音频;
通信服务器向被叫终端设备发送主叫视频;
通信服务器向被叫终端设备发送主叫场景;
被叫终端设备向通信服务器发送主叫渲染视频;
通信服务器向被叫终端设备发送被叫渲染视频。
若被叫用户看主叫用户数字人在被叫用户真实/虚拟场景,主被叫终端设备都是客户端渲染模式,则:主叫终端设备与通信服务器将主叫用户数字人模型、主叫用户数字人驱动参数传输到被叫终端设备;被叫终端设备根据主叫用户数字人模型与主叫用户数字人驱动参数重建主叫用户数字人;被叫终端设备根据主叫用户数字人与被叫用户真实/虚拟场景、视角,进行渲染并显示。
若被叫用户看主叫用户数字人在主叫用户真实/虚拟场景,主被叫终端设备都是客户端渲染模式,则:主叫终端设备与通信服务器将主叫用户数字人模型+主叫用户数字人驱动参数传输到被叫终端设备;主叫终端设备与通信服务器将主叫用户真实/虚拟场景、视角传输到被叫终端设备;被叫终端设备根据主叫用户数字人模型与主叫用户数字人驱动参数重建主叫用户数字人;被叫终端设备根据主叫用户数字人与主叫用户真实/虚拟场景、视角,进行渲染并显示。
为了让主被叫用户看到自己在对端的呈现形态,主被叫终端设备还可以与通信服务器间建立两条渲染流通道,一条用于上传对方在本端的渲染结果,一条用于接收本端在对方的渲染结果。
可以理解,主叫终端设备从通信服务器接收主叫用户场景是主叫用户虚拟场景保存在通信服务器中才需要;被叫终端设备从通信服务器接收被叫用户场景是被叫用户虚拟场景保存在通信服务器中才需要。
2、若主被叫终端设备都是服务端渲染模式为例,参见图13B,为主被叫用户与通信服务器之间需要传输的数据类型的示例。
其中,主叫终端设备与通信服务器之间的数据传输可以包括:
主叫终端设备向通信服务器发送主叫音频;
通信服务器向主叫终端设备发送主叫场景;
通信服务器向主叫终端设备发送被叫数字人模型;
通信服务器向主叫终端设备发送被叫音频;
通信服务器向主叫终端设备发送被叫渲染视频;
通信服务器向主叫终端设备发送被叫场景;
通信服务器向主叫终端设备发送被叫视角信息;
主叫终端设备向通信服务器发送主叫渲染视频。
其中,被叫终端设备与通信服务器之间的数据传输可以包括:
被叫终端设备向通信服务器发送被叫音频;
通信服务器向被叫终端设备发送被叫场景;
通信服务器向被叫终端设备发送主叫数字人模型;
通信服务器向被叫终端设备发送主叫音频;
通信服务器向被叫终端设备发送主叫渲染视频;
通信服务器向被叫终端设备发送主叫场景;
通信服务器向被叫终端设备发送主叫视角信息;
被叫终端设备向通信服务器发送被叫渲染视频。
若被叫用户看主叫用户数字人在被叫用户真实/虚拟场景,主被叫终端设备都是服务端渲染模式,则:被叫终端设备与通信服务器将被叫用户真实/虚拟场景、视角传输到主叫终端设备;主叫终端设备根据主叫用户数字人模型与主叫用户数字人驱动参数重建主叫用户数字人;主叫终端设备根据主叫用户数字人与被叫用户真实/虚拟场景、视角,进行渲染;主叫终端设备将渲染结果编码并传输到被叫终端设备;被叫终端设备解码并显示渲染结果。
若被叫用户看主叫用户数字人在主叫用户真实/虚拟场景,主被叫终端设备都是服务端渲染模式,则:被叫终端设备与通信服务器将被叫用户视角传输到主叫终端设备;主叫终端设备根据主叫用户数字人模型与主叫用户数字人驱动参数重建主叫用户数字人;主叫终端设备根据主叫用户数字人与主叫用户真实/虚拟场景、视角,进行渲染;主叫终端设备将渲染结果编码并传输到被叫终端设备;被叫终端设备解码并显示渲染结果。
3、若主被叫终端设备都渲染模式不一致,如主叫终端设备为服务端渲染模式,被叫终端设备为客户端渲染模式,或者被叫终端设备为服务端渲染模式,主叫终端设备为客户端渲染模式。在被叫用户看主叫用户数字人在主叫用户虚拟场景,主叫用户看被叫用户数字人在主叫用户真实场景时,主叫用户和被叫用户的渲染视频都在主叫终端设备处理,对主叫终端设备性能要求高。具体流程与通道不再赘述。
以主被叫用户均看到对端数字人在对端真实场景为例,参见图13C~图13D示出了主叫用户、被叫用户、通信服务器之间的通信流程,流程说明如下:
S1301,主叫终端设备向通信服务器发送消息A,消息A中携带如下几种参数:主叫用户的需求、主叫终端设备与被叫终端设备间的协商内容、周主叫终端设备与通信服务器间的协商内容;
S1302,通信服务器进行主叫人号一致认证,即验证主叫终端设备的当前用户是否是主叫终端设备对应的电话码号的开户人,验证通过之后再执行下一步S1303;
S1303,通信服务器向被叫终端设备发送消息B,其中携带如下几种参数:主叫用户的需求、主叫终端设备与被叫终端设备间的协商内容、通信服务器与被叫终端设备间的协商内容;
S1304,被叫终端设备结合被叫用户的需求、被叫终端设备的通话附加服务能力信息以及消息B中携带的参数,确定被叫终端设备的协商结果;
S1305,被叫终端设备向通信服务器发送消息C,其中携带如下几种参数:被叫终端设备协商结果;
S1306,通信服务器进行被叫人号一致认证,即验证被叫终端设备的当前用户是否是被叫终端设备对应的电话码号的开户人;验证通过之后,确定通信服务器与主叫终端设备间的协商结果;
S1307,通信服务器向主叫终端设备发送消息D,其中携带如下几种参数:被叫终端设备的协商结果、通信服务器与主叫终端设备间的协商结果;
S1308,主叫终端设备根据被叫终端设备的协商结果、通信服务器与主叫终端设备间的协商结果,确认主叫终端设备的协商结果;
S1309.主叫终端设备向通信服务器发送消息E,其中携带如下参数:主叫终端设备的协商结果;
S1310,通信服务器向被叫终端设备发送消息F,其中携带如下参数:主叫终端设备的协商结果。
以上步骤中涉及的参数的说明可参考表2,此处不再赘述。
S1311,被叫终端设备向通信服务器发送“OK”消息,通信服务器向主叫终端设备发送“OK”消息,其中“OK”消息为用于指示参数交互完成;
S1312,主叫终端设备与通信服务器建立传输通道;步骤S1313.被叫终端设备与通信服务器建立传输通道;
需要创建的数据传输通道类型包括:audio_channel、video_channel、digitalman_model_channel、viewpoint_channel。
S1314.1,主叫终端设备基于digitalman_model_channel从通信服务器下载主叫数字人模型;
S1314.2,被叫终端设备基于digitalman_model_channel从通信服务器下载被叫数字人模型;
可以理解,S1314.1、S1314.2不区分先后顺序。
S1315,主叫终端设备捕捉主叫驱动参数;
S1316,主叫终端设备基于主叫驱动参数重建主叫数字人;
S1317,主叫终端设备将重建的主叫数字人与主叫用户场景进行渲染,得到渲染画面;对渲染画面进行编码,得到主叫渲染视频;
S1318.1,主叫终端设备基于audio_channel向通信服务器发送主叫音频;
S1318.2,主叫终端设备基于video_channel向通信服务器发送主叫渲染视频;
S1318.3,主叫终端设备基于viewpoint_channel向通信服务器发送主叫视角信息;
可以理解,1318.1、1318.2、1318.3不区分先后顺序。
S1319,通信服务器对主叫数字人进行一致性校验,假设一致性校验通过;
S1320.1,通信服务器基于audio_channel向被叫终端设备发送主叫音频;
S1320.2,通信服务器基于video_channel向被叫终端设备发送主叫渲染视频;
S1320.3,通信服务器基于viewpoint_channel向被叫终端设备发送主叫视角信息;
可以理解,S1320.1、S1320.2、S1320.3不区分先后顺序。
被叫终端设备收到主叫渲染视频之后,就可以将其加载到显示屏上进行显示,同时还可以播放主叫音频。
主叫终端设备收到主叫渲染视频之后,也可以将其加载到显示屏上进行显示,同时还可以播放主叫音频,实现主叫用户回看本端的通话画面内容。
S1321,被叫终端设备捕捉被叫驱动参数;
S1322,被叫终端设备基于被叫驱动参数重建被叫数字人;
S1323,被叫终端设备将重建的被叫数字人与被叫用户场景进行渲染,得到渲染画面;对渲染画面进行编码,得到被叫渲染视频;
S1324.1,被叫终端设备基于audio_channel向通信服务器发送被叫音频;
S1324.2,被叫终端设备基于video_channel向通信服务器发送被叫渲染视频;
S1324.3,被叫终端设备基于viewpoint_channel向通信服务器发送被叫视角信息;
可以理解,S1324.1、S1324.2、S1324.3不区分先后顺序。
S1325,通信服务器对被叫数字人进行一致性校验,假设一致性校验通过;
S1326.1,通信服务器基于audio_channel向主叫终端设备发送被叫音频;
S1326.2,通信服务器基于video_channel向主叫终端设备发送被叫渲染视频;
S1326.3,通信服务器基于viewpoint_channel向主叫终端设备发送被叫视角信息。
可以理解,S1326.1、S1326.2、S1326.3不区分先后顺序。
主叫终端设备收到被叫渲染视频之后,就可以将其加载到显示屏上进行显示,同时还可以播放被叫音频。
被叫终端设备收到被叫渲染视频之后,也可以将其加载到显示屏上进行显示,同时还可以播放被叫音频,实现被叫用户回看本端的通话画面内容。
在上述示例5中,终端设备可以向通信服务器以及对端终端设备表达用户的通话需求,实现了按照用户的需求向对端提供本端的通话画面内容以及按照用户的需求在本端呈现对端提供的通话画面内容的技术效果,即主被叫用户均以数字人形象进行通话,以及主被叫用户均看到对端数字人在对端真实场景中;另外通信服务器、终端设备通过交互通话附加服务能力信息,明确了在为用户提供数字人服务的过程中,终端设备和通信服务器的分工情况(即各设备需要执行哪些操作),使得捕捉、重建、渲染都在终端设备实现,而认证鉴权(如人号一致认证)在通信服务器实现,并根据分工情况建立数字人服务所需的数据传输通道;并且,通过认证鉴权,保证了数字人服务在通话场景中被安全、合法地使用。
示例6、重建、渲染在终端设备实现,认证鉴权、捕捉在通信服务器实现。
具体以数字人模型存储在通信服务器,捕捉在通信服务器实现,重建、渲染在终端设备实现,主被叫用户看到的都是对端真实场景,主叫终端设备和被叫终端设备均不支持视角可变为例。
相比捕捉、重建、渲染在终端设备实现,捕捉在通信服务器实现后,主被叫终端设备不用再上报主叫用户/被叫用户的驱动参数,而是由通信服务器根据主被叫用户的音频和/或视频进行捕捉,生成驱动参数,并下发给主被叫终端设备,其余无区别。
仍以主被叫用户均看到对端数字人在对端真实场景为例,主被叫用户与通信服务器之间需要传输的数据类型包括如图14A所示。
其中,主叫终端设备与通信服务器之间的数据传输可以包括:
主叫终端设备向通信服务器发送主叫音频;
主叫终端设备向通信服务器发送主叫视频;
通信服务器向主叫终端设备发送主叫场景;
通信服务器向主叫终端设备发送被叫数字人模型;
通信服务器向主叫终端设备发送被叫驱动参数;
通信服务器向主叫终端设备发送被叫音频;
通信服务器向主叫终端设备发送被叫视频;
通信服务器向主叫终端设备发送被叫场景;
主叫终端设备向通信服务器发送被叫渲染视频;
通信服务器向主叫终端设备发送主叫渲染视频。
其中,被叫终端设备与通信服务器之间的数据传输可以包括:
被叫终端设备向通信服务器发送被叫音频;
被叫终端设备向通信服务器发送被叫视频;
通信服务器向被叫终端设备发送被叫场景;
通信服务器向被叫终端设备发送主叫数字人模型;
通信服务器向被叫终端设备发送主叫驱动参数;
通信服务器向被叫终端设备发送主叫音频;
通信服务器向被叫终端设备发送主叫视频;
通信服务器向被叫终端设备发送主叫场景;
被叫终端设备向通信服务器发送主叫渲染视频;
通信服务器向被叫终端设备发送被叫渲染视频。
参见图14B~图14C示出了主叫用户、被叫用户、通信服务器之间的通信流程,流程说明如下:
S1401,主叫终端设备向通信服务器发送消息A,消息A中携带如下几种参数:主叫用户的需求、主叫终端设备与被叫终端设备间的协商内容、周主叫终端设备与通信服务器间的协商内容;
S1402,通信服务器进行主叫人号一致认证,即验证主叫终端设备的当前用户是否是主叫终端设备对应的电话码号的开户人,验证通过之后再执行下一步S1403;
S1403,通信服务器向被叫终端设备发送消息B,其中携带如下几种参数:主叫用户的需求、主叫终端设备与被叫终端设备间的协商内容、通信服务器与被叫终端设备间的协商内容;
S1404,被叫终端设备结合被叫用户的需求、被叫终端设备的通话附加服务能力信息以及消息B中携带的参数,确定被叫终端设备的协商结果;
S1405,被叫终端设备向通信服务器发送消息C,其中携带如下几种参数:被叫终端设备协商结果;
S1406,通信服务器进行被叫人号一致认证,即验证被叫终端设备的当前用户是否是被叫终端设备对应的电话码号的开户人;验证通过之后,确定通信服务器与主叫终端设备间的协商结果;
S1407,通信服务器向主叫终端设备发送消息D,其中携带如下几种参数:被叫终端设备的协商结果、通信服务器与主叫终端设备间的协商结果;
S1408,主叫终端设备根据被叫终端设备的协商结果、通信服务器与主叫终端设备间的协商结果,确认主叫终端设备的协商结果;
S1409.主叫终端设备向通信服务器发送消息E,其中携带如下参数:主叫终端设备的协商结果;
S1410,通信服务器向被叫终端设备发送消息F,其中携带如下参数:主叫终端设备的协商结果。
以上步骤中涉及的参数的说明可参考表2,此处不再赘述。
S1411,被叫终端设备向通信服务器发送“OK”消息,通信服务器向主叫终端设备发送“OK”消息,其中“OK”消息为用于指示参数交互完成;
S1412,主叫终端设备与通信服务器建立传输通道;步骤S1413.被叫终端设备与通信服务器建立传输通道;
需要创建的数据传输通道类型包括:digitalman_model_channel、audio_channel、video_channel、action_channel。
S1414.1,主叫终端设备基于digitalman_model_channel从通信服务器下载主叫数字人模型;
S1414.2,被叫终端设备基于digitalman_model_channel从通信服务器下载被叫数字人模型;
可以理解,S1414.1、S1414.2不区分先后顺序。
S1415,主叫终端设备基于video_channel向通信服务器发送主叫视频;
S1416,通信服务器基于主叫视频捕捉主叫驱动参数;
S1417,通信服务器基于action_channel向主叫终端设备发送主叫驱动参数;
S1418,主叫终端设备基于主叫驱动参数重建主叫数字人;主叫终端设备将重建的主叫数字人与主叫用户场景进行渲染,得到渲染画面;对渲染画面进行编码,得到主叫渲染视频;
S1419.1,主叫终端设备基于audio_channel向通信服务器发送主叫音频;
S1419.2,主叫终端设备基于video_channel向通信服务器发送主叫渲染视频;
可以理解,S1419.1、S1419.2不区分先后顺序。
S1420,通信服务器对主叫数字人进行一致性校验,假设一致性校验通过;
S1421.1,通信服务器基于audio_channel向被叫终端设备发送主叫音频;
S1421.2,通信服务器基于video_channel向被叫终端设备发送主叫渲染视频;
可以理解,S1421.1、S1421.2不区分先后顺序。
被叫终端设备收到主叫渲染视频之后,就可以将其加载到显示屏上进行显示,同时还可以播放主叫音频。
主叫终端设备收到主叫渲染视频之后,也可以将其加载到显示屏上进行显示,同时还可以播放主叫音频,实现主叫用户回看本端的通话画面内容。
S1422,被叫终端设备基于video_channel向通信服务器发送被叫视频;
S1423,通信服务器基于被叫视频捕捉被叫驱动参数;
S1424,通信服务器基于action_channel向被叫终端设备发送被叫驱动参数;
S1425,被叫终端设备基于被叫驱动参数重建被叫数字人;被叫终端设备将重建的被叫数字人与被叫用户场景进行渲染,得到渲染画面;对渲染画面进行编码,得到被叫渲染 视频;
S1426.1,被叫终端设备基于audio_channel向通信服务器发送被叫音频;
S1426.2,被叫终端设备基于video_channel向通信服务器发送被叫渲染视频;
可以理解,S1426.1、S1426.2不区分先后顺序。
S1427,通信服务器对被叫数字人进行一致性校验,假设一致性校验通过;
S1428.1,通信服务器基于audio_channel向主叫终端设备发送被叫音频;
S1428.2,通信服务器基于video_channel向主叫终端设备发送被叫渲染视频。
可以理解,S1428.1、S1428.2不区分先后顺序。
主叫终端设备收到被叫渲染视频之后,就可以将其加载到显示屏上进行显示,同时还可以播放被叫音频。
被叫终端设备收到被叫渲染视频之后,也可以将其加载到显示屏上进行显示,同时还可以播放被叫音频,实现被叫用户回看本端的通话画面内容。
在上述示例6中,终端设备可以向通信服务器以及对端终端设备表达用户的通话需求,实现了按照用户的需求向对端提供本端的通话画面内容以及按照用户的需求在本端呈现对端提供的通话画面内容的技术效果,即主被叫用户均以数字人形象进行通话,以及主被叫用户均看到对端数字人在对端真实场景中;另外通信服务器、终端设备通过交互通话附加服务能力信息,明确了在为用户提供数字人服务的过程中,终端设备和通信服务器的分工情况(即各设备需要执行哪些操作),使得重建、渲染都在终端设备实现,而捕捉、认证鉴权(如人号一致认证)在通信服务器实现,并根据分工情况建立数字人服务所需的数据传输通道;并且,通过认证鉴权,保证了数字人服务在通话场景中被安全、合法地使用。
示例7、捕捉、渲染在终端设备实现,认证鉴权、重建在通信服务器实现。
具体以数字人模型存储在通信服务器,捕捉和渲染在终端设备实现,重建在通信服务器实现,主被叫用户看到的都是对端真实场景,主叫终端设备和被叫终端设备均不支持视角可变为例。
本示例重建在通信服务器实现后,通信服务器不再需要将数字人模型发送给主被叫终端设备,而是通信服务器根据主被叫用户数字人驱动参数重建主被叫用户数字人后,发给主被叫终端设备。其余无区别。
仍以主被叫用户均看到对端数字人在对端真实场景为例,主被叫用户与通信服务器之间需要传输的数据类型包括如图15A所示。
其中,主叫终端设备与通信服务器之间的数据传输可以包括:
主叫终端设备向通信服务器发送主叫视频;
主叫终端设备向通信服务器发送主叫场景;
通信服务器向主叫终端设备发送主叫驱动参数;
通信服务器向主叫终端设备发送被叫数字人;
通信服务器向主叫终端设备发送被叫音频;
通信服务器向主叫终端设备发送被叫视频;
通信服务器向主叫终端设备发送被叫场景;
主叫终端设备向通信服务器发送被叫渲染视频;
通信服务器向主叫终端设备发送主叫渲染视频。
其中,被叫终端设备与通信服务器之间的数据传输可以包括:
被叫终端设备向通信服务器发送被叫视频;
被叫终端设备向通信服务器发送被叫场景;
通信服务器向被叫终端设备发送被叫驱动参数;
通信服务器向被叫终端设备发送主叫数字人;
通信服务器向被叫终端设备发送主叫音频;
通信服务器向被叫终端设备发送主叫视频;
通信服务器向被叫终端设备发送主叫场景;
被叫终端设备向通信服务器发送主叫渲染视频;
通信服务器向被叫终端设备发送被叫渲染视频。
参见图15B~图15C示出了主叫用户、被叫用户、通信服务器之间的通信流程,流程说明如下:
S1501,主叫终端设备向通信服务器发送消息A,消息A中携带如下几种参数:主叫用户的需求、主叫终端设备与被叫终端设备间的协商内容、周主叫终端设备与通信服务器间的协商内容;
S1502,通信服务器进行主叫人号一致认证,即验证主叫终端设备的当前用户是否是主叫终端设备对应的电话码号的开户人,验证通过之后再执行下一步S1503;
S1503,通信服务器向被叫终端设备发送消息B,其中携带如下几种参数:主叫用户的需求、主叫终端设备与被叫终端设备间的协商内容、通信服务器与被叫终端设备间的协商内容;
S1504,被叫终端设备结合被叫用户的需求、被叫终端设备的通话附加服务能力信息以及消息B中携带的参数,确定被叫终端设备的协商结果;
S1505,被叫终端设备向通信服务器发送消息C,其中携带如下几种参数:被叫终端设备协商结果;
S1506,通信服务器进行被叫人号一致认证,即验证被叫终端设备的当前用户是否是被叫终端设备对应的电话码号的开户人;验证通过之后,确定通信服务器与主叫终端设备间的协商结果;
S1507,通信服务器向主叫终端设备发送消息D,其中携带如下几种参数:被叫终端设备的协商结果、通信服务器与主叫终端设备间的协商结果;
S1508,主叫终端设备根据被叫终端设备的协商结果、通信服务器与主叫终端设备间的协商结果,确认主叫终端设备的协商结果;
S1509.主叫终端设备向通信服务器发送消息E,其中携带如下参数:主叫终端设备的协商结果;
S1510,通信服务器向被叫终端设备发送消息F,其中携带如下参数:主叫终端设备的协商结果。
以上步骤中涉及的参数的说明可参考表2,此处不再赘述。
S1511,被叫终端设备向通信服务器发送“OK”消息,通信服务器向主叫终端设备发送“OK”消息,其中“OK”消息为用于指示参数交互完成;
S1512,主叫终端设备与通信服务器建立传输通道;步骤S1513.被叫终端设备与通信服务器建立传输通道;
需要创建的数据传输通道类型包括:audio_channel、video_channel、action_channel、digitalman_channel。
S1514,主叫终端设备捕捉主叫驱动参数;
S1515,主叫终端设备基于action_channel向通信服务器发送主叫驱动参数,基于audio_channel向通信服务器发送主叫音频;
S1516,通信服务器基于主叫驱动参数、主叫音频重建主叫数字人;
S1517,通信服务器基于digitalman_channel将重建好的主叫数字人发送给主叫终端设备;
S1518,主叫终端设备将重建的主叫数字人与主叫用户场景进行渲染,得到渲染画面;对渲染画面进行编码,得到主叫渲染视频;
S1519,主叫终端设备基于video_channel向通信服务器发送主叫渲染视频;
S1520,通信服务器对主叫数字人进行一致性校验,假设校验通过;
S1521,在主叫数字人一致性校验通过之后,通信服务器基于audio_channel向被叫终端设备发送主叫音频;基于video_channel向被叫终端设备发送主叫渲染视频;
被叫终端设备收到主叫渲染视频之后,就可以将其加载到显示屏上进行显示,同时还可以播放主叫音频。
主叫终端设备收到主叫渲染视频之后,也可以将其加载到显示屏上进行显示,同时还可以播放主叫音频,实现主叫用户回看本端的通话画面内容。
S1522,被叫终端设备捕捉被叫驱动参数;
S1523,被叫终端设备基于action_channel向通信服务器发送被叫驱动参数,基于audio_channel向通信服务器发送被叫音频;
S1524,通信服务器基于被叫驱动参数、被叫音频重建被叫数字人;
S1525,通信服务器基于digitalman_channel将重建好的被叫数字人发送给被叫终端设备;
S1526,被叫终端设备将重建的被叫数字人与被叫用户场景进行渲染,得到渲染画面;对渲染画面进行编码,得到被叫渲染视频;
S1527,被叫终端设备基于video_channel向通信服务器发送被叫渲染视频;
S1528,通信服务器对被叫数字人进行一致性校验,假设校验通过;
S1529,在被叫数字人一致性校验通过之后,通信服务器基于audio_channel向主叫终端设备发送被叫音频;基于video_channel向主叫终端设备发送被叫渲染视频。
主叫终端设备收到被叫渲染视频之后,就可以将其加载到显示屏上进行显示,同时还可以播放被叫音频。
被叫终端设备收到被叫渲染视频之后,也可以将其加载到显示屏上进行显示,同时还可以播放被叫音频,实现被叫用户回看本端的通话画面内容。
在上述示例7中,终端设备可以向通信服务器以及对端终端设备表达用户的通话需求,实现了按照用户的需求向对端提供本端的通话画面内容以及按照用户的需求在本端呈现对端提供的通话画面内容的技术效果,即主被叫用户均以数字人形象进行通话,以及主被叫用户均看到对端数字人在对端真实场景中;另外通信服务器、终端设备通过交互通话附加服务能力信息,明确了在为用户提供数字人服务的过程中,终端设备和通信服务器的分工情况(即各设备需要执行哪些操作),使得捕捉、渲染都在终端设备实现,而重建、认 证鉴权(如人号一致认证、数字人一致行校验)在通信服务器实现,并根据分工情况建立数字人服务所需的数据传输通道;并且,通过认证鉴权,保证了数字人服务在通话场景中被安全、合法地使用。
示例8、渲染在终端设备实现,认证鉴权、捕捉、重建在通信服务器实现。
具体以数字人模型存储在通信服务器,渲染在终端设备实现,捕捉、重建在通信服务器实现,主被叫用户看到的都是对端真实场景,主叫终端设备和被叫终端设备均不支持视角可变为例。
本示例捕捉与重建都在通信服务器实现后,主被叫终端设备不用再上报主叫用户/被叫用户数字人驱动参数,而是由通信服务器根据主被叫用户音视频进行捕捉生产数字人驱动参数,并使用主被叫用户输入模型重建,生成主被叫用户数字人,发给主被叫终端设备。其余无区别。
仍以主被叫用户均看到对端数字人在对端真实场景为例,主被叫用户与通信服务器之间需要传输的数据类型包括如图16A所示。
其中,主叫终端设备与通信服务器之间的数据传输可以包括:
主叫终端设备向通信服务器发送主叫视频;
主叫终端设备向通信服务器发送主叫场景;
通信服务器向主叫终端设备发送被叫数字人;
通信服务器向主叫终端设备发送被叫音频;
通信服务器向主叫终端设备发送被叫视频;
通信服务器向主叫终端设备发送被叫场景;
主叫终端设备向通信服务器发送被叫渲染视频;
通信服务器向主叫终端设备发送主叫渲染视频。
其中,被叫终端设备与通信服务器之间的数据传输可以包括:
被叫终端设备向通信服务器发送被叫视频;
被叫终端设备向通信服务器发送被叫场景;
通信服务器向被叫终端设备发送主叫数字人;
通信服务器向被叫终端设备发送主叫音频;
通信服务器向被叫终端设备发送主叫视频;
通信服务器向被叫终端设备发送主叫场景;
被叫终端设备向通信服务器发送主叫渲染视频;
通信服务器向被叫终端设备发送被叫渲染视频。
参见图16B~图16C示出了主叫用户、被叫用户、通信服务器之间的通信流程,流程说明如下:
S1601,主叫终端设备向通信服务器发送消息A,消息A中携带如下几种参数:主叫用户的需求、主叫终端设备与被叫终端设备间的协商内容、周主叫终端设备与通信服务器间的协商内容;
S1602,通信服务器进行主叫人号一致认证,即验证主叫终端设备的当前用户是否是主叫终端设备对应的电话码号的开户人,验证通过之后再执行下一步S1603;
S1603,通信服务器向被叫终端设备发送消息B,其中携带如下几种参数:主叫用户的 需求、主叫终端设备与被叫终端设备间的协商内容、通信服务器与被叫终端设备间的协商内容;
S1604,被叫终端设备结合被叫用户的需求、被叫终端设备的通话附加服务能力信息以及消息B中携带的参数,确定被叫终端设备的协商结果;
S1605,被叫终端设备向通信服务器发送消息C,其中携带如下几种参数:被叫终端设备协商结果;
S1606,通信服务器进行被叫人号一致认证,即验证被叫终端设备的当前用户是否是被叫终端设备对应的电话码号的开户人;验证通过之后,确定通信服务器与主叫终端设备间的协商结果;
S1607,通信服务器向主叫终端设备发送消息D,其中携带如下几种参数:被叫终端设备的协商结果、通信服务器与主叫终端设备间的协商结果;
S1608,主叫终端设备根据被叫终端设备的协商结果、通信服务器与主叫终端设备间的协商结果,确认主叫终端设备的协商结果;
S1609.主叫终端设备向通信服务器发送消息E,其中携带如下参数:主叫终端设备的协商结果;
S1610,通信服务器向被叫终端设备发送消息F,其中携带如下参数:主叫终端设备的协商结果。
以上步骤中涉及的参数的说明可参考表2,此处不再赘述。
S1611,被叫终端设备向通信服务器发送“OK”消息,通信服务器向主叫终端设备发送“OK”消息,其中“OK”消息为用于指示参数交互完成;
S1612,主叫终端设备与通信服务器建立传输通道;步骤S1613.被叫终端设备与通信服务器建立传输通道;
需要创建的数据传输通道类型包括:audio_channel、video_channel、digitalman_channel。
S1614,主叫终端设备基于video_channel向通信服务器发送主叫视频,基于audio_channel向通信服务器发送主叫音频;
S1615,通信服务器基于主叫视频捕捉主叫驱动参数;基于主叫驱动参数、主叫音频重建主叫数字人;
S1616,通信服务器基于digitalman_channel将重建好的主叫数字人发送给主叫终端设备;
S1617,主叫终端设备将重建的主叫数字人与主叫用户场景进行渲染,得到渲染画面;对渲染画面进行编码,得到主叫渲染视频;
S1618,主叫终端设备基于video_channel向通信服务器发送主叫渲染视频;
S1619,通信服务器对主叫数字人进行一致性校验,假设校验通过;
S1620,在主叫数字人一致性校验通过之后,通信服务器基于audio_channel向被叫终端设备发送主叫音频;基于video_channel向被叫终端设备发送主叫渲染视频;
被叫终端设备收到主叫渲染视频之后,就可以将其加载到显示屏上进行显示,同时还可以播放主叫音频。
主叫终端设备收到主叫渲染视频之后,也可以将其加载到显示屏上进行显示,同时还可以播放主叫音频,实现主叫用户回看本端的通话画面内容。
S1621,被叫终端设备基于video_channel向通信服务器发送被叫视频,基于 audio_channel向通信服务器发送被叫音频;
S1622,通信服务器基于被叫视频捕捉被叫驱动参数;基于被叫驱动参数、被叫音频重建被叫数字人;
S1623,通信服务器基于digitalman_channel将重建好的被叫数字人发送给被叫终端设备;
S1624,被叫终端设备将重建的被叫数字人与被叫用户场景进行渲染,得到渲染画面;对渲染画面进行编码,得到被叫渲染视频;
S1625,被叫终端设备基于video_channel向通信服务器发送被叫渲染视频;
S1626,通信服务器对被叫数字人进行一致性校验,假设校验通过;
S1627,在被叫数字人一致性校验通过之后,通信服务器基于audio_channel向主叫终端设备发送被叫音频;基于video_channel向主叫终端设备发送被叫渲染视频。
主叫终端设备收到被叫渲染视频之后,就可以将其加载到显示屏上进行显示,同时还可以播放被叫音频。
被叫终端设备收到被叫渲染视频之后,也可以将其加载到显示屏上进行显示,同时还可以播放被叫音频,实现被叫用户回看本端的通话画面内容。
在上述示例8中,终端设备可以向通信服务器以及对端终端设备表达用户的通话需求,实现了按照用户的需求向对端提供本端的通话画面内容以及按照用户的需求在本端呈现对端提供的通话画面内容的技术效果,即主被叫用户均以数字人形象进行通话,以及主被叫用户均看到对端数字人在对端真实场景中;另外通信服务器、终端设备通过交互通话附加服务能力信息,明确了在为用户提供数字人服务的过程中,终端设备和通信服务器的分工情况(即各设备需要执行哪些操作),使得捕捉、重建和认证鉴权(如人号一致认证、数字人一致性校验)都在通信服务器实现,而渲染终端设备实现,并根据分工情况建立数字人服务所需的数据传输通道;并且,通过认证鉴权,保证了数字人服务在通话场景中被安全、合法地使用。
示例9、认证鉴权、捕捉、重建、渲染在终端设备实现。
具体以数字人模型存储在终端设备,捕捉、重建、渲染在终端设备实现,主被叫用户看到的都是对端真实场景,主叫终端设备和被叫终端设备均不支持视角可变为例。
参见图17A~图17B,示出了主叫用户、被叫用户、通信服务器之间的通信流程,流程说明如下:
S1700,主叫终端设备认证主叫用户的身份(例如基于指纹、人脸、声纹等校验特征认证用户身份,确保当前用户的身份信息与主叫数字模型人绑定的用户身份信息一致);
S1701,主叫终端设备向通信服务器发送消息A,消息A中携带如下几种参数:主叫用户的需求、主叫终端设备与被叫终端设备间的协商内容、周主叫终端设备与通信服务器间的协商内容;
S1702,通信服务器进行主叫人号一致认证(即验证主叫终端设备的当前用户是否是主叫终端设备对应的电话码号的开户人),防篡改校验(如使用数字人审核设备的公钥或数字人模型制作设备的公钥验证主叫用户对应的数字人模型、身份信息、校验特征等信息的数字签名)等,以上验证均通过之后再执行下一步S1703;
S1703,通信服务器向被叫终端设备发送消息B,其中携带如下几种参数:主叫用户的 需求、主叫终端设备与被叫终端设备间的协商内容、通信服务器与被叫终端设备间的协商内容;
S1704,被叫终端设备认证用户的身份(例如基于指纹、人脸、声纹等校验特征认证用户身份,确保当前用户的身份信息与被叫数字模型人绑定的用户身份信息一致);被叫终端设备结合被叫用户的需求、被叫终端设备的通话附加服务能力信息以及消息B中携带的参数,确定被叫终端设备的协商结果;
S1705,被叫终端设备向通信服务器发送消息C,其中携带如下几种参数:被叫终端设备协商结果;
S1706,通信服务器进行被叫人号一致认证(即验证被叫终端设备的当前用户是否是被叫终端设备对应的电话码号的开户人),防篡改校验(如使用数字人审核设备的公钥或数字人模型制作设备的公钥验证被叫用户对应的数字人模型、身份信息、校验特征等信息的数字签名)等,以上验证均通过之后,确定通信服务器与主叫终端设备间的协商结果;
S1707,通信服务器向主叫终端设备发送消息D,其中携带如下几种参数:被叫终端设备的协商结果、通信服务器与主叫终端设备间的协商结果;
S1708,主叫终端设备根据被叫终端设备的协商结果、通信服务器与主叫终端设备间的协商结果,确认主叫终端设备的协商结果;
S1709.主叫终端设备向通信服务器发送消息E,其中携带如下参数:主叫终端设备的协商结果;
S1710,通信服务器向被叫终端设备发送消息F,其中携带如下参数:主叫终端设备的协商结果。
以上步骤中涉及的参数的说明可参考表2,此处不再赘述。
S1711,被叫终端设备向通信服务器发送“OK”消息,通信服务器向主叫终端设备发送“OK”消息,其中“OK”消息为用于指示参数交互完成;
S1712,主叫终端设备与通信服务器建立传输通道;步骤S1713.被叫终端设备与通信服务器建立传输通道;
需要创建的数据传输通道类型包括:audio_channel、video_channel。
S1714,主叫终端设备捕捉主叫驱动参数;基于主叫驱动参数、主叫音频重建主叫数字人;将重建的主叫数字人与主叫用户场景进行渲染,得到渲染画面;对渲染画面进行编码,得到主叫渲染视频;
S1715,基于video_channel向通信服务器发送主叫渲染视频,基于audio_channel向通信服务器发送主叫音频;
S1716,通信服务器基于video_channel向被叫终端设备发送主叫渲染视频,基于audio_channel向被叫终端设备发送主叫音频;
被叫终端设备收到主叫渲染视频之后,就可以将其加载到显示屏上进行显示,同时还可以播放主叫音频。
主叫终端设备收到主叫渲染视频之后,也可以将其加载到显示屏上进行显示,同时还可以播放主叫音频,实现主叫用户回看本端的通话画面内容。
S1717,被叫终端设备捕捉被叫驱动参数;基于被叫驱动参数、被叫音频重建被叫数字人;将重建的被叫数字人与被叫用户场景进行渲染,得到渲染画面;对渲染画面进行编码,得到被叫渲染视频;
S1718,基于video_channel向通信服务器发送被叫渲染视频,基于audio_channel向通信服务器发送被叫音频;
S1719,通信服务器基于video_channel向主叫终端设备发送被叫渲染视频,基于audio_channel向主叫终端设备发送被叫音频。
主叫终端设备收到被叫渲染视频之后,就可以将其加载到显示屏上进行显示,同时还可以播放被叫音频。
被叫终端设备收到被叫渲染视频之后,也可以将其加载到显示屏上进行显示,同时还可以播放被叫音频,实现被叫用户回看本端的通话画面内容。
在上述示例9中,终端设备可以向通信服务器以及对端终端设备表达用户的通话需求,实现了按照用户的需求向对端提供本端的通话画面内容以及按照用户的需求在本端呈现对端提供的通话画面内容的技术效果,即主被叫用户均以数字人形象进行通话,以及主被叫用户均看到对端数字人在对端真实场景中;另外通信服务器、终端设备通过交互通话附加服务能力信息,明确了在为用户提供数字人服务的过程中,终端设备和通信服务器的分工情况(即各设备需要执行哪些操作),使得捕捉、重建、渲染和认证鉴权(如用户身份认证)都在终端设备实现,并根据分工情况建立数字人服务所需的数据传输通道;并且,通过认证鉴权,保证了数字人服务在通话场景中被安全、合法地使用。
示例10、捕捉、重建在终端设备实现,渲染在通信服务器实现。
具体以数字人模型存储在终端设备,渲染在通信服务器实现,捕捉、重建在终端设备实现,主被叫用户看到的都是对端真实场景,主叫终端设备和被叫终端设备均不支持视角可变为例。
参见图18A~图18B,示出了主叫用户、被叫用户、通信服务器之间的通信流程,流程说明如下:
S1800,主叫终端设备认证主叫用户的身份(例如基于指纹、人脸、声纹等校验特征认证用户身份,确保当前用户的身份信息与主叫数字模型人绑定的用户身份信息一致);
S1801,主叫终端设备向通信服务器发送消息A,消息A中携带如下几种参数:主叫用户的需求、主叫终端设备与被叫终端设备间的协商内容、周主叫终端设备与通信服务器间的协商内容;
S1802,通信服务器进行主叫人号一致认证(即验证主叫终端设备的当前用户是否是主叫终端设备对应的电话码号的开户人),防篡改校验(如使用数字人审核设备的公钥或数字人模型制作设备的公钥验证主叫用户对应的数字人模型、身份信息、校验特征等信息的数字签名)等,以上验证均通过之后再执行下一步S1803;
S1803,通信服务器向被叫终端设备发送消息B,其中携带如下几种参数:主叫用户的需求、主叫终端设备与被叫终端设备间的协商内容、通信服务器与被叫终端设备间的协商内容;
S1804,被叫终端设备认证被用户的身份(例如基于指纹、人脸、声纹等校验特征认证用户身份,确保当前用户的身份信息与被叫数字模型人绑定的用户身份信息一致);被叫终端设备结合被叫用户的需求、被叫终端设备的通话附加服务能力信息以及消息B中携带的参数,确定被叫终端设备的协商结果;
S1805,被叫终端设备向通信服务器发送消息C,其中携带如下几种参数:被叫终端设 备协商结果;
S1806,通信服务器进行被叫人号一致认证(即验证被叫终端设备的当前用户是否是被叫终端设备对应的电话码号的开户人),防篡改校验(如使用数字人审核设备的公钥或数字人模型制作设备的公钥验证被叫用户对应的数字人模型、身份信息、校验特征等信息的数字签名)等,以上验证均通过之后,确定通信服务器与主叫终端设备间的协商结果;
S1807,通信服务器向主叫终端设备发送消息D,其中携带如下几种参数:被叫终端设备的协商结果、通信服务器与主叫终端设备间的协商结果;
S1808,主叫终端设备根据被叫终端设备的协商结果、通信服务器与主叫终端设备间的协商结果,确认主叫终端设备的协商结果;
S1809.主叫终端设备向通信服务器发送消息E,其中携带如下参数:主叫终端设备的协商结果;
S1810,通信服务器向被叫终端设备发送消息F,其中携带如下参数:主叫终端设备的协商结果。
以上步骤中涉及的参数的说明可参考表2,此处不再赘述。
S1811,被叫终端设备向通信服务器发送“OK”消息,通信服务器向主叫终端设备发送“OK”消息,其中“OK”消息为用于指示参数交互完成;
S1812,主叫终端设备与通信服务器建立传输通道;步骤S1813.被叫终端设备与通信服务器建立传输通道;
需要创建的数据传输通道类型包括:audio_channel、video_channel、digitalman_channel。
S1814,主叫终端设备捕捉主叫驱动参数;基于主叫驱动参数、主叫音频重建主叫数字人;
S1815,主叫终端设备基于digitalman_channel向通信服务器发送重建好的主叫数字人;基于video_channel向通信服务器发送主叫视频,基于audio_channel向通信服务器发送主叫音频;
S1816,通信服务器将重建好的主叫数字人与主叫用户场景进行渲染,得到渲染画面;对渲染画面进行编码,得到主叫渲染视频;
S1817,通信服务器基于video_channel向被叫终端设备发送主叫渲染视频,基于audio_channel向被叫终端设备发送主叫音频;
被叫终端设备收到主叫渲染视频之后,就可以将其加载到显示屏上进行显示,同时还可以播放主叫音频。
主叫终端设备收到主叫渲染视频之后,也可以将其加载到显示屏上进行显示,同时还可以播放主叫音频,实现主叫用户回看本端的通话画面内容。
S1818,被叫终端设备捕捉被叫驱动参数;基于被叫驱动参数、被叫音频重建被叫数字人;
S1819,被叫终端设备基于digitalman_channel向通信服务器发送重建好的被叫数字人;基于video_channel向通信服务器发送被叫视频,基于audio_channel向通信服务器发送被叫音频;
S1820,通信服务器将重建好的被叫数字人与被叫用户场景进行渲染,得到渲染画面;对渲染画面进行编码,得到被叫渲染视频;
S1821,通信服务器基于video_channel向主叫终端设备发送被叫渲染视频,基于 audio_channel向主叫终端设备发送被叫音频。
主叫终端设备收到被叫渲染视频之后,就可以将其加载到显示屏上进行显示,同时还可以播放被叫音频。
被叫终端设备收到被叫渲染视频之后,也可以将其加载到显示屏上进行显示,同时还可以播放被叫音频,实现被叫用户回看本端的通话画面内容。
在上述示例10中,终端设备可以向通信服务器以及对端终端设备表达用户的通话需求,实现了按照用户的需求向对端提供本端的通话画面内容以及按照用户的需求在本端呈现对端提供的通话画面内容的技术效果,即主被叫用户均以数字人形象进行通话,以及主被叫用户均看到对端数字人在对端真实场景中;另外通信服务器、终端设备通过交互通话附加服务能力信息,明确了在为用户提供数字人服务的过程中,终端设备和通信服务器的分工情况(即各设备需要执行哪些操作),使得一部分认证鉴权(如身份认证)、捕捉、重建在终端设备实现,另一部分认证鉴权(防篡改校验、人号一致认证等)、渲染在通信服务器实现,并根据分工情况建立数字人服务所需的数据传输通道;并且,通过认证鉴权,保证了数字人服务在通话场景中被安全、合法地使用。
可以理解,上述示例1~示例10中通信服务器执行的相关的功能(如建立数据传输通道、捕捉、重建、渲染、认证鉴权等)的操作可以由CSCF网元配合MS网元完成,上述各个数据传输通道可以为MS网元和主被叫终端之间的数据传输通道。例如,上述示例5中的图13D所示的方法,则通信服务器所执行方法步骤的具体细化可以如图19所示。
可以理解,以上各示例可以分别单独实施,也可以相互结合实施。
可以理解,为了实现上述方法实施例中功能,上述方法实施例中各个设备包括了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本申请中所公开的实施例描述的各示例的单元及方法步骤,本申请能够以硬件或硬件和计算机软件相结合的形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用场景和设计约束条件。
以下结合图20和图21介绍本申请实施例提供的装置。
图20和图21为本申请的实施例提供的可能的通信装置的结构示意图。这些通信装置可以用于实现上述方法实施例中任一终端设备或通信服务器的功能,因此也能实现上述方法实施例所具备的有益效果。在本申请的实施例中,该通信装置可以是如图1A至图1G所示的主叫终端设备、被叫终端设备或通信服务器中的一个,也可以是应用于主叫终端设备、被叫终端设备或通信服务器的模块(如芯片)。
如图20所示,通信装置2000包括处理单元2010和收发单元2020。通信装置2000用于实现上述方法实施例中主叫终端设备、被叫终端设备或通信服务器的功能。
如图21所示,通信装置2100包括处理器2110和接口电路2120。处理器2110和接口电路2120之间相互耦合。可以理解,接口电路2120可以为收发器或输入输出接口。可选的,通信装置2100还可以包括存储器2130,用于存储处理器2110执行的指令或存储处理器2110运行指令所需要的输入数据或存储处理器2110运行指令后产生的数据。
可以理解,本申请的实施例中的处理器可以是中央处理单元(Central Processing Unit,CPU),还可以是其它通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其它可编程逻辑器件、晶体管逻辑器件,硬件部件 或者其任意组合。通用处理器可以是微处理器,也可以是任何常规的处理器。
基于相同的技术构思,本申请实施例还提供一种计算机可读存储介质,存储介质中存储有计算机程序或指令,当计算机程序或指令被通信装置执行时,实现上述方法实施例中任一终端设备或通信服务器的所执行的方法。
基于相同的技术构思,本申请实施例还提供一种计算机程序产品,包括指令,当其在计算机上运行时,使得上述方法实施例中任一终端设备或通信服务器的所执行的方法被执行。
本申请的实施例中的方法步骤可以通过硬件的方式来实现,也可以由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器、闪存、只读存储器、可编程只读存储器、可擦除可编程只读存储器、电可擦除可编程只读存储器、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于网络设备或终端设备中。当然,处理器和存储介质也可以作为分立组件存在于网络设备或终端设备中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机程序或指令。在计算机上加载和执行所述计算机程序或指令时,全部或部分地执行本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、用户设备或者其它可编程装置。所述计算机程序或指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机程序或指令可以从一个网站站点、计算机、服务器或数据中心通过有线或无线方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是集成一个或多个可用介质的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,例如,软盘、硬盘、磁带;也可以是光介质,例如,数字视频光盘;还可以是半导体介质,例如,固态硬盘。该计算机可读存储介质可以是易失性或非易失性存储介质,或可包括易失性和非易失性两种类型的存储介质。
在本申请的各个实施例中,如果没有特殊说明以及逻辑冲突,不同的实施例之间的术语和/或描述具有一致性、且可以相互引用,不同的实施例中的技术特征根据其内在的逻辑关系可以组合形成新的实施例。
本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。在本申请的文字描述中,字符“/”,一般表示前后关联对象是一种“或”的关系;在本申请的公式中,字符“/”,表示前后关联对象是一种“相除”的关系。“包括A,B和C中的至少一个”可以表示:包括A;包括B;包括C;包括A和B;包括A和C;包括B和C;包括A、B和C。
可以理解,在本申请的实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本申请的实施例的范围。上述各过程的序号的大小并不意味着执行顺序的先后, 各过程的执行顺序应以其功能和内在逻辑确定。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。
Claims (51)
- 一种提供通话附加服务的方法,应用于终端设备,所述终端设备通过通信服务器提供的服务与至少一个对端终端设备进行通话,其特征在于,包括:所述终端设备获取通话附加服务需求信息,其中所述通话附加服务需求信息用于表示所述终端设备对应的用户要求在所述至少一个对端终端设备的通话应用界面中呈现所述用户的数字人形象;所述终端设备根据所述通话附加服务需求信息使所述通信服务器向所述至少一个对端终端设备发送所述用户的数字人内容,以使所述至少一个对端终端设备基于所述用户的数字人内容在通话应用界面呈现所述用户的数字人形象;或者,所述终端设备根据所述通话附加服务需求信息使所述至少一个对端终端设备生成所述用户的数字人内容,以使所述至少一个对端终端设备基于所述用户的数字人内容在通话应用界面呈现所述用户的数字人形象。
- 如权利要求1所述的方法,其特征在于,所述终端设备根据所述通话附加服务需求信息使所述通信服务器向所述至少一个对端终端设备发送所述用户的数字人内容,包括:所述终端设备根据所述通话附加服务需求信息生成所述用户的数字人内容;所述终端设备向所述通信服务器发送所述用户的数字人内容,以使所述通信服务器向所述至少一个对端终端设备发送所述用户的数字人内容;或者,所述终端设备向所述通信服务器发送所述通话附加服务需求信息,以使所述通信服务器生成所述用户的数字人内容以及向所述至少一个对端终端设备发送所述用户的数字人内容。
- 如权利要求1所述的方法,其特征在于,所述终端设备根据所述通话附加服务需求信息使所述至少一个对端终端设备生成所述用户的数字人内容,包括:所述终端设备向所述至少一个对端终端设备发送所述通话附加服务需求信息,以使所述至少一个对端终端设备生成所述用户的数字人内容。
- 如权利要求1-3任一项所述的方法,其特征在于,所述通话附加服务需求信息还用于表示所述终端设备对应的用户要求在所述至少一个对端终端设备的通话应用界面中呈现所述用户的虚拟场景画面,所述方法还包括:所述终端设备根据所述通话附加服务需求信息使所述通信服务器向所述至少一个对端终端设备发送所述用户的虚拟场景内容,以使所述至少一个对端终端设备基于所述用户的虚拟场景内容在通话应用界面呈现所述用户的虚拟场景画面;或者,所述终端设备根据所述通话附加服务需求信息使所述至少一个对端终端设备生成所述用户的虚拟场景内容,以使所述至少一个对端终端设备基于所述用户的虚拟场景内容在通话应用界面呈现所述用户的虚拟场景画面。
- 如权利要求4所述的方法,其特征在于,所述终端设备根据所述通话附加服务需求信息使所述通信服务器向所述至少一个对端终端设备发送所述用户的虚拟场景内容,包括:所述终端设备根据所述通话附加服务需求信息生成所述用户的虚拟场景内容;所述终端设备向所述通信服务器发送所述用户的虚拟场景内容,以使所述通信服务器向所述至少一个对端终端设备发送所述用户的虚拟场景内容;或者,所述终端设备向所述通信服务器发送所述通话附加服务需求信息,以使所述通信服务 器生成所述用户的虚拟场景内容以及向所述至少一个对端终端设备发送所述用户的虚拟场景内容。
- 如权利要求4所述的方法,其特征在于,所述终端设备根据所述通话附加服务需求信息使所述至少一个对端终端设备生成所述用户的虚拟场景内容,包括:所述终端设备向所述至少一个对端终端设备发送所述通话附加服务需求信息。
- 一种提供通话附加服务的方法,应用于通信服务器,所述通信服务器用于为终端设备与至少一个对端终端设备之间的通话提供服务,其特征在于,所述方法包括:所述通信服务器接收来自所述终端设备的通话附加服务需求信息,其中所述通话附加服务需求信息用于表示所述终端设备对应的用户要求在所述至少一个对端终端设备的通话应用界面中呈现所述用户的数字人形象;所述通信服务器根据所述通话附加服务需求信息向所述至少一个对端终端设备发送所述通信服务器生成的所述用户的数字人内容,以使所述至少一个对端终端设备基于所述用户的数字人内容在通话应用界面呈现所述用户的数字人形象;或者,所述通信服务器向所述至少一个对端终端设备发送所述通话附加服务需求信息,以使所述至少一个对端终端设备生成所述用户的数字人内容以及基于所述用户的数字人内容在通话应用界面呈现所述用户的数字人形象。
- 如权利要求7所述的方法,其特征在于,所述通话附加服务需求信息还用于表示所述终端设备对应的用户要求在所述至少一个对端终端设备的通话应用界面中呈现所述用户的虚拟场景画面,所述方法还包括:所述通信服务器根据所述通话附加服务需求信息向所述至少一个对端终端设备发送所述通信服务器生成的所述用户的虚拟场景内容,以使所述至少一个对端终端设备基于所述用户的虚拟场景内容在通话应用界面呈现所述用户的虚拟场景画面;或者,所述通信服务器向所述至少一个对端终端设备发送所述通话附加服务需求信息,以使所述至少一个对端终端设备生成所述用户的虚拟场景内容以及基于所述用户的虚拟场景内容在通话应用界面呈现所述用户的虚拟场景画面。
- 一种提供通话附加服务的方法,应用于终端设备,所述终端设备通过通信服务器提供的服务与至少一个对端终端设备进行通话,其特征在于,包括:所述终端设备向所述通信服务器发送第一通话附加服务能力信息,第一通话附加服务能力信息用于指示所述终端设备具备的数字人服务能力;所述终端设备接收所述通信服务器发送的指示信息;其中,所述指示信息用于指示所述终端设备执行的数字人处理操作,和/或,所述指示信息包括第二通话附加服务能力信息和第三通话附加服务能力信息,所述第二通话附加服务能力信息用于指示所述通信服务器具备的数字人服务能力,所述第三通话附加服务能力信息用于指示所述至少一个对端终端设备具备的数字人服务能力;所述终端设备根据所述指示信息执行至少一个数字人处理操作或不执行数字人处理操作。
- 如权利要求9所述的方法,其特征在于,所述数字人处理操作包括捕捉操作、重建操作、渲染操作中的一个或多个操作;所述捕捉操作包括:获取所述终端设备对应的用户的驱动参数,所述驱动参数包括所述终端设备对应的用户的唇形、表情、动作、深度信息中的至少一种;所述重建操作包括:根据所述终端设备对应的用户的驱动参数、所述终端设备对应的用户的数字人模型,生成数字人图像序列,所述数字人图像序列包括所述数字人模型驱动后的多帧图像;所述渲染操作包括:根据所述数字人图像序列和场景图像序列,生成所述终端设备对应的用户的通话画面内容。
- 如权利要求10所述的方法,其特征在于,所述指示信息用于指示所述第二通话附加服务能力信息和/或所述第三通话附加服务能力信息;所述终端设备根据所述指示信息执行至少一个数字人处理操作,包括:所述终端设备根据所述第一通话附加服务能力信息、所述第二通话附加服务能力信息、所述第三通话附加服务能力信息中的至少一项确定所述捕捉操作、所述重建操作、所述渲染操作中的至少一个操作;所述终端设备执行所述至少一个操作。
- 如权利要求10或11所述的方法,其特征在于,所述第一通话附加服务能力信息包括以下一项或多项:用于指示所述终端设备是/否保存有所述终端设备对应的用户的数字人模型的信息;用于指示所述终端设备是/否保存有所述终端设备对应的用户的虚拟场景的信息;用于指示所述终端设备是/否能够执行所述重建操作的信息;用于指示所述终端设备能够提供的驱动参数的信息;用于指示所述终端设备是/否能够执行所述渲染操作的信息;用于指示所述终端设备是/否需要所述通信服务器回传所述用户的通话画面内容的信息;用于指示所述终端设备是/否提供视角信息的信息。
- 如权利要求10或11所述的方法,其特征在于,所述第二通话附加服务能力信息包括以下一项或多项:用于指示所述通信服务器是/否保存有所述终端设备对应的用户的数字人模型的信息;用于指示所述通信服务器是/否保存有所述终端设备对应的用户的虚拟场景的信息;用于指示所述通信服务器是/否能够执行所述重建操作的信息;用于指示所述通信服务器能够接收的驱动参数的信息;用于指示所述通信服务器能够提取的驱动参数的信息;用于指示所述通信服务器是/否能够执行所述渲染操作的信息;用于指示所述通信服务器是/否能够向所述终端设备回传所述用户的通话画面内容的信息。
- 如权利要求10或11所述的方法,其特征在于,所述第三通话附加服务能力信息包括以下一项或多项:用于指示所述至少一个对端终端设备是/否能够执行所述渲染操作的信息;用于指示所述至少一个对端终端设备是/否提供视角信息的信息。
- 如权利要求10-14任一项所述的方法,其特征在于,所述终端设备执行所述重建操作,所述方法还包括:所述终端设备与所述通信服务器建立第一通道,所述第一通道用于传输所述终端设备对应的用户的数字人模型。
- 如权利要求10-14任一项所述的方法,其特征在于,所述终端设备执行所述重建操作且所述通信服务器执行所述渲染操作,所述方法还包括:所述终端设备与所述通信服务器建立第二通道,所述第二通道用于传输所述终端设备对应的用户的数字人图像序列。
- 如权利要求10-14任一项所述的方法,其特征在于,所述终端设备执行所述捕捉操作且所述通信服务器执行所述重建操作,所述方法还包括:所述终端设备与所述通信服务器建立第三通道,所述第三通道用于传输所述终端设备对应的用户的驱动参数。
- 如权利要求10-14任一项所述的方法,其特征在于,所述通信服务器执行所述渲染操作,所述第一通话附加服务能力信息包括用于指示所述终端设备保存有所述终端设备对应的用户的虚拟场景的信息,所述终端设备对应的用户的通话画面内容包括所述用户的虚拟场景画面内容;所述方法还包括:所述终端设备与所述通信服务器建立第四通道,所述第四通道用于传输所述终端设备对应的用户的虚拟场景。
- 如权利要求10-14任一项所述的方法,其特征在于,所述第一通话附加服务能力信息包括用于指示所述终端设备需要所述通信服务器回传所述用户的通话画面内容的信息,所述方法还包括:所述终端设备与所述通信服务器建立第五通道,所述第五通道用于传输所述终端设备对应的用户的通话画面内容。
- 如权利要求10-14任一项所述的方法,其特征在于,所述方法还包括:所述终端设备与所述通信服务器建立第六通道,所述第六通道用于传输所述终端设备对应的用户的视角信息,其中所述第一通话附加服务能力信息包括用于指示所述终端设备是/否提供视角信息的信息;和/或,所述终端设备与所述通信服务器建立第七通道或第八通道,所述第七通道用于传输所述终端设备对应的用户的音频,所述第八通道用于传输所述终端设备对应的用户的视频。
- 如权利要求10-20任一项所述的方法,其特征在于,所述终端设备执行所述捕捉操作;所述获取所述终端设备对应的用户的驱动参数,包括:所述终端设备利用传感器采集所述终端设备对应的用户的驱动参数;和/或,所述终端设备根据所述终端设备对应的用户的视频和/或音频确定所述终端设备对应的用户的驱动参数。
- 一种提供通话附加服务的方法,应用于通信服务器,所述通信服务器用于为终端设备与至少一个对端终端设备之间的通话提供服务,其特征在于,包括:所述通信服务器接收来自所述终端设备的第一通话附加服务能力信息,所述第一通话附加服务能力信息用于指示所述终端设备具备的数字人服务能力;所述通信服务器向所述终端设备发送指示信息;其中,所述指示信息用于指示所述终端设备执行的数字人处理操作,和/或,所述指示信息包括第二通话附加服务能力信息和第三通话附加服务能力信息,所述第二通话附加服务能力信息用于指示所述通信服务器具备的数字人服务能力,所述第三通话附加服务能力信息用于指示所述至少一个对端终端设备具备的数字人服务能力。
- 如权利要求22所述的方法,其特征在于,所述数字人处理操作包括捕捉操作、重 建操作、渲染操作中的一个或多个操作;所述捕捉操作包括:获取所述终端设备对应的用户的驱动参数,所述驱动参数包括所述终端设备对应的用户的唇形、表情、动作、深度信息中的至少一种;所述重建操作包括:根据所述终端设备对应的用户的驱动参数、所述终端设备对应的用户的数字人模型,生成数字人图像序列,所述数字人图像序列包括所述数字人模型驱动后的多帧图像;所述渲染操作包括:根据所述数字人图像序列和场景图像序列,生成所述终端设备对应的用户的通话画面内容。
- 如权利要求23所述的方法,其特征在于,所述方法还包括:所述通信服务器根据所述第一通话附加服务能力信息、所述第二通话附加服务能力信息、所述第三通话附加服务能力信息中的至少一项,执行所述捕捉操作、所述重建操作、所述渲染操作中的至少一个操作。
- 如权利要求23-24任一项所述的方法,其特征在于,所述方法还包括:所述通信服务器接收来自所述至少一个对端终端设备的所述第三通话附加服务能力信息。
- 如权利要求23-25任一项所述的方法,其特征在于,所述第一通话附加服务能力信息包括以下一项或多项:用于指示所述终端设备是/否保存有所述终端设备对应的用户的数字人模型的信息;用于指示所述终端设备是/否保存有所述终端设备对应的用户的虚拟场景的信息;用于指示所述终端设备是/否能够执行所述重建操作的信息;用于指示所述终端设备能够提供的驱动参数的信息;用于指示所述终端设备是/否能够执行所述渲染操作的信息;用于指示所述终端设备是/否需要所述通信服务器回传所述用户的通话画面内容的信息;用于指示所述终端设备是/否提供视角信息的信息。
- 如权利要求23-25任一项所述的方法,其特征在于,所述第二通话附加服务能力信息包括以下一项或多项:用于指示所述通信服务器是/否保存有所述终端设备对应的用户的数字人模型的信息;用于指示所述通信服务器是/否保存有所述终端设备对应的用户的虚拟场景的信息;用于指示所述通信服务器是/否能够执行所述重建操作的信息;用于指示所述通信服务器能够接收的驱动参数的信息;用于指示所述通信服务器能够提取的驱动参数的信息;用于指示所述通信服务器是/否能够执行所述渲染操作的信息;用于指示所述通信服务器是/否能够向所述终端设备回传所述用户的通话画面内容的信息。
- 如权利要求23-25任一项所述的方法,其特征在于,所述第三通话附加服务能力信息包括以下一项或多项:用于指示所述至少一个对端终端设备是/否能够执行所述渲染操作的信息;用于指示所述至少一个对端终端设备是/否提供视角信息的信息。
- 如权利要求23-28任一项所述的方法,其特征在于,所述终端设备执行所述重建操 作,所述方法还包括:所述通信服务器与所述终端设备建立第一通道,所述第一通道用于传输所述终端设备对应的用户的数字人模型。
- 如权利要求23-28任一项所述的方法,其特征在于,所述终端设备执行所述重建操作且所述通信服务器执行所述渲染操作,所述方法还包括:所述通信服务器与所述终端设备建立第二通道,所述第二通道用于传输所述终端设备对应的用户的数字人图像序列。
- 如权利要求23-28任一项所述的方法,其特征在于,所述终端设备执行所述捕捉操作且所述通信服务器执行所述重建操作,所述方法还包括:所述通信服务器与所述终端设备建立第三通道,所述第三通道用于传输所述终端设备对应的用户的驱动参数。
- 如权利要求23-28任一项所述的方法,其特征在于,所述通信服务器执行所述渲染操作,所述第一通话附加服务能力信息包括用于指示所述终端设备保存有所述终端设备对应的用户的虚拟场景的信息,所述终端设备对应的用户的通话画面内容包括所述用户的虚拟场景画面内容;所述方法还包括:所述终端设备与所述通信服务器建立第四通道,所述第四通道用于传输所述终端设备对应的用户的虚拟场景。
- 如权利要求23-28任一项所述的方法,其特征在于,所述第一通话附加服务能力信息包括用于指示所述终端设备需要所述通信服务器回传所述用户的通话画面内容的信息,所述方法还包括:所述通信服务器与所述终端设备建立第五通道,所述第五通道用于传输所述终端设备对应的用户的通话画面内容。
- 如权利要求23-28任一项所述的方法,其特征在于,所述方法还包括:所述通信服务器与所述终端设备建立第六通道,所述第六通道用于传输所述终端设备对应的用户的视角信息,其中所述第一通话附加服务能力信息包括用于指示所述终端设备是/否提供视角信息的信息;和/或,所述通信服务器与所述终端设备建立第七通道或第八通道,所述第七通道用于传输所述终端设备对应的用户的音频,所述第八通道用于传输所述终端设备对应的用户的视频。
- 如权利要求23-34任一项所述的方法,其特征在于,所述通信服务器执行所述捕捉操作;所述获取所述终端设备对应的用户的驱动参数,包括:所述通信服务器接收来自所述终端设备的所述终端设备对应的用户的驱动参数;和/或,所述通信服务器接收来自所述终端设备的所述终端设备对应的用户的视频和/或音频,根据所述视频和/或音频确定所述终端设备对应的用户的驱动参数。
- 一种提供通话附加服务的方法,应用于通信服务器,所述通信服务器用于为终端设备与至少一个对端终端设备之间的通话提供服务,其特征在于,所述方法包括:所述通信服务器接收来自所述终端设备的第一请求,其中所述第一请求用于请求数字人模型;所述通信服务器判断所述终端设备对应的用户是否为所述数字人模型的合法用户,其中所述终端设备对应的用户为操作所述终端设备的用户;所述通信服务器在确定所述终端设备对应的用户为所述数字人模型的合法用户之后,响应于所述第一请求,向所述终端设备发送所述数字人模型,其中所述数字人模型用于所述终端设备对应的用户以数字人形象通话。
- 如权利要求36所述的方法,其特征在于,所述通信服务器判断所述终端设备对应的用户是否为所述数字人模型的合法用户,包括:所述通信服务器获取所述终端设备对应的用户的特征、所述数字人模型关联的校验特征,其中所述校验特征为所述数字人模型的合法用户的特征;所述通信服务器根据所述终端设备对应的用户的特征、所述校验特征判断所述终端设备对应的用户是否为所述数字人模型的合法用户;或者,所述通信服务器接收来自所述终端设备的携带第一数字签名的验证结果;所述通信服务器验证所述第一数字签名,在验证所述第一数字签名通过之后,根据所述验证结果确定所述终端设备对应的用户是否为所述数字人模型的合法用户。
- 如权利要求37所述的方法,其特征在于,所述终端设备对应的用户的特征包括所述终端设备对应的用户的人脸、指纹、声纹或虹膜中的一项或多项,所述数字人模型关联的校验特征包括所述数字人模型的合法用户的人脸、指纹、声纹或虹膜中的一项或多项。
- 如权利要求36所述的方法,其特征在于,所述方法还包括:所述通信服务器获取数字人内容中的人物形象的特征,所述数字人内容包括所述数字人模型的多帧图像;所述通信服务器根据所述数字人内容中的人物形象的特征、所述数字人模型关联的校验特征判断所述数字人内容中的人物形象是否与所述数字人模型匹配;所述通信服务器在确定数字人内容中的人物形象与所述数字人模型匹配之后,向所述至少一个对端终端设备发送所述数字人内容。
- 如权利要求39所述的方法,其特征在于,所述数字人内容中的人物形象的特征包括所述人物形象的人脸和/或声纹;所述数字人模型关联的校验特征包括:所述数字人模型的合法用户的人脸和/或声纹。
- 如权利要求36-40任一项所述的方法,其特征在于,所述方法还包括:所述通信服务器接收携带第二数字签名的所述数字人模型、所述校验特征、所述数字人模型的合法用户的身份信息;其中,所述第二数字签名为数字人审核设备的数字签名;所述通信服务器使用所述数字人审核设备的公钥验证所述第二数字签名;在验证所述第二数字签名通过之后,所述通信服务器根据所述合法用户的身份信息将所述数字人模型、所述校验特征与所述合法用户在所述通信服务器的开户信息进行绑定。
- 如权利要求41所述的方法,其特征在于,所述数字人模型还携带第三数字签名;其中,所述第三数字签名为数字人模型制作设备的数字签名;所述方法还包括:所述通信服务器保存所述数字人模型、所述校验特征之前,使用所述数字人模型制作设备的公钥验证所述第三数字签名;在验证所述第二数字签名、所述第三数字签名通过之后,所述通信服务器根据所述合法用户的身份信息将所述数字人模型、所述校验特征与所述合法用户在所述通信服务器的开户信息进行绑定。
- 一种提供通话附加服务的方法,应用于数字人模型制作设备,其特征在于,所述方法包括:所述数字人模型制作设备使用数字人审核设备的第一公钥对终端设备对应的用户的数字人模型、校验特征以及身份信息进行加密,将加密后的所述终端设备对应的用户的数字人模型、校验特征以及身份信息发送到数字人审核设备;所述数字人模型制作设备接收来自所述数字人审核设备的第二数字签名、所述数字人模型、所述校验特征以及所述身份信息;所述数字人模型制作设备将所述第二数字签名、所述数字人模型、所述校验特征以及所述身份信息发送到通信服务器,其中所述数字人模型用于所述终端设备对应的用户以数字人形象通话。
- 如权利要求43所述的方法,其特征在于,所述方法还包括:所述数字人模型制作设备使用所述数字人模型制作设备的私钥为所述数字人模型、所述校验特征以及所述身份信息添加第三数字签名;所述数字人模型制作设备将所述二数字签名、所述第三数字签名、所述数字人模型、所述校验特征以及所述身份信息发送到所述通信服务器。
- 一种提供通话附加服务的方法,应用于数字人审核设备,其特征在于,所述方法包括:所述数字人审核设备接收来自数字人模型制作设备的加密后的终端设备对应的用户的数字人模型、校验特征以及身份信息;所述数字人审核设备使用所述数字人审核设备的第一私钥对所述加密后的所述终端设备对应的用户的数字人模型、校验特征以及身份信息进行解密;对所述终端设备对应的用户的数字人模型的合法性进行审核;在审核通过之后,使用所述数字人审核设备的第二私钥对所述数字人模型、所述校验特征以及所述身份信息添加第二数字签名;所述数字人审核设备将所述第二数字签名、所述数字人模型、所述校验特征以及所述身份信息发送给所述数字人模型制作设备,其中所述数字人模型用于所述终端设备对应的用户以数字人形象通话。
- 一种通信装置,其特征在于,包括处理器和存储器,所述处理器与所述存储器耦合;所述存储器,用于存储程序指令;所述处理器,用于读取所述存储器中存储的所述程序指令,以实现如权利要求1~6或7~8或9~21或22~35或36~42或43~44或45中任一所述的方法。
- 一种通话系统,其特征在于,包括第一终端设备、第二终端设备、以及为所述第一终端设备与所述第二终端设备之间的通话提供服务的通信服务器;所述第一终端设备用于执行如权利要求1-6中任一项所述的方法;所述通信服务器用于执行如权利要求7-8中任一项所述的方法;所述第二终端设备用于基于所述第一终端设备对应的用户的数字人内容在通话应用界面呈现所述第一终端设备对应的用户的数字人形象。
- 一种通话系统,其特征在于,包括第一终端设备、第二终端设备以及为所述第一终端设备与所述第二终端设备之间的通话提供服务的通信服务器;所述第一终端设备用于执行如权利要求9-21中任一项所述的方法;所述通信服务器用于执行如权利要求22-35中任一项所述的方法;所述第二终端设备用于基于所述第一终端设备对应的用户的数字人内容在通话应用界面呈现所述第一终端设备对应的用户的数字人形象。
- 一种通话系统,其特征在于,包括通信服务器、数字人模型制作设备、数字人模型制作设备;所述通信服务器用于执行如权利要求36-42中任一项所述的方法;数字人模型制作设备用于执行如权利要求43-44中任一项所述的方法;数字人模型制作设备用于执行如权利要求45所述的方法。
- 一种计算机可读存储介质,其特征在于,所述存储介质中存储有计算机程序或指令,当所述计算机程序或指令被通信装置执行时,实现如权利要求1-6或7-8或9-21或22-35或36-42或43-44或45中任一项所述的方法。
- 一种计算机程序产品,其特征在于,包括指令,当其在计算机上运行时,使得如权利要求1-6或7-8或9-21或22-35或36-42或43-44或45中任一项所述的方法被执行。
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP22863478.8A EP4387314A4 (en) | 2021-09-04 | 2022-08-30 | METHOD, APPARATUS AND SYSTEM FOR PROVIDING ADDITIONAL CALL SERVICE |
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111034920.1 | 2021-09-04 | ||
| CN202111034920 | 2021-09-04 | ||
| CN202111632624.1 | 2021-12-29 | ||
| CN202111632624.1A CN115767577A (zh) | 2021-09-04 | 2021-12-29 | 一种提供通话附加服务的方法、装置及系统 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023030343A1 true WO2023030343A1 (zh) | 2023-03-09 |
Family
ID=85332834
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2022/115986 Ceased WO2023030343A1 (zh) | 2021-09-04 | 2022-08-30 | 一种提供通话附加服务的方法、装置及系统 |
Country Status (3)
| Country | Link |
|---|---|
| EP (1) | EP4387314A4 (zh) |
| CN (1) | CN115767577A (zh) |
| WO (1) | WO2023030343A1 (zh) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN120074890A (zh) * | 2025-02-07 | 2025-05-30 | 西安华为技术有限公司 | 一种通信的方法及装置 |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117135145A (zh) * | 2023-08-23 | 2023-11-28 | 中国移动通信有限公司研究院 | 通话方法、装置及电子设备 |
| CN117560527B (zh) * | 2023-11-22 | 2024-08-13 | 北京风平智能科技有限公司 | 一种数字人aigc视频安全防伪方法及装置 |
| CN119865484A (zh) * | 2024-09-30 | 2025-04-22 | 中国电信股份有限公司技术创新中心 | 数据通信方法、装置、通信设备、可读存储介质和程序产品 |
| CN120074889B (zh) * | 2025-02-07 | 2026-03-27 | 西安华为技术有限公司 | 一种通信的方法及装置 |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1747546A (zh) * | 2004-09-07 | 2006-03-15 | 乐金电子(中国)研究开发中心有限公司 | 移动通信终端视频通话时的视频效果提供装置及方法 |
| WO2009149764A1 (en) * | 2008-06-13 | 2009-12-17 | Nokia Siemens Networks Oy | Method, apparatus, system, related computer program product and data structure for terminal capability indication |
| US20130258040A1 (en) * | 2012-04-02 | 2013-10-03 | Argela Yazilim ve Bilisim Teknolojileri San. ve Tic. A.S. | Interactive Avatars for Telecommunication Systems |
| CN103368816A (zh) * | 2012-03-29 | 2013-10-23 | 深圳市腾讯计算机系统有限公司 | 基于虚拟人物形象的即时通讯方法及系统 |
| US20140176662A1 (en) * | 2012-12-20 | 2014-06-26 | Verizon Patent And Licensing Inc. | Static and dynamic video calling avatars |
| CN106803921A (zh) * | 2017-03-20 | 2017-06-06 | 深圳市丰巨泰科电子有限公司 | 基于ar技术的即时音视频通信方法及装置 |
| CN108881784A (zh) * | 2017-05-12 | 2018-11-23 | 腾讯科技(深圳)有限公司 | 虚拟场景实现方法、装置、终端及服务器 |
| CN111614967A (zh) * | 2019-12-25 | 2020-09-01 | 北京达佳互联信息技术有限公司 | 虚拟形象直播方法、装置、电子设备及存储介质 |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109802931B (zh) * | 2017-11-17 | 2021-08-06 | 腾讯科技(深圳)有限公司 | 一种通信处理方法、终端及存储介质 |
-
2021
- 2021-12-29 CN CN202111632624.1A patent/CN115767577A/zh active Pending
-
2022
- 2022-08-30 EP EP22863478.8A patent/EP4387314A4/en active Pending
- 2022-08-30 WO PCT/CN2022/115986 patent/WO2023030343A1/zh not_active Ceased
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1747546A (zh) * | 2004-09-07 | 2006-03-15 | 乐金电子(中国)研究开发中心有限公司 | 移动通信终端视频通话时的视频效果提供装置及方法 |
| WO2009149764A1 (en) * | 2008-06-13 | 2009-12-17 | Nokia Siemens Networks Oy | Method, apparatus, system, related computer program product and data structure for terminal capability indication |
| CN103368816A (zh) * | 2012-03-29 | 2013-10-23 | 深圳市腾讯计算机系统有限公司 | 基于虚拟人物形象的即时通讯方法及系统 |
| US20130258040A1 (en) * | 2012-04-02 | 2013-10-03 | Argela Yazilim ve Bilisim Teknolojileri San. ve Tic. A.S. | Interactive Avatars for Telecommunication Systems |
| US20140176662A1 (en) * | 2012-12-20 | 2014-06-26 | Verizon Patent And Licensing Inc. | Static and dynamic video calling avatars |
| CN106803921A (zh) * | 2017-03-20 | 2017-06-06 | 深圳市丰巨泰科电子有限公司 | 基于ar技术的即时音视频通信方法及装置 |
| CN108881784A (zh) * | 2017-05-12 | 2018-11-23 | 腾讯科技(深圳)有限公司 | 虚拟场景实现方法、装置、终端及服务器 |
| CN111614967A (zh) * | 2019-12-25 | 2020-09-01 | 北京达佳互联信息技术有限公司 | 虚拟形象直播方法、装置、电子设备及存储介质 |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN120074890A (zh) * | 2025-02-07 | 2025-05-30 | 西安华为技术有限公司 | 一种通信的方法及装置 |
| CN120074890B (zh) * | 2025-02-07 | 2026-02-03 | 西安华为技术有限公司 | 一种通信的方法及装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4387314A4 (en) | 2024-12-11 |
| CN115767577A (zh) | 2023-03-07 |
| EP4387314A1 (en) | 2024-06-19 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2023030343A1 (zh) | 一种提供通话附加服务的方法、装置及系统 | |
| CN107820039B (zh) | 用于虚拟环视会议体验的方法和装置 | |
| AU2010216298B2 (en) | Video sharing | |
| US8917306B2 (en) | Previewing video data in a video communication environment | |
| CN105493501A (zh) | 虚拟视觉相机 | |
| KR20220031894A (ko) | 데이터 스트림을 동기화하기 위한 시스템 및 방법 | |
| WO2015090147A1 (zh) | 虚拟视频通话方法和终端 | |
| CN103262529A (zh) | 实时多媒体通讯中媒体流可伸缩复合的系统和方法 | |
| CN106575265A (zh) | 直播系统 | |
| CN111641829B (zh) | 视频处理方法及装置、系统、存储介质和电子设备 | |
| WO2022048651A1 (zh) | 合拍方法、装置、电子设备及计算机可读存储介质 | |
| US11082756B2 (en) | Crowdsource recording and sharing of media files | |
| EP3272127B1 (en) | Video-based social interaction system | |
| US11553216B2 (en) | Systems and methods of facilitating live streaming of content on multiple social media platforms | |
| GB2530984A (en) | Apparatus, method and computer program product for scene synthesis | |
| CN103581113A (zh) | 通信数据的发送方法、系统及接收装置 | |
| WO2019227426A1 (zh) | 多媒体数据处理方法、装置和设备/终端/服务器 | |
| KR101799199B1 (ko) | Vr콘텐츠 제공 시스템, 서버 및 방법 | |
| US20260095606A1 (en) | Methods and systems of facilitating streaming of events | |
| US20250252647A1 (en) | 3d digital virtual character generation with facial feature preservation | |
| US20250265787A1 (en) | Exchanging avatar data for extended reality (xr) communication sessions | |
| US20250252673A1 (en) | 3d digital virtual character generation with virtual triangles | |
| CN120825473A (zh) | 通信方法、通信装置、计算机设备、计算机可读存储介质及计算机程序产品 | |
| CN121531161A (zh) | 视频传输方法、装置、电子设备及介质 | |
| CN116456122A (zh) | 直播视频数据处理方法、装置和系统 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22863478 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2022863478 Country of ref document: EP Effective date: 20240314 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |








