WO2024148751A1 - 场景描述文件的生成方法及装置 - Google Patents

场景描述文件的生成方法及装置 Download PDF

Info

Publication number
WO2024148751A1
WO2024148751A1 PCT/CN2023/097873 CN2023097873W WO2024148751A1 WO 2024148751 A1 WO2024148751 A1 WO 2024148751A1 CN 2023097873 W CN2023097873 W CN 2023097873W WO 2024148751 A1 WO2024148751 A1 WO 2024148751A1
Authority
WO
WIPO (PCT)
Prior art keywords
description
target
scene
accessor
cache
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2023/097873
Other languages
English (en)
French (fr)
Inventor
张伟
杨付正
杨鹤杰
王之奎
李斌
张雯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hisense Visual Technology Co Ltd
Original Assignee
Hisense Visual Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hisense Visual Technology Co Ltd filed Critical Hisense Visual Technology Co Ltd
Priority to EP23915515.3A priority Critical patent/EP4557723A4/en
Priority to KR1020257003775A priority patent/KR20250034972A/ko
Priority to CN202380034331.4A priority patent/CN119013965A/zh
Priority to JP2025507589A priority patent/JP2025529756A/ja
Publication of WO2024148751A1 publication Critical patent/WO2024148751A1/zh
Priority to US19/033,804 priority patent/US20250166234A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/001Model-based coding, e.g. wire frame
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/20Three-dimensional [3D] animation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/20Three-dimensional [3D] animation
    • G06T13/40Three-dimensional [3D] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/00Three-dimensional [3D] image rendering
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/00Three-dimensional [3D] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/00Three-dimensional [3D] image rendering
    • G06T15/04Texture mapping
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three-dimensional [3D] modelling for computer graphics
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/40Tree coding, e.g. quadtree, octree
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23412Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs for generating or manipulating the scene composition of objects, e.g. MPEG-4 objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/85406Content authoring involving a specific file format, e.g. MP4 format
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding

Definitions

  • Some embodiments of the present application relate to the field of video processing technology, and in particular, to a method and device for generating a scene description file.
  • Point Cloud refers to a collection of massive three-dimensional points.
  • the compression standards for point clouds mainly include Geometry-based Point Cloud Compression (G-PCC) and Video-based Point Cloud Compression (V-PCC).
  • the current mainstream immersive media mainly include point clouds, 3D meshes, 6DoF panoramic videos, MPEG immersive videos (MIV), etc.
  • MIV MPEG immersive videos
  • a 3D scene multiple types of immersive media often exist at the same time.
  • Different types of rendering engines are generated according to the types and numbers of supported codecs.
  • the Moving Picture Experts Group (MPEG) initiated the formulation of the MPEG scene description standard, with the standard number ISO/IEC 23090-14.
  • This standard mainly solves the problem of cross-platform description of MPEG media (including codecs developed by MPEG, MPEG file formats, and MPEG transmission mechanisms) in 3D scenes.
  • the extensions made to the first version of the ISO/IEC 23090-14 MPEG-I scene description standard have met the key requirements of immersive scene description solutions.
  • the current scene description standard does not support media files of type G-PCC coded point cloud.
  • Point cloud is an important form of 3D media
  • G-PCC is one of the current mainstream point cloud compression algorithms. Therefore, it is of great significance and value to support media files of type G-PCC coded point cloud in the scene description framework.
  • some embodiments of the present application provide a method for generating a scene description file, including:
  • the type of the target media file in the to-be-rendered three-dimensional scene is a geometry-based point cloud compression G-PCC coded point cloud, generating a target description module corresponding to the target media file according to the description information of the target media file;
  • the target media description module is added to the media list of the MPEG media of the scene description file of the three-dimensional scene to be rendered.
  • some embodiments of the present application provide a device for generating a scene description file, including:
  • a memory configured to store a computer program
  • the processor is configured to, when calling the computer program, enable the scene description file generation device to implement the scene description file generation method described in the first aspect.
  • FIG1 is a schematic diagram showing the structure of an immersive media description framework in some embodiments.
  • FIG2 is a schematic diagram showing the structure of a scene description file in some embodiments.
  • FIG3 is a schematic diagram showing the structure of a scene description file in some other embodiments of the present application.
  • FIG4 shows a schematic diagram of the structure of a G-PCC encoder in some embodiments
  • FIG5 is a schematic diagram showing a LOD partitioning process in some embodiments.
  • FIG6 is a schematic diagram showing a lifting transformation process in some embodiments.
  • FIG7 is a schematic diagram showing a RAHT transformation process in some embodiments.
  • FIG8 is a schematic diagram showing the structure of a G-PCC decoder in some embodiments.
  • FIG9 is a schematic diagram showing the structure of a scene description file in some other embodiments.
  • FIG10 is a schematic diagram showing the structure of a scene description file in some other embodiments.
  • FIG11 is a schematic diagram showing a pipeline corresponding to a media file of type G-PCC coded point cloud provided by some embodiments;
  • FIG12 is a flowchart showing the steps of a method for generating a scene description file in some embodiments
  • FIG13 is a flowchart showing the steps of a scene description file parsing method in some embodiments.
  • FIG14 is a flowchart showing steps of a method for processing a media file in some embodiments.
  • FIG15 is a flowchart showing the steps of a method for rendering a three-dimensional scene in some embodiments
  • FIG16 is a flowchart showing the steps of a cache management method in some embodiments.
  • FIG. 17 shows an interactive flow chart of a method for rendering a three-dimensional scene in some embodiments.
  • Some embodiments of the present application involve scene description of immersive media.
  • the scene description framework of immersive media decouples the access and processing of media files from the rendering of media files, and designs a media access function (Media Access Function, MAF) 12 to be responsible for the access and processing of media files.
  • MAF Media Access Function
  • a media access function application programming interface is designed. (Application Programming Interface, API), the display engine 11 and the media access function 12 exchange commands through the media access function API.
  • the display engine 11 can issue commands to the media access function 12 through the media access function API, and the media access function 12 can also request commands from the display engine 11 through the media access function API.
  • the general workflow of the scene description framework of immersive media includes: 1) The display engine 11 obtains the scene description file (Scene Description Documents) provided by the immersive media service provider. 2) The display engine 11 parses the scene description file, obtains the access address of the media file, the attribute information of the media file (media type and encoding and decoding parameters, etc.) and the format requirements of the processed media file and other parameters or information, and calls the media access function API to pass all or part of the information obtained by parsing the scene description file to the media access function 12.
  • the display engine 11 obtains the scene description file (Scene Description Documents) provided by the immersive media service provider. 2) The display engine 11 parses the scene description file, obtains the access address of the media file, the attribute information of the media file (media type and encoding and decoding parameters, etc.) and the format requirements of the processed media file and other parameters or information, and calls the media access function API to pass all or part of the information obtained by parsing the scene description file to the media access function 12.
  • the media access function 12 requests to download the specified media file from the media resource server or obtains the specified media file from the local, and establishes a corresponding pipeline for the media file, and then decapsulates, decrypts, decodes, and post-processes the media file in the pipeline to convert the media file from the encapsulation format to the format specified by the display engine 11. 4)
  • the pipeline stores the output data obtained after all processing in the specified cache. 5)
  • the display engine 11 reads the fully processed data in the specified cache and renders the media file according to the data read from the cache.
  • the scene description file is used to describe the structure of the three-dimensional scene (its features can be described by a three-dimensional mesh), texture (such as texture mapping, etc.), animation (rotation, translation), camera viewpoint position (rendering perspective), and other contents.
  • GL Transmission Format 2.0 (glTF2.0) has been identified as a candidate format for a scene description file that can meet the requirements of MPEG-Immersive (MPEG-I) and 6-degrees of freedom (6DoF) applications.
  • MPEG-I MPEG-Immersive
  • 6DoF 6-degrees of freedom
  • glTF2.0 is described in the GL Transmission Format (glTF) version 2.0 of the Khronos Group available at github.com/KhronosGroup/glTF/tree/master/specification/2.0#specifying-extensions.
  • FIG. 2 is a schematic diagram of the structure of a scene description file in the glTF2.0 scene description standard (ISO/IEC 12113).
  • the scene description file in the glTF2.0 scene description standard includes but is not limited to: scene description module (scene) 201, node description module (node) 202, mesh description module (mesh) 203, accessor description module (accessor) 204, cache slice description module (bufferView) 205, buffer description module (buffer) 206, camera description module (camera) 207, lighting description module (light) 208, material description module (material) 209, texture description module (texture) 210, sampler description module (sampler) 211 and texture map description module (image) 212, animation description module (animation) 213, skin description module (skin) 214.
  • the scene description module (scene) 201 in the scene description file shown in FIG2 is used to describe the three-dimensional scene contained in the scene description file.
  • a scene description file may contain any number of three-dimensional scenes, and each three-dimensional scene is represented by a scene description module 201.
  • the scene description modules 201 are in parallel with each other, that is, the three-dimensional scenes are in parallel with each other.
  • the node description module (node) 202 in the scene description file shown in FIG2 is a description module at the next level of the scene description module 201, and is used to describe the objects contained in the three-dimensional scene described by the scene description module 201. There may be many specific objects in each three-dimensional scene, such as virtual digital people, three-dimensional objects in the near distance, and background images in the far distance. The scene description file will describe these specific objects through the node description module 202. Each node description module 202 can represent An object or a group of objects consisting of several objects. The relationship between the node description modules 202 reflects the relationship between the various components in the three-dimensional scene described by the scene description module 201.
  • a scene described by a scene description module 201 can contain one or more nodes.
  • a parallel relationship or a hierarchical relationship between multiple nodes can be a parallel relationship or a hierarchical relationship between multiple nodes, that is, there is a relationship of inclusion and being included between the node description modules 202, which allows multiple specific objects to be described together, or multiple specific objects can be described separately. If a node is included by another node, the included node is called a child node (children), and the child node is represented by "children" instead of "node”. By flexibly using nodes and child nodes to combine, a hierarchical node structure can be formed to express rich scene content.
  • the mesh description module (mesh) 203 in the scene description file shown in FIG2 is a description module of the next level of the node description module 202, and is used to describe the characteristics of the object represented by the node description module 202.
  • the mesh description module 203 is a set of one or more primitives, each of which may include an attribute, and the attribute of the primitive defines the attribute required for the graphics processing unit (GPU) to render.
  • the attributes may include: position (three-dimensional coordinates), normal (normal vector), tangent (tangent vector), texcoord_n (texture coordinates), color_n (color: RGB or RGBA), joints_n (attributes related to the skin description module 214), and weights_n (attributes related to the skin description module 214), etc.
  • the access address (Uniform Resource Identifier, URI) of the media file is pointed out in the scene description file, and the data in the media file can be downloaded when it is needed, thereby realizing the separation of the scene description file and the media file.
  • URI Uniform Resource Identifier
  • the mesh description module 203 does not store media data, but stores the index value of the accessor description module (accessor) 204 corresponding to each attribute, and points to the corresponding data in the cache slice (bufferView) of the buffer (buffer) through the accessor description module 204.
  • the scene description file and the media file may be merged to form a binary file, thereby reducing the types and number of files.
  • mode there may be a syntax element "mode” in the primitives of the mesh description module 203.
  • the value of "position” is 1, pointing to the accessor description module 204 with index 1, and finally pointing to the vertex coordinate data stored in the buffer;
  • the value of "color_0” is 2, pointing to the accessor description module 204 with index 2, and finally pointing to the color data stored in the buffer.
  • the accessor description module (accessor) 204, the buffer slice description module (bufferView) 205 and the buffer description module (buffer) 206 in the scene description file shown in FIG2 jointly implement the layer-by-layer refined indexing of the data of the media file by the grid description module 203.
  • the grid description module 203 does not store specific media data, but stores the index value of the corresponding accessor description module 204, and accesses the specific media data through the accessor described by the accessor description module 204 indexed by the index value.
  • the indexing process of the media data by the grid description module 203 includes: first, the grid description module 203 stores the index value of the corresponding accessor description module 204, and stores the index value of the corresponding accessor description module 204.
  • the index value declared by the syntax element of will point to the corresponding accessor description module 204; then, the accessor description module 204 will point to the corresponding cache slice description module 205; finally, the cache slice description module 205 will point to the corresponding cache description module 206.
  • the cache description module 206 in the scene description file shown in FIG2 is mainly responsible for pointing to the corresponding media file, including the URI of the media file, the byte length of the media file and other information, and is used to describe the buffer for caching the media data of the media file.
  • a buffer can be divided into one or more cache slices.
  • the cache slice description module 205 is mainly responsible for partial access to the media data in the buffer, including the starting byte offset of the access data and the byte length of the access data, etc.
  • the accessor description module 204 is mainly responsible for adding additional information to the partial data delineated in the cache slice description module 205, such as the data type, the number of data of this type, the numerical range of data of this type, etc.
  • Such a three-layer structure can realize the function of retrieving partial data from a media file, which is conducive to the accurate retrieval of data and also convenient for reducing the number of media files.
  • the camera description module (camera) 207 in the scene description file shown in FIG2 is a next-level description module of the node description module 202, and is used to describe the viewpoint, viewing angle, and other information related to visual viewing when the user views the object described by the node description module 202.
  • the node description module 202 can also point to the camera description module 207, and the camera description module 207 describes the viewpoint, viewing angle, and other information related to visual viewing when the user views the object described by the node description module 202.
  • the light description module (light) 208 in the scene description file shown in FIG. 2 is a next-level description module of the node description module 202 , and is used to describe light intensity, ambient light color, light direction, light source position and other light-related information of the object described by the node description module 202 .
  • the material description module (material) 209 in the scene description file shown in FIG2 is a description module at the next level of the mesh description module 203, and is used to describe the material information of the three-dimensional object described by the mesh description module 203.
  • this process can also be referred to as texture mapping or adding textures.
  • the scene description file in the glTF2.0 scene description standard also uses this description module.
  • the material description module 209 uses a set of general parameters to define the material to describe the material information of the geometric objects appearing in the three-dimensional scene.
  • the material description module 209 generally uses the metal-roughness model to describe the material of the virtual object, and the material characteristic parameters based on the metal-roughness model are represented by the widely used physically based rendering (PBR) material. Based on this, the material description module 209 makes a detailed description of the metal-roughness material attribute of the object.
  • PBR physically based rendering
  • the syntax elements in the metal-roughness (material.PbrMetarialRoughness) of the material description module 209 are defined as shown in Table 5 below:
  • each attribute in the metal-roughness of the material description module 209 can be defined using factors and/or textures (e.g., baseColorTexture and baseColorFactor). If no texture is given, it can be determined that all corresponding texture components in this material model have a value of 1.0. If both factors and textures are present, the factor value acts as a linear multiplier for the corresponding texture value. Texture binding is defined by the index of the texture object and the optional texture coordinate index.
  • factors and/or textures e.g., baseColorTexture and baseColorFactor
  • the material description module 209 By parsing the material description module 209, it is possible to determine that the current material is named “gold” through the material name syntax element and its value ("name”:”gold"), and then determine that the base color value of the current material is [1.000,0.766,0.336,1.0] through the color syntax element under the pbrMetallicRoughness array and its value ("basecolorFactor”:[1.000,0.766,0.336,1.0]), and that the metallic value of the current material is "1.0” through the metallic syntax element under the pbrMetallicRoughness array and its value (“metalnessFactor”:1.0), and that the roughness value of the current material is "0.0” through the roughness syntax element under the pbrMetallicRoughness array and its value (“roughnessFactor”:0.0).
  • the texture description module (texture) 210 in the scene description file shown in FIG2 is a next-level description module of the material description module 209, which is used to describe the color of the three-dimensional object described by the material description module 209 and other characteristics used in the material definition. Texture is an important aspect of giving an object a real appearance. Texture can be used to define the main color of the object and other characteristics used in the material definition in order to accurately describe the appearance of the rendered object.
  • the material itself can define multiple texture objects, which can be used as textures of virtual objects during rendering and can be used to encode different material properties.
  • the texture description module 210 uses sampler syntax elements and texture map syntax element indexes to reference a sampler description module (sampler) 211 and a texture map description module (image) 212.
  • the texture map description module 212 contains a uniform resource identifier (URI), which links to the texture map or binary file package actually used by the texture description module 210.
  • URI uniform resource identifier
  • the sampler description module 211 is used to describe the filtering and packaging mode of the texture.
  • the respective responsibilities and cooperation relationships of the material description module 209, the texture description module 210, the sampler description module 211 and the texture map description module 212 include: the material description module 209 and the texture description module 210 together define the color and physical information of the object surface.
  • the sampler description module 211 defines how to attach the texture map to the object surface.
  • the texture description module 210 specifies The sampler description module 211 and the texture map description module 212 implement the addition of textures through the texture map description module 212, and the texture map description module 212 uses URI for identification and indexing, and uses the accessor description module 204 to access data.
  • the sampler description module 211 implements the specific adjustment and packaging of textures.
  • Table 6 The definition of the syntax elements in the texture description module 210 is shown in Table 6 below:
  • the definition of the syntax elements in sample (texture.sample) of the texture description module 210 is shown in Table 7 below:
  • a material description module 209 For example, the following is a JSON example of a material description module 209, a texture description module 210, a sampler description module 211, and a texture map description module 212:
  • the animation description module (animation) 213 in the scene description file shown in FIG2 is a description module of the next level of the node description module 202, and is used to describe the animation information added to the object described by the node description module 202.
  • animation can be added to the object described by the node description module 202. Therefore, the description level of the animation description module 213 in the scene description file is specified by the node description module 202, that is, the animation description module 213 is a description module of the next level of the node description module 202, and the animation description module 213 also has a corresponding relationship with the grid description module 203.
  • the animation description module 213 can describe the animation in three ways: position movement, angle rotation, and size scaling, and can also specify the start and end time of the animation and the implementation method of the animation. For example, if an animation is added to a grid description module 203 representing a three-dimensional object, the three-dimensional object represented by the grid description module 203 can complete the specified animation process within the specified time window through the fusion of position movement, angle rotation, and size scaling.
  • the skin description module (skin) 214 in the scene description file shown in FIG2 is a description module of the next level of the node description module 202, and is used to describe the motion cooperation relationship between the skeleton added to the node described by the node description module 202 and the grid representing the surface information of the object.
  • the node described by the node description module 202 represents an object with a large degree of freedom of movement such as a person, an animal, or a machine, in order to optimize the motion performance of these objects, the skeleton can be filled into the interior of the object, and the three-dimensional grid representing the surface information of the object becomes the skin in concept.
  • the description level of the skin description module 214 is specified by the node description module 202, that is, the skin description module 214 is a description module of the next level of the node description module 202, and the skin description module 214 has a corresponding relationship with the grid description module 203.
  • Each description module of the scene description file in the above glTF2.0 scene description standard only has the most basic ability to describe three-dimensional objects. There are problems such as not supporting dynamic three-dimensional immersive media, not supporting audio files, and not supporting scene updates.
  • glTF also declares that each of its object attributes has an optional extended object attribute (extensions), allowing any part of it to be extended with extensions to achieve more complete functions. Including scene description module (scene), node description module (node), mesh description module (mesh), accessor description module (accessor), cache description module (buffer), animation description module (animation), etc. and their internally defined syntax elements all have optional extended object attributes to support certain functional extensions based on glTF2.0.
  • MPEG Moving Picture Experts Group
  • ISO/IEC 23090-14 the standard number ISO/IEC 23090-14.
  • This standard mainly solves the cross-platform description problem of MPEG media (including MPEG-developed codecs, MPEG file formats, and MPEG transmission mechanisms) in 3D scenes.
  • the MPEG#128 meeting decided to develop the MPEG-I Scene Description standard based on glTF2.0 (ISO/IEC 12113).
  • the first version of the MPEG scene description standard has been developed and is in the FDIS voting stage.
  • the MPEG scene description standard adds corresponding extensions to address the unrealized requirements in the cross-platform description of three-dimensional scenes, including interactivity, AR anchoring, user and avatar representation, tactile support, and extended support for immersive media codecs.
  • the first version of the MPEG scene description standard has been formulated mainly to define the following contents:
  • the MPEG scene description standard defines a scene description file format for describing immersive 3D scenes. This format combines the original glTF2.0 (ISO/IEC 12113) content and makes a series of extensions based on it.
  • MPEG scene description defines a scene description framework and an application programming interface (API) for inter-module collaboration, which decouples the acquisition and processing of immersive media from the media rendering process, and is beneficial for optimizing the adaptation of immersive media to different network conditions, partial acquisition of immersive media files, access to different levels of detail of immersive media, and content quality adjustment. Decoupling the acquisition and processing of immersive media from the immersive media rendering process is the key to achieving cross-platform description of 3D scenes.
  • API application programming interface
  • MPEG scene description proposes a series of extensions based on the International Standardization Organization Base Media File Format (ISOBMFF) (ISO/IEC 14496-12) for transmitting immersive media content.
  • ISOBMFF International Standardization Organization Base Media File Format
  • the scene description file is extended in the MPEG scene description standard based on the scene description file shown in FIG2 .
  • the extension of the scene description file in the MPEG scene description standard can be divided into two groups:
  • the first group of extensions includes: MPEG media (MPEG_media) 301, MPEG time-varying accessor (MPEG_accessor_timed) 302 and MPEG circular buffer (MPEG_buffer_circular) 303.
  • MPEG media 301 is an independent extension used to reference external media sources
  • MPEG time-varying accessor 302 is an extension of the accessor level used to access time-varying media
  • MPEG circular buffer is an extension of the buffer level used to support circular buffers.
  • the first group of extensions provides a basic description and format of the media in the scene, meeting the basic requirements of describing time-varying immersive media in the scene description framework.
  • MPEG time-varying accessor (MPEG_accessor_timed) 302 is used to access time-varying media.
  • the glTF2.0 scene description standard does not support time-varying media, when media data needs to change over time, it is necessary to update the scene description file under the glTF2.0 scene description standard to achieve this. For example, if the texture map on the surface of an object needs to be updated in the glTF2.0 scene description standard so that the texture map on the surface of the object can change over time, the scene description file under the glTF2.0 scene description standard must be updated. Frequent updates of scene description files require frequent parsing, processing, and transmission of scene description files, which increases the performance overhead in the 3D scene rendering process. Based on this, MPEG has designed the MPEG time-varying accessor (MPEG_accessor_timed) 302. The parameters in the MPEG time-varying accessor can change over time to change the access method of media data, thereby realizing that the accessed data changes over time, thereby avoiding frequent parsing, processing, and transmission of scene description files.
  • MPEG_accessor_timed MPEG time-varying accessor
  • the second group of extensions includes: MPEG dynamic scene (MPEG_scene_dynamic) 304, MPEG texture (MPEG_texture_video) 305, MPEG audio space (MPEG_audio_spatial) 306, MPEG viewport recommendation (MPEG_viewport_recommended) 307, MPEG mesh mapping (MPEG_mesh_linking) 308 and MPEG animation time (MPEG_animation_timing) 309.
  • MPEG dynamic scene MPEG_scene_dynamic
  • MPEG texture MPEG_texture_video
  • MPEG audio space MPEG_audio_spatial
  • MPEG viewport recommendation MPEG_viewport_recommended
  • MPEG mesh mapping MPEG_mesh_linking
  • MPEG animation time MPEG_animation_timing
  • MPEG_scene_dynamic 304 is a scene level extension to support dynamic scene updates
  • MPEG_texture_video 305 is a texture level extension to support textures in video form
  • MPEG_audio_spatial 306 is a node level and camera level extension to support spatial 3D audio
  • MPEG_viewport_recommended 307 is a scene level extension to support the description of recommended viewing angles in two-dimensional display
  • MPEG_mesh_linking 308 is a mesh level extension to support linking two meshes and providing mapping information
  • MPEG_animation_timing 309 is a scene level extension to support controlling the animation timeline.
  • the MPEG media in the MPEG scene description file is used to describe the type of media files and to provide necessary instructions for MPEG type media files so that these MPEG type media files can be used later.
  • the definition of the first level syntax elements of MPEG media is shown in Table 8 below:
  • ISO/IEC 23090-14 also defines the transmission format for the delivery of scene description files and data related to the glTF 2.0 extension.
  • ISO/IEC 23090-14 defines how to encapsulate glTF files and related data as non-time-varying and time-varying data (for example, as track samples) in ISOBMFF files.
  • MPEG_scene_dynamic, MPEG_mesh_linking, and MPEG_animation_timing provide a specific form of time-varying data to the display engine, and the display engine 11 should perform corresponding operations based on this changing information.
  • ISO/IEC 23090-14 also defines the format of each extended time-varying data and how to encapsulate it in the ISOBMFF file.
  • MPEF media MPEG_media
  • MPEG_media allows reference to external media streams delivered via protocols such as RTP/SRTP, MPEG-DASH, etc.
  • URL Uniform Resource Locator
  • the scheme requires the presence of a stream identifier in the query part, but does not specify a specific type of identifier, allowing the use of Media Stream Identification scheme (RFC5888), labeling scheme (RFC4575) or zero-based indexing scheme.
  • the main functions of the display engine 11 include obtaining a scene description file and parsing the obtained scene description file to obtain the composition structure of the three-dimensional scene to be rendered and the detailed information in the three-dimensional scene to be rendered, and rendering and displaying the three-dimensional scene to be rendered according to the information obtained by parsing the scene description file.
  • the specific workflow and principle of the display engine 11 are not limited in the embodiment of the present application, so that the display engine 11 can parse the scene description document, and issue instructions to the media access function 12 through the media access function API, issue instructions to the cache management module 13 through the cache API, and retrieve the processed data from the cache and complete the three-dimensional scene. and the rendering and display of objects therein.
  • the media access function 12 can receive instructions from the display engine 11, and complete the access and processing functions of the media files according to the instructions sent by the display engine 11. Specifically, it includes: after obtaining the media file, the media file is processed. There are large differences in the processing process of different types of media files. In order to achieve a wide range of media type support and considering the work efficiency of the media access function, a variety of pipelines are designed in the media access function. During the processing process, the pipeline that matches the media type can be enabled.
  • the input of the pipeline is the media files downloaded from the server or the media files read from the local storage control. These media files often have a more complex structure and cannot be directly used by the display engine 11. Therefore, the main function of the pipeline is to process the data of such media files so that the data of the media files meets the requirements of the display engine 11.
  • the media data processed by the pipeline needs to be delivered to the display engine 11 for use in a standardized arrangement structure, which requires the participation of the cache API and the cache management module 13.
  • the cache API and cache management realize the creation of corresponding caches according to the format of the processed media data, and are responsible for the subsequent management of the cache, such as update, release and other operations.
  • the cache management module 13 can communicate with the media access function 12 through the cache API, and can also communicate with the display engine 11. The goal of communicating with the display engine 11 and/or the media access function 12 is to achieve cache management.
  • the display engine 11 needs to send the relevant instructions of cache management to the media access function 12 through the media access function API first, and the media access function 12 then sends the relevant instructions of cache management to the cache management module 13 through the cache API.
  • the display engine 11 only needs to send the cache management description information parsed from the scene description document directly to the cache management module 13 through the cache API.
  • the above embodiments introduce the basic process of rendering a three-dimensional scene including immersive media using a scene description framework, as well as the content and function of each functional module or file in the scene description framework.
  • the immersive media in the three-dimensional scene can be a point cloud-based media file, a three-dimensional grid-based media file, a 6DoF-based media file, an MIV media file, etc.
  • Some embodiments of the present application involve rendering a three-dimensional scene including a point cloud based on a scene description framework, so the following first describes the point cloud-related content.
  • Point cloud refers to a collection of massive three-dimensional points. After obtaining the spatial coordinates of each sampling point on the surface of an object, a collection of points is obtained, which is called a point cloud. In addition to geometric coordinates, the points in the point cloud may also include some other attribute information, such as color, normal vector, reflectivity, transparency, material type, etc. Point cloud can be obtained in a variety of ways. In some embodiments, the implementation method of obtaining point cloud includes: using a camera array with a known fixed position in space to observe an object, and using the two-dimensional image obtained by the camera array to obtain a three-dimensional representation of the object using some related algorithms, thereby obtaining the point cloud corresponding to the object.
  • the implementation method of obtaining point cloud includes: using a laser radar scanning device to obtain the point cloud corresponding to the object.
  • the sensor of the laser radar scanning device records the electromagnetic waves emitted by the radar and reflected by the surface of the object, thereby obtaining the object volume information, and obtaining the point cloud corresponding to the object according to the object volume information.
  • the implementation method of obtaining point cloud may also include: using artificial intelligence or computer vision algorithms to create three-dimensional volume information based on two-dimensional images, thereby obtaining the point cloud corresponding to the object.
  • Point cloud provides a high-precision 3D expression for the fine digitization of the physical world and is widely used in 3D modeling, smart cities, autonomous navigation systems, augmented reality and other fields.
  • G-PCC Geometry-based Point Cloud Compression
  • V-PCC Video-based Point Cloud Compression
  • the G-PCC encoder 400 can be divided into two parts: a geometry encoding module 41 and an attribute encoding module 42 .
  • the geometry encoding module 41 can be further divided into an octree-based geometry encoding unit 411 and a prediction tree-based geometry encoding unit 412 .
  • the main encoding steps of the geometric information of the point cloud to be encoded by the geometric encoding module 41 of the G-PCC encoder include: S401, extracting the geometric information (positions) in the point cloud to be encoded; S402, performing coordinate conversion on the geometric information so that the point cloud to be encoded is all contained in a bounding box; S403, voxelizing the geometric information after the coordinate conversion. That is, firstly, the geometric information after the coordinate conversion is quantized to scale the point cloud to be encoded.
  • the quantization of the geometric information after the coordinate conversion also needs to determine whether to remove duplicate points according to parameters, and the process of quantization and removal of duplicate points is called voxelization.
  • the voxelization of the geometric information is completed, it is encoded by the octree-based geometric encoding unit 411 and the prediction tree-based geometric encoding unit 412 respectively to obtain the geometric information code stream of the point cloud to be encoded.
  • the coding process of the prediction tree-based geometric coding unit 412 includes: S406, constructing a prediction tree structure. Including: sorting the points in the point cloud to be coded, the sorting methods include: unordered, Morton order, azimuth order and radial distance order, and using two different methods (high-latency slow method and low-latency fast method) to construct the prediction tree structure.
  • S407 based on the structure of the prediction tree, traverse each node in the prediction tree, predict the geometric position information of the node by selecting different prediction modes to obtain the prediction residual, and quantize the geometric prediction residual using the quantization parameter.
  • S408 arithmetic coding; including: through continuous iteration, arithmetic coding of the prediction residual of the prediction tree node position information, the prediction tree structure and the quantization parameters, etc., to generate a binary geometric information code stream.
  • the process of encoding the attribute information of the point cloud to be encoded by the attribute encoding module 42 of the G-PCC encoder mainly includes: S408, extracting the attribute information (attributes) in the point cloud to be encoded; S409, performing attribute prediction on the attribute information; S410, performing lifting transformation on the attribute information; S411, performing region adaptive hierarchical transformation (RAHT) on the attribute information; S412, quantizing the coefficients of the RAHT transformation and the coefficients of the lifting transformation; S413, performing arithmetic coding on the quantized coefficients of the RAHT transformation and the coefficients of the lifting transformation to obtain the attribute information code stream.
  • S408 extracting the attribute information (attributes) in the point cloud to be encoded
  • S409 performing attribute prediction on the attribute information
  • S410 performing lifting transformation on the attribute information
  • S412 quantizing the coefficients of the RAHT transformation and the coefficients of the lifting transformation
  • Step S414 reconstruct the geometric information according to the geometric code stream, And match the original attribute information (attributes) and the reconstructed geometric information.
  • S415 recolor the geometric information.
  • the recoloring part in step S415 is to use the original point cloud to assign attribute information to the reconstructed point cloud, the goal is to make the attribute value of the reconstructed point cloud as similar as possible to the attribute value of the point cloud to be encoded, so as to minimize the error.
  • the attribute prediction algorithm is an algorithm that uses the weighted sum of the reconstructed attribute values of the points that have been reconstructed in the three-dimensional space to obtain the predicted attribute value of the current point to be predicted.
  • the attribute prediction algorithm can effectively remove the redundancy of the attribute space, thereby achieving the purpose of compressing the attribute information.
  • the implementation method of attribute prediction may include: first, hierarchically divide the point cloud to be encoded by the level of detail (LOD) algorithm to establish a hierarchical structure of the point cloud to be encoded. Secondly, first encode and decode the low-level points and use the low-level points and the reconstructed points of the same level to predict the high-level points, thereby realizing progressive encoding.
  • LOD level of detail
  • the implementation method of hierarchically dividing the point cloud to be encoded by the LOD algorithm may include: first, all points in the point cloud to be encoded are marked as unvisited, and the visited point set is represented as V. In the initial state, the visited point set V is empty. Loop through all unvisited points in the point cloud to be encoded, calculate the minimum distance D from the current point to the visited point set V, if D is less than the threshold distance, ignore the current point, otherwise mark the current point as visited, and add it to the visited point set V and the current subspace. Finally, the points in each subspace and all the subspaces before each subspace are merged to obtain the hierarchical structure of the point cloud to be encoded.
  • the point cloud to be encoded includes points P1 to P9.
  • points P0, P2, P4, and P5 are added to the visited point set V and level R0 in sequence.
  • points P1, P3, and P8 are added to the visited point set V and level R1 in sequence.
  • points P6, P7, and P9 are added to the visited point set V and level R2 in sequence.
  • the points in each level and all levels before each level are merged to obtain a hierarchical structure of the point cloud to be encoded including three levels.
  • the first level is LOD 0 , including: points P0, P2, P4, P5;
  • the second level is LOD 1 , including: points P0, P2, P4, P5, P1, P3, P8;
  • the third level is LOD 2 , including P0, P2, P4, P5, P1, P3, P8, P6, P7, P9.
  • the lifting transformation is built on the prediction transformation and includes three parts: segmentation, prediction and update.
  • the segmentation module 61 spatially divides the point cloud to be encoded into two parts: a high-level point cloud H(N) and a low-level point cloud L(N).
  • the update module 63 defines and recursively updates the influence weight of each point based on the prediction residual D(N) and the distance between the predicted point and its neighboring points.
  • RAHT transform is a hierarchical region adaptive transform algorithm based on Haar wavelet transform. Based on the hierarchical tree structure, the occupied child nodes in the same parent node are recursively transformed in a bottom-up manner along each dimension, the low-frequency coefficients obtained by the transformation are passed to the next level of the transformation process, and the high-frequency coefficients are quantized and entropy encoded.
  • the above RAHT transformation can be implemented by RAHT transformation based on upsampling prediction.
  • RAHT transformation based on upsampling prediction the overall tree structure of RHAT transformation is changed from bottom-up to top-down, and the transformation is still performed in a 2 ⁇ 2 ⁇ 2 block.
  • the transformation process includes: first, in the first direction, Perform RAHT transformation on the voxel block 71 upward. If there are adjacent voxel blocks in the first direction, the two are subjected to RAHT to obtain the weighted average (DC coefficient) and residual (AC coefficient) of the attribute values of the two adjacent points.
  • DC coefficient weighted average
  • AC coefficient residual
  • the DC coefficient obtained exists as the attribute information of the voxel block 122 of the parent node, and the RAHT transformation of the next layer is performed; and the AC coefficient is retained for the final encoding. If there are no adjacent points, the attribute value of the voxel block 71 is directly passed to the second-layer parent node. During the second-layer RAHT transformation, it is performed along the second direction. If there are adjacent voxel blocks in the second direction, the two are subjected to RAHT transformation, and the weighted average (DC coefficient) and residual (AC coefficient) of the attribute values of the two adjacent points are obtained.
  • the third-layer RAHT transformation is performed along the third direction, and the parent node voxel block 73 with three color depths is obtained as the child node of the next layer in the octree, and then the RAHT transformation is performed cyclically along the first direction, the second direction, and the third direction until there is only one parent node in the entire point cloud to be encoded.
  • the G-PCC decoder 800 may be divided into a geometry decoding module 81 and an attribute decoding module 82 .
  • the geometry decoding module 81 may be further divided into an octree-based geometry decoding unit 811 and a prediction tree-based geometry decoding unit 812 .
  • the main steps of the G-PCC decoder decoding the geometric information code stream through the octree-based geometric decoding unit 811 of the geometric decoding module 81 include: S801, arithmetic decoding; S802, octree synthesis; S803, surface fitting; S804, geometry reconstruction; S805, inverse coordinate conversion steps to obtain the geometric information of the point cloud.
  • the geometric decoding of the octree-based geometric decoding unit 811 includes: in the order of breadth-first traversal, the placeholder code of each node is obtained by continuous parsing, and the nodes are continuously divided in turn until the division is stopped when the 1x1x1 unit cube is obtained, the number of points contained in each leaf node is obtained by parsing, and finally the geometric reconstruction point cloud information is restored.
  • the main steps of the G-PCC decoder decoding the geometric information code stream through the prediction tree-based geometric decoding unit 812 of the geometric decoding module 81 include: S801, arithmetic decoding; S806, reconstruction of the prediction tree; S807, residual calculation; S804, reconstruction of geometry; S805, inverse coordinate conversion steps to obtain the geometric information of the point cloud.
  • the main steps of attribute decoding based on the attribute decoding module 82 of the G-PCC decoder 800 include: S808, arithmetic decoding; S809, inverse quantization; executing steps S810 and S811, or executing step S812; S810, attribute prediction; S811, lifting transformation; S812, RAHT-based inverse transformation; S813, color inverse transformation to obtain the attribute information of the point cloud. Finally, the three-dimensional image model of the point cloud data to be encoded is restored based on the geometric information and attribute information.
  • the main steps of the G-PCC decoder decoding the attribute information code stream based on the attribute decoding module 82 and the main steps of the G-PCC encoder encoding the attribute information based on the attribute encoding module 82 are inverse processes and will not be repeated here.
  • Some embodiments of the present application provide a scene description framework that supports point cloud code streams obtained by the G-PCC compression standard, and the specific contents include: scene description file support for media files of type G-PCC encoded point cloud, media access function API support for media files of type G-PCC encoded point cloud, media access function support for media files of type G-PCC encoded point cloud, cache API support for media files of type G-PCC encoded point cloud, cache management support for media files of type G-PCC encoded point cloud, and other contents.
  • the process of rendering a media file of type G-PCC coded point cloud in a three-dimensional scene based on a scene description framework includes: first First, the display engine obtains the scene description file by downloading or reading locally. Among them, the scene description file contains the description information of the entire three-dimensional scene and the media file of the type G-PCC coded point cloud contained in the scene.
  • the description information of the media file of the type G-PCC coded point cloud may include the access address of the media file of the type G-PCC coded point cloud, the storage format of the processed decoded data of the media file of the type G-PCC coded point cloud, the playback time and playback frame rate of the media file of the type G-PCC coded point cloud, etc.
  • the display engine parses the scene description file, it passes the description information of the media file of the type G-PCC coded point cloud contained in the scene description to the media access function through the media access function API.
  • the display engine calls the cache management module through the cache API to allocate the cache, and can also pass the cache information to the media access function, and the media access function calls the cache management module through the cache API to allocate the cache.
  • the media access function first requests the server to download the media file of the type G-PCC coded point cloud, or reads the media file of the type G-PCC coded point cloud from the local file.
  • the media access function After obtaining the media file of type G-PCC coded point cloud, the media access function creates and starts the corresponding pipeline to process the media file of type G-PCC coded point cloud.
  • the input of the pipeline is the encapsulated file of the media file of type G-PCC coded point cloud.
  • the pipeline performs decapsulation, G-PCC decoding, post-processing and other processes in sequence, and then stores the processed data into the specified cache.
  • the display engine obtains the decoded data of the media file of type G-PCC coded point cloud from the specified cache, and renders and displays the three-dimensional scene according to the data obtained in the cache.
  • the following describes the scene description file, media access function API, media access function, cache API, and cache management of media files supporting the G-PCC coded point cloud type.
  • some embodiments of the present application extend the values of the syntax elements in the MPEG media (MPEG_media) of the scene description file, and the specific extension includes at least one of the following:
  • the media type syntax element (MPEG_media.media.alternatives.mimeType) used to declare the encapsulation format of the media file in the options (MPEG_media.media.alternatives) of the media list (media) of the MPEG media (MPEG_media) of the scene description file is extended.
  • the extension of the media type syntax element (mimeType) includes: extending the media type syntax element (mimeType) with a value "application/mp4" associated with the G-PCC coded point cloud.
  • the value of the media type syntax element (mimeType) is "application/mp4".
  • Extension 2 The value of the first track index syntax element (MPEG_media.media.alternatives.tracks.track) used to declare the track information of the media file in the optional track array (MPEG_media.media.alternatives.tracks) of the media list (media) of the MPEG media (MPEG_media) of the scene description file is extended.
  • the extension of the first track index syntax element includes: when G-PCC data is referenced by the scene description file as an item in the track array of the optional items of the media list of the MPEG media and the referenced item complies with the provisions on tracks in the International Standardization Organization Base Media File Format (ISOBMFF): for G-PCC data encapsulated in a single track, the track referenced in the MPEG media is the G-PCC code stream track, and for G-PCC data encapsulated in multiple tracks, the track referenced in the MPEG media is the G-PCC geometry code stream track.
  • ISOBMFF International Standardization Organization Base Media File Format
  • Extension 3 The encoding and decoding parameters of the media data contained in the code stream track in the track array (tracks) of the alternatives (alternatives) of the media list (media) of the MPEG media (MPEG_media) of the scene description file are used to describe The codec parameter syntax element (MPEG_media.media.alternatives.tracks.codecs) is extended.
  • the specific extension includes: extending the codec parameters of the media files contained in the codestream track defined in IETFRFC 6381.
  • the codec parameter syntax element (codecs) can be represented by a comma-separated list of codec values. Therefore, the extension of the value of the syntax element codec parameter syntax element (codecs) includes: when the type of the media file is G-PCC encoded point cloud, the value of the codec parameter syntax element (codecs) should be set in accordance with the provisions of the ISO/IEC 23090-18 G-PCC Data Transport (Carriage of Geometry-based Point Cloud Compression Data) standard.
  • the "codecs" attribute of the preselection signaling should be set to 'gpc1', indicating that the preselected media is based on a geometric point cloud;
  • the "codecs" attribute of the Main G-PCC Adaptation Set should be set to 'gpcb' or 'gpeb', indicating that the adaptation set contains G-PCC Tile basic track data.
  • the "codecs" attribute of the Main G-PCC adaptationsset should be set to 'gpcb'.
  • the "codecs” attribute of the Main G-PCC Adaptation Set should be set to 'gpeb'.
  • G-PCC Tile preselection signaling is used in an MPD file, the "codecs" attribute of the preselection signaling shall be set to 'gpt1', indicating that the preselected media is a geometry-based point cloud tile.
  • some implementations of the present application extend the values of the syntax elements in the MPEG media (MPEG_media) in the scene description file, and the specific extension includes extending one or more of the following items shown in Table 12:
  • At least one of the above extensions 1 to 3 is performed on the syntax element values in the MPEG media (MPEG_media) in the scene description file, so that the MPEG media (MPEG_media) part in the scene description file supports media files of the type G-PCC coded point cloud.
  • a method for describing scenes and nodes in a scene description file containing a media file of type G-PCC coded point cloud includes: when a three-dimensional scene contains a media file of type G-PCC coded point cloud, the scene and node description method is used to describe the overall structure of the three-dimensional scene and the structural hierarchy and position of the media file of type G-PCC coded point cloud in the three-dimensional scene.
  • the description method using a scene description module and a node description module is used to describe the overall structure of the three-dimensional scene and the structural hierarchy and position of the media file of type G-PCC coded point cloud in the three-dimensional scene, including: one three-dimensional scene is described using one scene description module.
  • Each scene description file can describe one or more three-dimensional scenes, and the three-dimensional scenes can only be in a parallel relationship, not a hierarchical relationship.
  • the nodes can be in a parallel relationship or a hierarchical relationship.
  • a method for describing a three-dimensional mesh in a scene description file of a media file of type G-PCC encoded point cloud including: reusing the syntax elements in the attributes (mesh.primitives.attributes) of the primitives of the mesh description module to describe various types of data of the media file of type G-PCC encoded point cloud.
  • attributes messages.primitives.attributes
  • a point cloud is a scattered data structure, a collection of many scattered points is a point cloud, so describing a media file of type G-PCC encoded point cloud is equivalent to describing the data at each point in the point cloud.
  • each point in a media file of type G-PCC encoded point cloud has two types of information: geometric information and attribute information.
  • the geometric information represents the three-dimensional coordinates of the point in space
  • the attribute information represents the color, reflectivity, normal direction, and other information attached to the point. Since the data contained in the points of a media file of type G-PCC encoded point cloud are similar to the attributes that can be declared by the syntax elements contained in the attributes of the primitives of the mesh description module, when describing the data contained in the points of a media file of type G-PCC encoded point cloud in the mesh description module (mesh), the syntax elements in the attributes (mesh.primitives.attribute) of the primitives (primitives) of the mesh description module (mesh) can be reused to describe the data contained in the points of the media file of type G-PCC encoded point cloud.
  • the value of the position syntax element (position, the first table item in Table 1 above) in the attribute of the primitive of the mesh description module is a three-dimensional vector composed of floating point numbers.
  • Such a data structure can also represent the geometric information of the G-PCC coded point cloud. Therefore, the position syntax element (position) in the attribute (mesh.primitives.attribute) of the primitive of the reused mesh description module represents the geometric information of the point in the media file of the type G-PCC coded point cloud.
  • the color value of the point in the media file of the type G-PCC coded point cloud can also be represented by the color syntax element (color_n, the fifth table item in Table 1 above) in the attribute (mesh.primitives.attribute) of the primitive of the reused mesh description module.
  • the normal vector of the point in the media file of the type G-PCC coded point cloud can also be represented by the normal vector syntax element (normal, the third table item in Table 1 above) in the attribute (mesh.primitives.attribute) of the reused mesh description module.
  • the first syntax element set is defined as a set of syntax elements supported in the attributes of primitives of the mesh description module of the scene description file specified in the ISO/IEC 23090-14 MPEG-I scene description standard as the first syntax element set, and the description method of the three-dimensional mesh of the media file of the type G-PCC coded point cloud is supported, including: based on the syntax elements in the first syntax element set, adding syntax elements corresponding to various types of data possessed by the three-dimensional mesh in the attributes of the primitives of the mesh description module corresponding to the three-dimensional mesh.
  • Table 13 lists the method of describing the syntax elements in the attributes (mesh.primitives.attribute) of the primitives of the mesh description module of the partial data on the points in the media file of the type G-PCC coded point cloud:
  • the G-PCC coded point cloud data may also include other data.
  • Other data of the G-PCC coded point cloud can also be described by reusing the syntax elements in the attributes of the primitives of the mesh description module, such as texture coordinates (texcoord_n), joints (joints_n), weights (weights_n), etc.
  • a method for describing a three-dimensional mesh of a media file of type G-PCC coded point cloud comprising: adding a target extension array to an extension list of primitives (mesh.primitives.extensions) of a mesh description module, and adding syntax elements corresponding to various types of data contained in the three-dimensional mesh in the media file of type G-PCC coded point cloud in the target extension array, and describing geometric information, color data, normal vector and other data associated with each vertex of the three-dimensional mesh in the media file of type G-PCC coded point cloud through the syntax elements corresponding to various types of data.
  • primitives mesh.primitives.extensions
  • adding syntax elements corresponding to various types of data contained in the corresponding three-dimensional grid to the target extension array includes: adding syntax elements corresponding to various types of data contained in the corresponding three-dimensional grid to the target extension array based on syntax elements in a first syntax element set.
  • the first syntax element set is a set of syntax elements supported by the attributes of primitives of a grid description module of a scene description file specified in the ISO/IEC 23090-14 MPEG-I scene description standard.
  • syntax elements corresponding to each type of data contained in the corresponding three-dimensional grid are added to the target extended array, including: a second syntax element set composed of syntax elements corresponding to a preset G-PCC coded point cloud, and syntax elements corresponding to each type of data contained in the corresponding three-dimensional grid are added to the target extended array.
  • syntax element used to represent the geometric information associated with each vertex is defined as the first syntax element
  • syntax element used to represent the color data associated with each vertex is defined as the second syntax element
  • syntax element used to represent the normal vector associated with each vertex is defined as the third syntax element.
  • syntax elements added to the target extension array of the extension list (mesh.primitives.extensions) of the primitives of the partial mesh description module include:
  • FIG9 is a schematic diagram of a scene description file structure after adding a target extension array to the extension list (mesh.primitives.extensions) of the primitives of the mesh description module and extending the first syntax element, the second syntax element and the third syntax element in the target extension array based on the above embodiment.
  • the scene description file includes but is not limited to the following modules: MPEG media (MPEG_media) 901, scene description module (scene) 902, node description module (node) 903, mesh description module (mesh) 904, accessor description module (accessor) 905, buffer slice description module (bufferView) 906, buffer description module (buffer) 907, skin description module (skin) 908, animation description module (animation) 909, camera description module (camera) 910, material description module (material) 911, texture description module (texture) 912, sampler description module (sampler) 913 and texture map description module (image) 914.
  • the extended list of the primitive attributes of the grid description module 904 includes a target extension array 9000, a target extension array
  • the extended syntax elements in 9000 include: a first syntax element 9001 for representing the geometric information associated with each vertex, a second syntax element 9002 for representing the color data associated with each vertex, and a third syntax element 9003 for representing the normal vector associated with each vertex.
  • the functions, accessor types, data types and other information of other elements in the scene description file shown in FIG9 are similar to those in the scene description file shown in FIG3, and are not described in detail here.
  • a grid description method for a media file of type G-PCC coded point cloud including: pre-configuring syntax elements corresponding to various types of data of the G-PCC coded point cloud, and based on the pre-configured syntax elements corresponding to various types of data of the G-PCC coded point cloud, adding syntax elements corresponding to various types of data in the attributes of the primitives of the grid description module corresponding to the three-dimensional grid in the G-PCC coded point cloud.
  • the syntax elements corresponding to various types of data of the preconfigured G-PCC coded point cloud include: a fourth syntax element for representing geometric information associated with each vertex, a fifth syntax element for representing color data associated with each vertex, and a sixth syntax element for representing a normal vector associated with each vertex.
  • syntax elements corresponding to various types of data are added to the attributes of primitives of a mesh description module corresponding to the three-dimensional mesh in the G-PCC coded point cloud, including: adding at least one of the fourth syntax element, the fifth syntax element, and the sixth syntax element to the attributes of the primitives of the mesh description module corresponding to the three-dimensional mesh in the G-PCC coded point cloud.
  • the syntax element corresponding to the G-PCC coded point cloud for representing the geometric information associated with each vertex is defined as the fourth syntax element
  • the syntax element corresponding to the G-PCC coded point cloud for representing the color data associated with each vertex is defined as the fifth syntax element
  • the syntax element corresponding to the G-PCC coded point cloud for representing the normal vector associated with each vertex is defined as the sixth syntax element.
  • the description method of the syntax element in the attribute of the primitives of the partial mesh description module includes:
  • Fig. 10 is a schematic diagram of the structure of a scene description file after the syntax elements in the attributes (mesh.primitives.attribute) of the primitives of the mesh description module are expanded based on the above embodiment.
  • the scene description file includes but is not limited to the following modules: MPEG media (MPEG_media) 101, scene description module (scene) 102, node description module (node) 102, mesh description module (mesh) 104, accessor description module (accessor) 105, buffer slice description module (bufferView) 106, buffer description module (buffer) 107, skin description module (skin) 108, animation description module (animation) 109, camera description module (camera) 110, material description module (material) 111, texture description module (texture) 112, sampler description module (sampler) 113 and texture map description module (image) 114.
  • MPEG media MPEG_media
  • scene scene
  • node description module node description module
  • mesh description module mesh
  • accessor description module accessor
  • the primitive attributes (mesh.primitives.attribute) of the mesh description module 104 include: The fourth syntax element 1041 for indicating the geometric information associated with each vertex, the fifth syntax element 1042 for indicating the color data associated with each vertex, and the fifth syntax element 1043 for indicating the normal vector associated with each vertex.
  • the functions, accessor types, data types and other information of other elements in the scene description file shown in FIG10 are similar to those in the scene description file shown in FIG3, and are not described in detail here.
  • the scene description file describes a three-dimensional scene containing a media file of type G-PCC coded point cloud
  • the syntax elements in the attributes of the primitives of the mesh description module are reused to describe the G-PCC coded point cloud data, or a target extension array is added to the primitives of the mesh description module or new syntax elements are extended in the attributes of the primitives of the mesh description module to describe the media file of type G-PCC coded point cloud
  • the mesh description module (mesh) will contain a large number of points in the G-PCC coded point cloud, and each point contains at least geometric information and attribute information. Therefore, it is inconvenient to store the data of the media file of type G-PCC coded point cloud directly in the scene description framework. Instead, the link to the media file of type G-PCC coded point cloud is pointed out in the scene description framework, and the media file is downloaded when the data of the G-PCC coded point cloud is needed.
  • the scene description file may also be merged with a media file of the type of G-PCC coded point cloud to form a binary file to reduce the types and number of files.
  • the media file of type G-PCC coded point cloud needs to be specified in the buffer description module (buffer), but the Uniform Resource Locator (URL) of the media file of type G-PCC coded point cloud is not directly added in the buffer description module. Instead, the value of the media index syntax element (media) in the MPEG circular buffer (MPEG_buffer_circular) in the buffer description module (buffer) points to the media description module corresponding to the media file of type G-PCC coded point cloud in the MPEG media (MPEG_media).
  • the value of the uniform resource identifier syntax element (uri) in the optional options of the media description module corresponding to the media file of type G-PCC coded point cloud in the media list (media) of the MPEG media (MPEG_media) is: "http://www.example.com/G-PCCexample.mp4", and it is the first media description module in the MPEG media, then the value of the media index syntax element (media) of the MPEG circular buffer (MPEG_buffer_circular) can be set to "0", so as to index the link of the first media file in the MPEG media in the MPEG circular buffer in the buffer description module, so as to index the media description module corresponding to the media file of type G-PCC coded point cloud in the MPEG media (MPEG_media) through the media index syntax element (media) in the MPEG circular buffer (MPEG_buffer_circular.media) of the buffer description module (buffer).
  • an accessor accessor
  • a cache slice cache slice
  • a buffer (buffer) description method for a media file of type G-PCC coded point cloud including: track information of data cached by the value buffer of the second track index syntax element (track) of the track array (tracks) of the MPEG circular buffer (MPEG_buffer_circular) of the buffer description module (buffer).
  • MPEG_buffer_circular is used to reduce the need for data caching while ensuring data caching.
  • the MPEG circular buffer can be regarded as connecting the head and tail of the ordinary buffer to form a ring, and writing the buffer to the circular buffer and reading the data in the circular buffer rely on the write pointer and the read pointer to realize the simultaneous writing and reading process.
  • the syntax elements contained in the MPEG circular buffer are shown in Table 16:
  • the value of the media index syntax element (media) in Table 16 is the index value of the media description module corresponding to the media file of the type G-PCC coded point cloud declared in MPEG media (MPEG_media), so that the media file of the type G-PCC coded point cloud can be indexed in the buffer description module (buffer), and based on the setting rules of the value of the track index syntax element (tracks) in Table 16, the value of the track index syntax element (tracks) in Table 16 is the index value of one or more code stream tracks of the media file of the type G-PCC coded point cloud, so that the decoded data of the one or more code stream tracks can be cached in the corresponding buffer.
  • a method for describing materials (material), texture (texture), sampler (sampler) and texture map (image) of a media file of type G-PCC encoded point cloud including: when a scene description file is used to describe a three-dimensional scene of a G-PCC encoded point cloud, materials (material), texture (texture), sampler (sampler) and texture map (image) are not used to describe the three-dimensional scene.
  • the G-PCC encoded point cloud is a scattered topological structure, it does not actually have the concept of surface. Various additional information is directly represented on the points, and material, texture, sampler and image are all attachment information for the surface. Therefore, only the definitions of material, texture, sampler and image are retained, but material, texture, sampler and image are not used to describe the three-dimensional scene.
  • a camera description module (camera) description method for a media file of type G-PCC encoded point cloud is supported, including: defining the viewpoint, viewing angle and other viewing-related visual information of a node in a three-dimensional scene through a camera description module.
  • an animation description module (animation) description method for a media file of type G-PCC coded point cloud including: adding animation to a node description module (node) in a three-dimensional scene through an animation description module (animation).
  • the animation description module may describe the animation added to the node description module (node) through one or more of position movement, angle rotation, and size scaling.
  • the animation description module can also indicate the start time of the animation added to the node description module (node). At least one of the duration, end time and animation implementation method.
  • a description method of a skin description module (skin) for a media file of type G-PCC encoded point cloud includes: defining the movement and deformation relationship between a mesh (mesh) in a node description module (node) and the corresponding bone through the skin description module (skin).
  • the moving picture experts group media MPEG_media
  • scene description module scene
  • node description module node
  • mesh description module mesh
  • accessor description module accessor
  • cache slice description module bufferView
  • buffer description module buffer
  • skin description module skin
  • animation description module animation
  • camera description module camera
  • material description module material
  • texture description module texture
  • sampler description module sampler
  • texture map description module image
  • scene description file supporting media files of the G-PCC coded point cloud type provided in an embodiment of the present application is described below in conjunction with a specific scene description file.
  • the pair of curly brackets between line 1 and line 118 contain the main contents of the scene description file supporting media files of type G-PCC coded point cloud, which includes: digital asset description module (asset), extension description module (extensionUsed), MPEG media (MPEG_media), scene declaration (scene), scene list (scenes), node list (nodes), mesh list (meshes), accessor list (accessors), buffer slice list (bufferViews), and buffer list (buffers).
  • digital asset description module asset
  • extension description module extension description module
  • MPEG_media MPEG media
  • scene declaration scene
  • scene list scenes
  • node list node list
  • mesh list meshes
  • accessor list accessors
  • buffer slice list buffer slice list
  • buffer list buffer list
  • Digital asset description module (asset): The digital asset description module is the 2nd to 4th line. From the "version”: “2.0" in the 3rd line of the digital asset description module, it can be determined that the scene description file is written based on glTF 2.0, which is also the scene description file. From the parsing perspective, the display engine can determine which parser should be selected to parse the scene description file based on the digital asset description module.
  • Extension description module used (extensionUsed): The extension description module used is lines 6 to 10. Since the extension description module used includes three syntax elements: MPEG media (MPEG_media), MPEG circular buffer (MPEG_buffer_circular) and MPEG time-varying accessor (MPEG_accessor_timed), it can be determined that the scene description file uses three MPEG extensions: MPEG media, MPEG circular buffer, and MPEG time-varying accessor. From the parsing perspective, the display engine can know in advance that the extension items involved in the subsequent parsing include: MPEG media, MPEG circular buffer, and MPEG time-varying accessor based on the content of the extension description module used.
  • MPEG media MPEG_media
  • MPEG_buffer_circular MPEG_buffer_circular
  • MPEG_accessor_timed MPEG time-varying accessor
  • MPEG media is lines 12 to 34.
  • the track information of the media file of type G-PCC coded point cloud is indicated, the codec parameters of the media file of type G-PCC coded point cloud are indicated by "codecs":"gpc1" in line 26, the name of the media file of type G-PCC coded point cloud is indicated by "name":"G-PCCexample” in line 16, the media file of type G-PCC coded point cloud should be played automatically by “autoplay”:true in line 17, and the media file of type G-PCC coded point cloud should be played in a loop by "loop":true in line 18.
  • the display engine can determine that there is a media file of type G-PCC coded point cloud in the 3D scene to be rendered by parsing MPEG media, and learn the method of accessing and parsing the media file of type G-PCC coded point cloud.
  • Scene declaration The scene declaration is line 36. Because a scene description file can theoretically include multiple 3D scenes, the scene description file firstly points out through the scene declaration on line 36 and its "scene":0 that the 3D scene to be subsequently processed and rendered based on the scene description file is the first 3D scene in the scene list, that is, the 3D scene enclosed by the curly brackets on lines 39 to 43.
  • Scene list (scenes): The scene list is lines 38 to 44.
  • the scene list contains only one curly bracket, indicating that the scene list only includes one scene description module.
  • the scene description file only contains one 3D scene.
  • the "nodes":[0] in lines 40 to 42 in the curly bracket indicates that the 3D scene only includes one node, and the index value of the node description module corresponding to the node is 0.
  • the content of the scene list clarifies that the entire scene description framework should select the first 3D scene in the scene list (the 3D scene with index 0) for subsequent processing and rendering, clarifies the overall structure of the 3D scene, and points to the next layer of more detailed node description modules (node).
  • Node list (nodes): The node list is lines 46 to 51.
  • the node list contains only one curly bracket, indicating that the node list includes only one node description module, and the three-dimensional scene has only one node, and the node is the same node as the node with an index value of 0 in the node description module in the scene description module, and the two are associated through indexing.
  • the name of the node is "G-PCCexample_node” indicated by “name”:"G-PCCexample_node” in line 48
  • the content mounted on the node is the three-dimensional mesh corresponding to the first mesh description module in the mesh list through "mesh":0 in line 49, which corresponds to the mesh description module of the next layer.
  • the content of the node list indicates that the content mounted on the node is a three-dimensional mesh, and that the three-dimensional mesh is the three-dimensional mesh corresponding to the first mesh description module in the mesh list.
  • the mesh list is lines 53 to 66.
  • the mesh list contains only one curly bracket, indicating that the mesh list includes only one mesh description module.
  • the three-dimensional scene has only one three-dimensional mesh, and the three-dimensional mesh is the same three-dimensional mesh as the three-dimensional mesh with an index value of 0 in the node description module.
  • the curly brackets (mesh description module) describing the three-dimensional mesh the name of the three-dimensional mesh is indicated by "name":"G-PCCexample_mesh” on line 55, which is used only as an identification mark.
  • the "primitives" on line 56 indicates that the three-dimensional mesh has primitives.
  • Buffer list (buffers): The buffer list is lines 106 to 117.
  • the buffer list contains only one curly bracket, indicating that the scene description file only includes one buffer description module, and the display of the 3D scene only needs to access one media file.
  • the MPEG circular buffer (MPEG_buffer_circular) extension is used, indicating that the buffer is a circular buffer modified using the MPEG extension.
  • the "media:0" in line 112 indicates that the data source in the circular buffer is the media file corresponding to the first media description module declared in the MPEG media in the previous text.
  • the track with index 1 is not limited here.
  • It can be the only track of a media file of type G-PCC coded point cloud in a single-track package, or it can be a geometric code stream track of a media file of type G-PCC coded point cloud in a multi-track package.
  • the syntax element "count”:5 in the MPEG circular buffer it can also be determined that the MPEG circular buffer has five storage links.
  • the syntax element "by teLength”:15000 it can also be determined that the byte length (capacity) of the MPEG ring buffer is 15000 bytes.
  • the buffer list realizes the correspondence of the media files of the type G-PCC coded point cloud declared in the MPEG media to the buffer, or in other words, the buffer references the media files of the type G-PCC coded point cloud that were only declared but not used before.
  • the media files of the type G-PCC coded point cloud referenced here are unprocessed G-PCC encapsulated files.
  • the G-PCC encapsulated files need to be processed by the media access function to extract the position coordinates (position) and color values (color_0) mentioned in the grid description module, which can be directly used for rendering.
  • Buffer slice list (bufferViews): The buffer slice list is rows 93 to 104.
  • the buffer slice list contains two parallel curly brackets. Combined with the fact that there is only one buffer determined by the buffer description module, it means that the buffer used to store the media file of type G-PCC coded point cloud is divided into two cache slices, and the point cloud data in the media file of type G-PCC coded point cloud is stored in two cache slices.
  • the buffer description module with index 0 is first pointed to by buffer:0 in row 95, that is, the only buffer description module mentioned in the buffer list, and then the data slice range of the corresponding cache slice is limited to the first 12,000 bytes by the two parameters of byte length (byteLength) and byte offset (byteOffset) in rows 96 and 94.
  • the content in the second curly bracket is similar to the first curly bracket, except that the data slice range is defined as the last 3,000 bytes. From a parsing perspective, the cache slice list groups the point cloud data in the media file of type G-PCC encoded point cloud, which is conducive to the detailed definition of the subsequent accessor description module.
  • Accessor list (accessors): The accessor list is lines 68 to 91. The structure of the accessor list is similar to that of the cache slice list, and both contain two parallel curly braces, indicating that the accessor list includes two accessor description modules. The display of the three-dimensional scene requires access to media data through two accessors.
  • both curly braces (accessor description modules) contain the extension MPEG time-varying accessor (MPEG_accessor_timed), indicating that these two accessors point to time-varying media defined by MPEG.
  • MPEG_accessor_timed extension MPEG time-varying accessor
  • the data format stored in the accessor is a three-dimensional vector composed of 32-bit floating point numbers. "count”:1000 indicates that there are 1000 data that need to be accessed through the accessor of this format. Each 32-bit floating point number occupies 4 bytes. Therefore, the accessor corresponding to the accessor description module contains 12000 bytes of data, which corresponds to the setting in the cache slice description module with an index value of 0.
  • the content in the second curly brace (the second accessor description module) is similar.
  • the index value of the cache slice description module is changed to 1, and the data type is redefined. From the perspective of parsing, the accessor list (accessors) completes the complete definition of the data required for rendering. For example, the data types missing in the cache slice description module and the cache description module are defined in the corresponding accessor description module.
  • the main function of the display engine supports the function of the display engine of the media file of the type of G-PCC coded point cloud, which is similar to the main function of the display engine in the workflow of the scene description framework of immersive media described above, including: 1. Able to parse the scene description file of the media file of the type of G-PCC coded point cloud to obtain the corresponding rendering method of the three-dimensional scene; 2. Able to pass media access instructions or media data processing instructions through the media access function API and the media access function; wherein the media access instructions or media data processing instructions come from the parsing result of the scene description file of the media file of the type of G-PCC coded point cloud; 3.
  • the display engine can obtain a method for rendering a three-dimensional scene including a media file of type G-PCC media file by parsing the scene description file, and it is necessary to pass the method for rendering the three-dimensional scene to the media access function or send instructions to the media access function based on the method for rendering the three-dimensional scene.
  • the process of passing the method for rendering the three-dimensional scene to the media access function or sending instructions to the media access function based on the method for rendering the three-dimensional scene is implemented through the media access function API.
  • the display engine may send a media access instruction or a media data processing instruction to the media access function through the media access function API.
  • the media access instruction or the media data processing instruction sent by the display engine to the media access function through the media access function API comes from the parsing result of the scene description file of the media file of the type G-PCC coded point cloud, and the media access instruction or the media data processing instruction may include: the index of the media file of the type G-PCC coded point cloud, the URL of the media file of the type G-PCC coded point cloud, the attribute information of the media file of the type G-PCC coded point cloud, the display time window of the media file of the type G-PCC coded point cloud, the format requirements for the processed media file of the type G-PCC coded point cloud, etc.
  • the media access function can also request media access from the display engine through the media access function API. Instructions or media data processing instructions.
  • the media access function After the media access function receives the media access instruction or media data processing instruction issued by the display engine through the media access function API, it will execute the media access instruction or media data processing instruction issued by the display engine through the media access function API. For example: obtaining a media file of type G-PCC coded point cloud, establishing a suitable pipeline for a media file of type G-PCC coded point cloud, allocating a suitable cache for a processed media file of type G-PCC coded point cloud, etc.
  • the media access function obtains a media file of a G-PCC coded point cloud type, including: using a network transmission service to download the media file of a G-PCC coded point cloud type from a server.
  • the media access function obtains a media file of a G-PCC coded point cloud type, including: reading the media file of a G-PCC coded point cloud type from a local storage space.
  • the media access function After the media access function obtains the media file of type G-PCC coded point cloud, it needs to process the media file of type G-PCC coded point cloud. There are significant differences in the processing processes of different types of media files. In order to achieve wide media type support and take into account the work efficiency of the media access function, a variety of pipelines are designed in the media access function. In the process of processing media files, only the pipeline that matches the media type is enabled.
  • the media access function needs to establish a corresponding pipeline for the media file of type G-PCC coded point cloud, and perform decapsulation, G-PCC decoding, post-processing and other processes on the media file of type G-PCC coded point cloud through the established pipeline to complete the processing of the media file of type G-PCC coded point cloud, and process the media file data of type G-PCC coded point cloud into a data format that can be used for direct rendering by the display engine.
  • FIG. 11 is a schematic diagram of the structure of the pipeline corresponding to the G-PCC coded point cloud in some embodiments of the present application.
  • the pipeline 1100 supporting media files of the type of G-PCC coded point cloud includes: an input module 111 , a decapsulation module 112 , a geometry decoder 113 , an attribute decoder 114 , a first post-processing module 115 , and a second post-processing module 116 .
  • the input module 111 is used to receive a G-PCC encapsulation file and input the G-PCC encapsulation file into the decapsulation module 112.
  • the G-PCC encapsulation file is a file obtained by encapsulating the G-PCC code stream obtained by G-PCC encoding the point cloud data. Since the G-PCC encapsulation file is presented in the form of a track, what the input module 111 receives is the track code stream of the G-PCC encapsulation file.
  • the G-PCC encapsulation file can be a single track or a multi-track. Therefore, in the embodiment of the present application, the G-PCC encapsulation file received by the input module 111 can be a single track or a multi-track, and the embodiment of the present application does not limit this.
  • the decapsulation module 112 is used to decapsulate the G-PCC encapsulation file input by the input module 111 to obtain a G-PCC code stream (including a geometry information code stream and an attribute information code stream, input the geometry information code stream to a geometry decoder 113, and input the attribute information code stream to an attribute decoder 114. It should be noted that, with the development of relevant technologies, the G-PCC code stream may also increase the code stream of other information. When the G-PCC code stream also includes the code stream of other information, the decapsulation module 112 decapsulates the G-PCC encapsulation file to obtain the code stream of other information, and inputs the code stream of other information into the corresponding decoder.
  • the geometry decoder 113 is used to decode the geometry information code stream output by the decapsulation module 112 to obtain the geometry information of the point cloud.
  • the main steps of the geometry decoder 113 decoding the geometry information code stream include: obtaining the geometry information of the point cloud through arithmetic decoding, octree synthesis, surface fitting, geometry reconstruction, inverse coordinate conversion, etc.
  • the specific implementation of the geometry decoder 113 decoding the geometry information code stream can refer to the workflow of the geometry decoding module 81 in Figure 8, which will not be described in detail here.
  • the attribute decoder 114 is used to decode the attribute information code stream input by the decapsulation module 112 to obtain the attribute information of the point cloud.
  • the main steps of the attribute decoder 114 decoding the geometric information code stream include: attribute prediction, lifting and inverse operation of RAHT transformation, etc., to obtain the attribute information code stream.
  • the specific implementation of the attribute decoder 114 decoding the attribute information code stream can refer to the workflow of the attribute decoding module 82 in Figure 8, which will not be described in detail here.
  • the first post-processing module 115 is used to process the geometric information output by the geometry decoder 113. After the decoding of the geometric information code stream is completed, the geometric information of the points in the G-PCC encoded point cloud can be obtained, and in some cases the obtained geometric information can be directly used by the display engine, but because the scene description framework does not impose too many restrictions on the display engine or specifically define it, a wide variety of display engines may appear. These different display engines may have different requirements for input data, so after the decoding of the geometric information code stream is completed, the first post-processing module 115 is added to ensure that the geometric information output by the pipeline is available to any display engine. In some embodiments, the first post-processing module 115 processes the geometric information including: converting the format of the geometric information.
  • the second post-processing module 116 is used to process the attribute information output by the attribute decoder 114. After the decoding of the attribute information code stream is completed, the attribute information of the points in the G-PCC encoded point cloud can be obtained, and in some cases the attribute information can be directly used by the display engine, but because the scene description framework does not impose too many restrictions on the display engine or specifically define it, a wide variety of display engines may appear. These different display engines may have different requirements for input data, so after the decoding of the attribute information code stream is completed, the second post-processing module 116 is added to ensure that the output attribute information of the pipeline is available to any display engine. In some embodiments, the first post-processing module 115 processes the geometric information including: format conversion of the attribute information.
  • the processed geometric information output by the first post-processing module 115 and the processed attribute information output by the second post-processing module 116 are written into the buffer 117, so that the display engine 118 reads the geometric information and attribute information from the buffer as needed, and renders and displays the G-PCC encoded point cloud in the three-dimensional scene based on the read geometric information and attribute information.
  • the media access function After the media access function completes the processing of the G-PCC encoded point cloud data through the pipeline, the media access function also needs to deliver the processed data to the display engine in a standardized arrangement structure. This requires the processed G-PCC encoded point cloud data to be correctly stored in the cache. This work is completed by the cache management module, but the cache management module needs to obtain cache management instructions from the media access function or the display engine through the cache API.
  • the media access function may send a cache management instruction to the cache management module via a cache API, wherein the cache management instruction is a cache management instruction sent by the display engine to the media access function via the media access function API.
  • the display engine may send cache management instructions to the cache management module via a cache API.
  • the cache management module can communicate with the media access function through the cache API, and can also communicate with the display engine through the cache API, and the purpose of communicating with the media access function or the display engine is to achieve cache management.
  • the display engine needs to send the cache management instruction to the media access function through the media access function API first, and the media access function then sends the cache management instruction to the cache management module through the cache API;
  • the display engine only needs to generate the cache management instruction based on the cache management information parsed from the scene description file, and send it to the cache management module through the cache API.
  • the cache management instruction may include one or more of an instruction to create a cache, an instruction to update a cache, and an instruction to release a cache.
  • the processed G-PCC encoded point cloud data needs to be delivered to the display engine in a standardized arrangement structure. This requires the processed G-PCC encoded point cloud data to be correctly stored in the cache, and this task is the responsibility of the cache management module.
  • the cache management module implements management operations such as cache creation, update, and release, and the operation instructions are received through the cache API.
  • the cache management rules are recorded in the scene description document, parsed by the display engine, and finally issued to the cache management module by the display engine or the media access function.
  • the role of cache management is to manage these caches so that they match the format of the processed media data without disrupting the processed media data.
  • the specific design method of the media management module should be based on the design of the display engine and the media access function.
  • some embodiments of the present application provide a method for generating a scene description file.
  • the method for generating a scene description file includes the following steps S121 to S123:
  • S121 Determine the type of the media file in the 3D scene to be rendered.
  • the types of media files in the embodiments of the present application may include: one or more of G-PCC encoded point cloud, V-PCC encoded point cloud, tactile media files, 6DoF video, MIV video, etc., and the same type of media files may include any number of them.
  • the three-dimensional scene to be rendered may include only one media file of the type G-PCC encoded point cloud.
  • the three-dimensional scene to be rendered may include a media file of the type G-PCC encoded point cloud and a media file of the type V-PCC encoded point cloud.
  • the three-dimensional scene to be rendered may include two media files of the type G-PCC encoded point cloud and a tactile media file.
  • step S121 if the type of the target media file in the to-be-rendered three-dimensional scene is a G-PCC coded point cloud, the following step S122 is performed:
  • S122 Generate a target description module corresponding to the target media file according to the description information of the target media file.
  • the description information of the target media file includes: one or more of: the name of the target media file, whether the target media file needs to be played automatically, whether the target media file needs to be played in a loop, the encapsulation format of the target media file, the type of the code stream of the target media file, the encoding parameters of the target media file, etc.
  • the above step S122 (generating a target description module corresponding to the target media file according to the description information of the target media file) includes at least one of the following steps 1221 to 1229:
  • Step 1221 Add a media name syntax element (name) in the target media description module, and set the value of the media name syntax element according to the name of the target media file.
  • the media name syntax element in the target media description module is "name"
  • the name of the target media file is "G-PCCexample”
  • add the syntax element "name” in the target media description module and set the value of the syntax element "name” to "G-PCCexample”.
  • Step 1222 Add an autoplay syntax element (autoplay) in the target media description module, and set the value of the autoplay syntax element according to whether the target media file needs to be automatically played.
  • the automatic play syntax element in the target media description module is "autoplay"
  • the target media file needs to be played automatically
  • the syntax element "autoplay” is added to the target media description module, and the syntax element "autoplay” is to "true”.
  • the syntax element "autoplay” is added to the target media description module and the value of the syntax element "autoplay” is set to "false”.
  • Step 1223 loop playback syntax element (loop) in the target media description module, and set the value of the loop playback syntax element according to whether the target media file needs to be looped.
  • the automatic playback syntax element in the target media description module is "loop"
  • the target media file needs to be played in a loop
  • the syntax element "loop” is added to the target media description module, and the value of the syntax element "loop” is set to "true”.
  • the syntax element "loop” is added to the target media description module, and the value of the syntax element "loop" is set to "false”.
  • Step 1224 Add alternatives in the target media description module.
  • Step 1225 Add a media type syntax element (mimeType) to the alternatives, and set the value of the media type syntax element to the encapsulation format value corresponding to the G-PCC coded point cloud.
  • miType media type syntax element
  • the encapsulation format corresponding to the G-PCC encoded point cloud is MP4, and the encapsulation format value corresponding to the G-PCC encoded point cloud is: application/mp4.
  • the media type syntax element is "mimeType"
  • the encapsulation format value corresponding to the G-PCC encoded point cloud is "application/mp4"
  • the syntax element "mimeType” is added to the optional options of the target media description module, and the value of the syntax element "mimeType” is set to "application/mp4".
  • Step 1226 Add a uniform resource identifier syntax element (URI) to the alternatives, and set the value of the uniform resource identifier syntax element to the access address of the target media file.
  • URI uniform resource identifier syntax element
  • the uniform resource identifier syntax element is "uri"
  • the access address of the target media file is "http://www.exp.com/G-PCCexp.mp4"
  • the syntax element "uri” is added to the optional items of the target media description module, and the value of the syntax element "uri” is set to http://www.exp.com/G-PCCexp.mp4.
  • Step 1227 Add an array of tracks to the alternatives.
  • Step 1228 Add a first track index syntax element (track) to the track array (tracks) of the options (alternatives) of the target media description module, and set the value of the first track index syntax element (track) according to the encapsulation method of the target media file.
  • setting the value of the first track index syntax element (track) according to the encapsulation method of the target media file includes:
  • the target media file is a single-track encapsulation file, setting the value of the first track index syntax element to the index value of the code stream track of the target media file;
  • the value of the first track index syntax element is set to the index value of the geometry code stream track of the target media file.
  • the encapsulation method of the G-PCC coded point cloud includes single-track encapsulation and multi-track encapsulation.
  • single-track encapsulation refers to the encapsulation method of encapsulating the geometric code stream and attribute code stream of the G-PCC coded point cloud in the same code stream track
  • multi-track encapsulation refers to the encapsulation method of encapsulating the geometric code stream and attribute code stream of the G-PCC coded point cloud in multiple code stream tracks respectively.
  • Step 1229 add a codec parameter syntax element (codecs) to the optional track array of the target media description module, and set the value of the codec parameter syntax element according to the encoding parameters of the target media file, the type of the code stream of the target media file, and the ISO/IEC 23090-18 G-PCC data transmission standard.
  • codecs codec parameter syntax element
  • the ISO/IEC 23090-18 G-PCC data transmission standard stipulates that when the G-PCC coded point cloud is encapsulated in DASH, when the G-PCC pre-selection signaling is used in the MPD file, the "codecs" attribute of the pre-selection signaling should be set to 'gpc1', indicating that the pre-selected media is a point cloud based on geometry; when there are multiple G-PCC Tile tracks in the G-PCC container, the "codecs" attribute of the Main G-PCC Adaptation Set should be set to 'gpcb' or 'gpeb', indicating that the adaptation set contains G-PCC Tile basic track data.
  • the "codecs" attribute of the Main G-PCC adaptivesset should be set to 'gpcb'.
  • the "codecs" attribute of the Main G-PCC Adaptation Set should be set to 'gpeb'.
  • the "codecs” attribute of the preselection signaling should be set to 'gpt1', indicating that the preselected media is a point cloud fragment based on geometry.
  • the value of "codecs" in "tracks” of "alternatives” of the target media description module can be set to 'gpc1'.
  • the encapsulation format value corresponding to the G-PCC coded point cloud is "application/mp4"
  • the name of the target media file is "G-PCCexample”
  • the target media file is automatically played and looped
  • the access address of the target media file is: http://www.exp.com/G-PCCexp.mp4
  • the target media file is a single-track encapsulation file and the index value of the code stream track of the target media file is 1
  • the target media file is encapsulated using DASH and the G-PCC pre-selected signaling is used in the MPD file
  • the target media description module corresponding to the target media file can be as follows:
  • the target media description module is a media description module generated based on the description information of the target media file.
  • the encapsulation format value corresponding to the G-PCC coded point cloud is application/mp4
  • the name of the target media file is "G-PCCexample1”
  • the target media file is automatically played and looped
  • the access address of the target media file is "uri”: http://www.exp.com/G-PCCexp.mp4
  • the target media file is a single-track encapsulation file
  • the index value of the code stream track of the target media file is 1
  • the target media file is encapsulated with DASH and the G-PCC pre-selection signaling is used in the MPD file
  • the MPEG media of the scene description file can be as follows:
  • the three-dimensional scene to be rendered may also include multiple media files, and the type of one or more media files among the multiple media files is G-PCC coded point cloud.
  • the scene description file it is necessary to add a media description module corresponding to the media file of the type of G-PCC coded point cloud according to the above embodiment, and add media description modules corresponding to other types of media files according to the generation method of scene description files of other types of media files.
  • the media files in the three-dimensional scene to be rendered include a target media file of type G-PCC coded point cloud and a tactile media file
  • the encapsulation format value corresponding to the G-PCC coded point cloud is "application/mp4"
  • the name of the target media file is "G-PCCexample”
  • the target media file is automatically played and looped
  • the access address of the target media file is "uri”: http://www.exp.com/G-PCCexp.mp4
  • the target media file is a single-track encapsulation file
  • the index value of the bitstream track of the target media file is 1
  • the target media file is encapsulated with DASH and G-PCC pre-selection signaling is used in the MPD file
  • the MPEG media of the scene description file can be as follows:
  • the media list (media) of the MPEG media includes two curly brackets.
  • the first curly bracket (lines n+2 to n+18) encompasses the media description module corresponding to the target media file of type G-PCC coded point cloud, and the second curly bracket (lines n+19 to n+35) encompasses the media description module corresponding to the tactile media file.
  • the method for generating a scene description file first determines the type of media files in the three-dimensional scene to be rendered when generating a scene description file for a three-dimensional scene to be rendered, and when the type of the target media file in the three-dimensional scene to be rendered is a G-PCC coded point cloud, generates a target description module corresponding to the target media file according to the description information of the target media file, and adds the target media description module to the media list of the MPEG media in the scene description file of the three-dimensional scene to be rendered.
  • the method for generating a scene description file can generate a target description module corresponding to the target media file according to the description information of the target media file, and adds the target media description module to the media list of the MPEG media in the scene description file of the three-dimensional scene to be rendered when the media files in the three-dimensional scene to be rendered include a target media file of the type of a G-PCC coded point cloud.
  • the target media description module is added to the list, and the media description module corresponding to the target media file is added to the media description module list of the MPEG media of the scene description file. Therefore, the embodiment of the present application can generate a scene description file including a three-dimensional scene of the type G-PCC coded point cloud, thereby realizing the scene description file's support for media files of the type G-PCC coded point cloud.
  • the method for generating a scene description file further includes:
  • a target scene description module (scene) corresponding to the three-dimensional scene to be rendered is added to the scene list (scenes) of the scene description file, and an index value of a node description module corresponding to a node in the scene to be rendered is added to the node list (nodes) of the target scene description module.
  • the three-dimensional scene to be rendered includes two nodes, and the index values of the node description modules (node) corresponding to the two nodes are 0 and 1 respectively, then the target scene description module corresponding to the three-dimensional scene to be rendered added in the scene description file can be as follows:
  • the 3D scene to be rendered includes two nodes, and the index values of the node description modules corresponding to the two nodes are 0 and 1 respectively, so two index values 0 and 1 are added to the node list (nodes) of the scene description module corresponding to the 3D scene to be rendered.
  • the method for generating a scene description file further includes:
  • the method for generating a scene description file further includes:
  • a node name syntax element (name) is added to the node description module, and a value of the node name syntax element (name) in the corresponding node description module is set according to the name of the node.
  • the three-dimensional scene to be rendered includes two nodes, the names of the two nodes are G-PCCexp_node1 and G-PCCexp_node2, the index values of the mesh description module corresponding to the three-dimensional mesh contained in the node G-PCCexp_node1 are 0 and 1 respectively, and the index value of the mesh description module corresponding to the three-dimensional mesh contained in the node G-PCCexp_node2 is 2, then the node list (nodes) part of the scene description file can be as follows:
  • the node list (nodes) of the scene description file corresponding to the 3D scene to be rendered includes two node description modules, the first node description module is the content enclosed by the curly braces of lines n+2 to n+5, and the second node description module is the content enclosed by the curly braces of lines n+6 to n+9.
  • the value of the node name syntax element (name) in the first node description module is set to the name of the corresponding node "G-PCCexp_node1"
  • the value of the mesh index syntax element (mesh) in the first node description module is set to the index values 0 and 1 of the mesh description module of the 3D mesh mounted on the corresponding node
  • the value of the node name syntax element (name) in the second node description module is set to the name of the corresponding node "G-PCCexp_node2”
  • the value of the mesh index syntax element (mesh) in the second node description module is set to the index value 2 of the mesh description module of the 3D mesh mounted on the corresponding node.
  • the method for generating a scene description file further includes:
  • a mesh description module (mesh) corresponding to the three-dimensional mesh in the scene to be rendered is added to the mesh list (meshes) of the scene description file, syntax elements corresponding to various types of data contained in the three-dimensional mesh corresponding to the mesh description module are added to the mesh description module, and the value of the syntax element corresponding to each type of data is set to the index value of the accessor description module corresponding to the accessor for accessing each type of data.
  • the data contained in the three-dimensional grid may include: one or more of: geometric coordinates (position), color value (color), normal vector (normal), tangent vector (tangent), texture coordinates (texcoord), joints (joints), and weights (weights).
  • adding syntax elements corresponding to various types of data contained in the three-dimensional grid corresponding to the grid description module in the grid description module includes:
  • Extensions Add an extension list (extensions) to the primitives (primitives) of the mesh description module corresponding to the three-dimensional mesh in the target media file, add a target extension array (extensions) to the extension list (extensions), and add syntax elements corresponding to each type of data contained in the corresponding three-dimensional mesh in the target extension array.
  • the target extension array may be MPEG_primitve_GPCC.
  • adding syntax elements corresponding to various types of data contained in the corresponding three-dimensional grid to the target extension array includes: adding syntax elements corresponding to various types of data contained in the corresponding three-dimensional grid to the target extension array based on syntax elements in a first syntax element set.
  • the first syntax element set is a set of syntax elements supported by the attributes of primitives of a grid description module of a scene description file specified in the ISO/IEC 23090-14 MPEG-I scene description standard.
  • the syntax elements supported by the attributes of the primitives of the grid description module of the scene description file specified in the ISO/IEC 23090-14 MPEG-I scene description standard include: position, color_n, normal, tangent, texcoord, joints, weights, so the first syntax element set is: ⁇ position, color_n, normal, tangent, texcoord, joints, weights ⁇ .
  • a certain three-dimensional mesh includes geometric coordinates and color data
  • the index value of the accessor description module corresponding to the accessor for accessing the geometric coordinates is 0, and the index value of the accessor description module corresponding to the accessor for accessing the color data is 1.
  • syntax elements corresponding to each type of data contained in the corresponding three-dimensional grid are added to the target extended array, including: a second syntax element set composed of syntax elements corresponding to a preset G-PCC coded point cloud, and syntax elements corresponding to each type of data contained in the corresponding three-dimensional grid are added to the target extended array.
  • the syntax elements corresponding to the G-PCC coded point cloud may include: G-PCC_position, G-PCC_color_n, G-PCC_normal, G-PCC_tangent, G-PCC_texcoord, G-PCC_joints, G-PCC_weights, and accordingly, the second syntax element set is: ⁇ G-PCC_position, G-PCC_color_n, G-PCC_normal, G-PCC_tangent, G-PCC_texcoord, G-PCC_joints, G-PCC_weights ⁇ .
  • a certain three-dimensional mesh includes geometric coordinates and color data
  • the index value of the accessor description module corresponding to the accessor for accessing the geometric coordinates is 0, and the index value of the accessor description module corresponding to the accessor for accessing the color data is 1.
  • syntax elements corresponding to various types of data contained in the three-dimensional mesh corresponding to the mesh description module are added in the mesh description module, including: adding syntax elements corresponding to various types of data contained in the three-dimensional mesh corresponding to the mesh description module in the attributes of the primitives of the mesh description module.
  • adding syntax elements corresponding to various types of data contained in the three-dimensional mesh corresponding to the mesh description module to the attributes of the primitives of the mesh description module includes: adding syntax elements corresponding to various types of data contained in the three-dimensional mesh corresponding to the mesh description module to the attributes of the primitives of the mesh description module based on the first syntax element set.
  • the first syntax element set is a set of syntax elements supported in the attributes of the primitives of the mesh description module of the scene description file specified in the ISO/IEC 23090-14 MPEG-I scene description standard.
  • syntax elements are added to the attributes of primitives in the corresponding mesh description module based on the syntax elements in the same syntax element set.
  • a certain three-dimensional mesh includes geometric coordinates and color data
  • the index value of the accessor description module corresponding to the accessor for accessing the geometric coordinates is 1
  • the index value of the accessor description module corresponding to the accessor for accessing the color data is 2.
  • syntax elements corresponding to various types of data contained in the three-dimensional mesh corresponding to the mesh description module are added to the attributes of the primitives of the mesh description module, including: based on the syntax elements in the first syntax element set, syntax elements corresponding to various types of data contained in the corresponding three-dimensional mesh are added to the attributes of the primitives of the first mesh description module; based on the syntax elements in the second syntax element set, syntax elements corresponding to various types of data contained in the corresponding three-dimensional mesh are added to the attributes of the primitives of the second mesh description module.
  • the first grid description module is a grid description module corresponding to the three-dimensional grid in the media file of the G-PCC encoded point cloud type
  • the second grid description module is a grid description module corresponding to the three-dimensional grid in the media file of a type other than the G-PCC encoded point cloud.
  • the first syntax element set is a set of syntax elements supported in the attributes of primitives of the grid description module of the scene description file specified in the ISO/IEC 23090-14 MPEG-I scene description standard;
  • the second syntax element set is a set of syntax elements corresponding to the G-PCC coded point cloud of preset values.
  • syntax elements corresponding to various types of data contained therein are added to the attributes of the primitives of the corresponding mesh description module; for the three-dimensional mesh in the media file of a type not G-PCC coded point cloud, based on the syntax elements in the second syntax element set, syntax elements corresponding to various types of data contained therein are added to the attributes of the primitives of the corresponding mesh description module.
  • the scene description file includes two 3D meshes, the names of which are GPCCexample_mesh1 and GPCCexample_mesh2.
  • GPCCexample_mesh1 does not belong to the 3D mesh in the media file of type G-PCC, including geometric coordinates and color data, which are used to access GPCCexample_mesh1.
  • the index value of the accessor description module corresponding to the accessor for the geometric coordinates is 0, the index value of the accessor description module corresponding to the accessor for accessing the color data of GPCCexample_mesh1 is 1, GPCCexample_mesh2 belongs to a three-dimensional mesh in a media file of type G-PCC, including geometric coordinates and color data, the index value of the accessor description module corresponding to the accessor for accessing the geometric coordinates of GPCCexample_mesh2 is 2, and the index value of the accessor description module corresponding to the accessor for accessing the color data of GPCCexample_mesh2 is 3.
  • the mesh list (meshes) in the scene description file can be as follows:
  • the method for generating a scene description file further includes:
  • the value of the grid name syntax element (name) in the grid description module corresponding to the three-dimensional grid is set according to the name of the three-dimensional grid.
  • the method for generating a scene description file further includes:
  • the syntax elements included in the attributes of the primitives of the mesh description module corresponding to the three-dimensional mesh are set according to the data types included in the three-dimensional mesh.
  • the method for generating a scene description file further includes:
  • the value of the syntax element used to describe the topological type of the three-dimensional mesh in the mesh description module corresponding to the three-dimensional mesh is set.
  • the syntax element used to describe the topological type of the three-dimensional mesh in the mesh description module corresponding to the three-dimensional mesh is "mode”.
  • the method for generating a scene description file further includes:
  • An accessor description module (accessor) corresponding to a target accessor is added to a buffer list (accessor) of the scene description file, wherein the target accessor is an accessor for accessing decoded data of the target media file.
  • the method for generating a scene description file further includes: adding a buffer description module (buffer) corresponding to a target buffer in a buffer list (buffers) of the scene description file, wherein the target buffer is a buffer for storing decoded data of the target media file.
  • buffer buffer description module
  • adding a buffer description module (buffer) corresponding to the target buffer in the buffer list (buffers) of the scene description file comprises at least one of the following steps a1 to a5:
  • Step a1 Add a byte length syntax element (byteLength) in the buffer description module corresponding to the target buffer, and set the value of the byte length syntax element to the byte length of the target media file.
  • the value of "byteLenth" in the buffer description module is set to "15000".
  • Step a2 adding an MPEG circular buffer (MPEG_buffer_circular) to the buffer description module corresponding to the target buffer.
  • MPEG_buffer_circular MPEG circular buffer
  • Step a3 Add a link number syntax element (count) in the MPEG ring buffer, and set the corresponding value of the link number syntax element (count) according to the number of stored links in the target buffer.
  • the "count” and its value in the circular buffer are set to: “count”:8.
  • Step a4 Add a media index syntax element (media) in the MPEG ring buffer, and set the value of the media index syntax element (media) according to the index value of the target media description module.
  • the index value of the target media description module is 0, then the "media” and its value in the description module of the ring buffer are set to: “media”:0.
  • Step a5 Add a second track index syntax element (tracks) in the MPEG circular buffer, and set the value of the second track index syntax element (tracks) according to the track index value of the source data of the data stored in the target buffer.
  • the buffer description module corresponding to the target buffer added in the buffer list of the scene description file can be as follows:
  • the method for generating a scene description file further includes: in the cache cut of the scene description file Add the cache slice description module corresponding to the cache slice of the target buffer to the slice list (bufferViews).
  • adding a cache slice description module corresponding to the cache slice of the target cache to the cache slice list of the scene description file includes at least one of the following steps b1 to b3:
  • Step b1 add a buffer index syntax element (buffer) in the cache slice description module corresponding to the cache slice of the target buffer, and set the value of the buffer index syntax element (buffer) according to the index value of the buffer description module corresponding to the target buffer to which the cache slice belongs.
  • buffer buffer index syntax element
  • the "buffer” and its value in the cache slice description module are set to: “buffer”:2.
  • Step b2 add a second byte length syntax element (byteLength) in the cache slice description module corresponding to the cache slice of the target cache, and set the value of the second byte length syntax element (byteLength) according to the capacity of the cache slice.
  • Step b3 add an offset syntax element (byteOffset) in the cache slice description module corresponding to the cache slice of the target cache, and set the value of the offset syntax element according to the offset of the storage data of the corresponding cache slice.
  • byteOffset an offset syntax element
  • the cache slice description module corresponding to the cache slice of the target cache is added in the cache slice list (bufferViews) of the scene description file, including each item in the above steps b1 to b3, the index value of the cache description module corresponding to a certain target cache is 1, the capacity of the target cache is 8000, and the target cache includes two cache slices, the capacity of the first cache slice is 6000, the offset is 0, the capacity of the second cache slice is 2000, and the offset is 6001, then the cache slice description module corresponding to the cache slice of the target cache is added to the cache slice list of the scene description file as follows:
  • the method for generating a scene description file further includes: adding an accessor description module corresponding to a target accessor in an accessor list (accessors) of the scene description file, wherein the target accessor is an accessor for accessing decoded data of the target media file.
  • a target accessor is added to the accessor list (accessors) of the scene description file.
  • the accessor description module includes at least one of the following steps c1 to c6:
  • Step c1 add a data type syntax element (componentType) in the accessor description module corresponding to the target accessor, and set the value of the corresponding data type syntax element according to the type of data accessed by the target accessor.
  • componentType a data type syntax element
  • the data type syntax element and its value in the accessor description module corresponding to the accessor are set to: "componentType": 5126.
  • Step c2 Add an accessor type syntax element (type) in the accessor description module corresponding to the target accessor, and set the value of the accessor type syntax element according to the preconfigured accessor type.
  • the accessor type syntax element (type) and its value in the accessor description module corresponding to the accessor are set to: “type”:"VEC3".
  • Step c3 Add a data quantity syntax element (count) in the accessor description module corresponding to the target accessor, and set the value of the corresponding accessor type syntax element according to the type of the target accessor.
  • Step c4 Add an MPEG time-varying accessor (MPEG_accessor_timed) to the accessor description module corresponding to the target accessor.
  • MPEG_accessor_timed MPEG time-varying accessor
  • Step c5 add a cache slice index syntax element (bufferView) in the MPEG time-varying accessor, and set the corresponding value of the slice index syntax element according to the index value of the cache slice description module corresponding to the cache slice storing the data accessed by the target accessor.
  • bufferView a cache slice index syntax element
  • the buffer slice index syntax element and its value in the MPEG time-varying accessor of the accessor description module corresponding to the target accessor are set to: "bufferView":3.
  • Step c6 Add a time-varying syntax element (immutable) in the MPEG time-varying accessor, and set the value of the time-varying syntax element according to whether the value of the syntax element in the corresponding target accessor changes with time.
  • the time-varying syntax element and its value in the MPEG time-varying accessor of the accessor description module corresponding to the target accessor are set to: "immutable”:true; when the value of a syntax element within a target accessor changes with time, the time-varying syntax element and its value in the MPEG time-varying accessor of the accessor description module corresponding to the target accessor are set to: "immutable”:false.
  • the accessor description module corresponding to the target accessor for accessing the data in the cache slice of the target cache added in the accessor list (accessors) of the scene description file includes each item in the above steps c1 to c6, the type of data accessed by a certain target accessor is 5121, the accessor type of the target accessor is VEC2, the amount of data accessed by the target accessor is 4000, the index value of the cache slice description module corresponding to the cache slice storing the data to be accessed by the target accessor is 1, and the value of the syntax element in the corresponding accessor does not change with time, then the accessor description module corresponding to the target accessor added in the accessor list (accessors) of the scene description file can be as follows:
  • the method for generating a scene description file further includes:
  • a digital asset description module (asset) is added to the scene description file, a version syntax element (version) is added to the digital asset description module, and when the scene description file is a scene description document written based on glTF 2.0, the value of the version syntax element is set to 2.0.
  • the digital asset description module added to the scene description file may be as follows:
  • the method for generating a scene description file further includes:
  • extension usage description module (extensionsUsed) is added to the scene description file, and an extension of the MPEG scene description file of the glTF2.0 version used by the scene description file is added to the extension usage description module.
  • the MPEG extensions used in the scene description file include: MPEG media (MPEG_media), MPEG circular buffer (MPEG_buffer_circular) and MPEG time-varying accessor (MPEG_accessor_timed), and the extended usage description module added in the scene description file can be as follows:
  • the method for generating a scene description file further includes:
  • a scene declaration (scene) is added to the scene description file, and the value of the scene declaration is set to the index value of the scene description module corresponding to the scene to be rendered.
  • the index value of the scene description module corresponding to the scene to be rendered is 0, and the scene declaration added to the scene description file may be as follows:
  • Some embodiments of the present application also provide a method for parsing a scene description file. As shown in FIG. 13 , the method for parsing a scene description file includes the following steps: S131 to S133:
  • the three-dimensional scene to be rendered includes a target media file of the type of G-PCC coded point cloud.
  • the 3D scene to be rendered in the embodiment of the present application may include one or more media files, and when the 3D scene to be rendered includes multiple media files, the type of one or more media files in the multiple media files may be G-PCC coded point cloud.
  • the parsing method provided in the embodiment of the present application may be performed on the target media files of the type of G-PCC coded point cloud respectively.
  • S132 Acquire a target media description module corresponding to the target media file from a media list (media) of the MPEG media (MPEG_media) of the scene description file.
  • the target media description module corresponding to the target media file may be as follows:
  • S133 Acquire description information of the target media file according to the target media description module.
  • the above step S133 (obtaining description information of the target media file according to the target media description module) includes at least one of the following steps 1331 to 1337:
  • Step 1331 Obtain the name of the target media file according to the value of the media name syntax element (name) in the target media description module.
  • the media name syntax element in the target media description module and its value are: "name”:"GPCCexample", then it can be determined that the name of the target media file is: GPCCexample.
  • Step 1332 Determine whether the target media file needs to be played automatically according to the value of the automatic play syntax element (autoplay) in the target media description module.
  • whether the target media file needs to be played automatically is determined based on the value of the autoplay syntax element (autoplay) in the target media description module, including: when the autoplay syntax element (autoplay) in the target media description module and its value are: "autoplay":true, it is determined that the target media file needs to be played automatically; and when the autoplay syntax element (autoplay) in the target media description module and its value are: "autoplay”:false, it is determined that the target media file does not need to be played automatically.
  • Step 1333 Determine whether the target media file needs to be played in a loop according to the value of the loop playback syntax element (loop) in the target media description module.
  • whether the target media file needs to be played in a loop is determined based on the value of the loop playback syntax element (loop) in the target media description module, including: when the loop playback syntax element (loop) in the target media description module and its value are: "loop":true, it is determined that the target media file needs to be played in a loop; and when the loop playback syntax element (loop) in the target media description module and its value are: "loop":false, it is determined that the target media file does not need to be played in a loop.
  • Step 1334 Obtain the encapsulation format of the target media file according to the value of the media type syntax element (mimeType) in the alternatives of the target media description module.
  • the value of the media type syntax element (mimeType) in the media description module corresponding to the media file will be set to the encapsulation format value corresponding to the G-PCC encoded point cloud, and the encapsulation format value corresponding to the G-PCC encoded point cloud can be: "application/mp4". Therefore, when the encapsulation format value corresponding to the G-PCC encoded point cloud is: "application/mp4", the encapsulation format of the target media file can be obtained as MP4.
  • Step 1335 Obtain the access address of the target media file according to the value of the unique address identifier syntax element (URI) in the alternatives of the target media description module.
  • URI unique address identifier syntax element
  • the unique address identifier syntax element (uri) in the alternatives of the target media description module and its value is: "uri”:"http://www.example.com/GPCCexample.mp4", then it can be determined that the access address of the target media file is: http://www.example.com/GPCCexample.mp4.
  • Step 1336 Obtain the track information of the target media file according to the value of the first track index syntax element (track) in the track array (tracks) of the alternatives (alternatives) of the target media description module.
  • the track information of the target media file is obtained according to the value of the first track index syntax element (track) in the track array (tracks) of the options (alternatives) of the target media description module, including: when the encapsulation file of the target media file is a single-track encapsulation file, the value of the first track index syntax element is determined as the index value of the codestream track of the target media file; when the target media file is a multi-track encapsulation file, the value of the first track index syntax element is determined as the index value of the geometric codestream track of the target media file.
  • Step 1337 Determine the type of code stream and decoding parameters of the target media file according to the value of the codec parameter syntax element (codecs) in the track array (tracks) of the options (alternatives) of the target media description module and the SO/IEC 23090-18 G-PCC data transmission standard.
  • codecs codec parameter syntax element
  • the above step 1337 (determining the type of code stream and decoding parameters of the target media file according to the value of the codec parameter syntax element (codecs) in the track array (tracks) of the options (alternatives) of the target media description module and the ISO/IEC 23090-18 G-PCC data transmission standard) includes the following steps 13371 and 13372:
  • Step 13371 determine the type of code stream and encoding parameters of the target media file according to the value of the codec parameter syntax element (codecs) in the track array (tracks) of the options (alternatives) of the target media description module and the SO/IEC 23090-18 G-PCC data transmission standard.
  • codecs codec parameter syntax element
  • the ISO/IEC 23090-18 G-PCC data transmission standard specifies that when a G-PCC coded point cloud is encapsulated using DASH, when G-PCC preselection signaling is used in the MPD file, the codecs attribute of the preselection signaling should be set to 'gpc1', indicating that the preselected media is a point cloud based on geometry; when there are multiple G-PCC Tile tracks in a G-PCC container, the "codecs" attribute of the Main G-PCC Adaptation Set should be set to 'gpcb' or 'gpeb', indicating that the adaptation set contains G-PCC Tile basic track data.
  • the "codecs" attribute of the Main G-PCC adaptivesset should be set to 'gpcb'.
  • the "codecs" attribute of the Main G-PCC Adaptation Set shall be set to 'gpeb'.
  • the "codecs” attribute of the preselection signaling shall be set to 'gpt1', indicating that the preselected media is a point cloud fragment based on geometry.
  • the value of "codecs” in "tracks” of "alternatives” of the target media description module shall be set to 'gpc1'. Therefore, the encapsulation method and encoding parameters of the target media file can be determined according to the value of the codec parameter syntax element (codecs) in the track array (tracks) of the options (alternatives) of the target media description module and the SO/IEC 23090-18 G-PCC data transmission standard.
  • Step 13372 Determine decoding parameters of the target media file according to encoding parameters of the target media file.
  • the decoding parameters of the target media file can be determined according to the encoding parameters of the target media file.
  • target media description module corresponding to the target media file is as follows:
  • the description information of the target media file obtained by the target media description module includes: the name of the target media file is: AAAA, the target media file does not need to be played automatically, but needs to be played in a loop; the encapsulation format of the target media file is MP4, and the access address of the target media file is: http://www.bbbb.com/AAAA.mp4; the reference track of the target media file is the code stream track with an index value of 0, the encapsulation/decapsulation method of the target media file is MP4, and the encoding and decoding parameter of the target media file is gpc1.
  • the scene description file parsing method provided in the embodiment of the present application can obtain the target media description module corresponding to the target media file from the media list of the MPEG media of the scene description file after obtaining the scene description file of the to-be-rendered three-dimensional scene including the target media file of the type of G-PCC coded point cloud, and obtain the description information of the target media file according to the target media description module.
  • the embodiment of the present application can obtain the description information of the target media file according to the target media description module, and then render and display the to-be-rendered three-dimensional scene including the target media file of the type of G-PCC coded point cloud based on the description information of the target media file, the embodiment of the present application provides a method capable of parsing the scene description file of the three-dimensional scene including the media file of the type of G-PCC coded point cloud, and realizes the parsing of the scene description file of the three-dimensional scene including the G-PCC coded point cloud.
  • the scene description file parsing method provided in the above embodiment further includes:
  • a target scene description module (scene) corresponding to the to-be-rendered three-dimensional scene is obtained from a scene list (scenes) of the scene description file, and description information of the to-be-rendered three-dimensional scene is obtained according to the target scene description module.
  • a scene declaration (scene) and its declared index value can be obtained from the scene description file, and a target scene description module corresponding to the three-dimensional scene to be rendered can be obtained from the scene list of the scene description file based on the scene declaration and its declared index value.
  • the first scene description module can be obtained from the scene list of the scene description file according to the scene declaration and its declared index value as the target scene description module corresponding to the three-dimensional scene to be rendered.
  • obtaining description information of the three-dimensional scene to be rendered according to the target scene description module includes: determining the index value of the node description module corresponding to the node in the three-dimensional scene to be rendered according to the index value declared by the node index list (nodes) of the target scene description module.
  • the target scene description module is as follows:
  • the target scene description module to be described can be determined according to the index value declared in the node index list (nodes) of the target scene description module.
  • the rendered three-dimensional scene includes two nodes, the index value of the node description module corresponding to one node is 0 (the first node description module in the node list), and the index value of the node description module corresponding to the other node is 1 (the second node description module in the node list).
  • the scene description file parsing method further includes:
  • the node description module corresponding to the node in the three-dimensional scene to be rendered is obtained from the node list (nodes) of the scene description file; and according to the node description module corresponding to the node in the three-dimensional scene to be rendered, the description information of the node in the three-dimensional scene to be rendered is obtained.
  • the first node description module is obtained from the node list of the scene description file as the node description module corresponding to the node in the three-dimensional scene to be rendered.
  • the index values declared in the node index list of the target scene description module include 0 and 1
  • the first node description module and the second node description module are obtained from the node list of the scene description file as the node description modules corresponding to the nodes in the three-dimensional scene to be rendered.
  • obtaining description information of nodes in the three-dimensional scene to be rendered according to a node description module corresponding to the nodes in the three-dimensional scene to be rendered includes at least one of the following steps a1 and a2:
  • Step a1 Obtain the name of the node in the three-dimensional scene to be rendered according to the value of the node name syntax element (name) in the node description module corresponding to the node in the three-dimensional scene to be rendered.
  • Step a2 determining the index value of the mesh description module corresponding to the three-dimensional mesh mounted on the node in the three-dimensional scene to be rendered according to the index value declared in the mesh index list in the node description module corresponding to the node in the three-dimensional scene to be rendered.
  • a node description module corresponding to a certain node is as follows:
  • step a1 it can be determined that the name of the node is: GPCCexample_node, and based on the above step a2, it can be determined that the index values of the grid description module corresponding to the three-dimensional grid mounted on the node are 0 and 1 respectively.
  • the scene description file parsing method after determining the index value of the mesh description module corresponding to the three-dimensional mesh mounted on the node in the three-dimensional scene to be rendered, also includes: obtaining the mesh description module corresponding to the three-dimensional mesh mounted on the node in the three-dimensional scene to be rendered from the mesh list (meshes) of the scene description file according to the index value of the mesh description module corresponding to the three-dimensional mesh mounted on the node in the three-dimensional scene to be rendered; and obtaining the description information of the three-dimensional mesh mounted on the node in the three-dimensional scene to be rendered according to the mesh description module corresponding to the three-dimensional mesh mounted on the node in the three-dimensional scene to be rendered.
  • the first grid description module is obtained from the grid list of the scene description file as the grid description module corresponding to the three-dimensional grid mounted on the node corresponding to the node description module.
  • the index values declared in the grid index list of a node description module include 1 and 2
  • the second grid description module and the third grid description module are obtained from the grid list of the above file as the grid description modules corresponding to the three-dimensional grid mounted on the node corresponding to the node description module.
  • obtaining the description information of the three-dimensional mesh mounted on the node in the three-dimensional scene to be rendered includes at least one of the following steps b1 to b4:
  • Step b1 Obtain the name of the three-dimensional grid according to the grid name syntax element (name) in the grid description module corresponding to the three-dimensional grid.
  • Step b2 Acquire the data type included in the three-dimensional grid according to the data type syntax element in the grid description module corresponding to the three-dimensional grid.
  • the above-mentioned step b2 (obtaining the data types included in the three-dimensional grid according to the data type syntax elements in the mesh description module corresponding to the three-dimensional grid) includes: obtaining the data types included in the three-dimensional grid according to the data type syntax elements in the target extension array of the extension list (extensions) of the primitives (primitives) of the mesh description module corresponding to the three-dimensional grid.
  • the target extension array may be MPEG_primitve_GPCC.
  • extension list of primitives of the mesh description module corresponding to a certain three-dimensional mesh is as follows:
  • the three-dimensional grid includes position coordinates according to the position coordinate syntax element (position) in the target extension array (MPEG_primitve_GPCC) of the extension list (extensions) of the primitives (primitives) of the grid description module corresponding to the three-dimensional grid, that the three-dimensional grid includes color values according to the color value syntax element (color_0) in the target extension array (MPEG_primitve_GPCC) of the extension list (extensions) of the primitives (primitives) of the grid description module corresponding to the three-dimensional grid, and that the three-dimensional grid includes normal vectors according to the normal vector syntax element (normal) in the target extension array (MPEG_primitve_GPCC) of the extension list (extensions) of the primitives (primitives) of the grid description module corresponding to the three-dimensional grid.
  • extension list of primitives of a grid description module corresponding to a certain three-dimensional grid is as follows:
  • the three-dimensional grid can be determined to include position coordinates according to the position coordinate syntax element (G-PCC_position) in the target extension array (MPEG_primitve_GPCC) of the extension list (extensions) of the primitives (primitives) of the grid description module corresponding to the three-dimensional grid, and the position coordinates of the three-dimensional grid can be determined according to the extension list (extensions) of the primitives (primitives) of the grid description module corresponding to the three-dimensional grid.
  • G-PCC_position position coordinate syntax element
  • MPEG_primitve_GPCC target extension array
  • the color value syntax element (G-PCC_color_0) in the target extension array (MPEG_primitve_GPCC) of (extensions) determines that the three-dimensional grid includes color values
  • the normal vector syntax element (G-PCC_normal) in the target extension array (MPEG_primitve_GPCC) of the extension list (extensions) of the primitives (primitives) of the grid description module corresponding to the three-dimensional grid determines that the three-dimensional grid includes normal vectors.
  • the above-mentioned step b2 (obtaining the data type included in the three-dimensional grid according to the data type syntax element in the grid description module corresponding to the three-dimensional grid) includes: obtaining the data type included in the three-dimensional grid according to the data type syntax element in the attributes of the primitives of the grid description module corresponding to the three-dimensional grid.
  • the attributes of the primitives of the mesh description module corresponding to a certain three-dimensional mesh are as follows:
  • the three-dimensional mesh includes position coordinates according to the position coordinate syntax element (position) in the attributes (attributes) of the primitives (primitives) of the mesh description module corresponding to the three-dimensional mesh, that the three-dimensional mesh includes color values according to the color value syntax element (color_0) in the attributes (attributes) of the primitives (primitives) of the mesh description module corresponding to the three-dimensional mesh, and that the three-dimensional mesh includes normal vectors according to the normal vector syntax element (normal) in the attributes (attributes) of the primitives (primitives) of the mesh description module corresponding to the three-dimensional mesh.
  • the attributes of the primitives of the mesh description module corresponding to a certain three-dimensional mesh are as follows:
  • the three-dimensional mesh includes position coordinates according to the position coordinate syntax element (G-PCC_position) in the attributes (attributes) of the primitives (primitives) of the mesh description module corresponding to the three-dimensional mesh, that the three-dimensional mesh includes color values according to the color value syntax element (G-PCC_color_0) in the attributes (attributes) of the primitives (primitives) of the mesh description module corresponding to the three-dimensional mesh, and that the three-dimensional mesh includes normal vectors according to the normal vector syntax element (G-PCC_normal) in the attributes (attributes) of the primitives (primitives) of the mesh description module corresponding to the three-dimensional mesh.
  • G-PCC_position position coordinate syntax element
  • the three-dimensional mesh includes color values according to the color value syntax element (G-PCC_color_0) in the attributes (attributes) of the primitives (primitives) of the mesh description module corresponding to the three-dimensional mesh
  • the three-dimensional mesh includes normal vectors according to the normal vector
  • Step b3 Obtain the index value of the accessor description module corresponding to the accessor for accessing the data of the type of the three-dimensional grid according to the value of the data type syntax element.
  • the value of the position coordinate syntax element (G-PCC_position) is 0, so the index value of the accessor description module corresponding to the accessor used to access the position coordinates of the three-dimensional mesh is 0 (the first accessor in the accessor list), the value of the color value syntax element (G-PCC_color_0) is 1, so the index value of the accessor description module corresponding to the accessor used to access the color value of the three-dimensional mesh is 1 (the second accessor in the accessor list), and the value of the normal vector syntax element (G-PCC_normal) is 2, so the index value of the accessor description module corresponding to the accessor used to access the normal vector of the three-dimensional mesh is 2 (the third accessor in the accessor list).
  • Step b4 Obtain the value of the mode syntax element (mode) in the mesh description module corresponding to the three-dimensional mesh. The type of topology of the grid.
  • the type of the topological structure of the three-dimensional mesh can be determined as a scattered point; when the value of the pattern syntax element is 1, the type of the topological structure of the three-dimensional mesh can be determined as a line; when the value of the pattern syntax element is 4, the type of the topological structure of the three-dimensional mesh can be determined as a triangle.
  • a grid description module corresponding to a certain three-dimensional grid is as follows:
  • the description information of the three-dimensional mesh obtained according to the mesh description module corresponding to the three-dimensional mesh includes: the name of the three-dimensional mesh is: G-PCCexample_mesh; the topological type of the three-dimensional mesh is scattered points; the three-dimensional mesh includes three types of data, namely position coordinates, color values and normal vectors, the index value of the accessor description module corresponding to the accessor for accessing the position coordinates of the three-dimensional mesh is 0, the index value of the accessor description module corresponding to the accessor for accessing the color value of the three-dimensional mesh is 1, and the index value of the accessor description module corresponding to the accessor for accessing the normal vector of the three-dimensional mesh is 2.
  • the method further includes:
  • the second accessor description module is obtained from the accessor list of the scene description file as the accessor description module corresponding to the accessor used to access the color value of the three-dimensional grid.
  • obtaining description information of accessors for accessing various types of data of a three-dimensional grid according to accessor description modules corresponding to accessors for accessing various types of data of a three-dimensional grid includes at least one of the following steps c1 to c6:
  • Step c1 Determine the type of data accessed by the accessor according to the value of the data type syntax element (componentType) in the accessor description module.
  • the data type syntax element in the accessor description module corresponding to the accessor used to access the normal vector of a three-dimensional grid is: "componentType": 5126, then it can be determined that the type of the data accessed by the accessor corresponding to the accessor description module (the normal vector of the three-dimensional grid) is a 32-bit floating point number (float).
  • the accessor type syntax element in the accessor description module corresponding to the accessor for accessing the position coordinates of a three-dimensional grid is: "type": VEC3, then it can be determined that the type of the accessor corresponding to the accessor description module is a three-dimensional vector.
  • Step c3 Determine the number of data accessed by the accessor according to the value of the data number syntax element (count) in the accessor description module.
  • the data quantity syntax element in the accessor description module corresponding to the accessor for accessing the color value of a three-dimensional grid is "count": 1000, then it can be determined that the quantity of data (the color value of the three-dimensional grid) accessed by the accessor corresponding to the accessor description module is 1000.
  • Step c4 Determine whether the accessor is a time-varying accessor based on MPEG extension modification according to whether the accessor description module contains an MPEG time-varying accessor (MPEG_accessor_timed).
  • whether the accessor is a time-varying accessor modified based on MPEG extension is determined based on whether the accessor description module contains an MPEG time-varying accessor, including: if the accessor description module contains an MPEG time-varying accessor, then the accessor is determined to be a time-varying accessor modified based on MPEG extension, and if the accessor description module does not contain an MPEG time-varying accessor, then the accessor is determined not to be a time-varying accessor modified based on MPEG extension.
  • Step c5 determining the index value of the cache slice description module corresponding to the cache slice storing the data accessed by the accessor according to the value of the cache slice index syntax element (bufferView) in the MPEG time-varying accessor (MPEG_accessor_timed) of the accessor description module.
  • Step c6 Determine whether the value of the syntax element in the accessor changes with time based on the value of the time-varying syntax element (immutable) in the MPEG time-varying accessor of the accessor description module.
  • whether the value of the syntax element in the accessor changes with time is determined based on the value of the time-varying syntax element (immutable) in the MPEG time-varying accessor of the accessor description module, including: if the time-varying syntax element in the MPEG time-varying accessor of the accessor description module and its value are: "immutable”:true, then it is determined that the value of the syntax element in the accessor does not change with time, and if the time-varying syntax element in the MPEG time-varying accessor of the accessor description module and its value are: "immutable”:false, then it is determined that the value of the syntax element in the accessor will change with time.
  • the accessor description module corresponding to a certain accessor is as follows:
  • the description information of the accessor obtained by the accessor description module corresponding to the accessor includes: the type of data accessed by the accessor is 5123; the accessor type is scalar (SCALAR); the number of data accessed by the accessor is 1000; the accessor is a modified time-varying accessor based on MPEG extension; the data accessed by the accessor is cached in the cache slice corresponding to the second cache slice description module in the cache slice list; the value of the syntax element in the accessor does not change with time.
  • SCALAR scalar
  • the number of data accessed by the accessor is 1000
  • the accessor is a modified time-varying accessor based on MPEG extension
  • the data accessed by the accessor is cached in the cache slice corresponding to the second cache slice description module in the cache slice list
  • the value of the syntax element in the accessor does not change with time.
  • Step d obtaining a buffer description module in a buffer list (buffers) of the scene description file.
  • Step e Get the value of the media index syntax element (media) in the buffer description module.
  • Step f determining the buffer description module whose value of the media index syntax element is the same as the index value of the target media description module as the target buffer description module corresponding to the target buffer for caching the decoded data of the target media file.
  • the buffer description module whose value of the media index syntax element is 0 is determined as the target buffer description module corresponding to the target buffer for caching the decoded data of the target media file.
  • the number of target buffers for caching the decoded data of the target media file may be one or more, and this embodiment of the present application does not impose any limitation on this.
  • Step g obtaining description information of the target buffer according to the target buffer description module.
  • Step g1 Obtain the capacity of the target buffer according to the value of the first byte length syntax element (byteLength) in the target buffer description module.
  • the capacity of the target buffer is 15000 bytes.
  • Step g2 Determine whether the target buffer is a circular buffer based on MPEG extension modification according to whether the target buffer description module includes an MPEG circular buffer (MPEG_buffer_circular).
  • whether the target buffer is a ring buffer modified based on MPEG extension is determined according to whether the target buffer description module includes an MPEG ring buffer, including: if the target buffer description module includes an MPEG ring buffer, it is determined that the target buffer is a ring buffer modified based on MPEG extension, and if the target buffer description module does not include an MPEG ring buffer, it is determined that the target buffer is not a ring buffer modified based on MPEG extension.
  • Step g3 obtaining the number of storage links of the MPEG circular buffer according to the value of the link number syntax element (count) in the MPEG circular buffer of the target buffer description module.
  • the link number syntax element in the MPEG ring buffer of the target buffer description module and its value are: "count”:8, it can be determined that the MPEG ring buffer includes 5 storage links.
  • Step g4 According to the value of the second track index syntax element (tracks) in the MPEG circular buffer of the target buffer description module, obtain the track index value of the source data cached by the MPEG circular buffer.
  • a buffer description module corresponding to a certain buffer is as follows:
  • the description information of the buffer can be obtained, including: the capacity of the buffer is 8000 bytes; the buffer is a circular buffer based on MPEG extension modification, the number of storage links of the circular buffer is 5, the media file stored in the circular buffer is the second media file declared in the MPEG media, and the track index value of the source data cached by the circular buffer is 1.
  • the scene description file parsing method provided in the above embodiment further includes the following steps h to k:
  • Step h Obtain the cache slice description module in the cache slice list (bufferViews) of the scene description file.
  • Step i Get the value of the buffer index syntax element (buffer) in the cache slice description module.
  • Step j determine the cache slice description module whose value of the cache index syntax element is the same as the index value of the target cache description module as the cache slice description module corresponding to the cache slice of the target cache.
  • the cache slice description module whose value of the cache index syntax element is 1 is determined as the cache slice description module corresponding to the cache slice of the target cache.
  • the number of cache slices of the target cache may be one or more, which is not limited in the embodiment of the present application.
  • Step k acquiring description information of the cache slice of the target cache according to the cache slice description module corresponding to the cache slice of the target cache.
  • obtaining description information of the cache slice of the target cache according to the cache slice description module corresponding to the cache slice of the target cache includes at least one of the following steps k1 and k2:
  • Step k1 obtaining the capacity of the cache slice of the target cache according to the value of the second byte length syntax element (byteLength) in the cache slice description module corresponding to the cache slice of the target cache.
  • the second byte length syntax element and its value in the cache slice description module corresponding to a cache slice of the target cache are: "byteLength": 12000, it can be determined that the capacity of the cache slice of the target cache is 12000 bytes.
  • Step k2 Obtain the offset of the cache slice of the target cache according to the value of the offset syntax element (byteOffset) in the cache slice description module corresponding to the cache slice of the target cache.
  • the offset syntax element and its value in the cache slice description module corresponding to a cache slice of the target cache are: "byteOffset": 0, it can be determined that the offset of the cache slice of the target cache is 0 bytes.
  • the cache slice description module corresponding to a cache slice is as follows:
  • the description information of the cache slice can be obtained, including: the cache slice is a cache slice of the cache corresponding to the second cache description module in the cache list, and the capacity of the cache slice is 8000 bytes; the offset of the cache slice is 0, that is, the data range cached by the cache slice is the first 8000 bytes.
  • the scene description file parsing method provided in the above embodiment further includes the following steps 1 to 0:
  • Step 1 Obtain an accessor description module in an accessor list (accessor) of the scene description file.
  • Step m obtain the value of the cache slice index syntax element (bufferView) in the accessor description module.
  • Step n determine the accessor description module whose value of the cache slice index syntax element is the same as the index value of the cache slice description module corresponding to the cache slice of the target cache as the accessor description module corresponding to the accessor used to access the data in the cache slice of the target cache.
  • the accessor description module whose value of the cache slice index syntax element is 2 is determined as the accessor description module corresponding to the accessor used to access the data in the cache slice of the target cache.
  • Step o According to the accessor description module corresponding to the accessor used to access the data in the cache slice of the target cache, obtain the description information of the accessor used to access the data in the cache slice of the target cache.
  • obtaining description information of an accessor for accessing data in a cache slice of the target cache according to an accessor description module corresponding to the accessor for accessing data in a cache slice of the target cache includes at least one of the following steps o1 to o6:
  • Step o1 Determine the type of data accessed by the accessor according to the value of the data type syntax element (componentType) in the accessor description module.
  • Step o2 Determine the type of the accessor according to the value of the accessor type syntax element (type) in the accessor description module.
  • Step o3 Determine the number of data accessed by the accessor according to the value of the data number syntax element (count) in the accessor description module.
  • Step o4 Determine whether the accessor is a time-varying accessor based on MPEG extension modification according to whether the accessor description module contains an MPEG time-varying accessor (MPEG_accessor_timed).
  • Step o5 determining the index value of the cache slice description module corresponding to the cache slice storing the data accessed by the accessor according to the value of the cache slice index syntax element (bufferView) in the MPEG time-varying accessor of the accessor description module.
  • bufferView the cache slice index syntax element
  • Step o6 Determine whether the value of the syntax element in the accessor changes with time based on the value of the time-varying syntax element (immutable) in the MPEG time-varying accessor of the accessor description module.
  • the implementation of the above steps o1 to o6 can refer to the implementation of the above steps c1 to c6, and to avoid redundancy, they will not be described in detail here.
  • Some embodiments of the present application further provide a method for rendering a three-dimensional scene.
  • the execution subject of the method for rendering a three-dimensional scene is a display engine in an immersive media description framework. As shown in FIG. 14 , the method for rendering a three-dimensional scene includes the following steps:
  • the three-dimensional scene to be rendered includes a target media file of the type of G-PCC coded point cloud.
  • the implementation method of obtaining the scene description file of the three-dimensional scene to be rendered includes: sending a request message for requesting the scene description file of the three-dimensional scene to be rendered to a media resource server, and receiving a request response sent by the media resource server and carrying the scene description file of the three-dimensional scene to be rendered.
  • S143 Send description information of the target media file to the media access function.
  • the media access function can obtain the target media file according to the description information of the target media file, process the target media file to obtain the decoded data of the target media file, and write the decoded data of the target media file into the target buffer.
  • the display engine sends the description information of the target media file to the media access function, including: the display engine may send the description information of the target media file to the media access function through a media access function API.
  • data that has been completely processed by the media access function and can be directly used for rendering the three-dimensional scene to be rendered is read from the target buffer.
  • S145 Render the to-be-rendered 3D scene based on the decoded data of the target media file.
  • the rendering method of a three-dimensional scene after obtaining a scene description file of a three-dimensional scene to be rendered including a target media file of a type of G-PCC coded point cloud, first obtains description information of the target media file according to a media description module corresponding to the target media file in a media list of MPEG media of the scene description file, and sends the description information of the target media file to a media access function, so that the media access function obtains the target media file according to the description information of the target media file, processes the target media file to obtain decoded data of the target media file, writes the decoded data of the target media file into a target buffer, reads the decoded data of the target media file from the target buffer, and renders the three-dimensional scene to be rendered based on the decoded data of the target media file.
  • the display engine can obtain the description information of the target media file according to the target media description module, send the description information of the target media file to the media access function, read the decoded data of the target media file of the type G-PCC coded point cloud, and render the three-dimensional scene to be rendered based on the decoded data of the target media file
  • the embodiment of the present application provides a rendering method for rendering a three-dimensional scene to be rendered including a media file of the type G-PCC coded point cloud, and realizes the rendering of the media file of the type G-PCC coded point cloud based on the scene description file.
  • Some embodiments of the present application further provide a method for processing a media file.
  • the execution subject of the method for processing a media file is a media access function in an immersive media description framework.
  • the method for processing a media file includes the following steps:
  • S151 Receive description information of a target media file, description information of a target buffer, and description information of a cache slice of the target buffer sent by a display engine.
  • the target media file is a media file of a G-PCC coded point cloud type
  • the target buffer is a buffer for caching decoded data of the target media file.
  • the description information of the target media file may include at least one of the following:
  • the description information of the target buffer may include at least one of the following:
  • the capacity of the buffer whether it is an MPEG circular buffer, the number of storage links of the circular buffer, the index value of the media description module corresponding to the target media file, and the track index value of the source data of the data cached by the circular buffer.
  • the description information of the cache slice of the target cache may include at least one of the following:
  • the cache to which the cache slice belongs the capacity of the cache slice, and the offset of the cache slice.
  • receiving description information of a target media file, description information of a target buffer, and description information of a cache slice of the target buffer sent by a display engine includes:
  • the description information of the target media file, the description information of the target buffer, and the description information of the cache slice of the target buffer sent by the display engine are received through a media access function API.
  • S152 Obtain decoding data of the target media file according to the description information of the target media file.
  • the media access function obtains the decoded data of the target media file according to the description information of the target media file, including:
  • a target pipeline for processing the target media file is created according to the description information of the target media file, the target media file is acquired through the target pipeline, and the target media file is decapsulated and decoded to acquire decoded data of the target media file.
  • obtaining the target media file through the target pipeline, and decapsulating and decoding the target media file to obtain decoded data of the target media file includes: obtaining the target media file through the input module of the target pipeline, and inputting the target media file into the decapsulation module of the target pipeline; decoding the target media file through the decapsulation module to obtain a geometry code stream and an attribute code stream of the target media file; decoding the geometry code stream through the geometry decoder of the target pipeline to obtain geometry decoding data of the target media file; decoding the attribute code stream through the attribute decoder of the target pipeline to obtain attribute decoding data of the target media file.
  • acquiring the target media file through the target pipeline, and decapsulating and decoding the target media file to obtain decoded data of the target media file further includes: after acquiring the geometric decoding data of the target media file, processing the geometric decoding data through a first post-processing module of the target pipeline, and after acquiring the attribute decoding data of the target media file, processing the attribute decoding data through a second post-processing module of the target pipeline.
  • processing the geometric decoding data through the first post-processing module of the target pipeline may include: format conversion of the geometric decoding data through the first post-processing module of the target pipeline
  • processing the attribute decoding data through the second post-processing module of the target pipeline may include: format conversion of the attribute decoding data through the second post-processing module of the target pipeline.
  • S153 Write the decoded data of the target media file into the target buffer according to the description information of the target buffer and the description information of the cache slice of the target buffer.
  • the display engine may read the decoded data of the target media file from the target buffer according to the description information of the target buffer and the description information of the cache slice of the target buffer.
  • the method further comprises: obtaining decoded data of the target media file, and rendering a to-be-rendered three-dimensional scene including the target media file based on the decoded data of the target media file.
  • the media file processing method provided in the embodiment of the present application obtains the decoded data corresponding to the target media file according to the description information of the target media file after receiving the description information of the target media file of the type of G-PCC coded point cloud sent by the display engine, the description information of the target buffer for caching the decoded data of the target media file, and the description information of the cache slices of the target buffer.
  • the decoded data of the target media file is written into the target buffer according to the description information of the target buffer and the description information of the cache slices of the target buffer.
  • the display engine can read the decoded data of the target media file from the target buffer according to the description information of the target buffer and the description information of the cache slices of the target buffer, and render the three-dimensional scene to be rendered including the target media file based on the decoded data of the target media file. Therefore, the embodiment of the present application can support rendering media files of the type of G-PCC coded point cloud in the scene description box.
  • Some embodiments of the present application further provide a cache management method.
  • the execution subject of the cache management method is a cache management module in the immersive media description framework. As shown in FIG. 16 , the cache management method includes the following steps:
  • the target buffer is a buffer for caching a target media file
  • the target media file is a media file of a G-PCC coded point cloud type.
  • the description information of the target buffer may include at least one of the following:
  • the capacity of the buffer whether it is an MPEG circular buffer, the number of storage links of the circular buffer, the index value of the media description module corresponding to the media file (the target media file) cached by the circular buffer, and the track index value of the source data of the data cached by the circular buffer.
  • the description information of the cache slice of the target cache may include at least one of the following:
  • the cache to which the cache slice belongs the capacity of the cache slice, and the offset of the cache slice.
  • S162 Create the target buffer according to the description information of the target buffer.
  • the description information of the target buffer includes: the capacity of the target buffer is 8000 bytes; the target buffer is a circular buffer based on MPEG extension modification, the number of storage links of the circular buffer is 3, the media file stored in the circular buffer is the first media file declared in the MPEG media, and the track index value of the source data cached by the circular buffer is 1, then the cache management module creates a circular buffer with a capacity of 8000 bytes and 3 storage links as the target buffer.
  • the description information of the first cache slice includes: the capacity is 6000 bytes, the offset is 0, and the description information of the second cache slice includes: the capacity is 2000 bytes, and the offset is 6001, then the target buffer is divided into 2 cache slices, the capacity of the first cache slice is 6000 bytes, which is used to cache the first 6000 bytes of decoded data of the target media file, and the capacity of the second cache slice is 2000 bytes, which is used to cache 6001 to 8000 bytes of decoded data of the target media file.
  • the media access function can write the decoded data of the target media file into the target buffer
  • the display engine can read the decoded data of the target media file from the target buffer
  • the display engine can read the decoded data of the target media file based on the target media file.
  • the decoded data of the volume file is used to render the to-be-rendered three-dimensional scene including the target media file.
  • the cache management method provided in the embodiment of the present application can create the target buffer according to the description information of the target buffer and divide the cache slices of the target buffer according to the description information of the cache slices of the target buffer after receiving the description information of the target buffer. Therefore, the media access function can write the decoded data of the media file of type G-PCC coded point cloud into the target buffer, the display engine can read the decoded data of the target media file from the target buffer, and render the three-dimensional scene to be rendered including the target media file based on the decoded data of the target media file. Therefore, the embodiment of the present application can support rendering of media files of type G-PCC coded point cloud in the scene description box.
  • Some embodiments of the present application also provide a method for rendering a three-dimensional scene, which includes: a scene description file parsing method and a three-dimensional scene rendering method executed by a display engine, a media file processing method executed by a media access function, and a cache management method executed by a cache management module.
  • a scene description file parsing method and a three-dimensional scene rendering method executed by a display engine a media file processing method executed by a media access function
  • a cache management method executed by a cache management module.
  • the display engine obtains a scene description file of a 3D scene to be rendered.
  • the three-dimensional scene to be rendered includes a target media file of the type of G-PCC coded point cloud.
  • the display engine obtains a scene description file of a scene to be rendered, including: the display engine downloads the scene description file from a server using a network transmission service.
  • the display engine obtains a scene description file of a scene to be rendered, including: reading the scene description file from a local storage space.
  • the display engine obtains the media description module corresponding to each media file from the media list (media) of the MPEG media (MPEG_media) of the scene description file (including: obtaining the media description module corresponding to the target media file from the media list of the MPEG media of the scene description file).
  • the display engine obtains description information of each media file according to the media description module corresponding to each media file (including: obtaining description information of the target media file according to the media description module corresponding to the target media file).
  • the description information of the media file includes at least one of the following:
  • the name of the media file whether the media file is played automatically, whether the media file is played in a loop, the packaging format of the media file, the access address of the media file, the track information of the packaging file of the media file, and the encoding and decoding parameters of the media file.
  • the implementation manner in which the display engine obtains the description information of the target media file according to the media description module corresponding to the target media file can refer to the implementation manner in which the media description module of the target media file is parsed in the above-mentioned scene description parsing method. To avoid redundancy, it will not be described in detail here.
  • the display engine sends description information of each media file to the media access function (including: sending description information of the target media file to the media access function).
  • the media access function receives the description information of each media file sent by the display engine (including: receiving the description information of the target media file sent by the display engine).
  • the display engine sends the description information of each media file to the media access function, including: the display engine sends the description information of each media file to the media access function through the media access function API.
  • the media access function receives the description information of each media file sent by the display engine, including: the media access function receives the description information of each media file sent by the display engine through the media access function API.
  • the media access function creates a management function for processing each media file according to the description information of each media file.
  • the method further comprises: creating a target pipeline for processing the target media file according to the target media file description information.
  • the target pipeline includes: an input module, a decapsulation module and a decoding module; the input module is used to obtain the target media file (encapsulated file); the decapsulation module is used to decapsulate the target media file to obtain the code stream of the target media file (which may be a G-PCC code stream encapsulated in a single track, or a G-PCC geometry code stream and a G-PCC attribute code stream encapsulated in multiple tracks); the decoding module includes a decoder, a geometry decoder and an attribute decoder; when the code stream of the target media file is a G-PCC code stream encapsulated in a single track, the decoding module decodes the G-PCC code stream through the decoder to obtain the decoded data of the target media file; when the code stream of the target media file is a G-PCC geometry code stream and a G-PCC attribute code stream encapsulated in multiple tracks, the G-PCC geometry code stream and the G-PCC attribute
  • the target pipeline also includes: a first post-processing module and a second post-processing module; the first post-processing module is used to perform post-processing such as format conversion on the geometric data obtained by decoding the G-PCC geometric code stream, and the second post-processing module is used to perform post-processing such as format conversion on the attribute data obtained by decoding the G-PCC attribute code stream.
  • first post-processing module is used to perform post-processing such as format conversion on the geometric data obtained by decoding the G-PCC geometric code stream
  • the second post-processing module is used to perform post-processing such as format conversion on the attribute data obtained by decoding the G-PCC attribute code stream.
  • the media access function obtains each media file through pipeline processing corresponding to each media file, and decapsulates and decodes each media file to obtain decoded data corresponding to each media file. (including obtaining the target media file through the target pipeline, and decapsulating and decoding the target media file to obtain decoded data corresponding to the target media file).
  • the description information of the target media file includes an access address of the target media file
  • the media access function obtains the decoded data of the target media file according to the description information of the target media file, including:
  • the media access function obtains the target media file according to the access address of the target media file.
  • the media access function obtains the target media file according to the access address of the target media file, including: the media access function sends a media resource request to a media resource server according to the access address of the target media file, and receives a media resource response sent by the media server carrying the target media file.
  • the media access function obtains the target media file according to the access address of the target media file, including: the media access function reads the target media file from a preset storage space according to the access address of the target media file.
  • the description information of the target media file further includes index values of each code stream track of the target media file; and the media access function obtains the decoded data of the target media file according to the description information of the target media file, including:
  • the media access function decapsulates the target media file according to the target media file encapsulation format, and obtains the bitstreams of each bitstream track of the target media file.
  • the description information of the target media file further includes the type and encoding and decoding parameters of the code stream of the target media file; the media access function obtains the decoding data of the target media file according to the description information of the target media file, including:
  • the media access function decodes the bitstreams of each bitstream track of the target media file according to the bitstream type and encoding and decoding parameters of the target media file to obtain decoded data of the target media file.
  • the display engine obtains each buffer description module in the buffer list (buffers) of the scene description file (including: obtaining the decoded data for caching the target media file from the buffer list of the scene description file); The buffer description module corresponding to the target buffer).
  • the display engine obtains description information of each buffer according to the buffer description module corresponding to each buffer (including: obtaining description information of the target buffer according to the buffer description module corresponding to the target buffer).
  • the description information of the cache may include at least one of the following:
  • the capacity of the buffer (byte length), the access address of the data cached in the buffer, whether it is an MPEG circular buffer, the number of storage links in the circular buffer, the index value of the media description module corresponding to the media file cached in the circular buffer, and the track index value of the source data of the data cached in the circular buffer.
  • the display engine obtains each cache slice description module in the cache slice list (bufferViews) of the scene description file (including: obtaining the cache slice description module corresponding to the cache slice description of the target buffer from the cache slice list of the scene description file).
  • the display engine obtains description information of the cache slices of each cache according to the cache slice description module corresponding to the cache slice of each cache (including: obtaining description information of the cache slice of the target cache according to the cache slice description module corresponding to the cache slice of the target cache).
  • the description information of the cache may include at least one of the following:
  • the cache to which the cache slice belongs the capacity of the cache slice, and the offset of the cache slice.
  • the display engine obtains each accessor description module in the accessor list (accessors) of the scene description file (including: obtaining the accessor description module corresponding to the target accessor for accessing the decoded data of the target media file from the accessor list of the scene description file).
  • the display engine obtains description information of each accessor according to the accessor description module corresponding to each accessor (including: obtaining description information of the target accessor for accessing the decoded data of the target media file according to the accessor description module corresponding to the target accessor).
  • the description information of the accessor may include at least one of the following:
  • the embodiment of the present application can send the description information of each cache, the description information of the cache slices of each cache, and the description information of each accessor to the media access function and the cache management module through the following scheme 1.
  • the implementation of solution 1 includes the following steps a and b:
  • Step a the display engine sends the description information of each buffer, the description information of the cache slice of each buffer, and the description information of each accessor to the media access function (including: the display engine sends the description information of the target buffer, the description information of the cache slice of the target buffer, and the description information of the target accessor to the media access function).
  • the media access function receives description information of each buffer and description information of the cache slices of each buffer sent by the display engine (including: the media access function receives description information of the target buffer, description information of the cache slices of the target buffer and description information of the accessor sent by the display engine).
  • the above step a (the display engine sends the description information of each buffer, each The implementation method of the description information of the cache slices of the cache and the description information of each accessor) can be: the display engine sends the description information of each buffer, the description information of the cache slices of each buffer and the description information of each accessor to the media access function through the media access function API.
  • the implementation method of the media access function receiving the description information of each buffer sent by the display engine can be: the media access function receives the description information of each buffer sent by the display engine, the description information of the cache slices of each buffer and the description information of each accessor through the media access function API.
  • Step b the media access function sends the description information of each cache, the description information of the cache slice of each cache, and the description information of each accessor to the cache management module (including: the media access function sends the description information of the target cache, the description information of the cache slice of the target cache, and the description information of the target accessor to the cache management module).
  • the cache management module receives description information of each cache, description information of the cache slices of each cache, and description information of each accessor sent by the media access function (including: the media access function receives description information of the target cache, description information of the cache slices of the target cache, and description information of the target accessor sent by the display engine).
  • the implementation of the above step b may include: the media access function sends the description information of each buffer, the description information of the cache slice of each buffer, and the description information of each accessor to the cache management module through the cache API.
  • the implementation of the cache management module receiving the description information of each buffer, the description information of the cache slice of each buffer, and the description information of each accessor sent by the media access function may include: the cache management module receives the description information of each buffer, the description information of the cache slice of each buffer, and the description information of each accessor sent by the media access function through the cache API.
  • the implementation of solution 1 includes the following steps c and d:
  • Step c the display engine sends the description information of each buffer, the description information of the cache slice of each buffer, and the description information of each accessor to the media access function (including: the display engine sends the description information of the target buffer, the description information of the cache slice of the target buffer, and the description information of the target accessor to the media access function).
  • the media access function receives description information of each buffer, description information of the cache slices of each buffer, and description information of each accessor sent by the display engine (including: the media access function receives description information of the target buffer, description information of the cache slices of the target buffer, and description information of the accessor sent by the display engine).
  • Step d the display engine sends the description information of each cache, the description information of the cache slice of each cache, and the description information of each accessor to the cache management module (including: the display engine sends the description information of the target cache, the description information of the cache slice of the target cache, and the description information of the target accessor to the cache management module).
  • the cache management module receives description information of each cache, description information of cache slices of each cache, and description information of each accessor sent by the display engine.
  • step d above the display engine sends the description information of each buffer, the description information of the cache slice of each buffer, and the description information of each accessor to the cache management module
  • the display engine sends the description information of each buffer, the description information of the cache slice of each buffer to the cache management module through the cache API. The following information is provided along with descriptions of each accessor.
  • the implementation method of the cache management module receiving the description information of each buffer, the description information of the cache slices of each buffer, and the description information of each accessor sent by the display engine may include: the cache management module receives the description information of each buffer, the description information of the cache slices of each buffer, and the description information of each accessor sent by the display engine through the cache API.
  • the embodiment of the present application can send the description information of each cache, the description information of the cache slices of each cache, and the description information of each accessor to the media access function through the following scheme two, and send the description information of each cache and the description information of the cache slices of each cache to the cache management module.
  • the implementation method of solution 2 (sending the description information of each buffer, the description information of the cache slice of each buffer, and the description information of each accessor to the media access function, and sending the description information of each buffer and the description information of the cache slice of each buffer to the cache management module) includes the following steps e and f:
  • Step e the display engine sends the description information of each buffer, the description information of the cache slice of each buffer, and the description information of each accessor to the media access function (including: the display engine sends the description information of the target buffer, the description information of the cache slice of the target buffer, and the description information of the target accessor to the media access function).
  • the media access function receives description information of each buffer and description information of the cache slices of each buffer sent by the display engine (including: the media access function receives description information of the target buffer, description information of the cache slices of the target buffer and description information of the accessor sent by the display engine).
  • Step f the display engine sends the description information of each buffer and the description information of the cache slice of each buffer to the cache management module (including: the display engine sends the description information of the target buffer and the description information of the cache slice of the target buffer to the media access function).
  • the implementation method of solution 2 (sending the description information of each buffer, the description information of the cache slice of each buffer, and the description information of each accessor to the media access function, and sending the description information of each buffer and the description information of the cache slice of each buffer to the cache management module) includes the following steps g and h:
  • Step g the display engine sends the description information of each buffer, the description information of the cache slice of each buffer, and the description information of each accessor to the media access function (including: the display engine sends the description information of the target buffer, the description information of the cache slice of the target buffer, and the description information of the target accessor to the media access function).
  • the media access function receives description information of each buffer and description information of the cache slices of each buffer sent by the display engine (including: the media access function receives description information of the target buffer, description information of the cache slices of the target buffer and description information of the accessor sent by the display engine).
  • Step f the media access function sends the description information of each buffer and the description information of the cache slice of each buffer to the cache management module (including: the media access function sends the description information of the target buffer and the description information of the cache slice of the target buffer to the media access function).
  • the cache management module receives the description information of each cache and the description information of the cache slice of each cache sent by the media access function (including: the description information of the target cache sent by the display engine received by the media access function and description information of the cache slice of the target cache).
  • the cache management module creates each cache according to the description information of each cache (including: creating the target cache according to the description information of the target cache).
  • the media access function writes the decoded data corresponding to each media file into the buffer corresponding to each media file according to the description information of each buffer, the description information of the cache slice of each buffer, and the description information of each accessor (including: the media access function writes the decoded data of the target media file into the target buffer according to the description information of the target buffer, the description information of the cache slice of the target buffer, and the description information of the target accessor).
  • the display engine obtains a scene description module corresponding to the to-be-rendered three-dimensional scene from the scene list of the scene description file.
  • the display engine obtains description information of the three-dimensional scene to be rendered according to the scene description module corresponding to the three-dimensional scene to be rendered.
  • the display engine obtains the node description module corresponding to each node in the three-dimensional scene to be rendered from the node list of the scene description file according to the index value of the node description module corresponding to each node in the three-dimensional scene to be rendered.
  • the display engine obtains description information of each node in the three-dimensional scene to be rendered according to the node description module corresponding to each node in the three-dimensional scene to be rendered.
  • the description information of any node includes the index value of the mesh description module corresponding to the three-dimensional mesh mounted on the node.
  • the description information of any node also includes the name of the node.
  • the display engine obtains the mesh description module corresponding to the three-dimensional mesh in the three-dimensional scene to be rendered from the mesh list of the scene description file according to the index value of the mesh description module corresponding to the three-dimensional mesh mounted on each node in the three-dimensional scene to be rendered.
  • the display engine obtains the data types contained in the three-dimensional mesh in the three-dimensional scene to be rendered and the accessors for accessing the various types of data of each three-dimensional mesh in the three-dimensional scene to be rendered according to the mesh description module corresponding to the three-dimensional mesh in the three-dimensional scene to be rendered.
  • the method further includes: acquiring the name and topological structure type of the three-dimensional mesh in the three-dimensional scene to be rendered according to a mesh description module corresponding to the three-dimensional mesh in the three-dimensional scene to be rendered.
  • the display engine creates each accessor according to the description information of each accessor (including creating an accessor for accessing each type of data of each three-dimensional mesh in the three-dimensional scene to be rendered according to the description information of the accessor for accessing each type of data of each three-dimensional mesh in the three-dimensional scene to be rendered).
  • the display engine reads the decoded data of each media file from the buffer corresponding to each media file through each accessor (including: reading various types of data of each three-dimensional mesh in the three-dimensional scene to be rendered from the target buffer storage through an accessor for accessing various types of data of each three-dimensional mesh in the three-dimensional scene to be rendered).
  • the display engine renders the to-be-rendered 3D scene based on the decoded data of each media file.
  • some embodiments of the present application provide a device for generating a scene description file, the device for generating a scene description file comprising:
  • a memory configured to store a computer program
  • the processor is configured to, when calling the computer program, enable the scene description file generation device to implement the scene description file generation method described in any of the above embodiments.
  • some embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored.
  • the computing device implements the method for generating a scene description file described in any of the above embodiments.
  • some embodiments of the present application provide a computer program product, which, when executed on a computer, enables the computer to implement the method for generating a scene description file described in any of the above embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Graphics (AREA)
  • Software Systems (AREA)
  • Geometry (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Processing Or Creating Images (AREA)
  • Television Signal Processing For Recording (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Generation (AREA)

Abstract

本申请一些实施例提供了一种场景描述文件的生成方法及装置,涉及视频处理技术领域。包括:确定待渲染三维场景中的媒体文件的类型;当所述待渲染三维场景中的目标媒体文件的类型为基于几何的点云压缩G-PCC编码点云时,根据所述目标媒体文件的描述信息生成所述目标媒体文件对应的目标描述模块;在所述待渲染三维场景的场景描述文件的MPEG媒体的媒体列表中添加所述目标媒体描述模块。

Description

场景描述文件的生成方法及装置
本申请要求在2023年01月10日提交、申请号为202310036790.8;在2023年04月27日提交、申请号为202310474240.4的中国专利申请的优先权,该申请的内容通过引用结合在本申请中。
技术领域
本申请一些实施例涉及视频处理技术领域,尤其涉及一种场景描述文件的生成方法及装置。
背景技术
点云(Point Cloud)是指海量三维点的集合。针对点云的压缩标准主要有基于几何的点云压缩(Geometry-based Point Cloud Compression,G-PCC)和基于视频的点云压缩(Video-based Point Cloud Compression,V-PCC)。
随着沉浸式媒体和应用的发展,沉浸式媒体的种类越来越多,目前主流的沉浸式媒体主要有点云、三维网格(Mesh)、6DoF全景视频、MPEG沉浸式视频(MPEG Immersive Video,MIV)等。在三维场景中,多种类型的沉浸式媒体往往同时存在。这就要求渲染引擎支持多种不同类型的沉浸式媒体的编解码,根据所支持编解码器的种类和数量不同,产生了不同种类的渲染引擎。不同供应商设计的渲染引擎支持的媒体类型不同,为了实现对不同种类媒体所组成的三维场景的跨平台描述,动态图像专家组(Moving Picture Experts Group,MPEG)启动了MPEG场景描述标准的制订,标准号为ISO/IEC 23090-14。该标准主要解决MPEG媒体(包括MPEG制订的编解码器、MPEG文件格式、MPEG传输机制)在三维场景中的跨平台描述问题。第一版ISO/IEC 23090-14 MPEG-I场景描述标准所做的扩展已满足了沉浸式场景描述解决方案的关键需求,然而目前的场景描述标准并不支持类型为G-PCC编码点云的媒体文件。点云是一种重要的三维媒体形式,G-PCC是目前主流的点云压缩算法之一,因此在场景描述框架中支持类型为G-PCC编码点云的媒体文件具有重要意义和价值。
发明内容
第一方面,本申请一些实施例提供了一种场景描述文件的生成方法,包括:
确定待渲染三维场景中的媒体文件的类型;
当所述待渲染三维场景中的目标媒体文件的类型为基于几何的点云压缩G-PCC编码点云时,根据所述目标媒体文件的描述信息生成所述目标媒体文件对应的目标描述模块;
在所述待渲染三维场景的场景描述文件的MPEG媒体的媒体列表中添加所述目标媒体描述模块。
第二方面,本申请一些实施例提供了一种场景描述文件的生成装置,包括:
存储器,被配置为存储计算机程序;
处理器,被配置为用于在调用计算机程序时,使得所述场景描述文件的生成装置实现第一方面所述的场景描述文件的生成方法。
附图说明
图1示出了一些实施例中的沉浸式媒体描述框架的结构示意图;
图2示出了一些实施例中的场景描述文件的结构示意图;
图3示出了本申请另一些实施例中的场景描述文件的结构示意图;
图4示出了一些实施例中的G-PCC编码器的结构示意图;
图5示出了一些实施例中的LOD划分过程的示意图;
图6示出了一些实施例中的提升变换过程的示意图;
图7示出了一些实施例中的RAHT变换过程的示意图;
图8示出了一些实施例中的G-PCC解码器的结构示意图;
图9示出了另一些实施例中的场景描述文件的结构示意图;
图10示出了另一些实施例中的场景描述文件的结构示意图;
图11示出了一些实施例提供的类型为G-PCC编码点云的媒体文件对应的管线的示意图;
图12示出了一些实施例中的场景描述文件的生成方法的步骤流程图;
图13示出了一些实施例中的场景描述文件解析方法的步骤流程图;
图14示出了一些实施例中的媒体文件处理方法的步骤流程图;
图15示出了一些实施例中的三维场景的渲染方法的步骤流程图;
图16示出了一些实施例中的缓存管理方法的步骤流程图;
图17示出了一些实施例中的三维场景的渲染方法的交互流程图。
具体实施方式
为使本申请的目的和实施方式更加清楚,下面将结合本申请示例性实施例中的附图,对本申请示例性实施方式进行清楚、完整地描述,显然,描述的示例性实施例仅是本申请一部分实施例,而不是全部的实施例。
需要说明的是,本申请中对于术语的简要说明,仅是为了方便理解接下来描述的实施方式,而不是意图限定本申请的实施方式。除非另有说明,这些术语应当按照其普通和通常的含义理解。
术语“包括”和“具有”以及他们的任何变形,意图在于覆盖但不排他的包含,例如,包含了一系列组件的产品或设备不必限于清楚地列出的所有组件,而是可包括没有清楚地列出的或对于这些产品或设备固有的其它组件。
说明书中提及“一些实现方式”、“一些实施例”等是表明所描述的实现方式或实施例可包括特定的特征、结构或特性,但可能不一定每个实施例都包括该特定特征、结构或特性。此外,这种短语不一定指同一实现方式。另外,当联系一实施例来描述特定的特征、结构或特性时,认为联系其他实现方式(无论本文是否明确描述)来实现这种特征、结构或特性,是在本领域技术人员的知识范围内的。
本申请一些实施例涉及沉浸式媒体的场景描述。参照图1所示的沉浸式媒体的场景描述框架,为了使显示引擎11能够专注于媒体渲染,沉浸式媒体的场景描述框架将媒体文件的访问与处理和媒体文件的渲染进行了解耦,且设计了媒体接入函数(Media Access Function,MAF)12来负责媒体文件的访问与处理功能。同时设计了媒体接入函数应用程序编程接口 (Application Programming Interface,API),显示引擎11与媒体接入函数12之间通过媒体接入函数API进行指令交互。显示引擎11可以通过媒体接入函数API向媒体接入函数12下达指令,媒体接入函数12也可以通过媒体接入函数API向显示引擎11请求指令。
沉浸式媒体的场景描述框架的一般工作流程包括:1)、显示引擎11获取沉浸式媒体服务方提供的场景描述文件(Scene Description Documents)。2)、显示引擎11解析场景描述文件,获取媒体文件的访问地址、媒体文件的属性信息(媒体类型及编解码参数等)和经过处理的媒体文件的格式要求等参数或信息,并调用媒体接入函数API将解析场景描述文件得到的全部或部分信息传递给媒体接入函数12。3)、媒体接入函数12根据显示引擎11传递的信息,从媒体资源服务器请求下载指定的媒体文件或者从本地获取指定的媒体文件,并为该媒体文件建立相应的管线(pipeline),随后在管线中对媒体文件进行解封装、解密、解码、后处理等处理,以将媒体文件从封装格式转换为显示引擎11规定的格式。4)、管线将完成所有处理得到的输出数据存放到指定的缓存中。5)、最后,显示引擎11在指定的缓存中读取经过完备处理的数据,并根据从缓存中读取的数据进行媒体文件的渲染。
以下进一步对沉浸式媒体的场景描述框架所涉及的文件和功能模块进行说明。
一、场景描述文件
在沉浸式媒体的场景描述框架的工作流程中,场景描述文件用于描述三维场景的结构(可用三维网格描述其特征)、纹理(如纹理贴图等)、动画(旋转、平移)、相机视点位置(渲染视角)等内容。
相关技术领域中,GL传输格式2.0(glTF2.0)已被确定为一种场景描述文件的候选格式,可以满足动态图像专家组-浸入式(MPEG-Immersive,MPEG-I)和六自由度(Degrees of Freedom,6DoF)应用的需求。例如,在github.com/KhronosGroup/glTF/tree/master/specification/2.0#specifying-extensions可获得的Khronos Group的GL传输格式(glTF)版本2.0中,描述了glTF2.0。参照图2所示,图2为glTF2.0场景描述标准(ISO/IEC 12113)中的场景描述文件的结构示意图。如图2所示,glTF2.0场景描述标准中的场景描述文件包括但不限于:场景描述模块(scene)201、节点描述模块(node)202、网格描述模块(mesh)203、访问器描述模块(accessor)204、缓存切片描述模块(bufferView)205、缓存器描述模块(buffer)206、相机描述模块(camera)207、光照描述模块(light)208、材质描述模块(material)209、纹理描述模块(texture)210、采样器描述模块(sampler)211以及纹理贴图描述模块(image)212、动画描述模块(animation)213、蒙皮描述模块(skin)214。
图2所示场景描述文件中的场景描述模块(scene)201用于描述场景描述文件中包含的三维场景。一个场景描述文件中可能包含任意数量的三维场景,每一个三维场景分别使用一个场景描述模块201进行表示。场景描述模块201与场景描述模块201之间为并列关系,即三维场景和三维场景之间为并列关系。
图2所示场景描述文件中的节点描述模块(node)202为场景描述模块201的下一层级描述模块,用于描述场景描述模块201所描述的三维场景中所包含的物体。每一个三维场景中都可能存在许多具体物体,比如虚拟数字人、近处的三维物体、远处的背景图片等,场景描述文件会通过节点描述模块202对这些具体物体进行描述。每个节点描述模块202可以表示 一个物体或者是数个物体组成的一组物体集合,节点描述模块202之间的关系反映了场景描述模块201所描述的三维场景中各个组成部分之间的关系,一个场景描述模块201描述的场景中可以包含一个或多个节点。多个节点之间可以是并列关系或是层级关系,即节点描述模块202之间存在包含与被包含的关系,这使得多个具体物体可以集合在一起进行描述,也可以分别对多个具体物体进行描述。如果一个节点被另一个节点包含,则被包含的节点称为子节点(children),对于子节点使用"children"替换"node"进行表示。灵活使用节点与子节点进行组合,就可以构成层级化的节点结构,从而表达丰富的场景内容。
图2所示场景描述文件中的网格描述模块(mesh)203为节点描述模块202的下一层级描述模块,用于描述节点描述模块202所代表的物体的特征。网格描述模块203为由一个或多个基元(primitives)组成的集合,每基元又可以包括一个属性(attributes),基元的属性定义了图形处理器(Graphics Processing Unit,GPU)渲染时需要使用的属性。属性可以包括:position(三维坐标),normal(法向量),tangent(切向量),texcoord_n(纹理坐标),color_n(颜色:RGB或RGBA),joints_n(与蒙皮描述模块214相关的属性),和weights_n(与蒙皮描述模块214相关的属性)等。由于网格描述模块203中包含的顶点数量非常大,每个顶点又包含了多种属性信息,因此不便将媒体文件中包含的大量媒体数据直接存储在场景描述文件的网格描述模块203中,而是在场景描述文件中指出了媒体文件的访问地址(Uniform Resource Identifier,URI),当需要取用媒体文件中的数据时再进行下载即可,从而实现场景描述文件与媒体文件的分离。所以一般情况下网格描述模块203中不存储媒体数据,而是存储每一个属性对应的访问器描述模块(accessor)204的索引值,并通过访问器描述模块204指向缓存器(buffer)的缓存切片(bufferView)中相应的数据。
在一些实施例中,还可以将场景描述文件与媒体文件融合起来,形成一个二进制文件,从而减少文件的种类与数量。
此外,网格描述模块203的基元的中还可能有模式(mode)这一语法元素。模式语法元素用于描述图形处理器(Graphics Processing Unit,GPU)绘制三维网格时的拓扑结构,比如mode=0代表散点,mode=1代表线,mode=4代表三角形等。
示例性的,以下为一个网格描述模块203的JSON示例:
在上述网格描述模块203中"position"的值为1,指向了索引为1的访问器描述模块204,并最终指向缓存器中存储的顶点坐标数据;"color_0"的值为2,指向了索引为2的访问器描述模块204,并最终指向了缓存器中存储的颜色数据。
网格描述模块203的基元的属性(mash.primitives.attributes)中的语法元素的定义如下表1所示:
表1
网格描述模块203的基元的属性(mash.primitives.attributes)中索引的访问器的类型的定义如下表2所示:
表2
网格描述模块203的基元的属性(mash.primitives.attributes)中的数据类型的定义如下表3所示:
表3
图2所示场景描述文件中的访问器描述模块(accessor)204、缓存切片描述模块(bufferView)205以及缓存器描述模块(buffer)206共同实现网格描述模块203对媒体文件的数据的逐层精细化的索引。如上所述,网格描述模块203中不存储具体的媒体数据,而是存储对应的访问器描述模块204的索引值,并通过索引值所索引的访问器描述模块204描述的访问器来访问具体的媒体数据。网格描述模块203对媒体数据的索引过程包括:首先网格描述模块203中 的语法元素声明的索引值会指向对应的访问器描述模块204;然后,访问器描述模块204又会指向对应的缓存切片描述模块205;最后,缓存切片描述模块205再指向对应的缓存器描述模块206。图2所示场景描述文件中的缓存器描述模块206主要负责指向对应的媒体文件,包含了媒体文件的URI,媒体文件的字节长度等信息,用于描述缓存媒体文件的媒体数据的缓存器。一个缓存器可以被划分为一个或多个缓存切片,缓存切片描述模块205主要负责对缓存器中的媒体数据进行部分访问,包含了访问数据的起始字节偏移与访问数据的字节长度等,通过这缓存切片描述模块205以及缓存器描述模块206可以实现对媒体文件的数据的部分访问。访问器描述模块204主要负责为缓存切片描述模块205中划定的部分数据增添附加信息,比如数据类型,该类型数据的数量,该类型数据的数值范围等。这样的三层结构可以实现从一个媒体文件中取用部分数据的功能,有利于数据的精准取用,也便于减少媒体文件的数量。
图2所示场景描述文件中的相机描述模块(camera)207为节点描述模块202的下一层级描述模块,用于描述用户观看节点描述模块202所描述的物体时的视点、视角等与视觉观看相关的信息。为了使用户能够置身于三维场景之中,且能够观看三维场景,节点描述模块202还可以指向相机描述模块207,并通过相机描述模块207描述用户观看节点描述模块202所描述的物体时的视点、视角等与视觉观看相关的信息。
图2所示场景描述文件中的光照描述模块(light)208为节点描述模块202的下一层级描述模块,用于描述节点描述模块202所描述的物体的光照强度、环境光颜色、光照方向、光源位置等与光照相关的信息。
图2所示场景描述文件中的材质描述模块(material)209为网格描述模块203的下一层级描述模块,用于描述网格描述模块203所描述的三维物体的材质信息。在描述的三维物体时,仅借助网格描述模块203描述三维物体的几何信息,或单调地定义三维物体颜色和/或位置,是无法提高三维物体的真实感的,这就需要在三维物体的表面附加更多的信息。对于三维网格模型等三维模型技术,这一过程也可以简称为纹理贴图或添加纹理等。glTF2.0场景描述标准中的场景描述文件也沿用了这一描述模块。材质描述模块209使用一组通用参数定义材质,以描述三维场景中出现的几何对象的材质信息。该材质描述模块209普遍使用金属-粗糙度模型(metallic-roughness)描述虚拟物体的材质,基于金属-粗糙度模型的材质特性参数采用广泛使用的基于物理的渲染(Physically Based-Rendering,PBR)的材质表示。基于此,材质描述模块209对物体的金属-粗糙度材质属性做了详细的说明,材质描述模块209中的语法元素的定义如表4所示:
表4
在一些实施例中,材质描述模块209的金属-粗糙度(material.PbrMetarialRoughness)中的语法元素的定义如下表5所示:
表5
材质描述模块209的金属-粗糙度中的每个属性的值可以使用因子和/或纹理(例如baseColorTexture和baseColorFactor)定义。如果未给定纹理,则可以确定此材质模型中的所有相应纹理组件的值均为1.0。如果同时存在因子和纹理,则因子值充当相应纹理值的线性乘数。纹理绑定由纹理对象的索引和可选的纹理坐标索引定义。
示例性的,以下为一个材质描述模块209的JSON示例:
解析上述材质描述模块209,可以通过材质名称语法元素及其值("name":"gold"),确定当前材质命名为"gold",再通过pbrMetallicRoughness数组下的颜色语法元素及其值("basecolorFactor":[1.000,0.766,0.336,1.0]),确定当前材质的基础颜色的值为[1.000,0.766,0.336,1.0],通过pbrMetallicRoughness数组下金属性语法元素及其值("metalnessFactor":1.0),确定当前材质的金属性值为"1.0",通过pbrMetallicRoughness数组下的粗糙度语法元素及其值("roughnessFactor":0.0),确定当前材质的粗糙度值为"0.0"。
图2所示场景描述文件中的纹理描述模块(texture)210为材质描述模块209的下一层级描述模块,用于描述材质描述模块209所描述的三维物体的颜色以及材质定义中使用的其他特性。纹理是赋予对象真实外观的一个重要方面。通过纹理可以定义对象的主颜色以及材质定义中使用的其他特性,以便精确描述渲染对象的外观。材质本身可以定义多个纹理对象,这些纹理对象可以在渲染期间用作虚拟物体的纹理,并且可以用于编码不同的材质属性。纹理描述模块210使用采样器语法元素和纹理贴图语法元素索引来引用一个采样器描述模块(sampler)211和一个纹理贴图描述模块(image)212。纹理贴图描述模块212包含一个统一资源标识符(Uniform Resource Identifier,URI),该URI链接到纹理描述模块210实际使用的纹理贴图或二进制文件包。而采样器描述模块211则是用于描述纹理的过滤和包装模式。材质描述模块209、纹理描述模块210、采样器描述模块211以及纹理贴图描述模块212各自的职责与协作关系包括:材质描述模块209与纹理描述模块210一起定义了物体表面的颜色和物理信息。采样器描述模块211定义了如何将纹理贴图贴在物体表面。纹理描述模块210指定了 采样器描述模块211和纹理贴图描述模块212,通过纹理贴图描述模块212实现了添加纹理,而纹理贴图描述模块212则使用URI进行标识和索引,并使用访问器描述模块204进行数据的访问。采样器描述模块211则实现了纹理的具体调整和包装。纹理描述模块210中的语法元素的定义如下表6所示:
表6
在一些实施例中,纹理描述模块210的sample(texture.sample)中的语法元素的定义如下表7所示:
表7
示例性的,以下为一个材质描述模块209、纹理描述模块210、采样器描述模块211以及纹理贴图描述模块212的JSON示例:

图2所示场景描述文件中的动画描述模块(animation)213为节点描述模块202的下一层级描述模块,用于描述为节点描述模块202所描述的物体添加的动画信息。为了让节点描述模块202所代表的物体不被限制在静止的状态,可以为节点描述模块202所描述的物体增添动画,因此场景描述文件中的动画描述模块213这一描述层级是被节点描述模块202所指定的,即动画描述模块213是节点描述模块202的下一层级描述模块,动画描述模块213同样与网格描述模块203具有对应关系。动画描述模块213可通过位置移动、角度旋转、大小缩放三种方式进行动画的描述,同时可以规定动画的开始、结束时间以及动画的实现方式。比如为一个代表三维物体的网格描述模块203增添一个动画,则该网格描述模块203所代表的三维物体就可以在指定的时间窗口内,通过位置移动、角度旋转、大小缩放的融合,完成规定的动画过程。
图2所示场景描述文件中的蒙皮描述模块(skin)214为节点描述模块202的下一层级描述模块,用于描述为节点描述模块202所描述的节点所添加骨骼与表物体表面信息的网格之间的运动协作关系。当节点描述模块202描述的节点代表了人、动物、机械等运动自由度较大的物体时,为了优化这些物体的运动表现效果,可以向物体的内部填充骨骼,代表物体表面信息的三维网格此时在概念上就成为了蒙皮。蒙皮描述模块214这一描述层级是被节点描述模块202所指定的,即蒙皮描述模块214是节点描述模块202的下一层级描述模块,蒙皮描述模块214与网格描述模块203具有对应关系。以骨骼的运动带动物体表面的网格进行运动,再结合仿真仿生的设计,可以实现较为逼真的运动效果,比如当人的手进行握拳动作时,表面的皮肤会随着内部的骨骼发生拉伸、遮盖等变化,此时在手部模型内预先填充的骨骼,再定义骨骼与蒙皮之间的协作关系就可以实现对这一动作的逼真模拟。
上述glTF2.0场景描述标准中的场景描述文件各个描述模块只是具备了最基本的描述三维物体的能力,存在不能支持动态的三维沉浸式媒体,不支持音频文件,不支持场景更新等问题。glTF同时也声明了其每个对象属性下都有一个可选的扩展对象属性(extensions),允许在其任何部分使用extensions进行扩展以达到更完善的功能。包括场景描述模块(scene)、节点描述模块(node)、网格描述模块(mesh)、访问器描述模块(accessor)、缓存描述模块(buffer)、动画描述模块(animation)等及其这些内部定义的语法元素都有可选的扩展对象属性,以支持在glTF2.0的基础上进行一定的功能扩展。
目前不同供应商设计的渲染引擎支持的媒体类型不同,为了实现对不同种类媒体所组成的三维场景的跨平台描述,动态图像专家组(Moving Picture Experts Group,MPEG)启动了MPEG场景描述标准的制订,标准号为ISO/IEC 23090-14。该标准主要解决MPEG media(包括MPEG制订的编解码器、MPEG文件格式、MPEG传输机制)在三维场景中的跨平台描述问题。
MPEG#128次会议决议以glTF2.0(ISO/IEC 12113)作为基础,制订MPEG-I Scene Description标准。目前已制订出第一版的MPEG场景描述标准,处于FDIS投票阶段。MPEG场景描述标准在第一版标准的基础上,通过添加相应扩展,以解决三维场景跨平台描述中仍未实现的需求,包括交互性、AR锚定、用户和化身表示、触觉支持,以及扩展对沉浸式媒体编解码器的支持等。
已制订出的第一版MPEG场景描述标准主要制订了以下内容:
1)MPEG场景描述标准定义了一种用于描述沉浸式三维场景的场景描述文件格式,此格式结合了原来glTF2.0(ISO/IEC 12113)的内容并在其基础上进行了一系列扩展。
2)MPEG场景描述定义了一个场景描述框架以及其中用于模块间协作的应用程序编程接口(Application Program Interface,API),实现了沉浸式媒体的获取与处理过程同媒体渲染过程的解耦,有益于实现进行沉浸式媒体对不同网络条件的适应、部分获取沉浸式媒体文件、针对沉浸式媒体的不同细节层次的访问以及内容质量的调整等方面的优化。沉浸式媒体的获取与处理过程同沉浸式媒体渲染过程的解耦是实现三维场景跨平台描述的关键。
3)MPEG场景描述提出了一系列基于国际标准化组织基本媒体文件格式(International Standardization Organization Base Media File Format,ISOBMFF)(ISO/IEC 14496-12)的扩展用于传输沉浸式媒体内容。
参照图3所示,在图2所示场景描述文件的基础上MPEG场景描述标准中对场景描述文件进行了扩展,相比于glTF2.0场景描述标准中的场景描述文件(图2所示场景描述文件),MPEG场景描述标准中的场景描述文件的扩展可以分为两组:
第一组扩展包括:MPEG媒体(MPEG_media)301、MPEG时变访问器(MPEG_accessor_timed)302以及MPEG环形缓存器(MPEG_buffer_circular)303。其中,MPEG媒体301为独立扩展,用于引用外部媒体源;MPEG时变访问器302为访问器层级的扩展,用于访问时变媒体;MPEG环形缓存器为缓存器层级的扩展用于支持循环缓存器。第一组扩展提供了对场景中媒体的基本描述和格式,满足了在场景描述框架中描述时变的沉浸式媒体的基本需求。其中,MPEG时变访问器(MPEG_accessor_timed)302是用来访问时变媒体的。由于glTF2.0的场景描述标准中不支持时变媒体,因此在需要媒体数据随时间变化时,需要通过更新glTF2.0的场景描述标准下的场景描述文件来实现。例如:在glTF2.0的场景描述标准中需要更新物体表面的纹理贴图,使物体表面的纹理贴图能够随着时间而变化,则必须要更新glTF2.0的场景描述标准下的场景描述文件。而频繁更新场景描述文件需要频繁解析、处理和传输场景描述文件,增加三维场景渲染过程中的性能开销。基于此,MPEG设计了MPEG时变访问器(MPEG_accessor_timed)302,MPEG时变访问器内的参数可以随时间变化,以改变媒体数据的访问方式,实现了访问到的数据随时间变化,从而避免了频繁解析、处理和传输场景描述文件。
第二组扩展包括:MPEG动态场景(MPEG_scene_dynamic)304、MPEG纹理(MPEG_texture_video)305、MPEG音频空间(MPEG_audio_spatial)306、MPEG视角推荐(MPEG_viewport_recommended)307、MPEG网格映射(MPEG_mesh_linking)308以及MPEG动画时间(MPEG_animation_timing)309。其中,MPEG_scene_dynamic 304为场景层级扩展,用于支持动态场景更新;MPEG_texture_video 305为纹理层级扩展,用于支持视频形式的纹理;MPEG_audio_spatial 306,为节点层级和相机层级扩展,用于支持空间3D音频,MPEG_viewport_recommended 307为场景层级扩展,用于支持在二维显示时描述推荐视角,MPEG_mesh_linking 308为网格层级扩展,用于支持链接两个网格并提供映射信息;MPEG_animation_timing 309为场景层级扩展,用于支持控制动画时间线。
下面将对上述各个扩展部分进行详细展开说明:
MPEG场景描述文件中的MPEG媒体用于对媒体文件的类型进行描述,并对MPEG类型的媒体文件进行必要的说明,以便后续取用这些MPEG类型的媒体文件。其中,MPEG媒体的第一层级的语法元素的定义如下表8所示:
表8
MPEG媒体的媒体列表(MPEG_media.media)中的语法元素的定义如下表9所示:
表9
MPEG媒体的媒体列表中的可选项(MPEG_media.alternatives)中的语法元素的定义如下表10所示:
表10
MPEG媒体的媒体列表的可选项中的轨道数组(MPEG_media.alternatives.tracks)中的语法元素的定义如下表11所示:
表11
此外,基于ISOBMFF(ISO/IEC 14496-12),ISO/IEC 23090-14还定义了与场景描述文件的交付以及与glTF 2.0扩展相关的数据交付的传输格式。为了便于将场景描述文件交付给客户端,ISO/IEC 23090-14定义了如何将glTF文件和相关数据作为非时变和时变的数据(例如,作为轨道样本)封装在ISOBMFF文件中。MPEG_scene_dynamic、MPEG_mesh_linking、MPEG_animation_timing向显示引擎提供了特定形式的时变数据,显示引擎11应根据这些变化的信息进行相应的操作。ISO/IEC 23090-14还定义了每个扩展的时变数据的格式,以及将其封装在ISOBMFF文件中的方式。MPEF媒体(MPEG_media)允许引用通过RTP/SRTP、MPEG-DASH等协议传递的外部媒体流。为了允许在不知道实际协议方案、主机名或端口值的情况下对媒体流进行寻址,ISO/IEC 23090-14定义了一个新的统一资源定位符(Uniform Resource Locator,URL)方案。该方案要求在查询部分中存在一个流标识符,但没有指定特定类型的标识符,允许使用媒体流标识方案(Media Stream Identification scheme,RFC5888)、标记方案(labeling scheme,RFC4575)或基于0的索引方案。
二、显示引擎
参照图1所示,在沉浸式媒体的场景描述框架的工作流程中,显示引擎11主要作用包括获取场景描述文件,并对获取到的场景描述文件进行解析,以获取待渲染三维场景的组成结构和待渲染三维场景中的细节信息,以及根据解析场景描述文件得到的信息进行待渲染三维场景的渲染与展示。本申请实施例中不对显示引擎11具体工作流程和原理进行限制,以显示引擎11能够解析场景描述文档,并通过媒体接入函数API向媒体接入函数12下达指令,通过缓存API向缓存管理模块13下达指令,以及从缓存中取用经过处理的数据并完成三维场景 及其中物体的渲染与展示为准。
三、媒体接入函数
在沉浸式媒体的场景描述框架的工作流程中,媒体接入函数12可以接收来自显示引擎11的指令,并根据显示引擎11发送的指令完成媒体文件的访问与处理功能。具体包括:获取到媒体文件后,对媒体文件进行处理。不同类型的媒体文件的处理过程存在较大差异,为了实现广泛的媒体类型支持,也考虑到媒体接入函数的工作效率,于是在媒体接入函数中设计了多种管线,在处理过程中启用匹配于媒体类型的管线即可。
管线的输入是从服务器下载的媒体文件或本地存储控件读取的媒体文件,这些媒体文件往往具有较为复杂的结构,无法直接被显示引擎11使用,所以管线的主要功能就是对这种媒体文件的数据进行处理,使媒体文件的数据符合显示引擎11的要求。
在沉浸式媒体的场景描述框架的工作流程中,管线处理完成的媒体数据需要以规范的排列结构交付给显示引擎11使用,这就需要缓存API与缓存管理模块13的参与,缓存API与缓存管理实现了根据经过处理的媒体数据的格式,创建相应的缓存,并负责对缓存的后续管理,比如更新、释放等操作。缓存管理模块13可以通过缓存API与媒体接入函数12进行通信,也可以与显示引擎11进行通信,与显示引擎11和/或媒体接入函数12通信的目标都是实现缓存的管理。当缓存管理模块13与媒体接入函数12进行通信时,需要显示引擎11将缓存管理的相关指令先通过媒体接入函数API发送给媒体接入函数12,媒体接入函数12再通过缓存API将缓存管理的相关指令通过缓存API发送给缓存管理模块13。当缓存管理模块13与显示引擎11通信时,只需要显示引擎11将场景描述文档中解析出来的缓存管理描述信息直接通过缓存API发送给缓存管理模块13即可。
上述实施例介绍场景描述框架渲染包括沉浸式媒体的三维场景的基本流程,以及场景描述框架中各个功能模块或文件的内容及作用。三维场景中的沉浸式媒体可以为基于点云媒体文件、基于三维网格的媒体文件、基于6DoF的媒体文件、MIV媒体文件等。本申请一些实施例涉及基于场景描述框架渲染包括点云的三维场景,因此以下首先对点云相关内容进行说明。
点云是指海量三维点的集合,在获取物体表面每个采样点的空间坐标后,得到的是一个点的集合,称之为点云。点云中的点除了几何坐标,以外还可还包括了一些其它属性信息,比如颜色,法向量,反射率,透明度,材质类型等。点云可以由多种方式得到。在一些实施例中,获取点云的实现方式包括:使用空间中已知固定位置的相机阵列来观察一个物体,并用相机阵列中拍摄得到的二维图像使用一些相关的算法得到物体的三维表示,从而获取物体对应的点云。在另一些实施例中,获取点云的实现方式包括:使用激光雷达扫描设备获取物体对应的点云。激光雷达扫描设备的传感器会记录雷达发出的电磁波被物体表面反射的电磁波,从而得到的物体体积信息,以及根据物体体积信息获取物体对应的点云。在另一些实施例中,获取点云的实现方式还可以包括:通过使用人工智能或计算机视觉算法,根据二维图像创建三维体积信息,从而获取物体对应的点云。
点云为物理世界精细数字化提供了高精度的三维表达方式,广泛应用于三维建模、智慧城市、自主导航系统、增强现实等领域,但由于数据海量、非结构化、密度不均等特点,点云的存储和传输面临巨大的挑战。所以需要对点云进行高效地压缩,目前针对点云的压缩标 准主要有基于几何的点云压缩(Geometry-based Point Cloud Compression,G-PCC)和基于视频的点云压缩(Video-based Point Cloud Compression,V-PCC)两种。以下进一步对G-PCC的原理及相关算法进行说明。
参照图4所示,G-PCC编码器400可以分为几何编码模块41和属性编码模块42两部分,几何编码模块41又可以进一步分为基于八叉树(Octree)的几何编码单元411和基于预测树的几何编码单元412。
如图4所示,G-PCC编码器的几何编码模块41编码待编码点云的几何信息的主要编码步骤包括:S401、提取待编码点云中的几何信息(positions);S402、对几何信息进行坐标转换,使待编码点云全都包含在一个包围盒(bounding box)中;S403、对坐标转换后的几何信息进行体素化。即首先对坐标转换后的几何信息进行量化,以对待编码点云进行缩放。由于量化取整,会使得待编码点云中的部分点的位置相同,因此对坐标转换后的几何信息进行量化还需要根据参数来决定是否移除重复点,而量化和移除重复点这一过程称为体素化过程。完成几何信息的体素化之后,分别通过基于八叉树的几何编码单元411和基于预测树的几何编码单元412进行编码,以得到待编码点云的几何信息码流。
基于八叉树的几何编码单元411的编码流程包括:S404、树划分;包括:按照广度优先遍历(Breath First Search)的顺序不断对包围盒进行树划分(八叉树/四叉树/二叉树),并对每个节点的占位码进行编码。即,将包围盒依次划分得到子立方体,对非空的(包含点云中的点)的子立方体继续进行划分,直到划分得到的叶子结点为1x1x1的单位立方体时停止。其次对叶子节点中所包含的点数进行编码,最终完成几何八叉树的编码,生成二进制码流。S405、基于三角池(trisoup,triangle soup)对几何信息进行表面拟合。表面拟合同样也先进行八叉树划分,但不需要将待编码点云逐级划分到边长为1x1x1的单位立方体,而是划分到子块(block)的边长为预设值时停止划分,然后基于每个子块中点云的分布所形成的表面,得到该表面与子块的十二条边所产生的至多十二个交点(vertex),依次编码每个子块的交点坐标,生成二进制码流。
基于预测树的几何编码单元412的编码流程包括:S406、构建预测树结构。包括:对待编码点云中的点进行排序,排序方式包括:无序、莫顿序、方位角序和径向距离序等方式,以及利用两种不同的方式(高时延慢速方式和低时延快速方式)构建预测树结构。S407、基于预测树的结构,遍历预测树中的每个节点,通过选取不同的预测模式对节点的几何位置信息进行预测得到预测残差,并且利用量化参数对几何预测残差进行量化。S408、算术编码;包括:通过不断迭代,对预测树节点位置信息的预测残差、预测树结构以及量化参数等进行算术编码,生成二进制几何信息码流。
如图4所示,G-PCC编码器属性编码模块42编码待编码点云的属性信息的流程中主要包括:S408、提取待编码点云中的属性信息(attributes);S409、对属性信息进行属性预测;S410、对属性信息进行提升(Lifting)变换;S411、对属性信息进行区域自适应层次变换(Region Adaptive Hierarchical Transform,RAHT)变换;S412、对RAHT变换的系数和提升变换的系数进行量化;S413、对量化后的RAHT变换的系数和提升变换的系数进行算术编码,以得到属性信息码流。此外,由于属性编码模块42是基于重建的几何信息进行处理的,因此在有损几何编码完成后,需要执行如下步骤S414和S415:步S414、根据几何码流重建几何信息, 以及对原始属性信息(attributes)和重建几何信息进行匹配。S415、对几何信息进行重新着色。其中,步骤S415中的重新着色部分就是利用原始点云为重建点云赋予属性信息,目标是为了使重建点云的属性值与待编码点云的属性值尽可能的相似,使误差最小化。
属性预测算法是一种利用三维空间中已经重建的点的重建属性值加权求和得到当前待预测点的预测属性值的算法。属性预测算法可以有效的去除属性空间冗余,从而达到压缩属性信息的目的。在一些实施例中,属性预测的实现方式可以包括:首先通过层次细节(Level of Detail,LOD)算法对待编码点云进行层次划分,建立待编码点云的层级结构。其次,对低层级的点先进行编解码以及利用低层级的点和同一层级已重建的点对高层级的点进行预测,从而实现渐进式编码。其中,通过LOD算法对待编码点云进行层次划分的实现方式可以包括:首先将待编码点云中的所有点均标记为未访问,并将已访问点集表示为V,初始状态时,已访问点集V为空。循环遍历待编码点云中的所有未访问的点,计算当前点到访问点集V的最小距离D,如果D小于阈值距离则忽略当前点,否则将当前点标记为已访问,并添加到已访问点集V和当前子空间中。最后合并各个子空间和各个子空间之前的全部子空间中的点,得到待编码点云的层级结构。
示例性的,参照图5所示,图5中以待编码点云包括点P1~P9为例示出。基于距离的LOD划分过程,在第一次循环遍历时,依次将点P0、P2、P4、P5添加到了已访问点集V和层级R0中,在第二次循环遍历时,依次将点P1、P3、P8添加到了已访问点集V和层级R1,在第三次循环遍历时,完成了全部点的遍历,且依次将点P6、P7、P9添加到了已访问点集V和层级R2,最后合并各个层级和各个层级之前的全部层级中的点,得到待编码点云的层级结构包括三个层级。其中,第一层级为LOD0,包括:点P0、P2、P4、P5;第二层级为LOD1,包括:点P0、P2、P4、P5、P1、P3、P8;第三层级为LOD2,包括P0、P2、P4、P5、P1、P3、P8、P6、P7、P9。
提升变换建立在预测变换之上,包含了分割、预测和更新三个部分。参照图6所示,分割模块61将待编码点云进行空间分割为高层次点云H(N)和低层次点云L(N)两部分,高层次点云H(N)和低层次点云L(N)之间有一定的相关性。预测模块62利用低层次点云L(N)的属性对高层次点云H(N)进行属性预测,得到预测残差为:D(N)=H(N)-P(N),其中,P(N)为预测模块62对低层次点云L(N)进行预测后输出的特征。在分割模块61和预测模块62的过程中,由于LOD划分中的预测策略使得较低层LOD层中的点更有影响力,因此更新模块63基于预测残差D(N)、预测点与其邻近点之间的距离来定义和递归更新每个点的影响权重。其中,递归更新是指多次进行提升变换,且上一次提升变换的输出数据为下一次提升变换的输入数据;基于预测残差D(N)、预测点与其邻近点之间的距离来定义和递归更新每个点的影响权重,包括:基于预测残差D(N)、预测点与其邻近点之间的距离以及公式L′(N)=L(N)+U(N)定义和递归更新每个点的影响权重;U(N)为更新模块63对预测残差D(N)进行预测后输出的特征。
RAHT变换是一种基于哈尔小波变换的分层区域自适应变换算法。基于分层的树结构,在同一个父结点中对占用子节点以自下而上的方式沿着每一个维度进行递归变换,将变换得到的低频系数传递给变换过程的下一级,以及对高频系数进行量化和熵编码。
在一些实施例中,可以通过基于上采样预测的RAHT变换实现上述RAHT变换。在基于上采样预测的RAHT变换中,RHAT变换整体树结构由自下而上改为自上而下,变换仍在2×2×2的块内进行。参照图7所示,在一个2×2×2的块内,变换流程包括:首先在第一方 向上对体素块71进行RAHT变换。若第一方向上存在相邻的体素块,则二者进行RAHT,得到相邻两点属性值的加权平均值(DC系数)与残差(AC系数)。其中,得到的DC系数作为父节点的体素块122的属性信息存在,并进行下一层的RAHT变换;而AC系数保留起来,用于最后的编码。若不存在相邻点,则将该体素块71的属性值直接传递给第二层父节点。第二层RAHT变换时,沿第二方向进行,若第二方向上存在相邻体素块,二者进行RAHT变换,并得到相邻两点属性值的加权平均值(DC系数)与残差(AC系数)。之后,第三层RAHT变换沿第三方向进行,并得到三种颜色深度相间的父节点体素块73作为八叉树中下一层的子节点,再沿第一方向、第二方向、第三方向循环进行RAHT变换,直至整个待编码点云只存在一个父节点为止。
参照图8所示,G-PCC解码器800可以分为几何解码模块81和属性解码模块82,几何编解模块81又可以进一步分为基于八叉树的几何解码单元811以及基于预测树的几何解码单元812。
如图8所示,G-PCC解码器通过几何解码模块81的基于八叉树的几何解码单元811对几何信息码流进行解码的主要步骤包括:S801、算术解码;S802、八叉树合成;S803、表面拟合;S804、重建几何;S805、逆坐标转换的步骤,得到点云的几何信息。其中,基于八叉树的几何解码单元811的几何解码包括:按照广度优先遍历的顺序,通过不断解析得到每个节点的占位码,并且依次不断划分节点,直至划分得到1x1x1的单位立方体时停止划分,解析得到每个叶子节点中包含的点数,最终恢复得到几何重构点云信息。G-PCC解码器通过几何解码模块81的基于预测树的几何解码单元812对几何信息码流进行解码的主要步骤包括:S801、算术解码;S806、重构预测树;S807、残差计算;S804、重建几何;S805、逆坐标转换的步骤,得到点云的几何信息。基于G-PCC解码器800的属性解码模块82进行属性解码主要步骤包括:S808、算术解码;S809、反量化;执行步骤S810和S811,或者执行步骤S812;S810、进行属性预测;S811、提升变换;S812、基于RAHT的逆变换;S813、进行颜色反变换得到点云的属性信息。最后,基于几何信息和属性信息还原待编码的点云数据的三维图像模型。G-PCC解码器基于属性解码模块82对属性信息码流进行解码的主要步骤与G-PCC编码器基于属性编码模块82对属性信息进行编的主要步骤互为逆过程,在此不再赘述。
目前,第一版ISO/IEC 23090-14 MPEG-I场景描述标准所做的扩展已满足了沉浸式场景描述解决方案的关键需求,现在致力于解决与虚拟场景的交互、AR锚定、用户虚拟人的表示、触觉支持以及对沉浸式编解码器的支持等需求。点云是3D环境中一种重要的沉浸式三维媒体形式,因此在场景描述标准中支持点云媒体的表示是场景描述的重要内容,基于几何的点云压缩算法(G-PCC)是目前主流的点云压缩算法之一,在场景描述中支持类型为G-PCC编码点云的媒体文件具有重要意义和价值。
本申请一些实施例提供了一种支持包括由G-PCC压缩标准得到的点云码流在内的场景描述框架,具体内容包括:场景描述文件对类型为G-PCC编码点云的媒体文件的支持、媒体接入函数API对类型为G-PCC编码点云的媒体文件的支持、媒体接入函数对类型为G-PCC编码点云的媒体文件的支持、缓存API对类型为G-PCC编码点云的媒体文件的支持、缓存管理对类型为G-PCC编码点云的媒体文件的支持等内容。
基于场景描述框架渲染三维场景中的类型为G-PCC编码点云的媒体文件的过程包括:首 先,显示引擎通过下载或者本地读取等方式获取场景描述文件。其中,场景描述文件包含了对整个三维场景以及场景中包含的该类型为G-PCC编码点云的媒体文件的描述信息,该类型为G-PCC编码点云的媒体文件的描述信息可能包含了类型为G-PCC编码点云的媒体文件的访问地址、经过处理的类型为G-PCC编码点云的媒体文件的解码数据的存储格式、类型为G-PCC编码点云的媒体文件的播放时间、播放帧率等。显示引擎解析场景描述文件后,通过媒体接入函数API将场景描述中包含的类型为G-PCC编码点云的媒体文件的描述信息传递给媒体接入函数。同时,显示引擎通过缓存API调用缓存管理模块分配缓存,也可将缓存信息传递给媒体接入函数,由媒体接入函数通过缓存API调用缓存管理模块分配缓存。媒体接入函数接收到显示引擎传递的描述信息后,首先向服务器请求下载该类型为G-PCC编码点云的媒体文件,或者从本地文件中读取类型为G-PCC编码点云的媒体文件。在获取类型为G-PCC编码点云的媒体文件之后,媒体接入函数创建并启动对应的管线对类型为G-PCC编码点云的媒体文件进行处理。管线的输入是类型为G-PCC编码点云的媒体文件的封装文件,管线依次进行解封装、G-PCC解码、后处理等过程,随后将处理完毕的数据存入指定的缓存。最终,显示引擎从指定缓存中获取类型为G-PCC编码点云的媒体文件的解码数据,以及根据缓存中获取的数据进行三维场景的渲染和显示。
以下分别对支持类型为G-PCC编码点云的媒体文件的场景描述文件、媒体接入函数API、媒体接入函数、缓存API、缓存管理进行说明。
一、支持类型为G-PCC编码点云的媒体文件的场景描述文件
为了使场景描述文件能够正确地描述类型为G-PCC编码点云的媒体文件,本申请一些实施例对场景描述文件的MPEG媒体(MPEG_media)内的语法元素的取值进行了扩展,具体扩展包括以下至少一项:
扩展1、对场景描述文件的MPEG媒体(MPEG_media)的媒体列表(media)的可选项(MPEG_media.media.alternatives)中用于声明媒体文件的封装格式的媒体类型语法元素(MPEG_media.media.alternatives.mimeType)进行了扩展。对媒体类型语法元素(mimeType)的扩展包括:为媒体类型语法元素(mimeType)扩展与G-PCC编码点云相关联的值"application/mp4"。当媒体文件的类型为G-PCC编码点云时,使媒体类型语法元素(mimeType)的取值为"application/mp4"。例如:mimeType:application/mp4。
扩展2、对场景描述文件的MPEG媒体(MPEG_media)的媒体列表(media)的可选项的轨道数组(MPEG_media.media.alternatives.tracks)中用于声明媒体文件的轨道信息的第一轨道索引语法元素(MPEG_media.media.alternatives.tracks.track)的值进行扩展。对第一轨道索引语法元素(MPEG_media.media.alternatives.tracks.track)的扩展包括:当G-PCC数据作为MPEG媒体的媒体列表的可选项的轨道数组中的一项被场景描述文件引用并且被引用的项符合国际标准化组织基本媒体文件格式(International Standardization Organization Base Media File Format,ISOBMFF)中关于track的规定时:对于单轨道封装的G-PCC数据,在MPEG媒体中引用的轨道为G-PCC码流轨道,对于多轨道封装的G-PCC数据,在MPEG媒体中引用的轨道为G-PCC几何码流轨道。
扩展3、对场景描述文件的MPEG媒体(MPEG_media)的媒体列表(media)的可选项(alternatives)的轨道数组(tracks)中用于说明码流轨道中包含的媒体数据的编解码参数的 编解码参数语法元素(MPEG_media.media.alternatives.tracks.codecs)进行扩展。具体扩展包括:扩展在IETFRFC 6381中定义的,包含在码流轨道中的媒体文件的编解码参数。当码流轨道包含多个不同类型的编解码参数(例如:采用DASH封装G-PCC编码点云时,AdaptationSet包含具有不同编解码器的表示)时,编解码参数语法元素(codecs)可以用逗号分隔的编解码器值列表来表示,因此对语法元素编解码参数语法元素(codecs)的值的扩展包括:当媒体文件的类型为G-PCC编码点云时,编解码参数语法元素(codecs)的值应遵循ISO/IEC 23090-18 G-PCC数据传输(Carriage of Geometry-based Point Cloud Compression Data)标准中的规定进行设置。例如:在G-PCC数据采用DASH封装时,当媒体展示描述(Media Presentation Description,MPD)文件中使用G-PCC预选信令时,预选信令的"codecs"属性应设置为'gpc1',表示预选媒体是基于几何的点云;当G-PCC容器中存在多个G-PCC Tile轨道时,Main G-PCC Adaptation Set的"codecs"属性应被设置为'gpcb'或'gpeb',表示适配集包含G-PCC Tile基本轨道数据。当Tile Component Adaptation Sets只向单个G-PCC组件数据发送信号时,Main G-PCC adaptivesset的"codecs"属性应被设置为'gpcb'。当Tile Component Adaptation Sets向所有G-PCC组件数据发送信号时,Main G-PCC Adaptation Set的"codecs"属性应被设置为'gpeb'。当MPD文件中使用G-PCC Tile预选信令时,预选信令的"codecs"属性应设置为'gpt1',表示预选媒体是基于几何的点云碎片。
综上所述,为了使场景描述文件能够正确地描述类型为G-PCC编码点云的媒体文件,本申请一些实施对场景描述文件中的MPEG媒体(MPEG_media)内的语法元素的值进行了扩展,具体扩展包括扩展如下表12所示中的一项或多项:
表12
对场景描述文件中MPEG媒体(MPEG_media)内语法元素取值执行上述扩展1~扩展3中的至少一项,从而使场景描述文件中的MPEG媒体(MPEG_media)部分支持类型为G-PCC编码点云的媒体文件。
在一些实施例中,包含类型为G-PCC编码点云的媒体文件的场景描述文件中的场景与节点的描述方法,包括:当三维场景中包含类型为G-PCC编码点云的媒体文件时,使用场景和节点的描述方法对三维场景的总体结构以及类型为G-PCC编码点云的媒体文件在三维场景中所处的结构层次和位置进行描述。其中,使用场景描述模块和节点描述模块的描述方法对三维场景的总体结构以及类型为G-PCC编码点云的媒体文件在三维场景中所处的结构层次和位置进行描述,包括:一个三维场景使用一个场景描述模块进行描述。每个场景描述文件可以描述一个或多个三维场景,三维场景之间只能为并列关系,不能为层级关系。节点之间可以是并列关系或是层级关系
在一些实施例中,支持类型为G-PCC编码点云的媒体文件的场景描述文件中的三维网格的描述方法,包括:复用网格描述模块的基元的属性(mesh.primitives.attributes)中的语法元素描述类型为G-PCC编码点云的媒体文件的各个种类的数据。具体的,由于点云是一种散点化的数据结构,众多散点集合起来就是点云,因此描述一个类型为G-PCC编码点云的媒体文件便相当于描述点云中的每个点上具有的数据。一般来说,类型为G-PCC编码点云的媒体文件中的每个点具有几何信息和属性信息两类信息,几何信息代表了点在空间中的三维坐标,属性信息代表了附着在点上的颜色、反射率、法线方向等信息。由于类型为G-PCC编码点云的媒体文件的点上具有的数据与网格描述模块的基元的属性中所包含的语法元素可以声明的属性是类似的,所以在网格描述模块(mesh)中描述类型为G-PCC编码点云的媒体文件中的点上具有的数据时,可以复用网格描述模块(mesh)的基元(primitives)的属性(mesh.primitives.attribute)中的语法元素描述类型为G-PCC编码点云的媒体文件中的点上具有的数据。
例如:网格描述模块的基元的属性中的位置语法元素(position,上表1中的第一个表项)的值是一个由浮点数构成的三维向量,这样的数据结构同样可以表示G-PCC编码点云的几何信息,因此复用网格描述模块的基元的属性(mesh.primitives.attribute)中的位置语法元素(position)表示类型为G-PCC编码点云的媒体文件中的点上的几何信息。再例如:类型为G-PCC编码点云的媒体文件中的点的颜色值也可以通过复用网格描述模块的基元的属性(mesh.primitives.attribute)中的颜色语法元素(color_n,上表1中的第五个表项)进行表示。再例如:类型为G-PCC编码点云的媒体文件中的点的法向量也可以通过复用网格描述模块的基元的属性(mesh.primitives.attribute)中的法向量语法元素(normal,上表1中的第三个表项)进行表示。
将其中,所述第一语法元素集合为ISO/IEC 23090-14 MPEG-I场景描述标准中规定的场景描述文件的网格描述模块的基元的属性中支持的语法元素组成的集合定义为第一语法元素集合,则支持类型为G-PCC编码点云的媒体文件的三维网格的描述方法,包括:基于第一语法元素集合中的语法元素,在三维网格对应的网格描述模块的基元的属性中添加三维网格具有的各个种类的数据对应的语法元素。如下表13所示,表13列出了类型为G-PCC编码点云的媒体文件中的点上的部分数据复用网格描述模块的基元的属性(mesh.primitives.attribute)中的语法元素进行描述的方法:
表13
需要说明的是,上述实施例及表13仅列出了部分G-PCC编码点云的数据复用网格描述模块的基元的属性中的语法元素进行描述的方法,G-PCC编码点云数据还可以包括其它数据, G-PCC编码点云的其它数据也可复用网格描述模块的基元的属性中的语法元素进行描述。例如:纹理坐标(texcoord_n)、关节(joints_n)、权重(weights_n)等。
在另一些实施例中,支持类型为G-PCC编码点云的媒体文件的三维网格描述方法,包括:在网格描述模块的基元的扩展列表(mesh.primitives.extensions)内添加目标扩展数组,并在目标扩展数组中添加类型为G-PCC编码点云的媒体文件中的三维网格的所包含的各个种类的数据对应的语法元素,并分别通过各个种类的数据对应的语法元素描述类型为G-PCC编码点云的媒体文件中的三维网格的每个顶点关联的几何信息、颜色数据以及法向量等数据。
在一些实施例中,在所述目标扩展数组中添加对应的三维网格所包含的各个种类的数据对应的语法元素,包括:基于第一语法元素集合中的语法元素,在所述目标扩展数组中添加对应的三维网格所包含的各个种类的数据对应的语法元素。其中,所述第一语法元素集合为ISO/IEC 23090-14 MPEG-I场景描述标准中规定的场景描述文件的网格描述模块的基元的属性中支持的语法元素组成的集合。
在一些实施例中,在所述目标扩展数组中添加对应的三维网格所包含的各个种类的数据对应的语法元素,包括:基于预设置的G-PCC编码点云对应的语法元素组成的第二语法元素集合,在所述目标扩展数组中添加对应的三维网格所包含的各个种类的数据对应的语法元素。
将用于表示每个顶点关联的几何信息的语法元素定义为第一语法元素,用于表示每个顶点关联的颜色数据的语法元素定义为第二语法元素,用于表示每个顶点关联的法向量的语法元素定义为第三语法元素,则如下表14所示,部分网格描述模块的基元的扩展列表(mesh.primitives.extensions)的目标扩展数组中添加的语法元素包括:
表14
参照图9所示,图9为基于上述实施例对网格描述模块的基元的扩展列表(mesh.primitives.extensions)内添加目标扩展数组,并在目标扩展数组内扩展第一语法元素、第二语法元素以及第三语法元素中后的场景描述文件的结构示意图。场景描述文件中包括但不限于以下模块:MPEG媒体(MPEG_media)901,场景描述模块(scene)902、节点描述模块(node)903、网格描述模块(mesh)904、访问器描述模块(accessor)905、缓存切片描述模块(bufferView)906、缓存器描述模块(buffer)907、蒙皮描述模块(skin)908、动画描述模块(animation)909、相机描述模块(camera)910、材质描述模块(material)911、纹理描述模块(texture)912、采样器描述模块(sampler)913以及纹理贴图描述模块(image)914。其中,网格描述模块904的基元的属性的扩展列表内包括目标扩展数组9000,目标扩展数组 9000内扩展的语法元素包括:用于表示每个顶点关联的几何信息的第一语法元素9001、用于表示每个顶点关联的颜色数据的第二语法元素9002以及用于表示每个顶点关联的法向量的第三语法元素9003。除上述扩展以外,图9所示场景描述文件中的其它元素的作用、访问器类型、数据类型等信息与在图3所示场景描述文件中类似,在此不再赘述。
在一些实施例中,支持类型为G-PCC编码点云的媒体文件的网格描述方法,包括:预配置G-PCC编码点云的各个种类的数据对应的语法元素,并基于预配置的G-PCC编码点云的各个种类的数据对应的语法元素,在所述G-PCC编码点云中的三维网格对应的网格描述模块的基元的属性中添加各个种类的数据对应的语法元素。
示例性的,预配置的G-PCC编码点云的各个种类的数据对应的语法元素包括:用于表示每个顶点关联的几何信息的第四语法元素,用于表示每个顶点关联的颜色数据的第五语法元素,用于表示每个顶点关联的法向量的第六语法元素,则基于预配置的G-PCC编码点云的各个种类的数据对应的语法元素,在所述G-PCC编码点云中的三维网格对应的网格描述模块的基元的属性中添加各个种类的数据对应的语法元素,包括:在所述G-PCC编码点云中的三维网格对应的网格描述模块的基元的属性中添加第四语法元素、第五语法元素以及第六语法元素中的至少一个。
将G-PCC编码点云对应的用于表示每个顶点关联的几何信息的语法元素定义为第四语法元素,G-PCC编码点云对应的用于表示每个顶点关联的颜色数据的语法元素定义为第五语法元素,G-PCC编码点云对应的用于表示每个顶点关联的法向量的语法元素定义为第六语法元素,则如下表15所示,部分网格描述模块的基元(primitives)的属性(attribute)中的语法元素的描述方法包括:
表15
参照图10所示,图10为基于上述对上述实施例对网格描述模块的基元的属性(mesh.primitives.attribute)内的语法元素进行扩展后的场景描述文件的结构示意图。场景描述文件中包括但不限于以下模块:MPEG媒体(MPEG_media)101,场景描述模块(scene)102、节点描述模块(node)102、网格描述模块(mesh)104、访问器描述模块(accessor)105、缓存切片描述模块(bufferView)106、缓存器描述模块(buffer)107、蒙皮描述模块(skin)108、动画描述模块(animation)109、相机描述模块(camera)110、材质描述模块(material)111、纹理描述模块(texture)112、采样器描述模块(sampler)113以及纹理贴图描述模块(image)114。其中,网格描述模块104的基元的属性(mesh.primitives.attribute)包括:扩展的用于表 示每个顶点关联的几何信息的第四语法元素1041、用于表示每个顶点关联的颜色数据的第五语法元素1042以及用于表示每个顶点关联的法向量的第五语法元素1043。除上述扩展以外,图10所示场景描述文件中的其它元素的作用、访问器类型、数据类型等信息与在图3所示场景描述文件中类似,在此不再赘述。
还需要说明的是,当场景描述文件对包含类型为G-PCC编码点云的媒体文件的三维场景进行描述时,无论是复用网格描述模块的基元的属性中的语法元素对G-PCC编码点云数据进行描述,还是在网格描述模块的基元添加目标扩展数组或在网格描述模块的基元的属性中扩展新的语法元素对类型为G-PCC编码点云的媒体文件进行描述,网格描述模块(mesh)中都会包含大量G-PCC编码点云中的点,每个点又至少包含了几何信息和属性信息,因此不便将类型为G-PCC编码点云的媒体文件的数据直接存储在场景描述框架中,而是在场景描述框架中指出类型为G-PCC编码点云的媒体文件的链接,当需要取用G-PCC编码点云的数据时再进行媒体文件下载。
在一些实施例中,也可以将场景描述文件与类型为G-PCC编码点云的媒体文件融合起来,形成一个二进制文件,以减少文件的种类与数目。
在一些实施例中,支持类型为G-PCC编码点云的媒体文件的访问器描述模块(accessor)、缓存切片描述模块(bufferView)、缓存器描述模块(buffer)的描述方法,包括:通过缓存器描述模块(buffer)的MPEG环形缓存器(MPEG_buffer_circular)的媒体索引语法元素(media)声明的索引值指向MPEG媒体(MPEG_media)中类型为G-PCC编码点云的媒体文件对应的媒体描述模块。
即,类型为G-PCC编码点云的媒体文件需要在缓存器描述模块(buffer)内指定,但并非在缓存器描述模块中直接添加类型为G-PCC编码点云的媒体文件的统一资源定位符(Uniform Resource Locator,URL),而是通过缓存器描述模块(buffer)中的MPEG环形缓存器(MPEG_buffer_circular)中的媒体索引语法元素(media)的值指向MPEG媒体(MPEG_media)中类型为G-PCC编码点云的媒体文件对应的媒体描述模块。
示例性的,MPEG媒体(MPEG_media)的媒体列表(media)中类型为G-PCC编码点云的媒体文件对应的媒体描述模块的可选项中的统一资源标识符语法元素(uri)的值为:"http://www.example.com/G-PCCexample.mp4",且为MPEG媒体中的第一个媒体描述模块,则可以设置MPEG环形缓存器(MPEG_buffer_circular)的媒体索引语法元素(media)的值为"0",从而在缓存器描述模块中的MPEG环形缓存器中索引MPEG媒体中的第一个媒体文件的链接,以通过缓存器描述模块(buffer)的MPEG环形缓存器(MPEG_buffer_circular.media)中的媒体索引语法元素(media)索引MPEG媒体(MPEG_media)中类型为G-PCC编码点云的媒体文件对应的媒体描述模块。
在一些实施例中,支持类型为G-PCC编码点云的媒体文件的访问器(accessor)、缓存切片(bufferView)、缓存器(buffer)描述方法,包括:通过缓存器描述模块(buffer)的MPEG环形缓存器(MPEG_buffer_circular)的轨道数组(tracks)的第二轨道索引语法元素(track)的值缓存器所缓存的数据的轨道信息。
在glTF2.0的基础上,MPEG提出的场景描述技术中提出了名称为MPEG环形缓存器(MPEG_buffer_circular)的扩展。MPEG环形缓存器用于在保证数据缓存的前提下减少需要 使用的缓存器的数量。MPEG环形缓存器可视为将普通缓存器的头尾相接,形成了一个环,而将缓存写入环形缓存器和读取环形缓存器中的数据则依靠写入指针和读取指针,实现了写入和读取同时进行的工作过程。MPEG环形缓存器(MPEG_buffer_circular)中包含的语法元素如表16所示:
表16
即,基于表16中的语法元素"media"的值的设置规则,使表16中的媒体索引语法元素(media)的取值为MPEG媒体(MPEG_media)中声明的类型为G-PCC编码点云的媒体文件对应的媒体描述模块的索引值,即可在缓存器描述模块(buffer)内对类型为G-PCC编码点云的媒体文件进行索引,基于表16中的轨道索引语法元素(tracks)的值的设置规则,使表16中的轨道索引语法元素(tracks)的取值为类型为G-PCC编码点云的媒体文件的一个或多个码流轨道的索引值,即可在对应的缓存器中缓存该一个或多个码流轨道的解码数据。
在一些实施例中,支持类型为G-PCC编码点云的媒体文件的材质(material)、纹理(texture)、采样器(sampler)以及纹理贴图(image)描述方法,包括:当场景描述文件用于对G-PCC编码点云的三维场景进行描述时,不使用材质(material)、纹理(texture)、采样器(sampler)以及纹理贴图(image)对三维场景进行描述。
因为G-PCC编码点云为散点化的拓扑结构,实际上并不具有面的概念,各种附加信息也都是直接在点上表示出来的,而material、texture、sampler以及image都是针对面的附件信息,因此仅保留material、texture、sampler以及image的定义,但不使用material、texture、sampler以及image对三维场景进行描述。
在一些实施例中,支持类型为G-PCC编码点云的媒体文件的相机描述模块(camera)描述方法,包括:通过相机描述模块定义三维场景中的节点的视点、视角等与观看相关的视觉信息。
在一些实施例中,支持类型为G-PCC编码点云的媒体文件的动画描述模块(animation)描述方法,包括:通过动画描述模块(animation)为三维场景中的节点描述模块(node)添加的动画。
在一些实施例中,动画描述模块可以通过位置移动、角度旋转、大小缩放中的一种或多种对为节点描述模块(node)添加的动画进行描述。
在一些实施例中,动画描述模块还可以指示为节点描述模块(node)添加的动画的开始时 间、结束时间以及动画的实现方式中的至少一项。
即,在支持类型为G-PCC编码点云的媒体文件的场景描述文件中,同样可以为代表三维物体中的物体的节点增添动画。动画(animation)则通过位置移动、角度旋转、大小缩放三种方式为节点增添的动画进行描述,同时也可以规定动画的开始、结束时间以及动画的实现方式。
在一些实施例中,支持类型为G-PCC编码点云的媒体文件的蒙皮描述模块的(skin)的描述方法,包括:通过蒙皮描述模块(skin)定义节点描述模块(node)中的网格(mesh)与对应的骨骼之间的运动和形变关系。
基于上述实施例对场景描述文件中的动态图像专家组媒体(MPEG_media),场景描述模块(scene)、节点描述模块(node)、网格描述模块(mesh)、访问器描述模块(accessor)、缓存切片描述模块(bufferView)、缓存器描述模块(buffer)、蒙皮描述模块(skin)、动画描述模块(animation)、相机描述模块(camera)、材质描述模块(material)、纹理描述模块(texture)、采样器描述模块(sampler)以及纹理贴图描述模块(image)的改进和扩展,场景描述文件已能够正确地描述类型为G-PCC编码点云的媒体文件。
示例性的,以下结合一个具体的场景描述文件对本申请实施例提供的支持类型为G-PCC编码点云的媒体文件的场景描述文件进行说明。


上述示例中的第1行与第118行的一对大括号内包含了支持类型为G-PCC编码点云的媒体文件的场景描述文件的主要内容,支持类型为G-PCC编码点云的媒体文件的场景描述文件包括:数字资产描述模块(asset)、使用扩展描述模块(extensionUsed)、MPEG媒体(MPEG_media)、场景声明(scene)、场景列表(scenes)、节点列表(nodes)、网格列表(meshes)、访问器列表(accessors)、缓存切片列表(bufferViews)、缓存器列表(buffers)。以下分别对各个部分的内容以及在解析角度各个列表所包含的信息进行说明。
1、数字资产描述模块(asset):数字资产描述模块为第2~4行。由数字资产描述模块第3行的"version":"2.0",可以确定该场景描述文件是基于glTF2.0版本编写的,该版本也是场景描 述标准的参考版本。从解析角度看,显示引擎可以根据数字资产描述模块确定应该选取哪种解析器来解析该场景描述文件。
2、使用扩展描述模块(extensionUsed):使用扩展描述模块为第6~10行。由于使用扩展描述模块包括:MPEG媒体(MPEG_media)、MPEG环形缓存器(MPEG_buffer_circular)以及MPEG时变访问器(MPEG_accessor_timed)三个语法元素,因此可以确定该场景描述文件中使用了MPEG媒体、MPEG环形缓存器、MPEG时变访问器三种MPEG扩展。从解析角度看,显示引擎可以根据使用扩展描述模块的内容提前获知了后续解析所涉及的扩展项目包括:MPEG媒体、MPEG环形缓存器、MPEG时变访问器。
3、MPEG媒体(MPEG_media):MPEG媒体为第12~34行。MPEG媒体实现了对三维场景中包含的类型为G-PCC编码点云的媒体文件的声明,并通过第21行的媒体类型语法元素及其值"mimeType":"application/mp4"指出了包括类型为G-PCC编码点云的媒体文件的媒体文件的封装格式,通过第22行的"uri":"http://www.exp.com/G-PCCexp.mp4"指出了类型为G-PCC编码点云的媒体文件的访问地址,通过第25行的"track":"trackIndex=1"指出了类型为G-PCC编码点云的媒体文件的轨道信息,通过第26行的"codecs":"gpc1"指出了类型为G-PCC编码点云的媒体文件的编解码参数,通过第16行的"name":"G-PCCexample"指出了类型为G-PCC编码点云的媒体文件的名称,通过第17行的"autoplay":true指出了类型为G-PCC编码点云的媒体文件应该自动播放,通过第18行的"loop":true指出了类型为G-PCC编码点云的媒体文件应该循环播放。从解析角度看,显示引擎通过解析MPEG媒体可以确定待渲染三维场景中存在一个类型为G-PCC编码点云的媒体文件,并且获知了访问及解析该类型为G-PCC编码点云的媒体文件的方法。
4、场景声明(scene):场景声明为第36行。因为一个场景描述文件在理论上可以包括多个三维场景,所以上述场景描述文件中首先通过第36行的场景声明及其"scene":0,指出了基于该场景描述文件,后续处理和渲染的三维场景为场景列表中的第一个三维场景,即第39~43行的大括号所囊括的三维场景。
5、场景列表(scenes):场景列表为第38~44行。场景列表中仅包含了一个大括号,说明场景列表中仅包括一个场景描述模块,该场景描述文件中仅包含了一个三维场景,在该大括号内通过第40~42行的"nodes":[0]指出了该三维场景中仅包括一个节点,且该节点对应的节点描述模块的索引值为0。从解析角度看,场景列表的内容明确了整个场景描述框架应该选取场景列表中的第一个三维场景(索引为0的三维场景)进行后续处理和渲染,明确了三维场景的总体结构,并且指向了下一层更详细的节点描述模块(node)。
6、节点列表(nodes):节点列表为第46~51行。节点列表中仅包含了一个大括号,说明节点列表中仅包括一个节点描述模块,该三维场景中只具有一个节点,且该节点与场景描述模块中节点描述模块的索引值为0的节点是同一个节点,两者通过索引的方式实现关联。在表示该节点的大括号中,通过第48行的"name":"G-PCCexample_node"指出了节点的名称为"G-PCCexample_node",通过第49行的"mesh":0指出了节点上所挂载的内容为网格列表中的第一个网格描述模块对应的三维网格,这与下一层的网格描述模块是对应的。从解析角度看,该节点列表的内容指出了节点上所挂载的内容是一个三维网格,以及该三维网格为网格列表中的第一个网格描述模块对应的三维网格。
7、网格列表(meshes):网格列表为第53~66行,网格列表中仅包含了一个大括号,说明该网格列表中仅包括一个网格描述模块,该三维场景中只具有一个三维网格,且该三维网格与节点描述模块中的索引值为0的三维网格是同一三维网格。在描述该三维网格的大括号(网格描述模块)中,通过第55行的"name":"G-PCCexample_mesh"指出了该三维网格的名称为"G-PCCexample_mesh",该名称仅用作辨识标记。通过第56行的"primitives"指出了该三维网格具有基元(primitives)。分别通过第58行的"attributes"和第62行的"mode",指出了基元存在存在属性(attribute)和模式(mode)两类信息,分别通过第59行的"position"和第60行的"color_0",指出了三维网格具有几何坐标和颜色数据;分别通过第59行的"position":0和第60行的"color_0":1,指示几何坐标对应的访问器为访问器列表中的第一个访问器描述模块对应的访问器,颜色数据对应的访问器为访问器列表中的第二个访问器描述模块对应的访问器。此外,还可以通过62"mode":0还可以确定该三维网格的拓扑为散点结构。从解析角度看,该网格列表明确了场景描述文件中的三维网格具有的实际数据种类和三维网格的拓扑类型。
8、缓存器列表(buffers):缓存器列表为第106~117行。缓存器列表中仅包含了一个大括号,说明了该场景描述文件中仅包括一个缓存器描述模块,该三维场景的显示仅需要访问一个媒体文件。在该大括号中,使用了MPEG环形缓存器(MPEG_buffer_circular)这一扩展,说明了该缓存器是一个使用MPEG扩展进行改造的环形缓存器。112行的"media:0说明了该环形缓存器中的数据来源是前文中MPEG媒体内声明的第一个媒体描述模块对应的媒体文件,113行的"tracks":"#trackIndex=1"说明了访问媒体文件时应该参考索引值为1的轨道,在这里不对索引为1的轨道做限定,它可以是单轨道封装的类型为G-PCC编码点云的媒体文件的唯一轨道,也可以是多轨道封装的类型为G-PCC编码点云的媒体文件的几何码流轨道。此外,根据MPEG环形缓存器中的语法元素"count":5,还可以确定MPEG环形缓存器具有五个存储环节,根据MPEG环形缓存器中的语法元素"byteLength":15000,还可以确定MPEG环形缓存器的字节长度(容量)为15000字节。从解析角度看,缓存器列表实现了将MPEG媒体中声明的类型为G-PCC编码点云的媒体文件对应到缓存器,或者说实现了缓存器对此前仅声明但未使用的类型为G-PCC编码点云的媒体文件进行了引用。需要说明的是,在该处引用的类型为G-PCC编码点云的媒体文件是未经处理的G-PCC封装文件,G-PCC封装文件需要经过媒体访问函数的处理,才能够提取出网格描述模块中提到过的位置坐标(position)和颜色值(color_0)这样的可直接用于渲染的信息。
9、缓存切片列表(bufferViews):缓存切片列表为第93~104行。缓存切片列中包含了两个并列的大括号,结合缓存器描述模块确定的仅有一个缓存器,说明用于存储类型为G-PCC编码点云的媒体文件的缓存器被分为了两个缓存切片,类型为G-PCC编码点云的媒体文件中的点云数据存储在两个缓存切片中。在第一个大括号(第一个缓存切片描述模块)中,首先通过95行的buffer:0指向了索引为0的缓存器描述模块,即缓存器列表中提到的唯一缓存器描述模块,然后通过96和94行的字节长度(byteLength)和字节偏移(byteOffset)两个参数限定了对应的缓存切片的数据切片范围为前12000个字节。第二个大括号(第二个缓存切片描述模块)中的内容与第一个大括号是类似的,只是将数据切片范围定义为后3000个字节。从解析角度看,缓存切片列表将类型为G-PCC编码点云的媒体文件中的点云数据进行了分组,有利于后续访问器描述模块的细化定义。
10、访问器列表(accessors):访问器列表为第68~91行。访问器列表与缓存切片列表的结构类似,都包含了两个并列的大括号,说明访问器列表包括两个访问器描述模块,该三维场景的显示需要通过两个访问器进行媒体数据的访问。此外,两个大括号(访问器描述模块)中都具有MPEG时变访问器(MPEG_accessor_timed)这一扩展,说明这两个访问器指向的都是MPEG定义的时变媒体。在第一个大括号中,MPEG时变访问器中的内容指向了索引值为0的缓存切片描述模块。在第一个大括号(第一个访问器描述模块)中,还通过70行的"componentType":5126和71行的"type":"VEC3"说明了,该访问器中存储的数据格式是32位浮点数构成的三维向量,"count":1000说明了需要通过这样格式的访问器访问的数据有1000个,每个32位浮点数占用4个字节,因此该访问器描述模块对应的访问器包含了12000字节的数据,这与索引值为0的缓存切片描述模块中的设定是相应的。第二个大括号(第二个访问器描述模块)中的内容也是类似的,将缓存切片描述模块的索引值更换为了1,且重新定义了数据类型。从解析角度看,访问器列表(accessors)完善了对渲染所需数据的完全定义,比如在缓存切片描述模块和缓存器描述模块中缺少的数据类型就在对应的访问器描述模块中进行了定义。
二、支持类型为G-PCC编码点云的媒体文件的显示引擎
在沉浸式媒体的场景描述框架的工作流程中,显示引擎的主要功能支持类型为G-PCC编码点云的媒体文件的显示引擎的功能,与前文描述的在沉浸式媒体的场景描述框架的工作流程中显示引擎的主要功能类似,包括:1、能够解析类型为G-PCC编码点云的媒体文件的场景描述文件,获取相应的三维场景的渲染方法;2、能够通过媒体接入函数API与媒体接入函数传递媒体访问指令或媒体数据处理指令;其中,媒体访问指令或媒体数据处理指令来自于对类型为G-PCC编码点云的媒体文件的场景描述文件的解析结果;3、能够通过缓存API向缓存管理模块发送缓存管理指令;4、能够从缓存中取用经过处理的G-PCC编码点云数据,并根据读取的数据完成三维场景及三维场景中物体的渲染与展示。说明的是,处理过程的细节这里不做详细展开。
三、支持类型为G-PCC编码点云的媒体文件的媒体接入函数API
在沉浸式媒体的场景描述框架的工作流程中,显示引擎可以通过解析场景描述文件获取渲染包括类型为G-PCC媒体文件的媒体文件的三维场景的方法,且需要将渲染三维场景的方法传递给媒体接入函数或基于渲染三维场景的方法向媒体接入函数发送指令,而将渲染三维场景的方法传递给媒体接入函数或基于渲染三维场景的方法向媒体接入函数发送指令的过程就是通过媒体接入函数API实现的。
在一些实施例中,显示引擎可以通过媒体接入函数API向媒体接入函数发送媒体访问指令或媒体数据处理指令。其中,显示引擎通过媒体接入函数API向媒体接入函数发送的媒体访问指令或媒体数据处理指令来自于对类型为G-PCC编码点云的媒体文件的场景描述文件的解析结果,媒体访问指令或媒体数据处理指令可以包括:类型为G-PCC编码点云的媒体文件的索引、类型为G-PCC编码点云的媒体文件的URL、类型为G-PCC编码点云的媒体文件的属性信息、类型为G-PCC编码点云的媒体文件的展示时间窗口、对经过处理的类型为G-PCC编码点云的媒体文件的格式要求等。
在一些实施例中,媒体接入函数也可以通过媒体接入函数API向显示引擎请求媒体访问 指令或媒体数据处理指令。
四、支持类型为G-PCC编码点云的媒体文件的媒体接入函数
在沉浸式媒体的场景描述框架的工作流程中,媒体接入函数接收到显示引擎通过媒体接入函数API下发的媒体访问指令或媒体数据处理指令后,会执行显示引擎通过媒体接入函数API下发的媒体访问指令或媒体数据处理指令。例如:获取类型为G-PCC编码点云的媒体文件、为类型为G-PCC编码点云的媒体文件建立合适的管线、为经过处理的类型为G-PCC编码点云的媒体文件分配合适的缓存等。
在一些实施例中,媒体接入函数获取类型为G-PCC编码点云的媒体文件,包括:使用网络传输服务从服务器中下载类型为G-PCC编码点云的媒体文件。
在一些实施例中,媒体接入函数获取类型为G-PCC编码点云的媒体文件,包括:从本地存储空间中读取类型为G-PCC编码点云的媒体文件。
媒体接入函数获取类型为G-PCC编码点云的媒体文件后,需要对类型为G-PCC编码点云的媒体文件进行处理。不同类型的媒体文件的处理过程存在较大差异,为了实现广泛的媒体类型支持,也考虑到媒体接入函数的工作效率,于是在媒体接入函数中设计了多种管线,在处理媒体文件的过程中只启用匹配于媒体类型的管线即可。当媒体文件为类型为G-PCC编码点云的媒体文件时,媒体接入函数需要为类型为G-PCC编码点云的媒体文件建立相应的管线,并通过建立的管线对类型为G-PCC编码点云的媒体文件进行解封装、G-PCC解码、后处理等过程,以完成对类型为G-PCC编码点云的媒体文件的处理,将类型为G-PCC编码点云的媒体文件数据处理为可被显示引擎用于直接渲染的数据格式。
参照图11所示,图11为本申请一些实施例中G-PCC编码点云对应的管线的结构示意图。如图11所示,支持类型为G-PCC编码点云的媒体文件的管线1100包括:输入模块111、解封装模块112、几何解码器113、属性解码器114、第一后处理模块115、第二后处理模块116。
输入模块111用于接收G-PCC封装文件,并将G-PCC封装文件输入解封装模块112中。其中,G-PCC封装文件是对点云数据进行G-PCC编码得到的G-PCC码流进行封装得到的文件。由于G-PCC封装文件是以轨道的形式呈现的,所以输入模块111接收到的是G-PCC封装文件的轨道码流。此外,由G-PCC码流的封装规则可知,G-PCC封装文件可以是单轨道的,也可以是多轨道的,因此本申请实施例中输入模块111接收到的G-PCC封装文件可以是单轨道的,也可以是多轨道的,本申请实施例对此不做限定。
解封装模块112用于对输入模块111输入的G-PCC封装文件进行解封装获取G-PCC码流(包括几何信息码流和属性信息码流,将几何信息码流输入几何解码器113,以及将属性信息码流输入属性解码器114。需要说明的是,随着相关技术的发展,G-PCC码流还可能会增加其它信息的码流,当G-PCC码流还包括其它信息的码流时,解封装模块112对会G-PCC封装文件进行解封装获取其它信息的码流,并将其它信息的码流输入对应的解码器。
几何解码器113用于对解封装模块112输出的几何信息码流进行解码,获取点云的几何信息。其中,几何解码器113对几何信息码流进行解码的主要步骤包括:通过算术解码、八叉树合成、表面拟合、重建几何、逆坐标转换等,得到点云的几何信息。几何解码器113对几何信息码流进行解码的具体实现可以参照图8中几何解码模块81的工作流程,在此不再详细说明。
属性解码器114用于对解封装模块112输入的属性信息码流进行解码,获取点云的属性信 息。其中,属性解码器114对几何信息码流进行解码的主要步骤包括:属性预测、提升和RAHT变换的逆运算等,得到属性信息码流。属性解码器114对属性信息码流进行解码的具体实现可以参照图8中属性解码模块82的工作流程,在此不再详细说明。
第一后处理模块115,用于对几何解码器113输出的几何信息进行处理。在完成几何信息码流的解码后,可以得到G-PCC编码点云中的点的几何信息,且在一些情况下得到的几何信息已能够被显示引擎直接使用的,但是由于场景描述框架不对显示引擎进行过多的限制或对其进行专门的定义,所以可能会出现种类繁多的显示引擎。这些不同的显示引擎对输入数据的要求可能存在不同,所以在完成几何信息码流的解码后,增添了第一后处理模块115,从而保证管线的输出的几何信息对于任何显示引擎都是可用的。在一些实施例中,第一后处理模块115对几何信息进行处理包括:对几何信息进行格式转换。
第二后处理模块116,用于对属性解码器114输出的属性信息进行处理。在完成属性信息码流的解码后,可以得到G-PCC编码点云中的点的属性信息,且在一些情况下属性信息已能够被显示引擎直接使用的,但是由于场景描述框架不对显示引擎进行过多的限制或对其进行专门的定义,所以可能会出现种类繁多的显示引擎。这些不同的显示引擎对输入数据的要求可能存在不同,所以在完成属性信息码流的解码后,增添了第二后处理模块116,从而保证管线的输出的属性信息对于任何显示引擎都是可用的。在一些实施例中,第一后处理模块115对几何信息进行处理包括:对属性信息进行格式转换。
最后,第一后处理模块115输出的处理后的几何信息和第二后处理模块116输出的处理后的属性信息写入缓存器117,以便显示引擎118根据需要从缓存器中读取几何信息和属性信息,以及根据读取的几何信息和属性信息渲染并展示三维场景中的G-PCC编码点云。
五、支持类型为G-PCC编码点云的媒体文件的缓存API
媒体接入函数通过管线完成了对G-PCC编码点云数据的处理后,媒体接入函数还需要将经过处理的数据需要以规范的排列结构交付给显示引擎,这需要将经过处理的G-PCC编码点云数据正确地存储在缓存中,该工作由缓存管理模块完成,但是缓存管理模块需要通过缓存API从媒体接入函数或显示引擎获取缓存管理指令。
在一些实施例中,媒体接入函数可以通过缓存API向缓存管理模块发送缓存管理指令。其中,所述缓存管理指令为显示引擎通过媒体接入函数API向媒体接入函数发送的缓存管理指令。
在一些实施例中,显示引擎可以通过缓存API向缓存管理模块发送缓存管理指令。
即,缓存管理模块可以通过缓存API与媒体接入函数进行通信,也可以通过缓存API与显示引擎进行通信,且与媒体接入函数或显示引擎进行通信的目的都是实现缓存的管理。当缓存管理模块通过缓存API与媒体接入函数进行通信时,需要显示引擎将缓存管理指令先通过媒体接入函数API发送给媒体接入函数,媒体接入函数再通过缓存API将缓存管理指令通过缓存API发送给缓存管理模块;当缓存管理模块通过缓存API与显示引擎通信时,只需要显示引擎根据场景描述文件中解析出来的缓存管理信息生成缓存管理指令,并通过缓存API发送给缓存管理模块即可。
在一些实施例中,缓存管理指令可以包括:创建缓存的指令、更新缓存的指令、释放缓存的指令中的一个或多个。
六、支持类型为G-PCC编码点云的媒体文件的缓存管理模块
在沉浸式媒体的场景描述框架的工作流程中,媒体接入函数通过管线完成了对G-PCC编码点云数据的处理后,经过处理的G-PCC编码点云数据需要以规范的排列结构交付给显示引擎,这需要将经过处理的G-PCC编码点云数据正确地存储在缓存器中,该工作由缓存管理模块负责。
缓存管理模块实现了缓存的创建、更新、释放等管理操作,操作的指令通过缓存API接收。缓存管理的规则在场景描述文档中进行了记录,并通过显示引擎进行解析,最终由显示引擎或媒体接入函数下达给了缓存管理模块。当媒体文件被媒体接入函数处理完毕后,需要存储在合适的缓存中,再被显示引擎进行取用,缓存管理的作用就在于管理这些缓存,使其与经过处理的媒体数据的格式相匹配,而不会打乱经过处理的媒体数据。媒体管理模块的具体设计方法应该以显示引擎以及媒体接入函数的设计为参照。
在上述内容的基础上,本申请一些实施例提供了一种场景描述文件的生成方法,参照图12所示,该场景描述文件的生成方法,包括如下步骤S121~S123:
S121、确定待渲染三维场景中的媒体文件的类型。
本申请实施例中的媒体文件的类型可以包括:G-PCC编码点云、V-PCC编码点云、触觉媒体文件、6DoF视频、MIV视频等中的一种或多种,且同一种类型的媒体文件可以包任意数量个。例如:所述待渲染三维场景中可以仅包括一个类型为G-PCC编码点云的媒体文件。再例如:所述待渲染三维场景中可以包括一个类型为G-PCC编码点云的媒体文件以及一个类型为V-PCC编码点云的媒体文件。再例如:所述待渲染三维场景中可以包括两个类型为G-PCC编码点云的媒体文件以及一个触觉媒体文件。
在上步骤S121中,若所述待渲染三维场景中的目标媒体文件的类型为G-PCC编码点云,则执行如下步骤S122:
S122、根据所述目标媒体文件的描述信息生成所述目标媒体文件对应的目标描述模块。
在一些实施例中,所述目标媒体文件的描述信息包括:所述目标媒体文件的名称、所述目标媒体文件是否需要自动播放、所述目标媒体文件是否需要循环播放、所述目标媒体文件的封装格式、所述目标媒体文件的码流的类型、所述目标媒体文件的编码参数等中的一项或多项。
在一些实施例中,上述步骤S122(根据所述目标媒体文件的描述信息生成所述目标媒体文件对应的目标描述模块)包括如下步骤1221~1229中的至少一项:
步骤1221、在所述目标媒体描述模块中添加媒体名称语法元素(name),并根据所述目标媒体文件的名称设置所述媒体名称语法元素的值。
例如:所述目标媒体描述模块中的媒体名称语法元素为"name",所述目标媒体文件的名称为"G-PCCexample",则在所述目标媒体描述模块中添加语法元素"name",并将语法元素"name"的值设置为"G-PCCexample"。
步骤1222、在所述目标媒体描述模块中添加自动播放语法元素(autoplay),并根据所述目标媒体文件是否需要自动播放设置所述自动播放语法元素的值。
例如:所述目标媒体描述模块中的自动播放语法元素为"autoplay",所述目标媒体文件需要自动播放,则在所述目标媒体描述模块中添加语法元素"autoplay",并将语法元素"autoplay" 的值设置为"ture"。
再例如:所述目标媒体描述模块中的自动播放语法元素为"autoplay",所述目标媒体文件不需要自动播放,则在所述目标媒体描述模块中添加语法元素"autoplay",并将语法元素"autoplay"的值设置为"false"。
步骤1223、在所述目标媒体描述模块中循环播放语法元素(loop),并根据所述目标媒体文件是否需要循环播放设置所述循环播放语法元素的值。
例如:所述目标媒体描述模块中的自动播放语法元素为"loop",所述目标媒体文件需要循环播放,则在所述目标媒体描述模块中添加语法元素"loop",并将语法元素"loop"的值设置为"ture"。
再例如:所述目标媒体描述模块中的自动播放语法元素为"loop",所述目标媒体文件不需要循环播放,则在所述目标媒体描述模块中添加语法元素"loop",并将语法元素"loop"的值设置为"false"。
步骤1224、在所述目标媒体描述模块中添加可选项(alternatives)。
步骤1225、在所述可选项(alternatives)中添加媒体类型语法元素(mimeType),并将所述媒体类型语法元素的值设置为G-PCC编码点云对应的封装格式值。
在一些实施例中,G-PCC编码点云对应的封装格式为MP4,G-PCC编码点云对应的封装格式值为:application/mp4。
示例性的,当所述媒体类型语法元素为mimeType",所述G-PCC编码点云对应的封装格式值为"application/mp4",则在所述目标媒体描述模块的可选项中添加语法元素"mimeType",并将语法元素"mimeType"的值设置为"application/mp4"。
步骤1226、在所述可选项(alternatives)中添加统一资源标识符语法元素(uri),并将所述统一资源标识符语法元素的值设置为所述目标媒体文件的访问地址。
例如:所述统一资源标识符语法元素为"uri",所述目标媒体文件的访问地址为"http://www.exp.com/G-PCCexp.mp4",则在所述目标媒体描述模块的可选项中添加语法元素"uri",并将语法元素"uri"的值设置为http://www.exp.com/G-PCCexp.mp4。
步骤1227、在所述可选项(alternatives)中添加轨道数组(tracks)。
步骤1228、在所述目标媒体描述模块的可选项(alternatives)的轨道数组(tracks)中添加第一轨道索引语法元素(track),并根据所述目标媒体文件的封装方式设置所述第一轨道索引语法元素(track)的值。
在一些实施例中,根据所述目标媒体文件的封装方式设置所述第一轨道索引语法元素(track)的值,包括:
当所述目标媒体文件为单轨道封装文件时,将所述第一轨道索引语法元素的值设置为所述目标媒体文件的码流轨道的索引值;
当所述目标媒体文件为多轨道封装文件时,将所述第一轨道索引语法元素的值设置为所述目标媒体文件的几何码流轨道的索引值。
即,当编码后的G-PCC编码点云作为MPEG_media.alternative.tracks中的一项被场景描述文件引用,并且被引用的项符合ISOBMFF中track的规定时:对于单轨道封装的G-PCC数据,在MPEG_media中引用的轨道就是G-PCC码流轨道。例如,G-PCC数据被ISOBMFF封装成 一条MIHS轨道,则在MPEG_media中引用的轨道就是这条码流轨道。对于多轨道封装的G-PCC数据,在MPEG_media中引用的轨道为G-PCC几何码流轨道。
本申请实施例中,所述G-PCC编码点云的封装方式包括单轨道封装和多轨道封装。其中,单轨道封装是指将G-PCC编码点云的几何码流和属性码流封装在同一码流轨道的中的封装方式,而多个轨道封装是指将G-PCC编码点云的几何码流和属性码流分别封装在多个码流轨道的中的封装方式。
步骤1229、在所述目标媒体描述模块的可选项的轨道数组中添加编解码参数语法元素(codecs),并根据所述目标媒体文件的编码参数、所述目标媒体文件的码流的类型以及ISO/IEC 23090-18 G-PCC数据传输标准设置所述编解码参数语法元素的值。
例如:ISO/IEC 23090-18 G-PCC数据传输标准规定,在G-PCC编码点云采用DASH封装时,当MPD文件中使用G-PCC预选信令时,预选信令的“codecs”属性应设置为'gpc1',表示预选媒体是基于几何的点云;当G-PCC容器中存在多个G-PCC Tile轨道时,Main G-PCC Adaptation Set的“codecs”属性应被设置为'gpcb'或'gpeb',表示适配集包含G-PCC Tile基本轨道数据。当Tile Component Adaptation Sets只向单个G-PCC组件数据发送信号时,Main G-PCC adaptivesset的“codecs”属性应被设置为'gpcb'。当Tile Component Adaptation Sets向所有G-PCC组件数据发送信号时,Main G-PCC Adaptation Set的“codecs”属性应被设置为'gpeb'。当MPD文件中使用G-PCC Tile预选信令时,预选信令的“codecs”属性应设置为'gpt1',表示预选媒体是基于几何的点云碎片,则可以在G-PCC编码点云采用DASH封装且MPD文件中使用G-PCC预选信令时,将所述目标媒体描述模块的"alternatives"的"tracks"中的"codecs"的值设置为'gpc1'。
示例性的,当所述待渲染三维场景中的媒体文件仅包括一个类型为G-PCC编码点云的目标媒体文件、G-PCC编码点云对应的封装格式值为"application/mp4",所述目标媒体文件的名称为"G-PCCexample"、所述目标媒体文件自动播放且循环播放、所述目标媒体文件的访问地址为:http://www.exp.com/G-PCCexp.mp4、所述目标媒体文件为单轨道封装文件且所述目标媒体文件为的码流轨道的索引值为1、所述目标媒体文件采用DASH封装且MPD文件中使用G-PCC预选信令时,所述目标媒体文件对应的目标媒体描述模块可以如下所示:

S123、在所述待渲染三维场景的场景描述文件的MPEG媒体(MPEG_media)的媒体列表(media)中添加所述目标媒体描述模块。
其中,所述目标媒体描述模块为基于所述目标媒体文件的描述信息生成的媒体描述模块。
示例性的,当所述待渲染三维场景中的媒体文件仅包括一个类型为G-PCC编码点云的目标媒体文件,所述G-PCC编码点云对应的封装格式值为application/mp4、所述目标媒体文件的名称为"G-PCCexample1"、所述目标媒体文件自动播放且循环播放、所述目标媒体文件的访问地址为"uri":http://www.exp.com/G-PCCexp.mp4、所述目标媒体文件为单轨道封装文件,且所述目标媒体文件的码流轨道的索引值为1、所述目标媒体文件用DASH封装且MPD文件中使用G-PCC预选信令,则所述场景描述文件的MPEG媒体可以如下所示:
在一些实施例中,所述待渲染三维场景中还可以包括多个媒体文件,且多个媒体文件中的一个或多个媒体文件的类型为G-PCC编码点云,在生成所述场景描述文件时,需要根据上述实施例添加类型为G-PCC编码点云的媒体文件对应的媒体描述模块,以及根据其它类型的媒体文件场景描述文件生成方式添加其它类型的媒体文件对应的媒体描述模块。
示例性的,当所述待渲染三维场景中的媒体文件包括一个类型为G-PCC编码点云的目标媒体文件以及一个触觉媒体文件,所述G-PCC编码点云对应的封装格式值为"application/mp4"、所述目标媒体文件的名称为"G-PCCexample"、所述目标媒体文件自动播放且循环播放、所述目标媒体文件的访问地址为"uri":http://www.exp.com/G-PCCexp.mp4、所述目标媒体文件为单轨道封装文件,且所述目标媒体文件的码流轨道的索引值为1、所述目标媒体文件用DASH封装且MPD文件中使用G-PCC预选信令,则所述场景描述文件的MPEG媒体可以如下所示:

在上述示例中,MPEG媒体的媒体列表(media)中包括两个大括号,第一个大括号(第n+2~n+18行)囊括了类型为G-PCC编码点云的目标媒体文件对应的媒体描述模块,第二个大括号(第n+19~n+35行)囊括了触觉媒体文件对应的媒体描述模块。
本申请实施例提供的场景描述文件的生成方法在生成待渲染三维场景的场景描述文件时,首先确定待渲染三维场景中的媒体文件的类型,并在所述待渲染三维场景中的目标媒体文件的类型为G-PCC编码点云时,根据所述目标媒体文件的描述信息生成所述目标媒体文件对应的目标描述模块以及在所述待渲染三维场景的场景描述文件的MPEG媒体的媒体列表中添加所述目标媒体描述模块。由于本申请实施例可以在所述待渲染三维场景中的媒体文件包括类型为G-PCC编码点云的目标媒体文件时,根据所述目标媒体文件的描述信息生成所述目标媒体文件对应的目标描述模块以及在所述待渲染三维场景的场景描述文件的MPEG媒体的媒体 列表中添加所述目标媒体描述模块,在所述场景描述文件的MPEG媒体的媒体描述模块列表中添加目标媒体文件对应的媒体描述模块,因此本申请实施例可以生成包括类型为G-PCC编码点云的三维场景的场景描述文件,实现了场景描述文件对类型G-PCC编码点云的媒体文件支持。
在一些实施例中,所述场景描述文件的生成方法还包括:
在所述场景描述文件的场景列表(scenes)中添加所述待渲染三维场景对应的目标场景描述模块(scene),以及在所述目标场景描述模块的节点列表(nodes)中添加所述待渲染场景中的节点对应的节点描述模块的索引值。
例如:所述待渲染三维场景包括包括两个节点,且该两个节点对应的节点描述模块(node)的索引值分别为0和1,则在所述场景描述文件中添加的所述待渲染三维场景对应的目标场景描述模块可以如下所示:
在上述示例中,待渲染三维场景包括两个节点,且该两个节点对应的节点描述模块的索引值分别为0和1,因此在所述待渲染三维场景对应的场景描述模块的节点列表(nodes)中添加0和1两个索引值。
在一些实施例中,所述场景描述文件的生成方法还包括:
在所述场景描述文件的节点列表(nodes)中添加所述待渲染场景中的节点对应的节点描述模块,以及在所述节点描述模块的网格索引列表(mesh)中添加所述节点挂载的三维网格对应的网格描述模块的索引值。
在一些实施例中,所述场景描述文件的生成方法还包括:
在所述节点描述模块添加节点名称语法元素(name),以及根据所述节点的名称设置对应的节点描述模块中的节点名称语法元素(name)的值。
例如:所述待渲染三维场景包括两个节点,该两个节点的名称分别为G-PCCexp_node1和G-PCCexp_node2,该节点G-PCCexp_node1包含的三维网格对应的网格描述模块的索引值分别为0、1,节点G-PCCexp_node2包含的三维网格对应的网格描述模块的索引值为2,则该场景描述文件的节点列表(nodes)部分可以如下所示:

在上述示例中,待渲染三维场景对应的场景描述文件的节点列表(nodes)中包括两个节点描述模块,第一个节点描述模块为第n+2~n+5行的大括号所囊括的内容,第二个节点描述模块为第n+6~n+9行的大括号所囊括的内容。第一个节点描述模块中的节点名称语法元素(name)的值被设置为了对应节点的名称"G-PCCexp_node1",第一个节点描述模块中的网格索引语法元素(mesh)的值被设置为了对应节点挂载的三维网格的网格描述模块的索引值0和1,第二个节点描述模块中的节点名称语法元素(name)的值被设置为了对应节点的名称"G-PCCexp_node2",第二个节点描述模块中的网格索引语法元素(mesh)的值被设置为了对应节点挂载的三维网格的网格描述模块的索引值2。
在一些实施例中,所述场景描述文件的生成方法还包括:
在所述场景描述文件的网格列表(meshes)中添加所述待渲染场景中的三维网格对应的网格描述模块(mesh),在所述网格描述模块中添加所述网格描述模块对应的三维网格所包含的各个种类的数据对应的语法元素,以及将各个种类的数据对应的语法元素的值设置为用于访问各个种类的数据的访问器对应的访问器描述模块的索引值。
本申请实施例中,三维网格所包含的数据可以包括:几何坐标(position)、颜色值(color)、法向量(normal)、切向量(tangent)、纹理坐标(texcoord)、关节(joints)、权重(weights)中的一个或多个。
在一些实施例中,在所述网格描述模块中添加所述网格描述模块对应的三维网格所包含的各个种类的数据对应的语法元素,包括:
在所述目标媒体文件中的三维网格对应的网格描述模块的基元(primitives)中添加扩展列表(extensions),在所述扩展列表(extensions)中添加目标扩展数组以及在所述目标扩展数组中添加对应的三维网格所包含的各个种类的数据对应的语法元素。
在一些实施例中,所述目标扩展数组可以为MPEG_primitve_GPCC。
在一些实施例中,在所述目标扩展数组中添加对应的三维网格所包含的各个种类的数据对应的语法元素,包括:基于第一语法元素集合中的语法元素,在所述目标扩展数组中添加对应的三维网格所包含的各个种类的数据对应的语法元素。其中,所述第一语法元素集合为ISO/IEC 23090-14 MPEG-I场景描述标准中规定的场景描述文件的网格描述模块的基元的属性中支持的语法元素组成的集合。
具体的,ISO/IEC 23090-14 MPEG-I场景描述标准中规定的场景描述文件的网格描述模块的基元的属性支持的语法元素,包括:position、color_n、normal、tangent、texcoord、joints、weights,因此第一语法元素集合为:{position,color_n,normal,tangent,texcoord,joints,weights}。
示例性的,某一三维网格包括几何坐标和颜色数据,用于访问所述几何坐标的访问器对应的访问器描述模块的索引值为0,用于访问所述颜色数据的访问器对应的访问器描述模块的索引值为1,则基于第一语法元素集合,在所述目标扩展数组中添加对应的三维网格所包含的各个种类的数据对应的语法元素之后,该三维网格对应的网格描述模块可以如下所示:

在一些实施例中,在所述目标扩展数组中添加对应的三维网格所包含的各个种类的数据对应的语法元素,包括:基于预设置的G-PCC编码点云对应的语法元素组成的第二语法元素集合,在所述目标扩展数组中添加对应的三维网格所包含的各个种类的数据对应的语法元素。
示例性的,G-PCC编码点云对应的语法元素可以包括:G-PCC_position、G-PCC_color_n、G-PCC_normal、G-PCC_tangent、G-PCC_texcoord、G-PCC_joints、G-PCC_weights,相应的,第二语法元素集合为:{G-PCC_position、G-PCC_color_n、G-PCC_normal、G-PCC_tangent、G-PCC_texcoord、G-PCC_joints、G-PCC_weights}。
示例性的,某一三维网格包括几何坐标和颜色数据,用于访问所述几何坐标的访问器对应的访问器描述模块的索引值为0,用于访问所述颜色数据的访问器对应的访问器描述模块的索引值为1,则基于第二语法元素集合,在所述目标扩展数组中添加对应的三维网格所包含的各个种类的数据对应的语法元素之后,该三维网格对应的网格描述模块可以如下所示:
在一些实施例中,在所述网格描述模块中添加所述网格描述模块对应的三维网格所包含的各个种类的数据对应的语法元素,包括:在所述网格描述模块的基元(primitives)的属性(attributes)中添加所述网格描述模块对应的三维网格所包含的各个种类的数据对应的语法元素。
在一些实施例中,在所述网格描述模块的基元(primitives)的属性(attributes)中添加所述网格描述模块对应的三维网格所包含的各个种类的数据对应的语法元素,包括:基于所述第一语法元素集合,在所述网格描述模块的基元(primitives)的属性(attributes)中添加所述网格描述模块对应的三维网格所包含的各个种类的数据对应的语法元素。其中,所述第一语法元素集合为ISO/IEC 23090-14 MPEG-I场景描述标准中规定的场景描述文件的网格描述模块的基元的属性中支持的语法元素组成的集合。
即,针对场景描述文件中的所有三维网格(包括类型为G-PCC的媒体文件中的三维网格和其它类型的媒体文件中的三维网格),基于相同的语法元素集合中的语法元素在对应的网格描述模块的基元(primitives)的属性(attributes)中添加语法元素。
示例性的,某一三维网格包括几何坐标和颜色数据,用于访问所述几何坐标的访问器对应的访问器描述模块的索引值为1,用于访问所述颜色数据的访问器对应的访问器描述模块的索引值为2,则基于所述第一语法元素集合,在所述目标扩展数组中添加对应的三维网格所包含的各个种类的数据对应的语法元素之后,该三维网格对应的网格描述模块可以如下所示:
在一些实施例中,在所述网格描述模块的基元(primitives)的属性(attributes)中添加所述网格描述模块对应的三维网格所包含的各个种类的数据对应的语法元素,包括:基于第一语法元素集合中的语法元素,在第一网格描述模块的基元的属性中添加对应的三维网格所包含的各个种类的数据对应的语法元素,以及基于第二语法元素集合中的语法元素,在第二网格描述模块的基元的属性中添加对应的三维网格所包含的各个种类的数据对应的语法元素。
其中,所述第一网格描述模块为类型为G-PCC编码点云的媒体文件中的三维网格对应的网格描述模块,所述第二网格描述模块类型不为G-PCC编码点云的媒体文件中的三维网格对应的网格描述模块。
在一些实施例中,所述第一语法元素集合为ISO/IEC 23090-14 MPEG-I场景描述标准中规定的场景描述文件的网格描述模块的基元的属性中支持的语法元素组成的集合;所述第二语法元素集合为预设值的G-PCC编码点云对应的语法元素组成的集合。
即,在网格描述模块的基元的属性中添加对应的三维网格所包含的各个种类的数据对应的语法元素时,需要根据三维网格是否属于类型为G-PCC的媒体文件中的三维网格,将场景描述文件中的三维网格分为两类,对于类型为G-PCC编码点云的媒体文件中的三维网格,基于第一语法元素集合中的语法元素,在对应的网格描述模块的基元的属性中添加其所包含的各个种类的数据对应的语法元素;对于类型不为G-PCC编码点云的媒体文件中的三维网格,基于第二语法元素集合中的语法元素,在对应的网格描述模块的基元的属性中添加其所包含的各个种类的数据对应的语法元素。
示例性的,场景描述文件中包括两个三维网格,该两个三维网格的名称分别为GPCCexample_mesh1和GPCCexample_mesh2。其中,GPCCexample_mesh1不属于类型为G-PCC的媒体文件中的三维网格,包括几何坐标和颜色数据,用于访问GPCCexample_mesh1 的几何坐标的访问器对应的访问器描述模块的索引值为0,用于访问GPCCexample_mesh1的颜色数据的访问器对应的访问器描述模块的索引值为1,GPCCexample_mesh2属于类型为G-PCC的媒体文件中的三维网格,包括几何坐标和颜色数据,用于访问GPCCexample_mesh2的几何坐标的访问器对应的访问器描述模块的索引值为2,用于访问GPCCexample_mesh2的颜色数据的访问器对应的访问器描述模块的索引值为3,则基于上述实施例在所述目标扩展数组中添加对应的三维网格所包含的各个种类的数据对应的语法元素之后,该所述场景描述文件中的网格列表(meshes)可以如下所示:
在一些实施例中,所述场景描述文件的生成方法还包括:
根据三维网格的名称设置三维网格对应的网格描述模块中的网格名称语法元素(name)的值。
在一些实施例中,所述场景描述文件的生成方法还包括:
根据三维网格所包含的数据种类设置三维网格对应的网格描述模块的基元的属性中包含的语法元素。
在一些实施例中,所述场景描述文件的生成方法还包括:
根据三维网格的拓扑结构的类型,设置三维网格对应的网格描述模块中用于描述三维网格的拓扑类型的语法元素的值。
在一些实施例中,三维网格对应的网格描述模块中用于描述三维网格的拓扑类型的语法元素为"mode"。
在一些实施例中,所述场景描述文件的生成方法还包括:
在所述场景描述文件的缓存器列表(accessor)中添加目标访问器对应的访问器描述模块(accessor)。其中,所述目标访问器为用于访问所述目标媒体文件的解码数据的访问器。
在一些实施例中,所述场景描述文件的生成方法还包括:在所述场景描述文件的缓存器列表(buffers)中添加目标缓存器对应的缓存器描述模块(buffer)。其中,所述目标缓存器为用于存储所述目标媒体文件的解码数据的缓存器。
在一些实施例中,在所述场景描述文件的缓存器列表(buffers)中添加目标缓存器对应的缓存器描述模块(buffer)包括以下步骤a1至a5中的至少一项:
步骤a1、在所述目标缓存器对应的缓存器描述模块中添加字节长度语法元素(byteLength),并将所述字节长度语法元素的值设置为所述目标媒体文件的字节长度。
示例性的,当所述G-PCC编码点云的数据量为15000,则将所述缓存器描述模块中的"byteLenth"的值设置为"15000"。
步骤a2、在所述目标缓存器对应的缓存器描述模块中添加MPEG环形缓存器(MPEG_buffer_circular)。
步骤a3、在所述MPEG环形缓存器中添加环节数量语法元素(count),并根据目标缓存器的存储环节数量设置对应的所述环节数量语法元素(count)的值。
例如:所述环形缓存器的存储环节数量为8,则将所述环形缓存器中的"count"及其值设置为:"count":8。
步骤a4、在所述MPEG环形缓存器中添加媒体索引语法元素(media),并根据所述目标媒体描述模块的索引值设置所述媒体索引语法元素(media)的值。
例如:所述目标媒体描述模块的索引值为0,则将所述环形缓存器的描述模块中的"media"及其值设置为:"media":0。
步骤a5、在所述MPEG环形缓存器中添加第二轨道索引语法元素(tracks),并根据目标缓存器存储的数据的源数据的轨道索引值设置所述第二轨道索引语法元素(tracks)的值。
例如:所述环形缓存器存储的数据所属的码流轨道的索引值为1,则可以将所述环形缓存器的描述模块中的"tracks"及其值设置为:"tracks":"#trackIndex=1"。
示例性的,若在所述场景描述文件的缓存器列表中添加目标缓存器对应的缓存器描述模块包括上步骤a1~a5中的每一项,且目标媒体文件的字节长度为9000,某一目标缓存器的存储环节数量为8,所述目标媒体文件对应的媒体描述模块的索引值为1,所述MPEG环形缓存器存储的数据的源数据的轨道索引值为1,则在所述场景描述文件的缓存器列表中添加的目标缓存器对应的缓存器描述模块可以如下所示:
在一些实施例中,所述场景描述文件的生成方法还包括:在所述场景描述文件的缓存切 片列表(bufferViews)中添加目标缓存器的缓存切片对应的缓存切片描述模块。
在一些实施例中,在所述场景描述文件的缓存切片列表中添加所述目标缓存器的缓存切片对应的缓存切片描述模块,包括如下步骤b1~b3中的至少一项:
步骤b1、在所述目标缓存器的缓存切片对应的缓存切片描述模块中添加缓存器索引语法元素(buffer),并根据缓存切片所属的目标缓存器对应的缓存器描述模块的索引值设置所述缓存器索引语法元素(buffer)的值。
例如:某一缓存器对应的缓存器描述模块的索引值为2,则将该缓存切片描述模块中的"buffer"及其值设置为:"buffer":2。
步骤b2、在所述目标缓存器的缓存切片对应的缓存切片描述模块中添加第二字节长度语法元素(byteLength),并根据缓存切片的容量设置所述第二字节长度语法元素(byteLength)的值。
步骤b3、在所述目标缓存器的缓存切片对应的缓存切片描述模块中添加偏移量语法元素(byteOffset),并根据对应缓存切片的存储数据的偏移量设置所述偏移量语法元素的值。
例如:当所述缓存器的某一缓存器切片的数据范围为[1,12000]时,则基于上述步骤b2和步骤b3将该缓存器切片对应的缓存器切片描述模块中的"byteLenth"及其值设置为:"byteLenth":12000,将该缓存器切片对应的缓存器切片描述模块中的"byteOffset"及其值设置为:"byteOffset":0;;当所述缓存器的某一缓存器切片的数据范围为[12001,15000]时,则基于上述步骤b2和步骤b3将该缓存器切片对应的缓存器切片描述模块中的"byteLenth"及其值设置为:"byteLenth":3000,将该缓存器切片对应的缓存器切片描述模块中的"byteOffset"及其值设置为:"byteOffset":12000。
示例性的,若在所述场景描述文件的缓存切片列表(bufferViews)中添加所述目标缓存器的缓存切片对应的缓存切片描述模块包括上步骤b1~b3中的每一项,某一目标缓存器对应的缓存器描述模块的索引值为1,目标缓存器的容量为8000,且目标缓存器包括两个缓存切片,第一个缓存切片的容量为6000,偏移量为0,第二个缓存切片的容量为2000,偏移量为6001,则在所述场景描述文件的缓存切片列表中添加所述目标缓存器的缓存切片对应的缓存切片描述模块可以如下所示:
在一些实施例中,所述场景描述文件的生成方法还包括:在所述场景描述文件的访问器列表(accessors)中添加目标访问器对应的访问器描述模块。其中,所述目标访问器为用于访问所述目标媒体文件的解码数据的访问器。
在一些实施例中,在所述场景描述文件的访问器列表(accessors)中添加目标访问器对应 的访问器描述模块,包括如下步骤c1~c6中的至少一项:
步骤c1、在所述目标访问器对应的访问器描述模块中添加数据类型语法元素(componentType),并根据目标访问器所访问的数据的类型设置对应的所述数据类型语法元素的值。
例如:当某一访问器访问的数据的类型为5126,则将该访问器对应的访问器描述模块中的数据类型语法元素及其值设置为:"componentType":5126。
步骤c2、在所述目标访问器对应的访问器描述模块中添加访问器类型语法元素(type),并根据预配置的访问器类型设置所述访问器类型语法元素的值。
例如:当某一访问器访问的访问器类型为"VEC3",则将该访问器对应的访问器描述模块中的访问器类型语法元素(type)及其值设置为:"type":"VEC3"。
步骤c3、在所述目标访问器对应的访问器描述模块中添加数据数量语法元素(count),并根据目标访问器的类型设置对应的所述访问器类型语法元素的值。
步骤c4、在所述目标访问器对应的访问器描述模块中添加MPEG时变访问器(MPEG_accessor_timed)。
步骤c5、在所述MPEG时变访问器中添加缓存切片索引语法元素(bufferView),并根据存储目标访问器所访问的数据的缓存切片对应的缓存切片描述模块的索引值设置对应的所述切片索引语法元素的值。
例如:当某一访问器所访问的数据所属的缓存器切片对应的缓存器切片描述模块的索引值为3,则将该目标访问器对应的访问器描述模块的MPEG时变访问器中的缓存切片索引语法元素及其值设置为:"bufferView":3。
步骤c6、在所述MPEG时变访问器中添加时变语法元素(immutable),并根据对应的目标访问器内的语法元素的取值是否随时间变化设置所述时变语法元素的值。
在一些实施例中,当某一目标访问器内的语法元素的取值不随时间变化,则将该目标访问器对应的访问器描述模块的MPEG时变访问器中的时变语法元素及其值设置为:"immutable":ture,当某一目标访问器内的语法元素的取值会随时间变化,则将该目标访问器对应的访问器描述模块的MPEG时变访问器中的时变语法元素及其值设置为:"immutable":false。
示例性的,若在所述场景描述文件的访问器列表(accessors)中添加用于访问所述目标缓存器的缓存切片中的数据的所述目标访问器对应的访问器描述模块包括上步骤c1~c6中的每一项,某一目标访问器所访问的数据的类型为5121,该目标访问器的访问器类型为VEC2,该目标访问器访问的数据的数量为4000,该储存该目标访问器所需要访问的数据的缓存切片对应的缓存切片描述模块的索引值为1,且对应的访问器内的语法元素的取值不随时间变化,则在所述场景描述文件的访问器列表(accessors)中添加的该目标访问器对应的访问器描述模块可以如下所示:

在一些实施例中,所述场景描述文件的生成方法还包括:
在所述场景描述文件中添加数字资产描述模块(asset),在所述数字资产描述模块中添加版本语法元素(version)以及在所述场景描述文件为基于glTF2.0版本编写场景描述文时,将所述版本语法元素的值设置为2.0。
示例性的,在所述场景描述文件添加的数字资产描述模块可以如下所示:
在一些实施例中,所述场景描述文件的生成方法还包括:
在所述场景描述文件中添加扩展使用描述模块(extensionsUsed),并在所述扩展使用描述模块中添加所述场景描述文件使用的MPEG对glTF2.0版本的场景描述文件的扩展。
示例性的,场景描述文件中使用的MPEG扩展包括:MPEG媒体(MPEG_media)、MPEG环形缓存器(MPEG_buffer_circular)以及MPEG时变访问器(MPEG_accessor_timed),则在所述场景描述文件中添加的扩展使用描述模块可以如下所示:
在一些实施例中,所述场景描述文件的生成方法还包括:
在所述场景描述文件中添加场景声明(scene),并将所述场景声明的值设置为所述待渲染场景对应的场景描述模块的索引值。
示例性的,待渲染场景对应的场景描述模块的索引值为0,则在所述场景描述文件中添加场景声明可以如下所示:
本申请一些实施例还提供了一种场景描述文件的解析方法,参照图13所示,该场景描述文件的解析方法,包括如下步骤:S131~S133:
S131、获取待渲染三维场景的场景描述文件。
其中,所述待渲染三维场景中包括类型为G-PCC编码点云的目标媒体文件。
本申请实施例中的待渲染三维场景中可以包括一个或多个媒体文件,且当所述待渲染三维场景中包括多个媒体文件时,所述多个媒体文件中的一个或多个媒体文件的类型可以为G-PCC编码点云。当所述待渲染三维场景包括多个类型为G-PCC编码点云的目标媒体文件时,可以分别对类型为G-PCC编码点云的目标媒体文件执行本申请实施例提供的解析方法。
S132、从所述场景描述文件的MPEG媒体(MPEG_media)的媒体列表(media)中获取所述目标媒体文件对应的目标媒体描述模块。
示例性的,所述目标媒体文件对应的目标媒体描述模块可以如下所示:

S133、根据所述目标媒体描述模块获取所述目标媒体文件的描述信息。
在一些实施例中,上步骤S133(根据所述目标媒体描述模块获取所述目标媒体文件的描述信息)包括如下步骤1331~1337中的至少一项:
步骤1331、根据所述目标媒体描述模块中的媒体名称语法元素(name)的值获取所述目标媒体文件的名称。
例如:所述目标媒体描述模块中的媒体名称语法元素及其值为:"name":"GPCCexample",则可以确定所述目标媒体文件的名称为:GPCCexample。
步骤1332、根据所述目标媒体描述模块中的自动播放语法元素(autoplay)的值确定所述目标媒体文件是否需要自动播放。
在一些实施例中,根据所述目标媒体描述模块中的自动播放语法元素(autoplay)的值确定所述目标媒体文件是否需要自动播放,包括:当所述目标媒体描述模块中的自动播放语法元素(autoplay)及其值为:"autoplay":true,则确定所述目标媒体文件需要自动播放;而当所述目标媒体描述模块中的自动播放语法元素(autoplay)及其值为:"autoplay":false,则确定所述目标媒体文件不需要自动播放。
步骤1333、根据所述目标媒体描述模块中的循环播放语法元素(loop)的值确定所述目标媒体文件是否需要循环播放。
在一些实施例中,根据所述目标媒体描述模块中的循环播放语法元素(loop)的值确定所述目标媒体文件是否需要循环播放,包括:当所述目标媒体描述模块中的循环播放语法元素(loop)及其值为:"loop":true,则确定所述目标媒体文件需要循环播放;而当所述目标媒体描述模块中的循环播放语法元素(loop)及其值为:"loop":false,则确定所述目标媒体文件不需要循环播放。
步骤1334、根据所述目标媒体描述模块的可选项(alternatives)中的媒体类型语法元素(mimeType)的值获取所述目标媒体文件的封装格式。
由于媒体文件的类型为G-PCC编码点云时,媒体文件对应的媒体描述模块中的媒体类型语法元素(mimeType)的值会被设置为G-PCC编码点云对应的封装格式值,且G-PCC编码点云对应的封装格式值可以为:"application/mp4",因此当G-PCC编码点云对应的封装格式值为:"application/mp4",可以获取所述目标媒体文件的封装格式为MP4。
步骤1335、根据所述目标媒体描述模块的可选项(alternatives)中的唯一地址标识符语法元素(uri)的值获取所述目标媒体文件的访问地址。
例如:所述目标媒体描述模块的可选项(alternatives)中的唯一地址标识符语法元素(uri) 及其值为:"uri":"http://www.example.com/GPCCexample.mp4",则可以确定所述目标媒体文件的访问地址为:http://www.example.com/GPCCexample.mp4。
步骤1336、根据所述目标媒体描述模块的可选项(alternatives)的轨道数组(tracks)中的第一轨道索引语法元素(track)的值获取所述目标媒体文件的轨道信息。
在一些实施例中,根据所述目标媒体描述模块的可选项(alternatives)的轨道数组(tracks)中的第一轨道索引语法元素(track)的值获取所述目标媒体文件的轨道信息,包括:当所述目标媒体文件的封装文件为单轨道封装文件时,将所述第一轨道索引语法元素的值确定为所述目标媒体文件的码流轨道的索引值;当所述目标媒体文件为多轨道封装文件时,将所述第一轨道索引语法元素的值确定为所述目标媒体文件的几何码流轨道的索引值。
步骤1337、根据所述目标媒体描述模块的可选项(alternatives)的轨道数组(tracks)中的编解码参数语法元素(codecs)的值以及SO/IEC 23090-18 G-PCC数据传输标准确定所述目标媒体文件的码流的类型和解码参数。
在一些实施例中,上步骤1337(根据所述目标媒体描述模块的可选项(alternatives)的轨道数组(tracks)中的编解码参数语法元素(codecs)的值以及SO/IEC 23090-18 G-PCC数据传输标准确定所述目标媒体文件的码流的类型和解码参数)包括如下步骤13371和13372:
步骤13371、根据所述目标媒体描述模块的可选项(alternatives)的轨道数组(tracks)中的编解码参数语法元素(codecs)的值以及SO/IEC 23090-18 G-PCC数据传输标准确定所述目标媒体文件的码流的类型和编码参数。
ISO/IEC 23090-18 G-PCC数据传输标准规定,在G-PCC编码点云采用DASH封装时,当MPD文件中使用G-PCC预选信令时,预选信令的codecs属性应设置为'gpc1',表示预选媒体是基于几何的点云;当G-PCC容器中存在多个G-PCC Tile轨道时,Main G-PCC Adaptation Set的“codecs”属性应被设置为'gpcb'或'gpeb',表示适配集包含G-PCC Tile基本轨道数据。当Tile Component Adaptation Sets只向单个G-PCC组件数据发送信号时,Main G-PCC adaptivesset的“codecs”属性应被设置为'gpcb'。当Tile Component Adaptation Sets向所有G-PCC组件数据发送信号时,Main G-PCC Adaptation Set的“codecs”属性应被设置为'gpeb'。当MPD文件中使用G-PCC Tile预选信令时,预选信令的“codecs”属性应设置为'gpt1',表示预选媒体是基于几何的点云碎片,则可以在G-PCC编码点云采用DASH封装且MPD文件中使用G-PCC预选信令时,将所述目标媒体描述模块的"alternatives"的"tracks"中的"codecs"的值设置为'gpc1'。因此根据所述目标媒体描述模块的可选项(alternatives)的轨道数组(tracks)中的编解码参数语法元素(codecs)的值以及SO/IEC 23090-18 G-PCC数据传输标准可以确定所述目标媒体文件的封装方式和编码参数。
步骤13372、根据所述目标媒体文件的编码参数确定所述目标媒体文件的解码参数。
由于所述目标媒体文件的解码过程与所述目标媒体文件的编码过程互为逆操作,因此可以根据所述目标媒体文件的编码参数确定所述目标媒体文件的解码参数。
示例性的,当所述目标媒体文件对应的目标媒体描述模块如下所示:

则,根据所述目标媒体描述模块获取的所述目标媒体文件的描述信息包括:所述目标媒体文件的名称为:AAAA,所述目标媒体文件不需要自动播放,但需要循环播放;所述目标媒体文件的封装格式为MP4,所述目标媒体文件的访问地址为:http://www.bbbb.com/AAAA.mp4;所述目标媒体文件的参考轨道为索引值为0的码流轨道,所述目标媒体文件的封装/解封装方式为MP4,所述目标媒体文件的编解码参数为gpc1。
本申请实施例提供的场景描述文件的解析方法在获取包括类型为G-PCC编码点云的目标媒体文件的待渲染三维场景的场景描述文件后,可以从所述场景描述文件的MPEG媒体的媒体列表中获取所述目标媒体文件对应的目标媒体描述模块,以及根据所述目标媒体描述模块获取所述目标媒体文件的描述信息。由于本申请实施例提供的场景描述文件的解析方法可以根据所述目标媒体描述模块获取所述目标媒体文件的描述信息,进而基于目标媒体文件的描述信息渲染和显示包括类型为G-PCC编码点云的目标媒体文件的待渲染三维场景,因此本申请实施例提供了一种能够解析包括类型为G-PCC编码点云的媒体文件的三维场景的场景描述文件的方法,实现了解析包括G-PCC编码点云的三维场景的场景描述文件。
在一些实施例中,上述实施例提供的场景描述文件的解析方法还包括:
从所述场景描述文件的场景列表(scenes)中获取所述待渲染三维场景对应的目标场景描述模块(scene),以及根据所述目标场景描述模块获取所述待渲染三维场景的描述信息。
在一些实施例中,可以从所述场景描述文件中获取场景声明(scene)及其声明的索引值,并根据场景声明及其声明的索引值从所述场景描述文件的场景列表中获取所述待渲染三维场景对应的目标场景描述模块。
例如:场景声明及其声明的索引值为:"scene":0,则可以根据场景声明及其声明的索引值从所述场景描述文件的场景列表获取第一个场景描述模块作为所述待渲染三维场景对应的目标场景描述模块。
在一些实施例中,根据所述目标场景描述模块获取所述待渲染三维场景的描述信息,包括:根据所述目标场景描述模块的节点索引列表(nodes)声明的索引值确定所述待渲染三维场景中的节点对应的节点描述模块的索引值。
示例性的,所述目标场景描述模块如下所示:
则,根据所述目标场景描述模块的节点索引列表(nodes)声明的索引值可以确定所述待 渲染三维场景中包括两个节点,一个节点对应的节点描述模块的索引值为0(节点列表中的第一个节点描述模块),另一个节点对应的节点描述模块的索引值为1(节点列表中的第二个节点描述模块)。
在一些实施例中,在根据所述目标场景描述模块的节点索引列表(nodes)声明的索引值确定所述待渲染三维场景中的节点对应的节点描述模块的索引值之后,上述实施例提供的场景描述文件的解析方法还包括:
根据所述待渲染三维场景中的节点对应的节点描述模块的索引值,从所述场景描述文件的节点列表(nodes)中获取所述待渲染三维场景中的节点对应的节点描述模块,以及根据所述待渲染三维场景中的节点对应的节点描述模块,获取所述待渲染三维场景中的节点的描述信息。
例如:当所述目标场景描述模块的节点索引列表声明的索引值仅包括0,则从所述场景描述文件的节点列表中获取第一个节点描述模块作为所述待渲染三维场景中的节点对应的节点描述模块。
再例如:当所述目标场景描述模块的节点索引列表声明的索引值包括0和1,则从所述场景描述文件的节点列表中获取第一个节点描述模块和第二个节点描述模块作为所述待渲染三维场景中的节点对应的节点描述模块。
在一些实施例中,根据所述待渲染三维场景中的节点对应的节点描述模块,获取所述待渲染三维场景中的节点的描述信息,包括如下步骤a1和步骤a2中的至少一项:
步骤a1、根据所述待渲染三维场景中的节点对应的节点描述模块中的节点名称语法元素(name)的值,获取所述待渲染三维场景中的节点的名称。
步骤a2、根据所述待渲染三维场景中的节点对应的节点描述模块中的网格索引列表声明的索引值,确定所述待渲染三维场景中的节点挂载的三维网格对应的网格描述模块的索引值。
示例性的,当某一节点对应的节点描述模块如下所示:
则,基于上述步骤a1可以确定该节点的名称为:GPCCexample_node,基于上步骤a2可以确定该节点挂载的三维网格对应的网格描述模块的索引值分别为0和1。
在一些实施例中,在确定所述待渲染三维场景中的节点挂载的三维网格对应的网格描述模块的索引值之后,上述实施例提供的场景描述文件的解析方法还包括:根据所述待渲染三维场景中的节点挂载的三维网格对应的网格描述模块的索引值,从所述场景描述文件的网格列表(meshes)中获取所述待渲染三维场景中的节点挂载的三维网格对应的网格描述模块;以及根据所述待渲染三维场景中的节点挂载的三维网格对应的网格描述模块,获取所述待渲染三维场景中的节点挂载的三维网格的描述信息。
例如:当某一节点描述模块的网格索引列表声明的索引值仅包括0,则从所述场景描述文件的网格列表中获取第一个网格描述模块作为该节点描述模块对应的节点上挂载的三维网格对应的网格描述模块。
再例如:当某一节点描述模块的网格索引列表声明的索引值包括1和2,则从所述场景描 述文件的网格列表中获取第二个网格描述模块和第三个网格描述模块作为该节点描述模块对应的节点上挂载的三维网格对应的网格描述模块。
在一些实施例中,根据所述待渲染三维场景中的节点挂载的三维网格对应的网格描述模块,获取所述待渲染三维场景中的节点挂载的三维网格的描述信息包括如下步骤b1~步骤b4中的至少一项:
步骤b1、根据三维网格对应的网格描述模块中的网格名称语法元素(name)获取三维网格的名称。
步骤b2、根据三维网格对应的网格描述模块中的数据种类语法元素获取三维网格所包括的数据种类。
在一些实施例中,上述步骤b2(根据三维网格对应的网格描述模块中的数据种类语法元素获取三维网格所包括的数据种类),包括:根据三维网格对应的网格描述模块的基元(primitives)的扩展列表(extensions)的目标扩展数组中的数据种类语法元素获取三维网格所包括的数据种类。
在一些实施例中,所述目标扩展数组可以为MPEG_primitve_GPCC。
例如:某一三维网格对应的网格描述模块的基元(primitives)的扩展列表(extensions)如下所示:
则,可以根据该三维网格对应的网格描述模块的基元(primitives)的扩展列表(extensions)的目标扩展数组(MPEG_primitve_GPCC)中的位置坐标语法元素(position),确定三维网格包括位置坐标,根据该三维网格对应的网格描述模块的基元(primitives)的扩展列表(extensions)的目标扩展数组(MPEG_primitve_GPCC)中的颜色值语法元素(color_0),确定该三维网格包括颜色值,以及根据该三维网格对应的网格描述模块的基元(primitives)的扩展列表(extensions)的目标扩展数组(MPEG_primitve_GPCC)中的法向量语法元素(normal),确定该三维网格包括法向量。
再例如:例如:某一三维网格对应的网格描述模块的基元(primitives)的扩展列表(extensions)如下所示:
则,可以根据该三维网格对应的网格描述模块的基元(primitives)的扩展列表(extensions)的目标扩展数组(MPEG_primitve_GPCC)中的位置坐标语法元素(G-PCC_position),确定三维网格包括位置坐标,根据该三维网格对应的网格描述模块的基元(primitives)的扩展列表 (extensions)的目标扩展数组(MPEG_primitve_GPCC)中的颜色值语法元素(G-PCC_color_0),确定该三维网格包括颜色值,以及根据该三维网格对应的网格描述模块的基元(primitives)的扩展列表(extensions)的目标扩展数组(MPEG_primitve_GPCC)中的法向量语法元素(G-PCC_normal),确定该三维网格包括法向量。
在一些实施例中,上述步骤b2(根据三维网格对应的网格描述模块中的数据种类语法元素获取三维网格所包括的数据种类),包括:根据三维网格对应的网格描述模块的基元(primitives)的属性(attributes)中的数据种类语法元素获取三维网格所包括的数据种类。
例如:某一三维网格对应的网格描述模块的基元(primitives)的属性(attributes)如下所示:
则,可以根据该三维网格对应的网格描述模块的基元(primitives)的属性(attributes)中的位置坐标语法元素(position),确定三维网格包括位置坐标,根据该三维网格对应的网格描述模块的基元(primitives)的属性(attributes)中的颜色值语法元素(color_0),确定该三维网格包括颜色值,以及根据该三维网格对应的网格描述模块的基元(primitives)的属性(attributes)中的法向量语法元素(normal),确定该三维网格包括法向量。
再例如:某一三维网格对应的网格描述模块的基元(primitives)的属性(attributes)如下所示:
则,可以根据该三维网格对应的网格描述模块的基元(primitives)的属性(attributes)中的位置坐标语法元素(G-PCC_position),确定三维网格包括位置坐标,根据该三维网格对应的网格描述模块的基元(primitives)的属性(attributes)中的颜色值语法元素(G-PCC_color_0),确定该三维网格包括颜色值,以及根据该三维网格对应的网格描述模块的基元(primitives)的属性(attributes)中的法向量语法元素(G-PCC_normal),确定该三维网格包括法向量。
步骤b3、根据数据种类语法元素的值获取用于访问三维网格的种类的数据的访问器对应的访问器描述模块的索引值。
承上示例所述,位置坐标语法元素(G-PCC_position)的值为0,因此确定用于访问该三维网格的位置坐标的访问器对应的访问器描述模块的索引值为0(访问器列表中的第一个访问器),颜色值语法元素(G-PCC_color_0)的值为1,因此确定用于访问该三维网格的颜色值的访问器对应的访问器描述模块的索引值为1(访问器列表中的第二个访问器),法向量语法元素(G-PCC_normal)的值为2,因此确定用于访问该三维网格的法向量的访问器对应的访问器描述模块的索引值为2(访问器列表中的第三个访问器)。
步骤b4、根据三维网格对应的网格描述模块中的模式语法元素(mode)的值获取三维网 格的拓扑结构的类型。
示例性的,当模式语法元素的值为0时,可以确定三维网格的拓扑结构的类型为散点,模式语法元素的值为1时,可以确定三维网格的拓扑结构的类型为线,当模式语法元素的值为4时,可以确定三维网格的拓扑结构的类型为三角形。
示例性的,某一三维网格对应的网格描述模块如下所示:
则,根据该三维网格对应的网格描述模块获取的该三维网格的描述信息包括:该三维网格的名称为:G-PCCexample_mesh;该三维网格的拓扑类型为散点;该三维网格包括三个种类的数据,分别为位置坐标、颜色值以及法向量,用于访问该三维网格的位置坐标的访问器对应的访问器描述模块的索引值为0,用于访问该三维网格的颜色值的访问器对应的访问器描述模块的索引值为1,用于访问该三维网格的法向量的访问器对应的访问器描述模块的索引值为2。
在一些实施例中,在根据数据种类语法元素的值获取用于访问三维网格的种类的数据的访问器对应的访问器描述模块的索引值之后,所述方法还包括:
根据用于访问三维网格的种类的数据的访问器对应的访问器描述模块的索引值,从所述场景描述文件的访问器列表中获取用于访问三维网格的各个种类的数据的访问器对应的访问器描述模块,以及根据用于访问三维网格的各个种类的数据的访问器对应的访问器描述模块,获取用于访问三维网格的各个种类的数据的访问器的描述信息。
例如:用于访问该三维网格的颜色值的访问器对应的访问器描述模块的索引值为1,则从从所述场景描述文件的访问器列表中获取第二访问器描述模块作为用于访问该三维网格的颜色值的访问器对应的访问器描述模块。
在一些实施例中,根据用于访问三维网格的各个种类的数据的访问器对应的访问器描述模块,获取用于访问三维网格的各个种类的数据的访问器的描述信息,包括以下步骤c1~步骤c6中的至少一项:
步骤c1、根据访问器描述模块中的数据类型语法元素(componentType)的值确定访问器所访问的数据的类型。
例如:用于访问某一三维网格的法向量的访问器对应的访问器描述模块中的数据类型语法元素及其为:"componentType":5126,则可以确定该访问器描述模块对应的访问器所访问的数据(该三维网格的法向量)的类型为32为的浮点数(float)。
步骤c2、根据访问器描述模块中的访问器类型语法元素(type)的值确定访问器的类型。
例如:用于访问某一三维网格的位置坐标的访问器对应的访问器描述模块中的访问器类型语法元素及其为:"type":VEC3,则可以确定该访问器描述模块对应的访问器的类型为三维向量。
步骤c3、根据访问器描述模块中的数据数量语法元素(count)的值确定访问器所访问的数据的数量。
例如:用于访问某一三维网格的颜色值的访问器对应的访问器描述模块中的数据数量语法元素及其为:"count":1000,则可以确定该访问器描述模块对应的访问器所访问的数据(该三维网格的颜色值)的数量为1000。
步骤c4、根据访问器描述模块中是否包含MPEG时变访问器(MPEG_accessor_timed)确定访问器是否为基于MPEG扩展改造的时变访问器。
在一些实施例中,根据访问器描述模块中是否包含MPEG时变访问器确定访问器是否为基于MPEG扩展改造的时变访问器,包括:若访问器描述模块中包含MPEG时变访问器,则确定访问器为基于MPEG扩展改造的时变访问器,而若访问器描述模块中不包含MPEG时变访问器,则确定访问器不为基于MPEG扩展改造的时变访问器。
步骤c5、根据访问器描述模块的MPEG时变访问器(MPEG_accessor_timed)中的缓存切片索引语法元素(bufferView)的值确定存储访问器所访问的数据的缓存切片对应的缓存切片描述模块的索引值。
例如:用于访问某一三维网格的法向量的访问器对应的访问器描述模块的MPEG时变访问器中的缓存切片索引语法元素及其为:"bufferView":0,则可以确定该访问器描述模块对应的访问器所访问的数据(该三维网格的法向量)存储于缓存切片列表中的第一个缓存切片描述模块对应的缓存切片中。
步骤c6、根据访问器描述模块的MPEG时变访问器中的时变语法元素(immutable)的值,确定访问器内的语法元素的取值是否随时间变化。
在一些实施例中,根据访问器描述模块的MPEG时变访问器中的时变语法元素(immutable)的值,确定访问器内的语法元素的取值是否随时间变化,包括:若访问器描述模块的MPEG时变访问器中的时变语法元素及其值为:"immutable":true,则确定访问器内的语法元素的取值不随时间变化,而若访问器描述模块的MPEG时变访问器中的时变语法元素及其值为:"immutable":false,则确定访问器内的语法元素的取值会随时间变化。
示例性的,某一访问器对应的访问器描述模块如下所示:
则,根据该访问器对应的访问器描述模块获取的该访问器的描述信息包括:该访问器所访问的数据的类型为5123;该访问器类型为标量(SCALAR);该访问器所访问的数据的数量 为1000;该访问器基于MPEG扩展的改造时变访问器;访问器所访问的数据缓存于缓存切片列表中的第二个缓存切片描述模块对应的缓存切片中;该访问器内的语法元素的取值不随时间变化。
在一些实施例中,上述实施例提供场景描述文件解析方法还包括如下步骤d至步骤g:
步骤d、获取所述场景描述文件的缓存器列表(buffers)中的缓存器描述模块。
步骤e、获取缓存器描述模块中的媒体索引语法元素(media)的值。
步骤f、将所述媒体索引语法元素的值与所述目标媒体描述模块的索引值相同的缓存器描述模块确定为用于缓存所述目标媒体文件的解码数据的目标缓存器对应的目标缓存器描述模块。
示例性的,当所述目标媒体描述模块的索引值为0,则将所述媒体索引语法元素的值为0的缓存器描述模块确定为用于缓存所述目标媒体文件的解码数据的目标缓存器对应的目标缓存器描述模块。
需要说明的是,用于缓存所述目标媒体文件的解码数据的目标缓存器的数量可以为一个也可以为多个,本申请实施例对此不做限制。
步骤g、根据所述目标缓存器描述模块获取所述目标缓存器的描述信息。
在一些实施例中,根据所述目标缓存器描述模块获取所述目标缓存器的描述信息包括以下步骤g1~步骤g4中的至少一项:
步骤g1、根据所述目标缓存器描述模块中的第一字节长度语法元素(byteLength)的值,获取所述目标缓存器的容量。
例如:所述目标缓存器描述模块中的第一字节长度语法元素及其值为"byteLength":15000,则可以确定所述目标缓存器的容量为15000字节。
步骤g2、根据所述目标缓存器描述模块中是否包含MPEG环形缓存器(MPEG_buffer_circular)确定所述目标缓存器是否为基于MPEG扩展改造的环形缓存器。
在一些实施例中,根据所述目标缓存器描述模块中是否包含MPEG环形缓存器确定所述目标缓存器是否为基于MPEG扩展改造的环形缓存器,包括:若所述目标缓存器描述模块中包含MPEG环形缓存器确定所述目标缓存器为基于MPEG扩展改造的环形缓存器,而若所述目标缓存器描述模块中不包含MPEG环形缓存器确定所述目标缓存器不为基于MPEG扩展改造的环形缓存器。
步骤g3、根据所述目标缓存器描述模块的MPEG环形缓存器中的环节数量语法元素(count)的值,获取所述MPEG环形缓存器的存储环节的数量。
例如:所述目标缓存器描述模块的MPEG环形缓存器中的环节数量语法元素及其值为:"count":8,则可以确定所述MPEG环形缓存器包括5个存储环节。
步骤g4、根据所述目标缓存器描述模块的MPEG环形缓存器中的第二轨道索引语法元素(tracks)的值,获取所述MPEG环形缓存器所缓存的数据的源数据的轨道索引值。
示例性的,某一缓存器对应的缓存器描述模块如下所示:

则,根据该缓存器对应的缓存器描述模块可以获取该缓存器的描述信息包括:该缓存器的容量为8000字节;该缓存器为基于MPEG扩展改造的环形缓存器,环形缓存器的存储环节数量为5,环形缓存器存储的媒体文件为MPEG媒体中声明的第二个媒体文件,环形缓存器所缓存的数据的源数据的轨道索引值为1。
在一些实施例中,上述实施例提供场景描述文件解析方法还包括如下步骤h至步骤k:
步骤h、获取所述场景描述文件的缓存切片列表(bufferViews)中的缓存切片描述模块。
步骤i、获取缓存切片描述模块中的缓存器索引语法元素(buffer)的值。
步骤j、将所述缓存器索引语法元素的值与所述目标缓存器描述模块的索引值相同的缓存切片描述模块确定为所述目标缓存器的缓存切片对应的缓存切片描述模块。
示例性的,当所述目标媒体描述模块的索引值为1,则将所述缓存器索引语法元素的值为1的缓存切片描述模块确定为所述目标缓存器的缓存切片对应的缓存切片描述模块。
需要说明的是,所述目标缓存器的缓存切片的数量可以为一个或多个,本申请实施例对此不做限定。
步骤k、根据所述目标缓存器的缓存切片对应的缓存切片描述模块,获取所述目标缓存器的缓存切片的描述信息。
在一些实施例中,根据所述目标缓存器的缓存切片对应的缓存切片描述模块,获取所述目标缓存器的缓存切片的描述信息,包括以下步骤k1和步骤k2中的至少一项:
步骤k1、根据所述目标缓存器的缓存切片对应的缓存切片描述模块中的第二字节长度语法元素(byteLength)的值,获取所述目标缓存器的缓存切片的容量。
例如:所述目标缓存器的某一缓存切片对应的缓存切片描述模块中的第二字节长度语法元素及其值为:"byteLength":12000,则可以确定所述目标缓存器的该缓存切片的容量为12000字节。
步骤k2、根据所述目标缓存器的缓存切片对应的缓存切片描述模块中的偏移量语法元素(byteOffset)的值,获取所述目标缓存器的缓存切片的偏移量。
例如:所述目标缓存器的某一缓存切片对应的缓存切片描述模块中的偏移量语法元素及其值为:"byteOffset":0,则可以确定所述目标缓存器的该缓存切片的偏移量为0字节。
示例性的,示例性的,某一缓存切片对应的缓存切片描述模块如下所示:
则,根据该缓存切片对应的缓存切片描述模块可以获取该缓存切片的描述信息包括:该缓存切片为缓存器列表中的第二缓存器描述模块对应的缓存器的缓存切片,该缓存切片的容量为8000字节;该缓存切片的偏移量为0,即该缓存切片缓存的数据范围为前8000个字节。
在一些实施例中,上述实施例提供场景描述文件解析方法还包括如下步骤l至步骤o:
步骤l、获取所述场景描述文件的访问器列表(accessor)中的访问器描述模块。
步骤m、获取访问器描述模块中的缓存切片索引语法元素(bufferView)的值。
步骤n、将所述缓存切片索引语法元素的值与所述目标缓存器的缓存切片对应的缓存切片描述模块的索引值相同的访问器描述模块,确定为用于对所述目标缓存器的缓存切片中的数据进行访问的访问器对应的访问器描述模块。
例如:所述目标缓存器的某一缓存切片对应的缓存切片描述模块的索引值为2,则所述缓存切片索引语法元素的值为2的访问器描述模块确定为用于对所述目标缓存器的该缓存切片中的数据进行访问的访问器对应的访问器描述模块。
步骤o、根据用于对所述目标缓存器的缓存切片中的数据进行访问的访问器对应的访问器描述模块,获取用于对所述目标缓存器的缓存切片中的数据进行访问的访问器的描述信息。
在一些实施例中,根据用于对所述目标缓存器的缓存切片中的数据进行访问的访问器对应的访问器描述模块,获取用于对所述目标缓存器的缓存切片中的数据进行访问的访问器的描述信息,包括以下步骤o1~步骤o6中的至少一项:
步骤o1、根据访问器描述模块中的数据类型语法元素(componentType)的值确定访问器所访问的数据的类型。
步骤o2、根据访问器描述模块中的访问器类型语法元素(type)的值确定访问器的类型。
步骤o3、根据访问器描述模块中的数据数量语法元素(count)的值确定访问器所访问的数据的数量。
步骤o4、根据访问器描述模块中是否包含MPEG时变访问器(MPEG_accessor_timed)确定访问器是否为基于MPEG扩展改造的时变访问器。
步骤o5、根据访问器描述模块的MPEG时变访问器中的缓存切片索引语法元素(bufferView)的值确定存储访问器所访问的数据的缓存切片对应的缓存切片描述模块的索引值。
步骤o6、根据访问器描述模块的MPEG时变访问器中的时变语法元素(immutable)的值,确定访问器内的语法元素的取值是否随时间变化。
上步骤o1~步骤o6的实现方式可以参照上述步骤c1~步骤c6的实现方式,为避免赘述,此处不再详细说明。
本申请一些实施例还提供一种三维场景的渲染方法,该三维场景的渲染方法的执行主体为沉浸式媒体描述框架中的显示引擎,参照图14所示,该三维场景的渲染方法包括如下步骤:
S141、获取待渲染三维场景的场景描述文件。
其中,所述待渲染三维场景中包括类型为G-PCC编码点云的目标媒体文件。
在一些实施例中,获取待渲染三维场景的场景描述文件的实现方式包括:向媒体资源服务器发送用于请求所述待渲染三维场景的场景描述文件的请求信息,以及接收所述媒体资源服务器发送的携带有所述待渲染三维场景的场景描述文件的请求响应。
S142、根据所述场景描述文件的MPEG媒体(MPEG_media)的媒体列表(media)中所述目标媒体文件对应的媒体描述模块,获取所述目标媒体文件的描述信息。
在一些实施例中,所述目标媒体文件的描述信息包括:所述目标媒体文件的名称、所述目标媒体文件是否需要自动播放、所述目标媒体文件是否需要循环播放、所述目标媒体文件 的封装格式、所述目标媒体文件的码流的类型、所述目标媒体文件的编码参数等中的一项或多项。
根据所述目标媒体文件对应的媒体描述模块获取所述目标媒体文件的描述信息的实现方式可以参照上述场景描述文件解析方法中解析所述目标媒体文件的媒体描述模块的实现方式,为避免赘述,此处不再详细说明。
S143、向媒体接入函数发送所述目标媒体文件的描述信息。
显示引擎向媒体接入函数发送所述目标媒体文件的描述信息后,媒体接入函数可以根据所述目标媒体文件的描述信息获取所述目标媒体文件,对所述目标媒体文件进行处理获取所述目标媒体文件的解码数据,以及将所述目标媒体文件的解码数据写入目标缓存器。
在一些实施例中,显示引擎向媒体接入函数发送所述目标媒体文件的描述信息,包括:显示引擎可以通过媒体接入函数API向媒体接入函数发送所述目标媒体文件的描述信息。
在一些实施例中,显示引擎向媒体接入函数发送所述目标媒体文件的描述信息,包括:显示引擎向媒体接入函数发送携带有所述目标媒体文件的描述信息的媒体文件处理指令。
S144、从所述目标缓存器中读取所述目标媒体文件的解码数据。
即,从目标缓存器中读取经过媒体接入函数完备处理的、可以直接用于进行待渲染三维场景渲染的数据。
S145、基于所述目标媒体文件的解码数据对所述待渲染三维场景进行渲染。
本申请实施例提供的三维场景的渲染方法在获取包括类型为G-PCC编码点云的目标媒体文件的待渲染三维场景的场景描述文件后,首先根据所述场景描述文件的MPEG媒体的媒体列表中所述目标媒体文件对应的媒体描述模块获取所述目标媒体文件的描述信息,并将向媒体接入函数发送所述目标媒体文件的描述信息,以使媒体接入函数根据所述目标媒体文件的描述信息获取所述目标媒体文件,对所述目标媒体文件进行处理获取所述目标媒体文件的解码数据,以及将所述目标媒体文件的解码数据写入目标缓存器,再从所述目标缓存器中读取所述目标媒体文件的解码数据,以及基于所述目标媒体文件的解码数据对所述待渲染三维场景进行渲染。由于本申请实施例提供的三维场景的渲染方法中,显示引擎可以根据所述目标媒体描述模块获取所述目标媒体文件的描述信息,将向媒体接入函数发送目标媒体文件的描述信息,读取类型为G-PCC编码点云的目标媒体文件的解码数据以及基于所述目标媒体文件的解码数据对所述待渲染三维场景进行渲染,因此本申请实施例提供了一种渲染包括类型为G-PCC编码点云的媒体文件待渲染三维场景的渲染方法,实现了基于场景描述文件渲染类型G-PCC编码点云的媒体文件。
本申请一些实施例还提供一种媒体文件的处理方法,该媒体文件的处理方法的执行主体为沉浸式媒体描述框架中的媒体接入函数,参照图15所示,该媒体文件的处理方法包括如下步骤:
S151、接收显示引擎发送的目标媒体文件的描述信息、目标缓存器的描述信息以及所述目标缓存器的缓存切片的描述信息。
其中,所述目标媒体文件为类型为G-PCC编码点云的媒体文件,所述目标缓存器为用于缓存所述目标媒体文件的解码数据的缓存器。
在一些实施例中,所述目标媒体文件的描述信息可以包括以下至少一项:
所述目标媒体文件的名称、所述目标媒体文件是否需要自动播放、所述目标媒体文件是否需要循环播放、所述目标媒体文件的封装格式、所述目标媒体文件的码流的类型、所述目标媒体文件的编码参数。
在一些实施例中,所述目标缓存器的描述信息可以包括以下至少一项:
缓存器的容量、是否为MPEG的环形缓存器、环形缓存器的存储环节数量、所述目标媒体文件对应的媒体描述模块的索引值、环形缓存器所缓存的数据的源数据的轨道索引值。
在一些实施例中,所述目标缓存器的缓存切片的描述信息可以包括以下至少一项:
缓存切片所属的缓存器、缓存切片的容量、缓存切片的偏移量。
在一些实施例中,接收显示引擎发送的目标媒体文件的描述信息、目标缓存器的描述信息以及所述目标缓存器的缓存切片的描述信息,包括:
通过媒体接入函数API接收所述显示引擎发送的所述目标媒体文件的描述信息、所述目标缓存器的描述信息以及所述目标缓存器的缓存切片的描述信息。
S152、根据所述目标媒体文件的描述信息获取所述目标媒体文件的解码数据。
在一些实施例中,媒体接入函数根据所述目标媒体文件的描述信息获取所述目标媒体文件的解码数据,包括:
根据所述目标媒体文件的描述信息创建用于处理所述目标媒体文件的目标管线,以及通过所述目标管线获取所述目标媒体文件,并对所述目标媒体文件进行解封装和解码,以获取所述目标媒体文件的解码数据。
在一些实施例中,所述通过所述目标管线获取所述目标媒体文件,并对所述目标媒体文件进行解封装和解码,以获取所述目标媒体文件的解码数据,包括:通过所述目标管线的输入模块获取所述目标媒体文件,并将所述目标媒体文件输入所述目标管线的解封装模块;通过所述解封装模块对所述目标媒体文件进行解码,获取所述目标媒体文件的几何码流和属性码流;通过所述目标管线的几何解码器对所述几何码流进行解码,获取所述目标媒体文件的几何解码数据;通过所述目标管线的属性解码器对所述属性码流进行解码,获取所述目标媒体文件的属性解码数据。
在一些实施例中,所述通过所述目标管线获取所述目标媒体文件,并对所述目标媒体文件进行解封装和解码,以获取所述目标媒体文件的解码数据,还包括:在获取所述目标媒体文件的几何解码数据之后,通过所述目标管线的第一后处理模块对所述几何解码数据进行处理,以及在获取所述目标媒体文件的属性解码数据之后,通过所述目标管线的第二后处理模块对所述属性解码数据进行处理。
示例性的,通过所述目标管线的第一后处理模块对所述几何解码数据进行处理可以包括:通过所述目标管线的第一后处理模块对所述几何解码数据进行格式转换,通过所述目标管线的第二后处理模块对所述属性解码数据进行处理可以包括:通过所述目标管线的第二后处理模块对所述属性解码数据进行格式转换。
S153、根据所述目标缓存器的描述信息以及所述目标缓存器的缓存切片的描述信息,将所述目标媒体文件的解码数据写入所述目标缓存器中。
将所述目标媒体文件的解码数据写入所述目标缓存器中后,显示引擎可以根据所述目标缓存器的描述信息以及所述目标缓存器的缓存切片的描述信息,从所述目标缓存器中读取所 述目标媒体文件的解码数据,以及基于所述目标媒体文件的解码数据对包括所述目标媒体文件的待渲染三维场景进行渲染。
本申请实施例提供的媒体文件的处理方法在接收到显示引擎发送的类型为G-PCC编码点云的目标媒体文件的描述信息、用于缓存所述目标媒体文件的解码数据的缓目标缓存器的描述信息以及所述目标缓存器的缓存切片的描述信息后,根据所述目标媒体文件的描述信息获取所述目标媒体文件对应的解码数据,并将根据所述目标缓存器的描述信息和所述目标缓存器的缓存切片的描述信息,将所述目标媒体文件的解码数据写入所述目标缓存器中,因此显示引擎可以根据所述目标缓存器的描述信息以及所述目标缓存器的缓存切片的描述信息,从所述目标缓存器中读取所述目标媒体文件的解码数据,以及基于所述目标媒体文件的解码数据对包括所述目标媒体文件的待渲染三维场景进行渲染,因此本申请实施例可以支持在场景描述框中渲染类型为G-PCC编码点云的媒体文件。
本申请一些实施例还提供一种缓存管理方法,该缓存管理方法的执行主体为沉浸式媒体描述框架中的缓存管理模块,参照图16所示,该缓存管理方法包括如下步骤:
S161、接收目标缓存器的描述信息和所述目标缓存器的缓存切片的描述信息。
其中,所述目标缓存器为用于缓存目标媒体文件的缓存器,所述目标媒体文件为类型为G-PCC编码点云的媒体文件。
在一些实施例中,所述目标缓存器的描述信息可以包括以下至少一项:
缓存器的容量、是否为MPEG的环形缓存器、环形缓存器的存储环节数量、环形缓存器所缓存的媒体文件(所述目标媒体文件)对应的媒体描述模块的索引值、环形缓存器所缓存的数据的源数据的轨道索引值。
在一些实施例中,所述目标缓存器的缓存切片的描述信息可以包括以下至少一项:
缓存切片所属的缓存器、缓存切片的容量、缓存切片的偏移量。
S162、根据所述目标缓存器的描述信息创建所述目标缓存器。
例如:目标缓存器的描述信息包括:目标缓存器的容量为8000字节;目标缓存器为基于MPEG扩展改造的环形缓存器,环形缓存器的存储环节数量为3,环形缓存器存储的媒体文件为MPEG媒体中声明的第一个媒体文件,环形缓存器所缓存的数据的源数据的轨道索引值为1,则所述缓存管理模块创建一个容量为8000字节、包含3个存储环节的环形缓存器作为所述目标缓存器。
S163、根据所述目标缓存器的缓存切片的描述信息对所述目标缓存器进行缓存切片的划分。
承上实施例所述,若所述环形缓存器包括两个缓存切片,第一个缓存切片的描述信息包括:所述容量为6000字节,偏移量为0,第二个缓存切片的描述信息包括:所述容量为2000字节,偏移量为6001,则将所述目标缓存器划分为2个缓存器切片,第一个缓存切片的容量为6000字节,用于缓存所述目标媒体文件的解码数据的前6000个字节的数据,第二个缓存切片的容量为2000字节,用于缓存所述目标媒体文件的解码数据的6001~8000个字节的数据。
在缓存管理模块根据所述目标缓存器的缓存切片的描述信息对所述目标缓存器进行缓存切片的划分后,媒体接入函数可以将所述目标媒体文件的解码数据写入所述目标缓存器中,显示引擎可以从所述目标缓存器中读取所述目标媒体文件的解码数据,以及基于所述目标媒 体文件的解码数据对包括所述目标媒体文件的待渲染三维场景进行渲染。
本申请实施例提供的缓存管理方法在接收到目标缓存器的描述信息和所述目标缓存器的缓存切片的描述信息后,可以根据所述目标缓存器的描述信息创建所述目标缓存器,并根据所述目标缓存器的缓存切片的描述信息对所述目标缓存器进行缓存切片的划分,因此媒体接入函数可以将类型为G-PCC编码点云的媒体文件的解码数据将所述目标媒体文件的解码数据写入所述目标缓存器中,显示引擎可以从所述目标缓存器中读取所述目标媒体文件的解码数据,以及基于所述目标媒体文件的解码数据对包括所述目标媒体文件的待渲染三维场景进行渲染,因此本申请实施例可以支持在场景描述框中渲染类型为G-PCC编码点云的媒体文件。
本申请一些实施例还提供了一种三维场景的渲染方法,该三维场景的渲染方法包括:显示引擎所执行的场景描述文件解析方法和三维场景的渲染方法、媒体接入函数所执行的媒体文件的处理方法以及缓存管理模块所执行的缓存管理方法。参照图17所示,该方法包括如下步骤:
S1701、显示引擎获取待渲染三维场景的场景描述文件。
其中,所述待渲染三维场景中包括类型为G-PCC编码点云的目标媒体文件。
在一些实施例中,显示引擎获取待渲染场景的场景描述文件,包括:显示引擎使用网络传输服务从服务器中下载所述场景描述文件。
在一些实施例中,显示引擎获取待渲染场景的场景描述文件,包括:从本地存储空间中读取所述场景描述文件。
S1702、显示引擎从所述场景描述文件的MPEG媒体(MPEG_media)的媒体列表(media)中获取各个媒体文件对应的媒体描述模块(包括:从所述场景描述文件的MPEG媒体的媒体列表中获取所述目标媒体文件对应的媒体描述模块)。
S1703、显示引擎根据各个媒体文件对应的媒体描述模块获取各个媒体文件的描述信息(包括:根据所述目标媒体文件对应的媒体描述模块获取所述目标媒体文件的描述信息)。
在一些实施例中,媒体文件的描述信息包括以下至少一项:
媒体文件的名称、媒体文件是否自动播放、媒体文件是否循环播放、媒体文件的封装格式、媒体文件的访问地址、媒体文件的封装文件的轨道信息、媒体文件的编解码参数。
显示引擎根据所述目标媒体文件对应的媒体描述模块获取所述目标媒体文件的描述信息的实现方式可以参照上述场景描述解析方法中解析所述目标媒体文件的媒体描述模块的实现方式,为避免赘述,此处不再详细说明。
S1704、显示引擎向媒体接入函数发送各个媒体文件的描述信息(包括:向媒体接入函数发送所述目标媒体文件的描述信息)。
相应的,媒体接入函数接收所述显示引擎发送的各个媒体文件的描述信息(包括:接收所述显示引擎发送的所述目标媒体文件的描述信息)。
在一些实施例中,显示引擎向媒体接入函数发送各个媒体文件的描述信息,包括:显示引擎通过媒体接入函数API向媒体接入函数发送各个媒体文件的描述信息。
在一些实施例中,媒体接入函数接收所述显示引擎发送的各个媒体文件的描述信息,包括:媒体接入函数通过媒体接入函数API接收所述显示引擎发送各个媒体文件的描述信息。
S1705、媒体接入函数根据各个媒体文件的描述信息创建用于处理各个媒体文件对应的管 线(包括根据所述目标媒体文件描述信息创建用于处理所述目标媒体文件的目标管线)。
在一些实施例中,所述目标管线,包括:输入模块、解封装模块以及解码模块;所述输入模块用于获取所述目标媒体文件(封装文件),所述解封装模块用于对所述目标媒体文件进行解封装,获取所述目标媒体文件的码流(可能为单轨道封装的G-PCC码流,也可能为多轨道封装的G-PCC几何码流和G-PCC属性码流),所述解码模块包括解码器、几何解码器以及属性解码器,当目标媒体文件的码流单轨道封装的G-PCC码流时,解码模块通过解码器对所述G-PCC码流进行解码获取所述目标媒体文件的解码数据,当目标媒体文件的码流为多轨道封装的G-PCC几何码流和G-PCC属性码流时,分别通过几何解码器和属性解码器对所述G-PCC几何码流和G-PCC属性码流进行解码,获取所述目标媒体文件的几何数据和属性数据,以获取所述目标媒体文件的解码数据。
在一些实施例中,所述目标管线还包括:第一后处理模块和第二后处理模块;所述第一后处理模块用于对解码G-PCC几何码流得到的几何数据进行格式转换等后处理,所述第二后处理模块用于对解码G-PCC属性码流得到的属性数据进行格式转换等后处理。
S1706、媒体接入函数通过各个媒体文件对应的管线处理获取各个媒体文件,并对各个媒体文件进行解封装和解码,以获取各个媒体文件对应的解码数据。(包括通过所述目标管线获取所述目标媒体文件,并对所述目标媒体文件进行解封装和解码,以获取所述目标媒体文件对应的解码数据)。
在一些实施例中,所述目标媒体文件的描述信息包括所述目标媒体文件的访问地址,媒体接入函数根据所述目标媒体文件的描述信息获取所述目标媒体文件的解码数据,包括:
媒体接入函数根据所述目标媒体文件的访问地址获取所述目标媒体文件。
在一些实施例中,所述媒体接入函数根据所述目标媒体文件的访问地址获取所述目标媒体文件,包括:媒体接入函数根据所述目标媒体文件的访问地址向媒体资源服务器发送媒体资源请求,并接收所述媒体服务器发送的携带有所述目标媒体文件的媒体资源响应。
在一些实施例中,所述媒体接入函数根据所述目标媒体文件的访问地址获取所述目标媒体文件,包括:媒体接入函数根据所述目标媒体文件的访问地址从预设存储空间中读取所述目标媒体文件。
在一些实施例中,所述目标媒体文件的描述信息还包括所述目标媒体文件的各个码流轨道的索引值;媒体接入函数根据所述目标媒体文件的描述信息获取所述目标媒体文件的解码数据,包括:
媒体接入函数根据所述目标媒体文件封装格式对所述目标媒体文件进行解封装,获取所述目标媒体文件的各个码流轨道的码流。
在一些实施例中,所述目标媒体文件的描述信息还包括所述目标媒体文件的码流的类型和编解码参数;媒体接入函数根据所述目标媒体文件的描述信息获取所述目标媒体文件的解码数据,包括:
媒体接入函数根据所述目标媒体文件的码流的类型和编解码参数对所述目标媒体文件的各个码流轨道的码流进行解码,获取所述目标媒体文件的解码数据。
S1707、显示引擎获取所述场景描述文件的缓存器列表(buffers)中的各个缓存器描述模块(包括:从所述场景描述文件的缓存器列表中获取用于缓存所述目标媒体文件的解码数据 的目标缓存器对应的缓存器描述模块)。
S1708、显示引擎根据各个缓存器对应的缓存器描述模块获取各个缓存器的描述信息(包括:根据所述目标缓存器对应的缓存器描述模块获取所述目标缓存器的描述信息)。
在一些实施例中,缓存器的描述信息可以包括以下至少一项:
缓存器的容量(字节长度)、缓存器所缓存的数据的访问地址、是否为MPEG环形缓存器、环形缓存器的存储环节数量、环形缓存器所缓存的媒体文件对应的媒体描述模块的索引值、环形缓存器所缓存的数据的源数据的轨道索引值。
S1709、显示引擎获取所述场景描述文件的缓存切片列表(bufferViews)中的各个缓存切片描述模块(包括:从所述场景描述文件的缓存切片列表中获取所述目标缓存器的缓存切片描对应的缓存切片描描述模块)。
S1710、显示引擎根据各个缓存器的缓存切片对应的缓存切片描述模块获取各个缓存器的缓存切片的描述信息(包括:根据所述目标缓存器的缓存切片对应的缓存切片描述模块获取所述目标缓存器的缓存切片的描述信息)。
在一些实施例中,缓存器的描述信息可以包括以下至少一项:
缓存切片所属的缓存器、缓存切片的容量、缓存切片的偏移量。
S1711、显示引擎获取所述场景描述文件的访问器列表(accessors)中的各个访问器描述模块(包括:从所述场景描述文件的访问器列表中获取用于访问所述目标媒体文件的解码数据的目标访问器对应的访问器描述模块)。
S1712、显示引擎根据各个访问器对应的访问器描述模块获取各个访问器的描述信息(包括:根据所述目标访问器对应的访问器描述模块获取用于访问所述目标媒体文件的解码数据的目标访问器的描述信息)。
在一些实施例中,访问器的描述信息可以包括以下至少一项:
访问器所访问的缓存切片、访问器所访问的数据的数据类型、访问器的类型、访问器所访问的数据的数量、是否为MPEG时变访问器、时变访问器所访问的缓存切片、访问器参数是否随时间变化。
在一些实施例中,在上步骤S1707~S1712之后,本申请实施例可以通过如下方案一将各个缓存器的描述信息、各个缓存器的缓存切片的描述信息以及各个访问器的描述信息发送至媒体接入函数和缓存管理模块。
在一些实施例中,方案一(将各个缓存器的描述信息、各个缓存器的缓存切片的描述信息以及各个访问器的描述信息发送至媒体接入函数和缓存管理模块)的实现方式包括如下步骤a和步骤b:
步骤a、显示引擎向媒体接入函数发送各个缓存器的描述信息、各个缓存器的缓存切片的描述信息以及各个访问器的描述信息(包括:显示引擎向媒体接入函数发送所述目标缓存器的描述信息、所述目标缓存器的缓存切片的描述信息以及所述目标访问器的描述信息)。
相应的,媒体接入函数接收显示引擎发送的各个缓存器的描述信息和各个缓存器的缓存切片的描述信息(包括:媒体接入函数接收显示引擎发送的所述目标缓存器的描述信息、所述目标缓存器的缓存切片的描述信息以及所述访问器的描述信息)。
在一些实施例中,上步骤a(显示引擎向媒体接入函数发送各个缓存器的描述信息、各个 缓存器的缓存切片的描述信息以及各个访问器的描述信息)的实现方式可以为:显示引擎通过媒体接入函数API向媒体接入函数发送各个缓存器的描述信息、各个缓存器的缓存切片的描述信息以及各个访问器的描述信息。
相应的,媒体接入函数接收显示引擎发送的各个缓存器的描述信息的实现方式可以为:媒体接入函数通过媒体接入函数API接收显示引擎发送的各个缓存器的描述信息、各个缓存器的缓存切片的描述信息以及各个访问器的描述信息。
步骤b、媒体接入函数向缓存管理模块发送各个缓存器的描述信息、各个缓存器的缓存切片的描述信息以及各个访问器的描述信息(包括:媒体接入函数向缓存管理模块发送所述目标缓存器的描述信息、所述目标缓存器的缓存切片的描述信息以及所述目标访问器的描述信息)。
相应的,缓存管理模块接收媒体接入函数发送的各个缓存器的描述信息、各个缓存器的缓存切片的描述信息以及各个访问器的描述信息(包括:媒体接入函数接收显示引擎发送的所述目标缓存器的描述信息、所述目标缓存器的缓存切片的描述信息以及所述目标访问器的描述信息)。
在一些实施例中,上步骤b(媒体接入函数向缓存管理模块发送各个缓存器的描述信息、各个缓存器的缓存切片的描述信息以及各个访问器的描述信息)的实现方式可以包括:媒体接入函数通过缓存API向缓存管理模块发送各个缓存器的描述信息、各个缓存器的缓存切片的描述信息以及各个访问器的描述信息。相应的,缓存管理模块接收媒体接入函数发送的各个缓存器的描述信息、各个缓存器的缓存切片的描述信息以及各个访问器的描述信息的实现方式可以包括:缓存管理模块通过缓存API接收媒体接入函数发送的各个缓存器的描述信息、各个缓存器的缓存切片的描述信息以及各个访问器的描述信息。
在一些实施例中,方案一(将各个缓存器的描述信息、各个缓存器的缓存切片的描述信息以及各个访问器的描述信息发送至媒体接入函数和缓存管理模块)的实现方式包括如下步骤c和步骤d:
步骤c、显示引擎向媒体接入函数发送各个缓存器的描述信息、各个缓存器的缓存切片的描述信息以及各个访问器的描述信息(包括:显示引擎向媒体接入函数发送所述目标缓存器的描述信息、所述目标缓存器的缓存切片的描述信息以及所述目标访问器的描述信息)。
相应的,媒体接入函数接收显示引擎发送的各个缓存器的描述信息、各个缓存器的缓存切片的描述信息以及各个访问器的描述信息(包括:媒体接入函数接收显示引擎发送的所述目标缓存器的描述信息、所述目标缓存器的缓存切片的描述信息以及所述访问器的描述信息)。
步骤d、显示引擎向缓存管理模块发送各个缓存器的描述信息、各个缓存器的缓存切片的描述信息以及各个访问器的描述信息(包括:显示引擎向缓存管理模块发送所述目标缓存器的描述信息、所述目标缓存器的缓存切片的描述信息以及所述目标访问器的描述信息)。
相应的,缓存管理模块接收显示引擎发送的各个缓存器的描述信息、各个缓存器的缓存切片的描述信息以及各个访问器的描述信息。
在一些实施例中,上步骤步骤d(显示引擎向缓存管理模块发送各个缓存器的描述信息、各个缓存器的缓存切片的描述信息以及各个访问器的描述信息)的实现方式可以包括:显示引擎通过缓存API向缓存管理模块发送各个缓存器的描述信息、各个缓存器的缓存切片的描 述信息以及各个访问器的描述信息。
相应的,缓存管理模块接收显示引擎发送的各个缓存器的描述信息、各个缓存器的缓存切片的描述信息以及各个访问器的描述信息的实现方式可以包括:缓存管理模块通过缓存API接收显示引擎发送的各个缓存器的描述信息、各个缓存器的缓存切片的描述信息以及各个访问器的描述信息。
在一些实施例中,在上步骤S1707~S1712之后,本申请实施例可以通过如下方案二将各个缓存器的描述信息、各个缓存器的缓存切片的描述信息以及各个访问器的描述信息发送至媒体接入函数,以及将各个缓存器的描述信息和各个缓存器的缓存切片的描述信息发送至缓存管理模块。
在一些实施例中,方案二(将各个缓存器的描述信息、各个缓存器的缓存切片的描述信息以及各个访问器的描述信息发送至媒体接入函数,以及将各个缓存器的描述信息和各个缓存器的缓存切片的描述信息发送至缓存管理模块)的实现方式包括如下步骤e和步骤f:
步骤e、显示引擎向媒体接入函数发送各个缓存器的描述信息、各个缓存器的缓存切片的描述信息以及各个访问器的描述信息(包括:显示引擎向媒体接入函数发送所述目标缓存器的描述信息、所述目标缓存器的缓存切片的描述信息以及所述目标访问器的描述信息)。
相应的,媒体接入函数接收显示引擎发送的各个缓存器的描述信息和各个缓存器的缓存切片的描述信息(包括:媒体接入函数接收显示引擎发送的所述目标缓存器的描述信息、所述目标缓存器的缓存切片的描述信息以及所述访问器的描述信息)。
步骤f、显示引擎向缓存管理模块发送各个缓存器的描述信息和各个缓存器的缓存切片的描述信息(包括:显示引擎向媒体接入函数发送所述目标缓存器的描述信息和所述目标缓存器的缓存切片的描述信息)。
相应的,缓存管理模块接收显示引擎发送的各个缓存器的描述信息和各个缓存器的缓存切片的描述信息(包括:媒体接入函数接收显示引擎发送的所述目标缓存器的描述信息和所述目标缓存器的缓存切片的描述信息)。
在一些实施例中,方案二(将各个缓存器的描述信息、各个缓存器的缓存切片的描述信息以及各个访问器的描述信息发送至媒体接入函数,以及将各个缓存器的描述信息和各个缓存器的缓存切片的描述信息发送至缓存管理模块)的实现方式包括如下步骤g和步骤h:
步骤g、显示引擎向媒体接入函数发送各个缓存器的描述信息、各个缓存器的缓存切片的描述信息以及各个访问器的描述信息(包括:显示引擎向媒体接入函数发送所述目标缓存器的描述信息、所述目标缓存器的缓存切片的描述信息以及所述目标访问器的描述信息)。
相应的,媒体接入函数接收显示引擎发送的各个缓存器的描述信息和各个缓存器的缓存切片的描述信息(包括:媒体接入函数接收显示引擎发送的所述目标缓存器的描述信息、所述目标缓存器的缓存切片的描述信息以及所述访问器的描述信息)。
步骤f、媒体接入函数向缓存管理模块发送各个缓存器的描述信息和各个缓存器的缓存切片的描述信息(包括:媒体接入函数向媒体接入函数发送所述目标缓存器的描述信息和所述目标缓存器的缓存切片的描述信息)。
相应的,缓存管理模块接收媒体接入函数发送的各个缓存器的描述信息和各个缓存器的缓存切片的描述信息(包括:媒体接入函数接收显示引擎发送的所述目标缓存器的描述信息 和所述目标缓存器的缓存切片的描述信息)。
在通过上述方案一将各个缓存器的描述信息、各个缓存器的缓存切片的描述信息以及各个访问器的描述信息发送至媒体接入函数和缓存管理模块或通过上述方案二将各个缓存器的描述信息、各个缓存器的缓存切片的描述信息以及各个访问器的描述信息发送至媒体接入函数,以及将各个缓存器的描述信息和各个缓存器的缓存切片的描述信息发送至缓存管理模块后,继续执行如下步骤:
S1713、缓存管理模块根据各个缓存器的描述信息创建各个缓存器(包括:根据所述目标缓存器的描述信息创建所述目标缓存器)。
S1714、缓存管理模块根据各个缓存器的缓存切片的描述信息对各个缓存器进行缓存切片的划分(包括:根据所述目标缓存器的缓存切片的描述信息,对所述目标缓存器进行缓存切片的划分)。
S1715、媒体接入函数根据各个缓存器的描述信息、各个缓存器的缓存切片的描述信息以及各个访问器的描述信息将各个媒体文件对应的解码数据写入各个媒体文件对应的缓存器中(包括:媒体接入函数根据所述目标缓存器的描述信息、所述目标缓存器的缓存切片的描述信息以及所述目标访问器的描述信息,将所述目标媒体文件的解码数据写入所述目标缓存器中)。
即,媒体接入函数在缓存器的描述信息中的缓存容量、缓存器的缓存切片的描述信息中的缓存切片容量、访问器的描述信息中的访问器类型、访问器的描述信息中的数据类型等信息将媒体文件对应的解码数据以正确的排布方式写入缓存器。
S1716、显示引擎从所述场景描述文件的场景列表中获取所述待渲染三维场景对应的场景描述模块。
S1717、显示引擎根据所述待渲染三维场景对应的场景描述模块,获取所述待渲染三维场景的描述信息。
其中,所述待渲染三维场景的描述信息包括所述待渲染三维场景中的各个节点对应的节点描述模块的索引值。
S1718、显示引擎根据所述待渲染三维场景中的各个节点对应的节点描述模块的索引值,从所述场景描述文件的节点列表中获取所述待渲染三维场景中的各个节点对应的节点描述模块。
S1719、显示引擎根据所述待渲染三维场景中的各个节点对应的节点描述模块获取所述待渲染三维场景中的各个节点的描述信息。
其中,任一节点的描述信息包括该节点挂载的三维网格对应的网格描述模块的索引值。
在一些实施例中,任一节点的描述信息还包括该节点的名称。
S1720、显示引擎根据所述待渲染三维场景中的各个节点挂载的三维网格对应的网格描述模块的索引值,从所述场景描述文件的网格列表中获取所述待渲染三维场景中的三维网格对应的网格描述模块。
S1721、显示引擎根据所述待渲染三维场景中的三维网格对应的网格描述模块,获取所述待渲染三维场景中的三维网格所包含的数据种类以及用于访问所述待渲染三维场景中的各个三维网格的各个种类的数据的访问器。
在一些实施例中,所述方法还包括:根据所述待渲染三维场景中的三维网格对应的网格描述模块,获取所述待渲染三维场景中的三维网格的名称以及拓扑结构类型。
S1722、显示引擎根据各个访问器的描述信息创建各个访问器(包括根据用于访问所述待渲染三维场景中的各个三维网格的各个种类的数据的访问器的描述信息,创建用于访问所述待渲染三维场景中的各个三维网格的各个种类的数据的访问器)。
S1723、显示引擎通过各个访问器从各个媒体文件对应的缓存器中读取各个媒体文件的解码数据(包括:通过用于访问所述待渲染三维场景中的各个三维网格的各个种类的数据的访问器,从所述目标缓存器存储中读取所述待渲染三维场景中的各个三维网格的各个种类的数据)。
S1724、显示引擎基于各个媒体文件的解码数据对所述待渲染三维场景进行渲染。
在一些实施例中,本申请一些实施例提供了一种场景描述文件的生成装置,该场景描述文件的生成装置,包括:
存储器,被配置为存储计算机程序;
处理器,被配置为用于在调用计算机程序时,使得所述场景描述文件的生成装置实现上述任一实施例所述的场景描述文件的生成方法。
在一些实施例中,本申请一些实施例提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,当所述计算机程序被计算设备执行时,使得所述计算设备实现上述任一实施例所述的场景描述文件的生成方法。
在一些实施例中,本申请一些实施例提供了一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机实现上述任一实施例所述的场景描述文件的生成方法。
最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。
为了方便解释,已经结合具体的实施方式进行了上述说明。但是,上述示例性的讨论不是意图穷尽或者将实施方式限定到上述公开的具体形式。根据上述的教导,可以得到多种修改和变形。上述实施方式的选择和描述是为了更好的解释原理以及实际的应用,从而使得本领域技术人员更好的使用所述实施方式以及适于具体使用考虑的各种不同的变形的实施方式。

Claims (49)

  1. 一种场景描述文件的生成方法,包括:
    确定待渲染三维场景中的媒体文件的类型;
    当所述待渲染三维场景中的目标媒体文件的类型为基于几何的点云压缩G-PCC编码点云时,根据所述目标媒体文件的描述信息生成所述目标媒体文件对应的目标描述模块;
    在所述待渲染三维场景的场景描述文件的MPEG媒体的媒体列表中添加所述目标媒体描述模块。
  2. 根据权利要求1所述的方法,所述根据所述目标媒体文件的描述信息生成所述目标媒体文件对应的目标描述模块,包括:
    在所述目标媒体描述模块的可选项中添加媒体类型语法元素,并将所述媒体类型语法元素的值设置为G-PCC编码点云对应的封装格式值。
  3. 根据权利要求1所述的方法,所述根据所述目标媒体文件的描述信息生成所述目标媒体文件对应的目标描述模块,包括:
    在所述目标媒体描述模块的可选项的轨道数组中添加第一轨道索引语法元素,并根据所述目标媒体文件的封装方式设置所述第一轨道索引语法元素的值。
  4. 根据权利要求3所述的方法,所述根据所述目标媒体文件的封装方式设置所述第一轨道索引语法元素的值,包括:
    当所述目标媒体文件为单轨道封装文件时,将所述第一轨道索引语法元素的值设置为所述目标媒体文件的码流轨道的索引值;
    当所述目标媒体文件为多轨道封装文件时,将所述第一轨道索引语法元素的值设置为所述目标媒体文件的几何码流轨道的索引值。
  5. 根据权利要求1所述的方法,所述根据所述目标媒体文件的描述信息生成所述目标媒体文件对应的目标描述模块,包括:
    在所述目标媒体描述模块的可选项的轨道数组中添加编解码参数语法元素,并根据所述目标媒体文件的编码参数、所述目标媒体文件的码流的类型以及ISO/IEC 23090-18 G-PCC数据传输标准设置所述编解码参数语法元素的值。
  6. 根据权利要求1所述的方法,所述根据所述目标媒体文件的描述信息生成所述目标媒体文件对应的目标描述模块,包括:
    在所述目标媒体描述模块的可选项中添加统一资源标识符语法元素,并将所述统一资源标识符语法元素的值设置为所述目标媒体文件的访问地址。
  7. 根据权利要求1所述的方法,所述方法还包括:
    在所述场景描述文件的场景列表中添加所述待渲染三维场景对应的目标场景描述模块;
    在所述目标场景描述模块的节点索引列表中添加所述待渲染场景中的节点对应的节点描述模块的索引值。
  8. 根据权利要求1所述的方法,所述方法还包括:
    在所述场景描述文件的节点列表中添加所述待渲染场景中的节点对应的节点描述模块;
    在所述节点描述模块的网格索引列表中添加所述节点挂载的三维网格对应的网格描述模块的索引值。
  9. 根据权利要求1所述的方法,所述方法还包括:
    在所述场景描述文件的网格列表中添加所述待渲染场景中的三维网格对应的网格描述模块;
    在所述网格描述模块中添加所述网格描述模块对应的三维网格所包含的各个种类的数据对应的语法元素;
    将各个种类的数据对应的语法元素的值设置为用于访问各个种类的数据的访问器对应的访问器描述模块的索引值。
  10. 根据权利要求9所述的方法,所述在所述网格描述模块中添加所述网格描述模块对应的三维网格所包含的各个种类的数据对应的语法元素,包括:
    在所述目标媒体文件中的三维网格对应的网格描述模块的基元中添加扩展列表;
    在所述扩展列表中添加目标扩展数组;
    在所述目标扩展数组中添加对应的三维网格所包含的各个种类的数据对应的语法元素。
  11. 根据权利要求9所述的方法,所述在所述网格描述模块中添加所述网格描述模块对应的三维网格所包含的各个种类的数据对应的语法元素,包括:
    在所述网格描述模块的基元的属性中添加所述网格描述模块对应的三维网格所包含的各个种类的数据对应的语法元素。
  12. 根据权利要求11所述的方法,所述在所述网格描述模块的基元的属性中添加所述网格描述模块对应的三维网格所包含的各个种类的数据对应的语法元素,包括:
    基于第一语法元素集合中的语法元素,在第一网格描述模块的基元的属性中添加对应的三维网格所包含的各个种类的数据对应的语法元素,所述第一网格描述模块为类型为G-PCC编码点云的媒体文件中的三维网格对应的网格描述模块;
    基于第二语法元素集合中的语法元素,在第二网格描述模块的基元的属性中添加对应的三维网格所包含的各个种类的数据对应的语法元素,所述第二网格描述模块类型不为G-PCC编码点云的媒体文件中的三维网格对应的网格描述模块。
  13. 根据权利要求1所述的方法,所述方法还包括:
    在所述场景描述文件的访问器列表中添加目标访问器对应的访问器描述模块;
    其中,所述目标访问器为用于访问所述目标媒体文件的解码数据的访问器。
  14. 根据权利要求13所述的方法,所述在所述场景描述文件的访问器列表中添加目标访问器对应的访问器描述模块,包括以下至少一项:
    在所述目标访问器对应的访问器描述模块中添加数据类型语法元素,并根据所述目标访问器所访问的数据的类型设置所述数据类型语法元素的值;
    在所述目标访问器对应的访问器描述模块中添加访问器类型语法元素,并根据所述目标访问器的类型设置所述访问器类型语法元素的值;
    在所述目标访问器对应的访问器描述模块中添加数据数量语法元素,并根据所述目标访问器所访问的数据的数量设置所述数据数量语法元素的值;
    在所述目标访问器对应的访问器描述模块中添加MPEG时变访问器;
    在所述MPEG时变访问器中添加缓存切片索引语法元素,并根据存储所述目标访问器所访问的数据的缓存切片对应的缓存切片描述模块的索引值设置所述切片索引语法元素的值;
    在所述MPEG时变访问器中添加时变语法元素,并根据对应的目标访问器内的语法元素的取值是否随时间变化设置所述时变语法元素的值。
  15. 根据权利要求1所述的方法,所述方法还包括:
    在所述场景描述文件的缓存器列表中添加所述目标缓存器对应的缓存器描述模块;
    其中,所述目标缓存器为用于存储所述目标媒体文件的解码数据的缓存器。
  16. 根据权利要求15所述的方法,所述在所述场景描述文件的缓存器列表中添加目标缓存器对应的缓存器描述模块,包括:
    在所述缓存器描述模块中添加第一字节长度语法元素,并根据所述目标缓存器的容量设置对应的所述第一字节长度语法元素的值;
    在所述缓存器描述模块中添加MPEG环形缓存器;
    在所述MPEG环形缓存器中添加环节数量语法元素,并根据所述目标缓存器的存储环节数量设置对应的所述环节数量语法元素的值;
    在所述MPEG环形缓存器中添加媒体索引语法元素,并根据所述目标媒体描述模块的索引值设置所述媒体索引语法元素的值;
    在所述MPEG环形缓存器中添加第二轨道索引语法元素,并根据所述目标缓存器存储的数据的源数据的轨道索引值设置所述第二轨道索引语法元素的值。
  17. 根据权利要求15所述的方法,所述方法还包括:
    在所述场景描述文件的缓存切片列表中添加所述目标缓存器的缓存切片对应的缓存切片描述模块。
  18. 根据权利要求17所述的方法,所述在所述场景描述文件的缓存切片列表中添加所述目标缓存器的缓存切片对应的缓存切片描述模块,包括:
    在所述目标缓存器的缓存切片对应的缓存切片描述模块中添加缓存器索引语法元素,并根据所述缓存切片所属的目标缓存器对应的缓存器描述模块的索引值设置所述缓存器索引语法元素的值;
    在所述目标缓存器的缓存切片对应的缓存切片描述模块中添加第二字节长度语法元素,并根据所述缓存切片的容量设置所述第二字节长度语法元素的值;
    在所述目标缓存器的缓存切片对应的缓存切片描述模块中添加偏移量语法元素,并根据对应缓存切片的存储数据的偏移量设置所述偏移量语法元素的值。
  19. 一种场景描述文件的生成装置,包括:
    存储器,被配置为存储计算机程序;
    处理器,被配置为用于在调用计算机程序时,使得所述场景描述文件的生成装置实现权利要求1-18任一项所述的场景描述文件的生成方法。
  20. 一种场景描述文件的解析方法,包括:
    获取待渲染三维场景的场景描述文件,所述待渲染三维场景中包括类型为G-PCC编码点云的目标媒体文件;
    从所述场景描述文件的动态图像专家组MPEG媒体的媒体列表中获取所述目标媒体文件对应的目标媒体描述模块;
    根据所述目标媒体描述模块获取所述目标媒体文件的描述信息。
  21. 根据权利要求20所述的方法,所述根据所述目标媒体描述模块获取所述目标媒体文件的描述信息,包括以下至少一项:
    根据所述目标媒体描述模块中的媒体名称语法元素的值获取所述目标媒体文件的名称;
    根据所述目标媒体描述模块中的自动播放语法元素的值确定所述目标媒体文件是否需要自动播放;
    根据所述目标媒体描述模块中的循环播放语法元素的值确定所述目标媒体文件是否需要循环播放;
    根据所述目标媒体描述模块的可选项中的媒体类型语法元素的值获取所述目标媒体文件的封装格式;
    根据所述目标媒体描述模块的可选项中的唯一地址标识符语法元素的值获取所述目标媒体文件的访问地址;
    根据所述目标媒体描述模块的可选项的轨道数组中的第一轨道索引语法元素的值获取所述目标媒体文件的轨道信息;
    根据所述目标媒体描述模块的可选项的轨道数组中的编解码参数语法元素的值以及ISO/IEC 23090-18 G-PCC数据传输标准确定所述目标媒体文件的码流的类型和解码参数。
  22. 根据权利要求20所述的方法,所述方法还包括:
    从所述场景描述文件的场景列表中获取所述待渲染三维场景对应的目标场景描述模块;
    根据所述目标场景描述模块获取所述待渲染三维场景的描述信息。
  23. 根据权利要求22所述的方法,所述根据所述目标场景描述模块获取所述待渲染三维场景的描述信息,包括:
    根据所述目标场景描述模块的节点索引列表声明的索引值确定所述待渲染三维场景中的各个节点对应的节点描述模块的索引值。
  24. 根据权利要求23所述的方法,在根据所述目标场景描述模块的节点索引列表声明的索引值确定所述待渲染三维场景中的各个节点对应的节点描述模块的索引值之后,所述方法还包括:
    根据所述待渲染三维场景中的各个节点对应的节点描述模块的索引值,从所述场景描述文件的节点列表中获取所述待渲染三维场景中的各个节点对应的节点描述模块;
    根据所述待渲染三维场景中的各个节点对应的节点描述模块,获取所述待渲染三维场景中的各个节点的描述信息。
  25. 根据权利要求24所述的方法,所述根据所述待渲染三维场景中的各个节点对应的节点描述模块,获取所述待渲染三维场景中的各个节点的描述信息,包括以下至少一项:
    根据所述待渲染三维场景中的各个节点对应的节点描述模块中的节点名称语法元素的值,获取所述待渲染三维场景中的各个节点的名称;
    根据所述待渲染三维场景中的各个节点对应的节点描述模块中的网格索引列表声明的索引值,确定所述待渲染三维场景中的各个节点挂载的三维网格对应的网格描述模块的索引值。
  26. 根据权利要求25所述的方法,在确定所述待渲染三维场景中的各个节点挂载的三维网格对应的网格描述模块的索引值之后,所述方法还包括:
    根据所述待渲染三维场景中的各个节点挂载的三维网格对应的网格描述模块的索引值, 从所述场景描述文件的网格列表中获取所述待渲染三维场景中的各个节点挂载的三维网格对应的网格描述模块;
    根据所述待渲染三维场景中的各个节点挂载的三维网格对应的网格描述模块,获取所述待渲染三维场景中的各个节点挂载的三维网格的描述信息。
  27. 根据权利要求26所述的方法,所述根据所述待渲染三维场景中的各个节点挂载的三维网格对应的网格描述模块获取所述待渲染三维场景中的各个节点挂载的三维网格的描述信息,包括以下至少一项:
    根据各个三维网格对应的网格描述模块中的网格名称语法元素获取各个三维网格的名称;
    根据各个三维网格对应的网格描述模块中的数据种类语法元素获取各个三维网格所包括的数据种类;
    根据各个数据种类语法元素的值获取用于访问各个三维网格的各个种类的数据的访问器对应的访问器描述模块的索引值;
    根据各个三维网格对应的网格描述模块中的模式语法元素的值获取各个三维网格的拓扑结构的类型。
  28. 根据权利要求27所述的方法,在根据各个数据种类语法元素的值获取用于访问各个三维网格的各个种类的数据的访问器对应的访问器描述模块的索引值之后,所述方法还包括:
    根据用于访问各个三维网格的各个种类的数据的访问器对应的访问器描述模块的索引值,从所述场景描述文件的访问器列表中获取用于访问各个三维网格的各个种类的数据的访问器对应的访问器描述模块;
    根据用于访问各个三维网格的各个种类的数据的访问器对应的访问器描述模块,获取用于访问各个三维网格的各个种类的数据的访问器的描述信息。
  29. 根据权利要求20所述的方法,所述方法还包括:
    获取所述场景描述文件的缓存器列表中的各个缓存器描述模块;
    获取各个缓存器描述模块中的媒体索引语法元素的值;
    将所述媒体索引语法元素的值与所述目标媒体描述模块的索引值相同的缓存器描述模块确定为用于缓存所述目标媒体文件的解码数据的目标缓存器对应的目标缓存器描述模块;
    根据所述目标缓存器描述模块获取所述目标缓存器的描述信息。
  30. 根据权利要求29所述的方法,所述根据所述目标缓存器描述模块获取所述目标缓存器的描述信息,包括以下至少一项:
    根据所述目标缓存器描述模块中的第一字节长度语法元素的值,获取所述目标缓存器的容量;
    根据所述目标缓存器描述模块中是否包含MPEG环形缓存器确定所述目标缓存器是否为基于MPEG扩展改造的环形缓存器;
    根据所述目标缓存器描述模块的MPEG环形缓存器中的环节数量语法元素的值,获取所述MPEG环形缓存器的存储环节的数量;
    根据所述目标缓存器描述模块的MPEG环形缓存器中的第二轨道索引语法元素的值,获取所述MPEG环形缓存器所缓存的数据的源数据的轨道索引值。
  31. 根据权利要求29所述的方法,所述方法还包括:
    获取所述场景描述文件的缓存切片列表中的各个缓存切片描述模块;
    获取各个缓存切片描述模块中的缓存器索引语法元素的值;
    将所述缓存器索引语法元素的值与所述目标缓存器描述模块的索引值相同的缓存切片描述模块确定为所述目标缓存器的缓存切片对应的缓存切片描述模块;
    根据所述目标缓存器的缓存切片对应的缓存切片描述模块,获取所述目标缓存器的缓存切片的描述信息。
  32. 根据权利要求31所述的方法,所述根据所述目标缓存器的缓存切片对应的缓存切片描述模块,获取所述目标缓存器的缓存切片的描述信息,包括以下至少一项:
    根据所述目标缓存器的缓存切片对应的缓存切片描述模块中的第二字节长度语法元素的值,获取所述目标缓存器的缓存切片的容量;
    根据所述目标缓存器的缓存切片对应的缓存切片描述模块中的偏移量语法元素的值,获取所述目标缓存器的缓存切片的偏移量。
  33. 根据权利要求31所述的方法,所述方法还包括:
    获取所述场景描述文件的访问器列表中的各个访问器描述模块;
    获取各个访问器描述模块中的缓存切片索引语法元素的值;
    将所述缓存切片索引语法元素的值与所述目标缓存器的缓存切片对应的缓存切片描述模块的索引值相同的访问器描述模块,确定为用于对所述目标缓存器的缓存切片中的数据进行访问的访问器对应的访问器描述模块;
    根据用于对所述目标缓存器的缓存切片中的数据进行访问的访问器对应的访问器描述模块,获取用于对所述目标缓存器的缓存切片中的数据进行访问的访问器的描述信息。
  34. 根据权利要求28或33所述的方法,根据访问器对应的访问器描述模块获取访问器的描述信息,包括以下至少一项:
    根据访问器描述模块中的数据类型语法元素的值确定访问器所访问的数据的类型;
    根据访问器描述模块中的访问器类型语法元素的值确定访问器的类型;
    根据访问器描述模块中的数据数量语法元素的值确定访问器所访问的数据的数量;
    根据访问器描述模块中是否包含MPEG时变访问器确定访问器是否为基于MPEG扩展改造的时变访问器;
    根据访问器描述模块的MPEG时变访问器中的缓存切片索引语法元素的值确定存储访问器所访问的数据的缓存切片对应的缓存切片描述模块的索引值;
    根据访问器描述模块的MPEG时变访问器中的时变语法元素的值,确定访问器内的语法元素的取值是否随时间变化。
  35. 一种场景描述文件的解析装置,包括:
    存储器,被配置为存储计算机程序;
    处理器,被配置为用于在调用计算机程序时,使得所述三维场景的渲染装置实现权利要求20-34任一项所述的场景描述文件的解析方法。
  36. 一种三维场景的渲染方法,包括:
    获取待渲染三维场景的场景描述文件,所述待渲染三维场景中包括类型为基于几何的点云压缩G-PCC编码点云的目标媒体文件;
    根据所述场景描述文件的动态图像组专家MPEG媒体的媒体列表中所述目标媒体文件对应的媒体描述模块,获取所述目标媒体文件的描述信息;
    向媒体接入函数发送所述目标媒体文件的描述信息,以使所述媒体接入函数根据所述目标媒体文件的描述信息获取所述目标媒体文件,对所述目标媒体文件进行处理获取所述目标媒体文件的解码数据,以及将所述目标媒体文件的解码数据写入目标缓存器;
    从所述目标缓存器中读取所述目标媒体文件的解码数据;
    基于所述目标媒体文件的解码数据对所述待渲染三维场景进行渲染。
  37. 根据权利要求36所述的方法,所述向媒体接入函数发送所述目标媒体文件的描述信息,包括:
    通过媒体接入函数应用程序编程接口API向所述媒体接入函数发送所述目标媒体文件的描述信息。
  38. 根据权利要求36所述的方法,在从所述目标缓存器中读取所述目标媒体文件的解码数据之前,所述方法还包括:
    从所述场景描述文件的缓存器列表中获取所述目标缓存器对应的缓存器描述模块;
    根据所述目标缓存器对应的缓存器描述模块获取所述目标缓存器的描述信息;
    从所述场景描述文件的缓存切片列表中获取所述目标缓存器的缓存切片对应的缓存切片描述模块;
    根据所述目标缓存器的缓存切片对应的缓存切片描述模块获取所述目标缓存器的缓存切片的描述信息;
    向缓存管理模块发送所述目标缓存器的描述信息和所述目标缓存器的缓存切片的描述信息,以使所述缓存管理模块根据所述目标缓存器的描述信息创建所述目标缓存器以及根据所述目标缓存器的缓存切片的描述信息对所述目标缓存器进行缓存切片的划分。
  39. 根据权利要求38所述的方法,所述向缓存管理模块发送所述目标缓存器的描述信息和所述目标缓存器的缓存切片的描述信息,包括:
    通过缓存API向所述向缓存管理模块发送所述目标缓存器的描述信息和所述目标缓存器的缓存切片的描述信息。
  40. 根据权利要求39所述的方法,在从所述目标缓存器中读取所述目标媒体文件的解码数据之前,所述方法还包括:
    向所述媒体接入函数发送所述目标缓存器的描述信息和所述目标缓存器的缓存切片的描述信息,以使所述媒体接入函数根据所述目标缓存器的描述信息和所述目标缓存器的缓存切片的描述信息将所述目标媒体文件的解码数据写入所述目标缓存器中。
  41. 根据权利要求40所述的方法,所述向所述媒体接入函数发送所述目标缓存器的描述信息和所述目标缓存器的缓存切片的描述信息,包括:
    通过媒体接入函数API向所述媒体接入函数发送所述目标缓存器的描述信息和所述目标缓存器的缓存切片的描述信息。
  42. 根据权利要求38所述的方法,所述向缓存管理模块发送所述目标缓存器的描述信息和所述目标缓存器的缓存切片的描述信息,包括:
    通过媒体接入函数API向所述媒体接入函数发送所述目标缓存器的描述信息和所述目标 缓存器的缓存切片的描述信息,以使所述媒体接入函数将所述目标缓存器的描述信息和所述目标缓存器的缓存切片的描述信息转发至所述缓存管理模块。
  43. 根据权利要求38所述的方法,在从所述目标缓存器中读取所述目标媒体文件的解码数据之前,所述方法还包括:
    从所述场景描述文件的访问器列表中获取目标访问器对应的访问器描述模块,所述目标访问器为用于访问所述目标媒体文件的解码数据的访问器;
    根据所述目标访问器对应的访问器描述模块获取所述目标访问器的描述信息;
    向所述媒体接入函数发送所述目标访问器的描述信息,以使所述媒体接入函数根据所述目标访问器的描述信息将所述目标媒体文件的解码数据写入所述目标缓存器中。
  44. 根据权利要求36-43任一项所述的方法,所述从所述目标缓存器中读取所述目标媒体文件的解码数据,包括:
    通过用于访问所述待渲染三维场景中的各个三维网格的各个种类的数据的访问器,从所述目标缓存器存储的所述目标媒体文件的解码数据中读取所述待渲染三维场景中的各个三维网格的各个种类的数据。
  45. 根据权利要求44所述的方法,在通过用于访问所述待渲染三维场景中的各个三维网格的各个种类的数据的访问器,从所述目标缓存器存储的所述目标媒体文件的解码数据中读取所述待渲染三维场景中的各个三维网格的各个种类的数据之前,所述方法还包括:
    根据所述待渲染三维场景中的各个节点挂载的三维网格对应的网格描述模块的索引值,从所述场景描述文件的网格列表中获取所述待渲染三维场景中的各个三维网格对应的网格描述模块;
    根据所述待渲染三维场景中的各个三维网格对应的网格描述模块,获取所述待渲染三维场景中的各个三维网格所包含的数据种类以及用于访问所述待渲染三维场景中的各个三维网格的各个种类的数据的访问器。
  46. 根据权利要求45所述的方法,在根据所述待渲染三维场景中的各个节点挂载的三维网格对应的网格描述模块的索引值,从所述场景描述文件的网格列表中获取所述待渲染三维场景中的各个三维网格对应的网格描述模块之前,所述方法还包括:
    根据所述待渲染三维场景中的各个节点对应的节点描述模块的索引值,从所述场景描述文件的节点列表中获取所述待渲染三维场景中的各个节点对应的节点描述模块;
    根据所述待渲染三维场景中的各个节点对应的节点描述模块,获取所述待渲染三维场景中的各个节点的描述信息;任一节点的描述信息包括该节点挂载的三维网格对应的网格描述模块的索引值。
  47. 根据权利要求46所述的方法,在根据所述待渲染三维场景中的各个节点对应的节点描述模块的索引值,从所述场景描述文件的节点列表中获取所述待渲染三维场景中的各个节点对应的节点描述模块之前,所述方法还包括:
    从所述场景描述文件的场景列表中获取所述待渲染三维场景对应的场景描述模块;
    根据所述待渲染三维场景对应的场景描述模块,获取所述待渲染三维场景的描述信息,所述待渲染三维场景的描述信息包括所述待渲染三维场景中的各个节点对应的节点描述模块的索引值。
  48. 根据权利要求47所述的方法,任一三维网格的描述信息还包括用于访问该三维网格的各个种类的数据的访问器对应的访问器描述模块的索引值,所述根据所述待渲染三维场景中的各个三维网格对应的网格描述模块,获取用于访问所述待渲染三维场景中的各个三维网格的各个种类的数据的访问器,包括:
    根据用于访问所述待渲染三维场景中的各个三维网格的各个种类的数据的访问器对应的访问器描述模块的索引值,从所述场景描述文件的访问器列表中获取用于访问所述待渲染三维场景中的各个三维网格的各个种类的数据的访问器对应的访问器描述模块;
    根据用于访问所述待渲染三维场景中的各个三维网格的各个种类的数据的访问器对应的访问器描述模块,获取用于访问所述待渲染三维场景中的各个三维网格的各个种类的数据的访问器的描述信息;
    根据用于访问所述待渲染三维场景中的各个三维网格的各个种类的数据的访问器的描述信息创建用于访问所述待渲染三维场景中的各个三维网格的各个种类的数据的访问器。
  49. 一种三维场景的渲染装置,包括:
    存储器,被配置为存储计算机程序;
    处理器,被配置为用于在调用计算机程序时,使得所述三维场景的渲染装置实现权利要求36-48任一项所述的三维场景的渲染方法。
PCT/CN2023/097873 2023-01-10 2023-06-01 场景描述文件的生成方法及装置 Ceased WO2024148751A1 (zh)

Priority Applications (5)

Application Number Priority Date Filing Date Title
EP23915515.3A EP4557723A4 (en) 2023-01-10 2023-06-01 METHOD AND DEVICE FOR GENERATING A SCENE DESCRIPTION FILE
KR1020257003775A KR20250034972A (ko) 2023-01-10 2023-06-01 장면 묘사 파일의 생성 방법 및 장치
CN202380034331.4A CN119013965A (zh) 2023-01-10 2023-06-01 场景描述文件的生成方法及装置
JP2025507589A JP2025529756A (ja) 2023-01-10 2023-06-01 シーン記述ファイルの生成方法及び装置
US19/033,804 US20250166234A1 (en) 2023-01-10 2025-01-22 Method and apparatus for generating scene description document

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202310036790 2023-01-10
CN202310036790.8 2023-01-10
CN202310474240.4A CN118334199A (zh) 2023-01-10 2023-04-27 一种场景描述文件的生成方法及装置
CN202310474240.4 2023-04-27

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US19/033,804 Continuation US20250166234A1 (en) 2023-01-10 2025-01-22 Method and apparatus for generating scene description document

Publications (1)

Publication Number Publication Date
WO2024148751A1 true WO2024148751A1 (zh) 2024-07-18

Family

ID=91763118

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/CN2023/097873 Ceased WO2024148751A1 (zh) 2023-01-10 2023-06-01 场景描述文件的生成方法及装置
PCT/CN2023/120131 Ceased WO2024148849A1 (zh) 2023-01-10 2023-09-20 场景描述文件的生成方法及装置

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/120131 Ceased WO2024148849A1 (zh) 2023-01-10 2023-09-20 场景描述文件的生成方法及装置

Country Status (6)

Country Link
US (1) US20250166234A1 (zh)
EP (1) EP4557723A4 (zh)
JP (1) JP2025529756A (zh)
KR (1) KR20250034972A (zh)
CN (17) CN118334198A (zh)
WO (2) WO2024148751A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12530845B2 (en) * 2023-03-01 2026-01-20 Toyota Research Institute, Inc. Hybrid geometric primitive representation for point clouds
CN120599137B (zh) * 2025-05-28 2025-12-09 慧航(江西)数字科技有限公司 基于3d高斯溅射的三维场景实时渲染优化系统及方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190251737A1 (en) * 2017-05-31 2019-08-15 Verizon Patent And Licensing Inc. Methods and Systems for Rendering Frames Based on a Virtual Entity Description Frame of a Virtual Scene
CN112492385A (zh) * 2020-09-30 2021-03-12 中兴通讯股份有限公司 点云数据处理方法、装置、存储介质及电子装置
CN112700550A (zh) * 2021-01-06 2021-04-23 中兴通讯股份有限公司 三维点云数据处理方法、装置、存储介质及电子装置
CN114747219A (zh) * 2019-10-02 2022-07-12 诺基亚技术有限公司 用于存储和信令传送子样本条目描述的方法和装置
CN115315943A (zh) * 2021-01-06 2022-11-08 腾讯美国有限责任公司 用于媒体场景描述的方法和设备
CN115396646A (zh) * 2022-08-22 2022-11-25 腾讯科技(深圳)有限公司 一种点云媒体的数据处理方法及相关设备

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10911787B2 (en) * 2018-07-10 2021-02-02 Apple Inc. Hierarchical point cloud compression
US11010931B2 (en) * 2018-10-02 2021-05-18 Tencent America LLC Method and apparatus for video coding
WO2020072665A1 (en) * 2018-10-02 2020-04-09 Futurewei Technologies, Inc. Hierarchical tree attribute coding in point cloud coding
JP7467646B2 (ja) * 2020-06-24 2024-04-15 中興通訊股▲ふん▼有限公司 3次元コンテンツ処理方法および装置
EP4193602A1 (en) * 2020-08-07 2023-06-14 Vid Scale, Inc. Tile tracks for geometry-based point cloud data
WO2022220278A1 (ja) * 2021-04-14 2022-10-20 ソニーグループ株式会社 情報処理装置および方法
CN114898043A (zh) * 2022-05-19 2022-08-12 中国南方电网有限责任公司超高压输电公司检修试验中心 一种激光点云数据瓦片构建方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190251737A1 (en) * 2017-05-31 2019-08-15 Verizon Patent And Licensing Inc. Methods and Systems for Rendering Frames Based on a Virtual Entity Description Frame of a Virtual Scene
CN114747219A (zh) * 2019-10-02 2022-07-12 诺基亚技术有限公司 用于存储和信令传送子样本条目描述的方法和装置
CN112492385A (zh) * 2020-09-30 2021-03-12 中兴通讯股份有限公司 点云数据处理方法、装置、存储介质及电子装置
CN112700550A (zh) * 2021-01-06 2021-04-23 中兴通讯股份有限公司 三维点云数据处理方法、装置、存储介质及电子装置
CN115315943A (zh) * 2021-01-06 2022-11-08 腾讯美国有限责任公司 用于媒体场景描述的方法和设备
CN115396646A (zh) * 2022-08-22 2022-11-25 腾讯科技(深圳)有限公司 一种点云媒体的数据处理方法及相关设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4557723A4 *

Also Published As

Publication number Publication date
CN118334139A (zh) 2024-07-12
WO2024148849A1 (zh) 2024-07-18
CN118334196A (zh) 2024-07-12
KR20250034972A (ko) 2025-03-11
CN118338024A (zh) 2024-07-12
EP4557723A1 (en) 2025-05-21
CN118334199A (zh) 2024-07-12
CN120500859A (zh) 2025-08-15
CN118334140A (zh) 2024-07-12
CN118334198A (zh) 2024-07-12
CN119013965A (zh) 2024-11-22
CN118334200A (zh) 2024-07-12
CN118334195A (zh) 2024-07-12
EP4557723A4 (en) 2025-12-31
CN118334202A (zh) 2024-07-12
US20250166234A1 (en) 2025-05-22
CN118334201A (zh) 2024-07-12
CN118338025A (zh) 2024-07-12
CN118334137A (zh) 2024-07-12
JP2025529756A (ja) 2025-09-09
CN118334197A (zh) 2024-07-12
CN118334138A (zh) 2024-07-12
CN118338095A (zh) 2024-07-12

Similar Documents

Publication Publication Date Title
US20250166234A1 (en) Method and apparatus for generating scene description document
WO2024037247A1 (zh) 一种点云媒体的数据处理方法及相关设备
CN118334203A (zh) 缓存管理方法及装置
WO2024149117A1 (zh) 场景描述文件的生成方法及装置
CN118334211A (zh) 一种场景描述文件的生成方法及装置
WO2025012275A1 (en) Decoder, encoder, system, data stream, method and computer program for nn rendering in scenes based on an anchoring information
CN121353476A (zh) 数字人的动画方法及装置
CN121304880A (zh) 三维场景的场景数据的处理方法及装置
CN121304878A (zh) 场景描述文件的生成方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23915515

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202380034331.4

Country of ref document: CN

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112024021335

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 20257003775

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 1020257003775

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 2025507589

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 2023915515

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2023915515

Country of ref document: EP

Effective date: 20250214

WWP Wipo information: published in national office

Ref document number: 1020257003775

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 202517032074

Country of ref document: IN

WWP Wipo information: published in national office

Ref document number: 202517032074

Country of ref document: IN

WWP Wipo information: published in national office

Ref document number: 2023915515

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 112024021335

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20241014

NENP Non-entry into the national phase

Ref country code: DE