WO2024257784A1 - 復号装置、復号方法、及び、符号化装置 - Google Patents
復号装置、復号方法、及び、符号化装置 Download PDFInfo
- Publication number
- WO2024257784A1 WO2024257784A1 PCT/JP2024/021291 JP2024021291W WO2024257784A1 WO 2024257784 A1 WO2024257784 A1 WO 2024257784A1 JP 2024021291 W JP2024021291 W JP 2024021291W WO 2024257784 A1 WO2024257784 A1 WO 2024257784A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- dimensional
- presentation
- information
- unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/816—Monomedia components thereof involving special video data, e.g 3D video
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/001—Model-based coding, e.g. wire frame
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/002—Image coding using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/40—Tree coding, e.g. quadtree, octree
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/167—Position within a video image, e.g. region of interest [ROI]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/236—Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
- H04N21/2365—Multiplexing of several video streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44012—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving rendering scenes according to scene graphs, e.g. MPEG-4 scene graphs
Definitions
- This disclosure relates to a decoding device, a decoding method, and an encoding device.
- 3D data can be acquired in a variety of ways, including distance sensors such as range finders, stereo cameras, or a combination of multiple monocular cameras.
- One method of expressing three-dimensional data is a method called a point cloud, which uses a group of points in three-dimensional space to represent the shape of a three-dimensional structure.
- a point cloud In a point cloud, the position and color of the point cloud are stored.
- Point clouds are expected to become the mainstream method of expressing three-dimensional data, but point clouds have a very large amount of data. Therefore, when storing or transmitting three-dimensional data, it is essential to compress the amount of data by encoding, just as with two-dimensional moving images (examples include MPEG-4 AVC or HEVC standardized by MPEG).
- Patent Document 1 There is also known technology that uses three-dimensional map data to search for and display facilities located around a vehicle (see, for example, Patent Document 1).
- the present disclosure aims to provide a decoding device and the like that can appropriately present first presentation data and second presentation data based on first data and second data.
- a decoding device includes a circuit and a memory connected to the circuit, and in operation, the circuit acquires encoded data including first data representing a three-dimensional object and encoding method information indicating one of the encoding methods including second data representing the three-dimensional object, and identification information indicating the three-dimensional space in which the three-dimensional object is included, decodes the first data and the second data corresponding to the three-dimensional space based on the encoded data, renders the first data to generate first presentation data for presentation, renders the second data to generate second presentation data for presentation, and switches from the generated second presentation data to the first presentation data for presentation.
- a decoding device is a decoding device that decodes first data representing a three-dimensional object, the decoding device comprising a circuit and a memory connected to the circuit, the circuit, in operation, decodes encoding method information that represents the three-dimensional object and indicates a second encoding method different from the first encoding method of the first data, decodes second data of the second encoding method indicated by the encoding method information, and the second data is used to generate second presentation data for presentation.
- the encoding device is an encoding device for encoding first data representing a three-dimensional object, the encoding device comprising a circuit and a memory connected to the circuit, the circuit, in operation, generating encoding method information representing the three-dimensional object and indicating a second encoding method different from the first encoding method of the first data, generating second data of the second encoding method indicated by the encoding method information, generating a bitstream including the encoding method information and the second data, the second data being used to generate second presentation data for presentation.
- the decoding device and the like disclosed herein can appropriately present the first presentation data and the second presentation data based on the first data and the second data.
- FIG. 1 is a diagram showing an example of a configuration of a three-dimensional data encoding/decoding system according to an embodiment.
- FIG. 2 is a diagram showing the configuration of point cloud data.
- FIG. 3 is a diagram showing an example of the structure of a data file in which information on point cloud data is described.
- FIG. 4 is a diagram showing the structure of three-dimensional mesh data.
- FIG. 5 is a diagram showing an example of the structure of a data file in which information on three-dimensional mesh data is described.
- FIG. 6 is a diagram for explaining a three-dimensional model.
- FIG. 7 is a diagram showing types of three-dimensional data.
- FIG. 8 is a diagram for explaining the encoding process of three-dimensional data.
- FIG. 1 is a diagram showing an example of a configuration of a three-dimensional data encoding/decoding system according to an embodiment.
- FIG. 2 is a diagram showing the configuration of point cloud data.
- FIG. 3 is a diagram showing an example of the
- FIG. 9 is a diagram for explaining the decoding process of three-dimensional data.
- FIG. 10 is a two-dimensional schematic diagram of tiles and slices of three-dimensional data.
- FIG. 11 is a diagram showing an example of a terminal presentation screen that can be switched in response to a user request.
- FIG. 12 is a diagram showing an example of a terminal presentation screen that is automatically switched in response to a user operation.
- FIG. 13 is a block diagram showing an example of a functional configuration of the server and the terminal.
- FIG. 14 is a block diagram illustrating another example of the data generating unit of the server.
- FIG. 15 is a diagram for explaining the synchronization process of the coordinate systems.
- FIG. 16 is a diagram for explaining the synchronization process of the coordinate systems.
- FIG. 15 is a diagram for explaining the synchronization process of the coordinate systems.
- FIG. 17 is a diagram for explaining the relationship between the three-dimensional space and the encoded data.
- FIG. 18 is a diagram illustrating an example of the syntax of the coding method unit.
- FIG. 19 is a diagram illustrating an example of the syntax of the encoded point group.
- FIG. 20 is a diagram showing an example of the syntax of a coding mesh.
- FIG. 21 is a diagram illustrating an example of the syntax of the encoded 3D model.
- FIG. 22 is a diagram showing an example of the syntax of the three-dimensional data information.
- FIG. 23 is a diagram illustrating the data structure of the encoded point group.
- FIG. 24 is a diagram for explaining the data structure of the encoded mesh.
- FIG. 25 is a diagram for explaining the data structure of an encoded three-dimensional model.
- FIG. 18 is a diagram illustrating an example of the syntax of the coding method unit.
- FIG. 19 is a diagram illustrating an example of the syntax of the encoded point group.
- FIG. 20 is
- FIG. 26 is a two-dimensional diagram showing an example of a plurality of three-dimensional spaces.
- FIG. 27 is a diagram showing an example of a bounding box.
- FIG. 28 is a diagram showing an example of the syntax of the three-dimensional space information.
- FIG. 29 is a flowchart showing an example of partial decoding.
- FIG. 30 is a diagram showing an example of a three-dimensional spatial region that is a target of partial decoding.
- FIG. 31 is a diagram showing an example of the data structure of a group of encoded points to be partially decoded.
- FIG. 32 is a diagram showing an example of the data structure of a partially decoded encoded mesh.
- FIG. 33 is a diagram showing an example of the data structure of an encoded 3D model to be partially decoded.
- FIG. 33 is a diagram showing an example of the data structure of an encoded 3D model to be partially decoded.
- FIG. 34 is a diagram showing an example of coordinate systems of different types of three-dimensional data that are not spatially synchronized.
- FIG. 35 is a diagram showing an example of the syntax of the three-dimensional data information.
- FIG. 36 is a diagram showing an example of the syntax of the three-dimensional space information.
- FIG. 37 is a diagram illustrating an example of a functional configuration of a terminal.
- FIG. 38 is a flowchart showing an example of the spatial synchronization process.
- FIG. 39 is a diagram illustrating an example of the configuration of a decoding device.
- FIG. 40 is a flowchart showing an example of a decoding method by the decoding device.
- FIG. 41 is a flowchart showing another example of a decoding method performed by the decoding device.
- FIG. 42 is a diagram showing an example of the configuration of an encoding device.
- FIG. 43 is a flowchart showing an example of an encoding method by the encoding device.
- the decoding device includes a circuit and a memory connected to the circuit, and in operation, the circuit acquires encoded data including first data representing a three-dimensional object and encoding method information indicating one of the encoding methods including second data representing the three-dimensional object, and identification information indicating the three-dimensional space in which the three-dimensional object is included, decodes the first data and the second data corresponding to the three-dimensional space based on the encoded data, renders the first data to generate first presentation data for presentation, renders the second data to generate second presentation data for presentation, and switches from the generated second presentation data to the first presentation data for presentation.
- the first presentation data and the second presentation data are generated based on the first data and the second data corresponding to the three-dimensional space, and the second presentation data is switched to the first presentation data for presentation, so that the two pieces of data representing the three-dimensional object can be switched and presented without causing any spatial misalignment. Therefore, the first presentation data and the second presentation data can be presented appropriately.
- the decoding device is the decoding device according to the first aspect, and the first data is point cloud data representing the three-dimensional object.
- the second presentation data is switched to the first presentation data based on the point cloud data, so that the two pieces of data representing the three-dimensional object can be switched and presented without causing any spatial misalignment.
- the decoding device is the decoding device according to the first or second aspect, and the second data is mesh data representing the three-dimensional object.
- the second presentation data based on mesh data is switched to the first presentation data, so that the two pieces of data representing the three-dimensional object can be switched and presented without causing any spatial misalignment.
- the decoding device is the decoding device according to the first or second aspect, in which the second data is three-dimensional model data representing the three-dimensional object, and the three-dimensional model data indicates a machine learning model obtained by machine learning a plurality of sets of gazes and two-dimensional images.
- the second presentation data based on the three-dimensional model data is switched to the first presentation data, so that the two pieces of data representing the three-dimensional object can be switched and presented without causing any spatial misalignment.
- the decoding device is the decoding device according to the first or second aspect, and the second data is a two-dimensional image of the three-dimensional object when viewed from a predetermined line of sight.
- the second presentation data based on a two-dimensional image is switched to the first presentation data, so that the two pieces of data representing a three-dimensional object can be switched and presented without causing any spatial misalignment.
- the decoding device is a decoding device according to any one of the first to fifth aspects, and the circuit further acquires a presentation data switching request from a user, and in the presentation, switches from the second presentation data to the first presentation data in response to the switching request.
- the decoding device is a decoding device according to any one of the first to fifth aspects, and the circuit further receives an operation from a user to change the presentation mode, changes the presentation mode in response to the operation, and switches from the second presentation data to the first presentation data in response to the change.
- the decoding device is a decoding device according to any one of the first to fifth aspects, and the circuit, in the acquisition, acquires the encoded data from the encoding device via a communication network, and, in the presentation, switches from the second presentation data to the first presentation data and presents it according to the bandwidth of the communication network.
- switching can be performed according to the bandwidth of the communication network; for example, when the bandwidth of the communication network changes from less than a specified bandwidth to equal to or greater than the specified bandwidth, the second presentation data can be switched to the first presentation data for presentation.
- the decoding device is a decoding device according to any one of the first to fifth aspects, in which the circuit switches from the second presentation data to the first presentation data in the presentation according to the available capabilities of the circuit.
- switching can be performed according to the capacity of the available circuit. For example, when the capacity of the available circuit changes from less than a specified capacity to equal to or greater than the specified capacity, the second presentation data can be switched to the first presentation data for presentation.
- the decoding device is a decoding device according to any one of the first to ninth aspects, in which the encoded data includes synchronization information for synchronizing a coordinate system of the first data and a coordinate system of the second data, and the circuit, in the presentation, presents the first presentation data and the second presentation data based on the synchronization information.
- the first presentation data and the second presentation data can be switched to the first presentation data after the coordinate systems of the data are aligned. This makes it possible to switch and present two pieces of data representing a three-dimensional object in a way that minimizes spatial misalignment.
- the decoding device is the decoding device according to the tenth aspect, in which the circuit further determines whether or not to synchronize the coordinate system of the first data with the coordinate system of the second data, and if it is determined that the coordinate system of the first data is to be synchronized with the coordinate system of the second data, the circuit presents the first presentation data and the second presentation data based on the synchronization information during the presentation.
- synchronization processing can be performed when necessary, and can be skipped when it is not necessary. This has the potential to reduce the processing load.
- the decoding device is a decoding device according to any one of the first to eleventh aspects, in which the first data and the second data each have a common configuration.
- the amount of encoded data can be reduced, and communication capacity can be reduced.
- the decoding device is a decoding device according to any one of the first to twelfth aspects, in which the encoded data includes spatial information for identifying the three-dimensional space in which the three-dimensional object is included, and the circuit further obtains a target area indicating a partial area of the three-dimensional space, identifies first duplicate data that is a part of the first data and overlaps with the target area based on the spatial information, and decodes the identified first duplicate data in the decoding process.
- the amount of data to be acquired can be reduced by acquiring only the first duplicate data. This makes it possible to reduce communication capacity. Also, for example, it is possible to decrypt only the first duplicate data. This makes it possible to reduce processing load.
- the decoding method obtains encoded data including first data representing a three-dimensional object, encoding method information indicating one of the encoding methods including second data representing the three-dimensional object, and identification information indicating the three-dimensional space in which the three-dimensional object is included, decodes the first data and the second data based on the encoded data, renders the first data to generate first presentation data for presentation, renders the second data to generate second presentation data for presentation, and switches from the generated second presentation data to the first presentation data for presentation.
- the first presentation data and the second presentation data are generated based on the first data and the second data corresponding to a three-dimensional space, and the second presentation data is switched back to the first presentation data for presentation, so that the data can be switched and presented without causing a spatial misalignment. Therefore, the first presentation data and the second presentation data can be presented appropriately.
- a decoding device is a decoding device for decoding first data representing a three-dimensional object, the decoding device comprising a circuit and a memory connected to the circuit, the circuit, in operation, decodes encoding method information representing the three-dimensional object and indicating a second encoding method different from the first encoding method of the first data, decodes second data of the second encoding method indicated by the encoding method information, and the second data is used to generate second presentation data for presentation.
- the second data of the second encoding method indicated by the encoding method information obtained by decoding is decoded, so that the second data can be obtained for generating second presentation data for appropriate presentation.
- the encoding device is an encoding device for encoding first data representing a three-dimensional object, the encoding device comprising a circuit and a memory connected to the circuit, the circuit, in operation, generating encoding method information representing the three-dimensional object and indicating a second encoding method different from the first encoding method of the first data, generating second data of the second encoding method indicated by the encoding method information, generating a bitstream including the encoding method information and the second data, the second data being used to generate second presentation data for presentation.
- bitstream including the encoding method information and the second data is generated, and a decoding device that acquires the bitstream can obtain the second data for generating second presentation data for appropriate presentation.
- Fig. 1 is a diagram showing an example of the configuration of a three-dimensional data encoding/decoding system according to this embodiment.
- the three-dimensional data encoding/decoding system includes a three-dimensional data encoding system 1001, a three-dimensional data decoding system 1002, a sensor terminal 1003, and an external connection unit 1004.
- the three-dimensional data encoding system 1001 generates encoded data or multiplexed data by encoding three-dimensional data.
- the three-dimensional data encoding system 1001 may be a three-dimensional data encoding device realized by a single device, or may be a system realized by multiple devices.
- the three-dimensional data encoding device may also include some of the multiple processing units included in the three-dimensional data encoding system 1001.
- the three-dimensional data encoding system 1001 includes a three-dimensional data generation system 1011, a presentation unit 1012, an encoding unit 1013, a multiplexing unit 1014, an input/output unit 1015, and a control unit 1016.
- the three-dimensional data generation system 1011 includes a sensor information acquisition unit 1017 and a three-dimensional data generation unit 1018.
- the sensor information acquisition unit 1017 acquires a sensor signal from the sensor terminal 1003 and outputs the sensor signal to the three-dimensional data generation unit 1018.
- the three-dimensional data generation unit 1018 generates three-dimensional data from the sensor signal and outputs the three-dimensional data to the encoding unit 1013.
- the presentation unit 1012 presents the sensor signal or the three-dimensional data to the user. For example, the presentation unit 1012 displays information or an image based on the sensor signal or the three-dimensional data.
- the encoding unit 1013 encodes (compresses) the three-dimensional data, and outputs the resulting encoded data, control information obtained in the encoding process, and other additional information to the multiplexing unit 1014.
- the additional information includes, for example, a sensor signal.
- the multiplexing unit 1014 generates multiplexed data by multiplexing the coded data input from the coding unit 1013, the control information, and the additional information.
- the format of the multiplexed data is, for example, a file format for storage, or a packet format for transmission.
- the input/output unit 1015 (e.g., a communication unit or an interface) outputs the multiplexed data to the outside.
- the multiplexed data is stored in a storage unit such as an internal memory.
- the control unit 1016 (or the application execution unit) controls each processing unit. In other words, the control unit 1016 controls encoding and multiplexing, etc.
- the control unit 1016 may also control demultiplexing, decoding, or presentation.
- the sensor signal may be input to the encoding unit 1013 or the multiplexing unit 1014.
- the input/output unit 1015 may output the three-dimensional data or the encoded data directly to the outside.
- the transmission signal (multiplexed data) output from the three-dimensional data encoding system 1001 is input to the three-dimensional data decoding system 1002 via the external connection unit 1004.
- the three-dimensional data decoding system 1002 generates three-dimensional data by decoding the encoded data or multiplexed data.
- the three-dimensional data decoding system 1002 may be a three-dimensional data decoding device realized by a single device, or may be a system realized by multiple devices.
- the three-dimensional data decoding device may also include some of the multiple processing units included in the three-dimensional data decoding system 1002.
- the three-dimensional data decoding system 1002 includes a sensor information acquisition unit 1021, an input/output unit 1022, a demultiplexing unit 1023, a decoding unit 1024, a presentation unit 1025, a user interface 1026, and a control unit 1027.
- the sensor information acquisition unit 1021 acquires a sensor signal from the sensor terminal 1003.
- the input/output unit 1022 acquires the transmission signal, decodes the multiplexed data (file format or packets) from the transmission signal, and outputs the multiplexed data to the demultiplexer unit 1023.
- the demultiplexing unit 1023 obtains the coded data, control information, and additional information from the multiplexed data, and outputs the coded data, control information, and additional information to the decoding unit 1024.
- the decoding unit 1024 reconstructs the point cloud data by decoding the encoded data.
- the presentation unit 1025 presents the point cloud data to the user. For example, the presentation unit 1025 displays information or an image based on the point cloud data.
- the user interface 1026 acquires instructions based on user operations.
- the control unit 1027 (or the application execution unit) controls each processing unit. In other words, the control unit 1027 controls demultiplexing, decoding, presentation, etc.
- the input/output unit 1022 may obtain the point cloud data or the encoded data directly from outside.
- the presentation unit 1025 may obtain additional information such as a sensor signal and present information based on the additional information.
- the presentation unit 1025 may perform presentation based on a user instruction obtained by the user interface 1026.
- the sensor terminal 1003 generates a sensor signal, which is information obtained by a sensor.
- the sensor terminal 1003 is a terminal equipped with a sensor or a camera, and examples of the sensor terminal include a moving object such as an automobile, a flying object such as an airplane, a mobile terminal, or a camera.
- Sensor signals that can be acquired by the sensor terminal 1003 include, for example, (1) a signal indicating the distance between the sensor terminal 1003 and an object, or the reflectance of the object, obtained from a LIDAR, millimeter wave radar, or infrared sensor, and (2) a signal indicating the distance between the camera and an object, or the reflectance of the object, obtained from multiple monocular camera images or stereo camera images.
- the sensor signal may also include the attitude, orientation, gyro (angular velocity), position (GPS information or altitude), speed, acceleration, etc. of the sensor.
- the sensor signal may also include temperature, air pressure, humidity, magnetism, etc.
- the external connection unit 1004 is realized by an integrated circuit (LSI or IC), an external storage unit, communication with a cloud server via the Internet, broadcasting, etc.
- LSI or IC integrated circuit
- IC integrated circuit
- cloud server via the Internet
- broadcasting etc.
- Figure 2 is a diagram showing the structure of point cloud data.
- Figure 3 is a diagram showing an example of the structure of a data file in which information about point cloud data is described.
- Point cloud data includes data on multiple points.
- the data on each point includes location information (three-dimensional coordinates) and attribute information for that location information.
- a collection of multiple such points is called a point cloud.
- a point cloud can represent the three-dimensional shape of an object.
- Position information such as three-dimensional coordinates is sometimes called geometry.
- the data for each point may include attribute information of multiple attribute types.
- the attribute types are, for example, color or reflectance.
- One piece of attribute information may be associated with one piece of location information, or multiple pieces of attribute information with different attribute types may be associated with one piece of location information. Also, multiple pieces of attribute information of the same attribute type may be associated with one piece of location information.
- the data file configuration example shown in Figure 3 is an example where there is a one-to-one correspondence between position information and attribute information, and shows the position information and attribute information of N points that make up the point cloud data.
- the position information is, for example, information on the three axes x, y, and z.
- the attribute information is, for example, RGB color information.
- a typical example of a data file is a ply file.
- Figure 4 is a diagram showing the structure of three-dimensional mesh data.
- Figure 5 is a diagram showing an example of the structure of a data file in which information about the three-dimensional mesh data is described.
- Three-dimensional mesh data is a data format used in CG (Computer Graphics) that represents the three-dimensional shape of an object as a collection of multiple pieces of surface information. Each piece of surface information represents a polygon such as a triangle or a quadrangle. Three-dimensional mesh data is also called a polygon or polygon mesh.
- CG Computer Graphics
- the components are a three-dimensional point cloud, vertices that are multiple three-dimensional points of the three-dimensional point cloud, edges that connect two vertices at multiple three-dimensional points, and a collection of faces that are surrounded by multiple edges.
- a three-dimensional point cloud is a collection of points that include position information in three-dimensional space and attribute information that corresponds to that position information. Note that three-dimensional points may simply be referred to as points.
- a vertex may have attribute information such as color information, reflectance, and normal vector for a three-dimensional point.
- the relationship between the vertices that make up an edge or face may be indicated by information called connectivity.
- a vertex may be expressed as a position.
- the front and back of a face may be expressed by the direction of the normal vector for the three-dimensional point.
- a vertex may also have attribute information for the face.
- An example of the format of a mesh data file is an object file.
- a mesh data file such as that shown in Figure 5
- the position information G(1) to G(N) of the N vertices that make up the mesh and the vertex attribute information A(1) to A(N) are shown as vertex information.
- the vertex information does not have to include attribute information.
- the mesh data file in Figure 5 shows an example of three-dimensional mesh data having M pieces of attribute information A2.
- the number of vertices of a face is not limited to three, as long as the number of vertices is an integer equal to or greater than three. For example, if the face is a quadrangle, the number of vertices is four, and if the face is a polygon, the number of vertices is equal to the number of vertices that form the polygon.
- the attribute information A2 may be indicated in a file separate from the mesh data file, and may include pointer information thereto.
- the attribute information may be stored in a two-dimensional attribute map file, and the attribute map file name and two-dimensional coordinates in the attribute map may be indicated by the attribute information A2 of the mesh data file.
- the attribute information A2 may be included in the mesh data file, or may be indicated in a file separate from the mesh data file, and either method can be used to specify attribute information for a three-dimensional point.
- Figure 6 is a diagram to explain the three-dimensional model.
- a three-dimensional model is a model created based on two-dimensional or three-dimensional data.
- the three-dimensional model learning unit 1031 learns, for example, two-dimensional data (two-dimensional images) or three-dimensional data (point clouds or meshes) to generate a three-dimensional model, which is a network model in which three-dimensional shapes and attribute information corresponding to the three-dimensional shapes are learned using a Neural Network or the like.
- the three-dimensional model learning unit 1031 may generate a three-dimensional model by learning using NeRF (Neural Radiance Fields) based on a two-dimensional image.
- the three-dimensional model learning unit 1031 may generate a three-dimensional model after converting the two-dimensional image into three-dimensional data by performing photogrammetry using the two-dimensional image.
- the three-dimensional model may be generated using three-dimensional data acquired by a sensor (distance sensor).
- the three-dimensional model data is the elements that make up the three-dimensional model, and includes information indicating the structure of the network model, features, etc.
- the three-dimensional model data includes, for example, information about the components of a neural network.
- Information about the components includes, for example, multiple layers such as an input layer, intermediate layer, and output layer, nodes in each layer, weighting coefficients for the nodes, transformation functions for the nodes, etc.
- the three-dimensional model encoding unit 1032 may encode the three-dimensional model data and transmit the encoded three-dimensional model data.
- the three-dimensional model decoding unit 1033 receives the transmitted encoded three-dimensional model data and decodes the three-dimensional model based on the encoded three-dimensional model data.
- the rendering reconstruction unit 1034 reconstructs (generates) two-dimensional data (two-dimensional image) or three-dimensional data (point cloud or mesh) based on the decoded three-dimensional model.
- the rendering reconstruction unit 1034 acquires viewpoint position or line of sight vector information, generates rendered two-dimensional data (two-dimensional image) based on the three-dimensional model and the viewpoint position or line of sight vector, and outputs the two-dimensional data.
- the generated two-dimensional data indicates a three-dimensional object seen from the viewpoint position, or a two-dimensional image of the three-dimensional object seen from the line of sight indicated by the line of sight vector.
- the three-dimensional object is the three-dimensional data input to the three-dimensional model learning unit 1031 or the three-dimensional object of the subject that was the source of the three-dimensional data.
- Figure 7 is a diagram showing the types of three-dimensional data. As shown in Figure 7, three-dimensional data includes static objects and dynamic objects.
- a static object is three-dimensional data at any time (a certain moment in time).
- a dynamic object is three-dimensional data that changes over time.
- point cloud data at a certain moment in time will be referred to as a PCC frame, or frame.
- mesh data at a certain moment in time will be referred to as a mesh frame, or frame.
- the object may be three-dimensional data with a certain area restriction, such as ordinary video data, or it may be three-dimensional data with no area restriction, such as map information.
- sparse point cloud data sparse mesh data
- dense point cloud data dense point cloud data with various densities of points.
- Sensor information is acquired in various ways, such as distance sensors such as LIDAR or range finders, stereo cameras, or a combination of multiple monocular cameras.
- the three-dimensional data generation unit 1018 generates point cloud data based on the sensor information acquired by the sensor information acquisition unit 1017.
- the three-dimensional data generation unit 1018 generates position information (geometry information) as point cloud data, and adds attribute information for the position information to the position information.
- the three-dimensional data generating unit 1018 may process the point cloud data when generating position information or adding attribute information. For example, the three-dimensional data generating unit 1018 may reduce the amount of data by deleting point clouds with overlapping positions. The three-dimensional data generating unit 1018 may also convert the position information (such as by shifting the position, rotating, or normalizing), or process the point cloud data to generate mesh data. The three-dimensional data generating unit 1018 may also render the attribute information.
- the three-dimensional data generation system 1011 is included in the three-dimensional data encoding system 1001, but it may be provided independently outside the three-dimensional data encoding system 1001.
- the encoding unit 1013 generates encoded data by encoding the three-dimensional data based on a predefined encoding method.
- the encoding methods include G-PCC (an encoding method using position information), V-PCC (an encoding method using a video codec), Draco (a mesh encoding method), and V-DMC (a mesh encoding method).
- the encoding method is not limited to these methods, and may be, for example, a method of encoding a dynamic mesh, or another method that combines these methods.
- the decoding unit 1024 decodes the three-dimensional data by decoding the encoded data based on a predefined encoding method.
- the multiplexing unit 1014 generates multiplexed data by multiplexing the encoded data using an existing multiplexing method.
- the generated multiplexed data is transmitted or stored.
- the multiplexing unit 1014 multiplexes other media such as video, audio, subtitles, applications, files, or reference time information.
- the multiplexing unit 1014 may further multiplex attribute information related to sensor information or point cloud data.
- Multiplexing methods or file formats include ISOBMFF, MPEG-DASH, which is an ISOBMFF-based transmission method, MMT, MPEG-2 TS Systems, and RTP.
- the demultiplexing unit 1023 extracts the encoded data of the three-dimensional data, other media, time information, etc. from the multiplexed data.
- the input/output unit 1015 transmits the multiplexed data using a method suited to the transmission medium or storage medium, such as broadcasting or communication.
- the input/output unit 1015 may communicate with other devices via the Internet, or may communicate with a storage unit such as a cloud server.
- the communication protocol used may be http, ftp, TCP, or UDP.
- a PULL type communication method or a PUSH type communication method may be used.
- Either wired transmission or wireless transmission may be used.
- Ethernet registered trademark
- USB registered trademark
- RS-232C HDMI
- coaxial cable etc.
- wireless transmission wireless LAN, Wi-Fi (registered trademark), Bluetooth (registered trademark), millimeter waves, etc. are used.
- Broadcasting methods that are used include, for example, DVB-T2, DVB-S2, DVB-C2, ATSC3.0, and ISDB-S3.
- FIG. 8 is a diagram for explaining the process of encoding three-dimensional data.
- FIG. 9 is a diagram for explaining the process of decoding three-dimensional data.
- the data division unit 1041 divides the three-dimensional data into one or more three-dimensional spaces, and generates one or more divided three-dimensional data (i.e., one or more divided three-dimensional data).
- the encoding unit 1042 may encode one or more divided three-dimensional data to generate encoded data.
- the data division unit 1041 and the encoding unit 1042 may be included in a single encoding device as components of the single encoding device, or may be included in separate devices.
- Each of the one or more three-dimensional spaces may be referred to as a tile or a space.
- the three-dimensional space is, for example, a bounding box.
- the three-dimensional data contained in each of the divided three-dimensional spaces may be referred to as a slice.
- a slice is divided three-dimensional data, and includes any of a point cloud, a mesh, and a three-dimensional model having position information (Geometry) or attribute information (Attribute).
- Each of the multiple slices is encoded by the encoding unit 1042 for each component, and is output as encoded data.
- the encoded data includes the encoded multiple slices.
- the decoding unit 1051 decodes one or more pieces of divided three-dimensional data (one or more slices) based on the encoded data.
- the data combining unit 1052 combines the one or more pieces of divided three-dimensional data to restore (generate) three-dimensional data.
- the decoding unit 1051 and the data combining unit 1052 may be included in one decoding device as components of the decoding device, or may be included in separate devices. The one or more pieces of divided three-dimensional data decoded by the decoding unit 1051 do not need to be combined.
- the decoding unit 1051 may decode a portion of the one or more pieces of divided three-dimensional data based on a portion of the encoded data, and output the decoded portion of the divided three-dimensional data. In this case, the decoding device does not need to have the data combining unit 1052.
- Figure 10 is a two-dimensional schematic diagram of tiles and slices of three-dimensional data.
- the encoding device may encode using the dependency relationships between the multiple slices, or may encode without using the dependency relationships.
- the encoding device can encode each slice independently, and the processing time can be reduced by encoding multiple slices using parallel processing.
- the decoding device can decode each slice independently, and the processing time can be reduced by decoding multiple slices using parallel processing.
- the decoding device can reduce the amount of processing by partial decoding, which decodes some of the multiple slices.
- the encoding device When encoding using a dependency relationship, the encoding device signals an identifier indicating the dependency relationship and encodes the data in order starting with the dependent data. When multiple slices are encoded using a dependency relationship, the decoding device decodes the data in order starting with the dependent data based on the identifier.
- the three-dimensional data may be divided into any number of divisions and any division method may be used.
- the shape of an object may be determined and multiple three-dimensional points may be divided for each object.
- the three-dimensional data may also be divided based on the number of three-dimensional points contained in a slice. In other words, an upper limit on the number of three-dimensional points contained in one slice may be set.
- the three-dimensional data may also be divided based on whether or not it is contained in a three-dimensional space (tile information) using map information or location information. Multiple tile shapes may overlap.
- the server accumulates multiple pieces of three-dimensional data for the same space.
- the server accumulates, for example, point cloud data and mesh data for the same space.
- the server is an example of an encoding device.
- the terminal switches the three-dimensional data acquired from the server based on the purpose of the terminal, and presents the switched three-dimensional data.
- the terminal may be, for example, a terminal that analyzes three-dimensional data. In this case, the terminal may switch the three-dimensional data to be presented based on the purpose, such as analysis or presentation, or user operation.
- the terminal is an example of a decoding device.
- the terminal may transmit a user's selection result to a server, receive (download) three-dimensional data based on the selection result from the server, and present the received three-dimensional data.
- the three-dimensional data may or may not be encoded by the server.
- the terminal may receive the encoded three-dimensional data from the server, decode the three-dimensional data based on the received encoded three-dimensional data, and present the decoded three-dimensional data.
- FIG. 11 shows an example of a terminal presentation screen that can be switched in response to a user request.
- the terminal presentation screen 1061 may be switched in response to a user request.
- the terminal presentation screen includes a point cloud button 1061a and a mesh button 1061b as a UI for accepting user requests.
- the point cloud button 1061a is a button for accepting a request to present an encoded point cloud (encoded point cloud data).
- the mesh button 1061b is a button for accepting a request to present an encoded mesh (encoded mesh data).
- the terminal accepts the request to present the encoded point cloud and notifies (transmits) the request as a result of the selection of the presented data to the server.
- the terminal accepts the request to present the encoded mesh and notifies (transmits) the request as a result of the selection of the presented data to the server.
- presentation may be expressed as display.
- Meshes are characterized by the small processing load required for presentation, making them suitable for presentation.
- point clouds are characterized by the high accuracy of the positional information of 3D models represented by 3D point clouds, making them suitable for measurement. For example, a user can select a mesh when wishing to observe a 3D model, or a point cloud when wishing to make measurements. By selecting the type of 3D data appropriate for the required application, the user can reduce presentation processing or use the data for accurate measurements.
- FIG. 12 shows an example of a terminal presentation screen that is automatically switched in response to a user operation.
- the terminal performs a process of enlarging the three-dimensional data by accepting a user's operation.
- the terminal may present mesh data as in terminal presentation screen 1062, and at a magnification equal to or greater than the predetermined magnification, present point cloud data as in terminal presentation screen 1063.
- the terminal may present mesh data at a magnification smaller than the predetermined magnification, and when the mesh data is enlarged to the predetermined magnification in response to a user's operation and presented, download point cloud data from a server that corresponds to a portion of the three-dimensional data that is focused on (presented by enlarging), and decode and present the downloaded point cloud data.
- the type of three-dimensional data to be presented is switched in response to an operation to enlarge or reduce the three-dimensional data, but the type of three-dimensional data to be presented may be switched by other operations.
- the terminal may switch to and present point cloud data corresponding to the part of the mesh data selected by the tap.
- the terminal may present the point cloud data by superimposing it on the mesh data, or may switch the point cloud data from the mesh data and present it independently (i.e., without presenting the mesh data).
- the terminal may predict in advance before presentation at a predetermined magnification rate, download point cloud data corresponding to the enlarged part, and decode and present the downloaded point cloud data.
- the terminal may present the downloaded point cloud data as it is without spatial synchronization.
- the terminal may present the point cloud data by aligning it with the mesh data (or the coordinate system of the mesh data) based on synchronization information regarding spatial synchronization.
- the terminal may present the point cloud data without aligning it. Note that, although an example of spatially synchronizing point cloud data with mesh data has been described above, the terminal may also present mesh data in spatial synchronization with the point cloud data.
- the terminal downloads only the necessary point cloud data, and then decrypts and presents the downloaded point cloud data. This is expected to reduce the communication volume between the server and the terminal, reduce the processing load on the terminal, and reduce the latency of presentation time.
- mesh data has a small data size
- point cloud data has a large data size.
- the terminal can reduce the delay until initial presentation by first downloading the mesh data and presenting it initially.
- the three-dimensional data to be presented may be switched based on the communication network bandwidth.
- the terminal may present mesh data when the bandwidth is narrow and the network speed is slower than a predetermined speed, and present point cloud data when the network speed is equal to or faster than the predetermined speed.
- the three-dimensional data to be presented may also be switched depending on the capabilities of the terminal. For example, the three-dimensional data to be presented may be switched based on the processing performance of the terminal or the usage rate of the terminal's CPU.
- the terminal may switch from presenting point cloud data to presenting mesh data if the processing performance or CPU resources of the terminal are insufficient to process the point cloud data.
- presenting mesh data the terminal may switch from presenting mesh data to presenting point cloud data if it is determined that the processing performance or CPU resources of the terminal are sufficient to process the mesh data.
- the terminal may switch the type of 3D data presented, such as presenting the exterior of a building or map as mesh data, and presenting point cloud data when the data includes areas that require measurement, such as distortion, cracks, and warping of the building.
- the terminal can also switch the type of 3D data it presents, for example, presenting mesh data when it is desired to present the external appearance of spaces such as stadiums, halls, and factories, including lighting brightness, color, and atmosphere, and presenting point cloud data when it is desired to measure studio shapes, equipment layouts, and passageway clearances, etc.
- three-dimensional data with different resolutions may be displayed in a switched manner.
- the terminal may switch between presenting multiple point cloud data with different resolutions, multiple mesh data with different resolutions, and the three-dimensional model. For example, the terminal may present low-resolution mesh data when presenting three-dimensional data that is far from the viewpoint, and switch to high-resolution mesh data when the data is closer. This can improve the accuracy of the three-dimensional data display.
- the terminal may present mesh data until a specific movement or location is identified, and then switch to presenting point cloud data of the corresponding body part after the specific movement or location is identified, and use the point cloud data to measure the three-dimensional shape.
- the terminal may switch between three-dimensional models with different resolutions in video games, etc.
- the terminal may, for example, use high-resolution three-dimensional data to present three-dimensional data of important parts, and low-resolution three-dimensional data to present three-dimensional data of unimportant parts. This can improve overall processing performance.
- whether each three-dimensional model is important or not is determined, for example, by whether it is related to the action of the video game (in a shooting game, the player, target, gun, etc. are important), or whether it is related to the player's choices (in a purchasing scene, product information is important, etc.). Note that whether each three-dimensional model is important or not may be set in advance for each video game.
- the terminal may switch between terrain data with different resolutions. This allows more detailed terrain information to be displayed as needed.
- the terminal when displaying thumbnails on the web, the terminal may present low-resolution three-dimensional data, and when measuring or presenting in an application using the three-dimensional data selected on the thumbnail, high-resolution three-dimensional data may be used for measurement or presentation.
- the three-dimensional data when displaying on a two-dimensional display terminal, the three-dimensional data is presented as two-dimensional data when the three-dimensional object (subject) is viewed from a specific viewpoint in a specific direction.
- the three-dimensional data may be divided into camera information indicating a specific viewpoint and a specific direction, and two-dimensional data when the three-dimensional object is viewed from the specific viewpoint in the specific direction.
- the device When display is a priority, the device presents mesh data, which requires light processing, and when making measurements, replaces it with point cloud data suitable for the measurement, making it possible to realize applications that require light processing.
- Figure 13 is a block diagram showing an example of the functional configuration of the server and the terminal.
- the server 1070 includes a data generation unit 1071, a synchronization unit 1075, a point cloud coding unit 1076, a mesh coding unit 1077, a model coding unit 1078, a multiplexing unit 1079, and a data extraction unit 1080.
- the data generation unit 1071 generates three-dimensional data based on at least one of two-dimensional data and three-dimensional data.
- the generated three-dimensional data includes at least two of point cloud data, mesh data, and three-dimensional model data.
- the data generation unit 1071 has a point cloud generation unit 1072, a mesh generation unit 1073, and a model generation unit 1074.
- the data generation unit 1071 only needs to have at least two of the point cloud generation unit 1072, the mesh generation unit 1073, and the model generation unit 1074.
- the point cloud generation unit 1072 generates point cloud data based on at least one of two-dimensional data and three-dimensional data.
- the mesh generation unit 1073 generates mesh data based on at least one of two-dimensional data and three-dimensional data.
- the model generation unit 1074 generates three-dimensional model data by machine learning based on at least one of the two-dimensional data and three-dimensional data.
- the two-dimensional data input to the data generation unit 1071 may be a two-dimensional image acquired by a camera.
- the three-dimensional data input to the data generation unit 1071 may be point cloud data acquired by a sensor such as LiDAR of a space such as a construction site, a factory, or an office.
- the data generation unit 1071 may generate color information corresponding to each point included in the point cloud data of the three-dimensional data as attribute information using a two-dimensional image of the two-dimensional data.
- the three-dimensional data generated by the data generation unit 1071 may be divided into any space.
- the point cloud data, mesh data, and three-dimensional model data may each be divided into any space.
- the synchronization unit 1075 synchronizes the spatial positions of the point cloud data, mesh data, and three-dimensional model data generated by the data generation unit 1071 or the time of each data (playback time, decode time, acquisition time, etc.).
- the time of each data is the playback time, decode time, acquisition time, etc.
- the synchronization unit 1075 may generate synchronization information for synchronization without synchronizing the point cloud data, mesh data, and three-dimensional model data.
- the synchronization unit 1075 may perform a process of synchronizing at least two types of three-dimensional data out of the point cloud data, mesh data, and three-dimensional model data generated by the data generation unit 1071, or generate synchronization information (synchronization signal) for synchronization, and may not perform a process (synchronization process) for synchronizing three types of three-dimensional data.
- the point cloud encoding unit 1076 encodes the point cloud data after the synchronization process is performed by the synchronization unit 1075. Note that the point cloud encoding unit 1076 does not have to encode the point cloud data.
- the point cloud data may be encoded in advance, or may be encoded in response to a request from the terminal 1090.
- the mesh encoding unit 1077 encodes the mesh data after the synchronization process is performed by the synchronization unit 1075.
- the model encoding unit 1078 encodes the three-dimensional model data after the synchronization process is performed by the synchronization unit 1075.
- the multiplexing unit 1079 multiplexes the encoded point cloud data (encoded point cloud), the encoded mesh data (encoded mesh data), the encoded three-dimensional model data, and the synchronization information using a predetermined format or a predetermined multiplexing method. Note that multiplexing by the multiplexing unit 1079 does not have to be performed. In this case, the server 1070 does not have to be equipped with the multiplexing unit 1079.
- the data extraction unit 1080 extracts a portion of the multiplexed three-dimensional data in response to a request from the terminal 1090, and transmits the extracted portion of the three-dimensional data to the terminal 1090.
- data extraction by the data extraction unit 1080 may not be performed.
- the server 1070 may not include the data extraction unit 1080.
- the server 1070 may transmit to the terminal 1090 the three-dimensional data multiplexed by the multiplexing unit 1079.
- the server 1070 may transmit to the terminal 1090 the coded point cloud data (coded point cloud), the coded mesh data (coded mesh), the coded three-dimensional model data (coded three-dimensional model), and the synchronization information, or may transmit to the terminal 1090 a bit stream including the coded point cloud data (coded point cloud), the coded mesh data (coded mesh), the coded three-dimensional model data (coded three-dimensional model), and the synchronization information.
- the terminal 1090 includes a control unit 1091, a decoding unit 1092, and a presentation unit 1093.
- the control unit 1091 transmits a request for a portion of the three-dimensional data to be presented to the server 1070.
- the control unit 1091 may also accept an operation by the user to identify the portion of the three-dimensional data.
- the decoding unit 1092 decodes a portion of the three-dimensional data based on the bit stream (encoded data) obtained from the server 1070.
- the presentation unit 1093 renders and presents a portion of the decoded three-dimensional data.
- the data generation unit 1071 in FIG. 13 may be realized by the data generation unit 1110 shown in FIG. 14.
- FIG. 14 is a block diagram showing another example of the data generation unit of the server.
- the data generation unit 1110 includes a point cloud generation unit 1111, a mesh generation unit 1112, and a model generation unit 1113.
- the point cloud generation unit 1111 has the same functions as the point cloud generation unit 1072.
- the point cloud generation unit 1111 acquires point cloud data obtained from the point cloud sensor 1101 and a two-dimensional image obtained from the camera 1102, and generates point cloud data based on the point cloud data and the two-dimensional image.
- the point cloud data generated by the point cloud generation unit 1111 includes position information of each point and attribute information corresponding to each point indicated by the position information, and includes attribute information (such as color information) extracted from the two-dimensional image.
- the mesh generation unit 1112 generates mesh data based on the point cloud data generated by the point cloud generation unit 1111.
- the model generation unit 1113 has the same functions as the model generation unit 1074.
- the model generation unit 1113 acquires point cloud data obtained from the point cloud sensor 1101 and two-dimensional images obtained from the camera 1102, and generates three-dimensional model data by performing machine learning based on the point cloud data and the two-dimensional images.
- the point cloud data, mesh data, and three-dimensional model data may be data generated independently of each other, as described in FIG. 13.
- the mesh data may be generated from the point cloud data, as described in FIG. 14. Note that the point cloud data may be generated from the mesh data.
- Meshes may be generated from point clouds and vice versa.
- the point cloud data, mesh data, and three-dimensional model data may be generated by the server 1070, or may be generated by a sensor or a terminal 1090 equipped with a sensor.
- the sensor is, for example, a point cloud sensor 1101 and a camera 1102.
- Figures 15 and 16 are diagrams for explaining the process of synchronizing the coordinate systems.
- the origin position (origin coordinates) of the local coordinate system handled by the system may differ from the actual coordinates due to differences in the system or coordinate system used.
- the origin position is the same for the mesh data and the point cloud data, but if the mesh data and the point cloud data are generated in different systems, the origin positions may be different for the mesh data and the point cloud data.
- the origin coordinates of the mesh data coordinate system and the origin coordinates of the point cloud data coordinate system are both, for example, (x1, y1, z1) in world coordinates, and the mesh data coordinate system and the point cloud data coordinate system are the same. Note that if these origin coordinates are not the same, they may be corrected to be the same.
- FIG. 15 shows a case where the three-dimensional spatial region for dividing the mesh data and the three-dimensional spatial region for dividing the point cloud data are the same. Specifically, the number of three-dimensional spatial regions, the size of the three-dimensional spatial regions, and the positions of the three-dimensional spatial regions are the same between the multiple three-dimensional spatial regions in the mesh data coordinate system and the multiple three-dimensional spatial regions in the point cloud coordinate system.
- the positions of the origin of the bounding box that represents the three-dimensional space (black triangle mark) and the maximum value point of the bounding box (black square mark) are the same in the coordinate system of the mesh data and the coordinate system of the point cloud data.
- this bounding box is the same in the coordinate system of the mesh data and the coordinate system of the point cloud data.
- the number of three-dimensional spatial regions, the size of the three-dimensional spatial regions, and the positions of the three-dimensional spatial regions do not have to be completely identical between the multiple three-dimensional spatial regions in the mesh data coordinate system and the multiple three-dimensional spatial regions in the point cloud coordinate system.
- the three-dimensional spatial regions in the mesh data coordinate system may be divided into larger regions, and the three-dimensional spatial regions in the point cloud data coordinate system may be regions into which the three-dimensional spatial regions in the mesh data coordinate system are further divided.
- the unit formed by combining multiple three-dimensional spatial regions in the point cloud data coordinate system is identical to the three-dimensional spatial region in the mesh data coordinate system.
- Figure 17 is a diagram for explaining the relationship between three-dimensional space and encoded data.
- the three-dimensional data includes, for example, point cloud data, mesh data, and three-dimensional models.
- the encoding device encodes each of the three divided three-dimensional data, attaches a header, and creates a data unit.
- the header signals (assigns) the identifier of the space to which the encoded data of the data unit belongs (Space_ID), and the identifier of the data unit (DataUnit_ID).
- the data unit is further given a header that includes the data unit's identifier or length information of the data unit, and is then unitized to generate an encoding method unit.
- Fig. 18 is a diagram showing an example of the syntax of the coding method unit.
- Fig. 19 is a diagram showing an example of the syntax of the coding point group.
- Fig. 20 is a diagram showing an example of the syntax of the coding mesh.
- Fig. 21 is a diagram showing an example of the syntax of the coding 3D model.
- unit_type indicates the type of data unit stored in the encoding method unit. This specifies the type of data unit stored in the encoding method unit.
- Length indicates the length of the data unit.
- data() indicates the body of the data unit.
- unit_type when unit_type indicates 0, it indicates that the data unit is position information (geometry) of the encoded point group. When unit_type indicates 1, it indicates that the data unit is attribute information of the encoded point group. When unit_type indicates 2, it indicates that the data unit is metadata of the encoded point group.
- unit_type when unit_type indicates 0, it indicates that the data unit is position information (geometry) of the encoded mesh. When unit_type indicates 1, it indicates that the data unit is attribute information of the encoded mesh. When unit_type indicates 2, it indicates that the data unit is metadata of the encoded mesh.
- unit_type when unit_type indicates 0, it indicates that the data unit is element 1 of the encoded 3D model. When unit_type indicates 1, it indicates that the data unit is element 2 of the encoded 3D model. When unit_type indicates 2, it indicates that the data unit is metadata of the encoded 3D model.
- syntax shown in Figures 19 to 21 is an example and is not limited to the above configuration. These syntaxes may use a partial configuration of the syntax, or a type (category) not described above may be used, or the order of the syntax components may be changed.
- the syntax of the encoding method unit may have an encoding method unit configuration common to multiple encoding methods as shown in Figure 18, and may indicate the unit_type, length, and data() shown in Figures 19 to 21.
- a header may be added to the encoding method unit to indicate the type of the encoding method unit.
- the encoding unit types include, for example, point_cloud_codec_unit, which indicates point cloud data, mesh_codec_unit, which indicates mesh data, and model_codec_unit, which indicates three-dimensional model data. This makes it possible to handle multiple encoding methods in an integrated manner.
- FIG. 22 shows an example of the syntax of three-dimensional data information.
- the syntax when multiple encoding methods are stored in one format, the number of three-dimensional data included in the format (number_of_3Dformat) and the type of three-dimensional data (format_type) are indicated, and data of each format may be stored. This makes it possible to handle multiple encoding methods or three-dimensional data in an integrated manner, and also makes it possible to identify multiple encoding methods or three-dimensional data.
- 3Ddata_info indicates the format structure information for storing multiple three-dimensional data.
- number_of_3Dformat indicates the number of 3D formats used.
- format_type indicates the type of format of the three-dimensional data to be stored. For example, the number of format_type and the format corresponding to that number may be defined as follows: When format_type indicates 0, it indicates that the format of the three-dimensional data to be stored is point cloud data (point cloud). When format_type indicates 1, it indicates that the format of the three-dimensional data to be stored is mesh data (mesh). When format_type indicates 2, it indicates that the format of the three-dimensional data to be stored is G-PCC data (g-pcc). When format_type indicates 3, it indicates that the format of the three-dimensional data to be stored is V-DMC data (v-dmc). When format_type indicates 4, it indicates that the format of the three-dimensional data to be stored is three-dimensional model data (3Dmodel).
- FIG. 23 is a diagram for explaining the data structure of an encoded point group.
- FIG. 24 is a diagram for explaining the data structure of an encoded mesh.
- FIG. 25 is a diagram for explaining the data structure of an encoded three-dimensional model.
- the encoding device divides each of the multiple types of three-dimensional data into multiple pieces of three-dimensional data for each of the multiple spatial regions, encodes each of the multiple pieces of divided three-dimensional data (i.e., the multiple divided three-dimensional data), and generates encoded data.
- Each encoded data is given a header and contains at least one of the data_unit_id and space_id.
- data_unit_id is an identifier that identifies a data unit within the encoded data, and is unique within the encoded data. Furthermore, space_id indicates identification information for a spatial region. If data_unit_id or space_id is common to multiple pieces of three-dimensional data, the same value is indicated in the multiple pieces of three-dimensional data.
- the data, headers, and other data may be included in a bitstream structure such as a data unit or encoding method, or may be stored in a specified file format such as each ISOBMFF box.
- Fig. 26 is a diagram showing an example of multiple three-dimensional spaces in two dimensions.
- Fig. 27 is a diagram showing an example of a bounding box.
- Fig. 28 is a diagram showing an example of the syntax of three-dimensional space information.
- 3Dspace_info is information that indicates the divided three-dimensional space. 3Dspace_info can be used for partial decoding.
- number_of_space indicates the number of divided three-dimensional spaces.
- space_id indicates the identifier of the divided three-dimensional space.
- the three-dimensional spatial information includes bounding box information for defining the bounding box shown in FIG. 27.
- Bounding box information includes bounding_box_xyz and bounding_box_whd.
- bounding_box_xyz indicates the coordinates of the reference point of the bounding box. In the example of Figure 27, it is expressed as x, y, and z coordinate values (x0, y0, z0).
- bounding_box_whd indicates the size of the bounding box. In the example of Figure 27, it is expressed as width w, height h, and depth d (w0, h0, d0).
- the three-dimensional spatial information may also include an identifier of a data unit for each piece of encoded data. However, the three-dimensional spatial information does not have to include the identifier. In other words, the identifier does not have to be signaled.
- pointcloud_id indicates the identifier of the data unit of the encoded point cloud for the space corresponding to space_id.
- mesh_id indicates the identifier of the data unit of the spatial coding mesh corresponding to space_id.
- model_id indicates the identifier of a data unit of the encoded 3D model of the space corresponding to space_id.
- the identifier of the data unit for each piece of encoded data may be stored in the information indicating each space of the three-dimensional spatial information. This allows the three-dimensional spatial information to be associated with the divided three-dimensional encoded data.
- the three-dimensional spatial information may be associated with an identifier for the data unit for each encoded data by the space_id. In this case, the identifier for the data unit for each encoded data does not need to be stored.
- the division method, the origin of each divided space, and the bounding box size may be the same for the mesh data and the point cloud data, so that the three-dimensional spatial information of the point cloud data and the three-dimensional spatial information of the mesh data can be made common. Also, the same three-dimensional spatial information may be used for the point cloud data and the mesh data. In this way, the three-dimensional spatial information may be common or the same three-dimensional spatial information may be used between multiple different types of three-dimensional data. By commonizing the three-dimensional spatial information, it becomes easy to switch between different types of three-dimensional data (for example, switching the presentation or switching the transmission).
- three-dimensional spatial information does not need to be provided for each piece of three-dimensional data, and one piece of three-dimensional spatial information can be used for each piece of three-dimensional data, so that the amount of data of the three-dimensional spatial information can be reduced.
- the three-dimensional spatial information of the three-dimensional model may be synchronized with other types of three-dimensional data, or may be shared with the three-dimensional spatial information of other types of three-dimensional data.
- Fig. 29 is a flowchart showing an example of partial decoding.
- Fig. 30 is a diagram showing an example of a three-dimensional spatial region that is the subject of partial decoding.
- Fig. 31 is a diagram showing an example of the data structure of a group of encoded points to be partially decoded.
- Fig. 32 is a diagram showing an example of the data structure of an encoded mesh to be partially decoded.
- Fig. 33 is a diagram showing an example of the data structure of an encoded three-dimensional model to be partially decoded.
- the decoding device first determines the three-dimensional spatial region to be subjected to partial decoding (S1001).
- the decoding device uses the three-dimensional space information (3Dspace_info) to identify an area that overlaps with the target three-dimensional space area from the bounding box information of multiple three-dimensional space areas, and obtains the space_id corresponding to the identified area (S1002).
- 3Dspace_info three-dimensional space information
- the decoding device obtains and decodes the data unit having the obtained space_id from the encoded data (S1003). As a result, the decoding device performs partial decoding, which decodes a portion of the three-dimensional data. In partial decoding, the decoding device decodes only a portion of the three-dimensional data, without decoding all of the three-dimensional data.
- the space_id of the three-dimensional space to be obtained is determined to be #2 from the three-dimensional space information.
- the decoding device may also obtain a data unit ID instead of a space_id from the three-dimensional spatial information, and obtain and partially decode a data unit having the obtained data unit ID.
- Figure 34 is a diagram showing an example of a coordinate system of different types of three-dimensional data where spatial synchronization is not achieved.
- Figure 35 is a diagram showing an example of the syntax of three-dimensional data information.
- Figure 36 is a diagram showing an example of the syntax of three-dimensional space information.
- the encoding device may align these origin coordinates by calculating and correcting the relative coordinate values (x1-x2, y1-y2, z1-z2).
- the encoding device may notify (transmit) the calculated relative coordinate values to the decoding device (terminal) as synchronization information.
- the synchronization information may be indicated by the relative position of the point cloud data position (origin) relative to the mesh data position (origin), or may be indicated by the relative position of the mesh data position relative to the point cloud data position.
- the synchronization information may be indicated by the relative position of the positions (origins) of different types of three-dimensional data. Note that when there are three or more types of three-dimensional data, the relative position is calculated based on any one type of three-dimensional data.
- the three-dimensional data information may include space_sync_information, which indicates synchronization information, as shown in FIG. 35.
- the space_sync_information indicates synchronization information for three-dimensional space, and indicates, for example, the amount of deviation in three-dimensional space (the difference between the reference coordinates and the current coordinates, i.e., a relative value).
- synchronization information (space_sync_information) may be stored for each of the multiple format information.
- the data format that is the basis for synchronization may be placed at the beginning of the loop, and synchronization information (relative position information) from the beginning format may be stored from the second loop onwards.
- the three-dimensional space information may also include space_sync_information, which indicates synchronization information, as shown in FIG. 36. If the spatial position is shifted for each three-dimensional space, the synchronization information may be stored in a loop for each three-dimensional space.
- Figure 37 shows an example of the functional configuration of a terminal.
- the terminal 1120 includes a decoding unit 1121 and a synchronous presentation unit 1122.
- the decoding unit 1121 decodes the synchronization information based on the three-dimensional data information or the three-dimensional space information.
- the synchronization presentation unit 1122 aligns and presents the three-dimensional data based on the synchronization information.
- Figure 38 is a flowchart showing an example of spatial synchronization processing.
- the system including the encoding device (server) and the decoding device (terminal) determines whether spatial synchronization is required between the point cloud data and the mesh data (S1011).
- step S1012 If the system determines that spatial synchronization is necessary (Yes in S1011), it executes step S1012; if the system determines that spatial synchronization is not necessary (No in S1011), it executes step S1013.
- step S1012 the system determines whether spatial synchronization has been achieved between the point cloud data and the mesh data (S1012).
- step S1014 If the system determines that spatial synchronization has been achieved (Yes in S1012), it executes step S1014, and if it determines that spatial synchronization has not been achieved (No in S1012), it executes step S1015.
- step S1013 the system presents the point cloud data and mesh data without spatial synchronization (S1013).
- step S1014 the system presents the point cloud data and mesh data in spatial synchronization based on the synchronization information (S1014).
- step S1015 the system presents the point cloud data and mesh data as is (S1015).
- steps S1011 to S1012 may be performed by an encoding device or a decoding device.
- the steps S1013 to S1015 may be performed by a decoding device.
- Whether or not spatial synchronization is required may be switched depending on the application or use. For example, when using three-dimensional data for measurement purposes, accurate positioning is required, so it may be determined that spatial synchronization is required.
- a level of alignment may be specified, in which case the system (encoding device or decoding device) may change the accuracy of synchronization based on the level of alignment.
- time synchronization may be performed to synchronize the presentation time, the decoding time, or the acquisition time. At least one of spatial synchronization and time synchronization may be performed.
- parameters such as the color matrix, color bit rate, and HDR may be set to the same.
- attribute information may be synchronized between multiple 3D data.
- switching of three-dimensional data has been described using different types of three-dimensional data such as point cloud data and mesh data as an example, but the multiple three-dimensional data to be switched is not limited to different types of three-dimensional data.
- the multiple three-dimensional data may be, for example, three-dimensional data with different resolutions, multiple point cloud data with different numbers of points, or multiple mesh data with different numbers of points or faces.
- the multiple three-dimensional data to be switched may be three or more pieces of three-dimensional data.
- the multiple pieces of three-dimensional data to be switched may be multiple pieces of point cloud data acquired at different times.
- the multiple pieces of three-dimensional data to be switched may include point cloud data before construction, point cloud data after construction, point cloud data 10 years later, modeled mesh data, etc., at a construction site.
- three-dimensional model data such as NeRF may be used.
- Three-dimensional model data is a model for presenting three-dimensional data, and may or may not be encoded. Multiple three-dimensional model data for the same space may be switched, or multiple three-dimensional model data may be switched.
- the spatial synchronization method described above can be used.
- by adding an identifier for three-dimensional model data to a data format that handles point cloud data and mesh data in an integrated manner it becomes possible to handle three-dimensional data and three-dimensional model data in an integrated manner.
- the spatial information oil, bounding box, division method, etc.
- the three-dimensional data to be processed may be switched in the following order: mesh data lower than the first resolution, mesh data higher than the first resolution, point cloud data lower than the second resolution, and point cloud data higher than the second resolution.
- the server may store data in a state in which multiple three-dimensional models have been encoded in advance, and extract the three-dimensional data corresponding to the request based on a request from the terminal.
- the server may encode the three-dimensional data corresponding to the request at the timing when the request is made from the terminal.
- the terminal may request the three-dimensional data to be processed in advance, and download the requested three-dimensional data from the server. This can reduce the time it takes for the terminal to present the data.
- the terminal can present divided data corresponding to the avatar's torso as a mesh and divided data corresponding to the face as a point cloud, enabling a more accurate presentation.
- point cloud data, mesh data, and three-dimensional model data have been given as examples of three-dimensional data representing a three-dimensional object, but this is not limiting.
- a three-dimensional object may be represented by multiple sets, each of which includes line-of-sight information indicating a line of sight and a two-dimensional image of the three-dimensional object as viewed from that line of sight.
- data including the multiple sets may be treated as a type of three-dimensional data.
- the three-dimensional data may be data in another format, such as Gaussian splatting data.
- FIG. 39 is a diagram showing an example of the configuration of a decoding device.
- FIG. 40 is a flowchart showing an example of a decoding method using the decoding device.
- the decoding device 1130 includes a circuit 1131 and a memory 1132 connected to the circuit 1131.
- Circuit 1131 performs the following operations:
- the circuit 1131 acquires encoded data including first data representing a three-dimensional object, encoding method information (format) indicating one of the encoding methods including the second data representing the three-dimensional object, and identification information indicating the three-dimensional space including the three-dimensional object (S1021).
- the circuit 1131 decodes the first data and the second data corresponding to the three-dimensional space based on the encoded data (S1022).
- the circuit 1131 renders the first data to generate first presentation data for presentation (S1023).
- the circuit 1131 renders the second data to generate second presentation data for presentation (S1024).
- the circuit 1131 switches from the generated second presentation data to the first presentation data and presents it (S1025).
- the first presentation data and the second presentation data are, for example, two-dimensional data or three-dimensional data generated by the rendering reconstruction unit 1034.
- the first presentation data and the second presentation data are generated based on the first data and the second data corresponding to the three-dimensional space, and the second presentation data is switched to the first presentation data for presentation, so that the presentation can be performed without causing any spatial misalignment when switching between the two data representing the three-dimensional object. Therefore, the first presentation data and the second presentation data can be presented appropriately.
- the first data is point cloud data representing the three-dimensional object.
- the second presentation data is switched to the first presentation data based on the point cloud data, so that the two pieces of data representing the three-dimensional object can be switched and presented without causing any spatial misalignment.
- the second data is mesh data representing the three-dimensional object.
- the second presentation data based on mesh data is switched to the first presentation data, so that the two pieces of data representing the three-dimensional object can be switched and presented without causing any spatial misalignment.
- the second data is three-dimensional model data representing the three-dimensional object.
- the three-dimensional model data represents a machine learning model obtained by machine learning a gaze and a plurality of sets of two-dimensional images.
- the second presentation data based on the three-dimensional model data is switched to the first presentation data, so that the two pieces of data representing the three-dimensional object can be switched and presented without causing any spatial misalignment.
- the second data is a two-dimensional image of the three-dimensional object when viewed from a specific viewing direction.
- the second presentation data based on a two-dimensional image is switched to the first presentation data, so that the two pieces of data representing a three-dimensional object can be switched and presented without causing any spatial misalignment.
- the circuit further receives a presentation data switching request from the user.
- the circuit switches from the second presentation data to the first presentation data in response to the switching request.
- the circuit further receives an operation from a user to change the mode of presentation.
- the circuit changes the mode of presentation in response to the operation, and switches from the second presentation data to the first presentation data in response to the change.
- the circuit acquires the encoded data from an encoding device via a communication network.
- the circuit switches from the second presentation data to the first presentation data in accordance with the bandwidth of the communication network.
- switching can be performed according to the bandwidth of the communication network; for example, when the bandwidth of the communication network changes from less than a specified bandwidth to equal to or greater than the specified bandwidth, the second presentation data can be switched to the first presentation data for presentation.
- the circuit switches from the second presentation data to the first presentation data in the presentation depending on the available capabilities of the circuit.
- switching can be performed according to the capacity of the available circuit. For example, when the capacity of the available circuit changes from less than a specified capacity to equal to or greater than the specified capacity, the second presentation data can be switched to the first presentation data for presentation.
- the encoded data includes synchronization information for synchronizing a coordinate system of the first data and a coordinate system of the second data.
- the circuit presents the first presentation data and the second presentation data based on the synchronization information.
- the first presentation data and the second presentation data can be switched to the first presentation data after the coordinate systems of the data are aligned. This makes it possible to switch and present two pieces of data representing a three-dimensional object in a way that minimizes spatial misalignment.
- the circuit further determines whether or not to synchronize the coordinate system of the first data with the coordinate system of the second data. If the circuit determines that the coordinate system of the first data is to be synchronized with the coordinate system of the second data, the circuit presents the first presentation data and the second presentation data based on the synchronization information during the presentation.
- synchronization processing can be performed when necessary, and can be skipped when it is not necessary. This has the potential to reduce the processing load.
- the first data and the second data each have a common configuration.
- the amount of encoded data can be reduced, and communication capacity can be reduced.
- the encoded data includes spatial information for identifying the three-dimensional space in which the three-dimensional object is included.
- the circuit further obtains a target area indicating a partial area of the three-dimensional space.
- the circuit identifies first duplicate data that is a part of the first data and that overlaps with the target area based on the spatial information. In the decoding, the circuit decodes the identified first duplicate data.
- the amount of data to be acquired can be reduced by acquiring only the first duplicate data. This makes it possible to reduce communication capacity. Also, for example, it is possible to decrypt only the first duplicate data. This makes it possible to reduce processing load.
- the circuit 1131 may also operate according to the decoding method shown in the flowchart of FIG. 41.
- FIG. 41 is a flowchart showing another example of a decoding method performed by a decoding device.
- the circuit 1131 decodes encoding method information that represents the three-dimensional object and indicates a second encoding method that is different from the first encoding method of the first data (S1031).
- the circuit 1131 decodes second data in the second encoding method indicated by the encoding method information (S1032).
- the second data is used to generate second presentation data for presentation.
- the second data of the second encoding method indicated by the encoding method information obtained by decoding is decoded, so that the second data can be obtained for generating second presentation data for appropriate presentation.
- FIG. 42 is a diagram showing an example of the configuration of an encoding device.
- FIG. 43 is a flowchart showing an example of an encoding method performed by the encoding device.
- the encoding device 1140 includes a circuit 1141 and a memory 1142 connected to the circuit 1141.
- Circuit 1141 performs the following operations:
- the circuit 1141 generates encoding method information that represents the three-dimensional object and indicates a second encoding method different from the first encoding method of the first data (S1041).
- the circuit 1141 generates second data in the second encoding method indicated by the encoding method information (S1042).
- the circuit 1141 generates a bitstream that includes the encoding method information and the second data (S1043).
- the second data is used to generate second presentation data for presentation.
- bitstream including the encoding method information and the second data is generated, and a decoding device that acquires the bitstream can obtain the second data for generating second presentation data for appropriate presentation.
- each processing unit included in the encoding device, decoding device, server, terminal, etc. is typically realized as an LSI, which is an integrated circuit. These may be individually implemented as single chips, or may be integrated into a single chip that includes some or all of them.
- the integrated circuit is not limited to LSI, but may be realized by a dedicated circuit or a general-purpose processor. It is also possible to use an FPGA (Field Programmable Gate Array) that can be programmed after the LSI is manufactured, or a reconfigurable processor that can reconfigure the connections and settings of the circuit cells inside the LSI.
- FPGA Field Programmable Gate Array
- each component may be configured with dedicated hardware, or may be realized by executing a software program suitable for each component.
- Each component may be realized by a program execution unit such as a CPU or processor reading and executing a software program recorded on a recording medium such as a hard disk or semiconductor memory.
- the present disclosure may also be realized as a decoding method executed by a decoding device, etc.
- the division of functional blocks in the block diagram is one example, and multiple functional blocks may be realized as one functional block, one functional block may be divided into multiple blocks, or some functions may be transferred to other functional blocks. Furthermore, the functions of multiple functional blocks having similar functions may be processed in parallel or in a time-shared manner by a single piece of hardware or software.
- This disclosure can be applied to a decoding device and a decoding method.
- Three-dimensional data encoding system 1002 Three-dimensional data decoding system 1003 Sensor terminal 1004 External connection unit 1011 Three-dimensional data generation system 1012 Presentation unit 1013 Encoding unit 1014 Multiplexing unit 1015 Input/output unit 1016 Control unit 1017 Sensor information acquisition unit 1018 Three-dimensional data generation unit 1021 Sensor information acquisition unit 1022 Input/output unit 1023 Demultiplexing unit 1024 Decoding unit 1025 Presentation unit 1026 User interface 1027 Control unit 1031 Three-dimensional model learning unit 1032 Three-dimensional model encoding unit 1033 Three-dimensional model decoding unit 1034 Rendering reconstruction unit 1041 Data division unit 1042 Encoding unit 1051 Decoding unit 1052 Data combination unit 1061 Terminal presentation screen 1061a Point cloud button 1061b Mesh button 1062 Terminal presentation screen 1063 Terminal presentation screen 1070 Server 1071 Data generation unit 1072 Point cloud generation unit 1073 Mesh generation unit 1074 Model generation unit 1075 Synchronization unit 1076 Point cloud encoding unit 1077 Mesh encoding unit 1078 Model encoding unit 1079 Multiplex
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Processing Or Creating Images (AREA)
Abstract
Description
本実施の形態に係る三次元データ符号化復号システムの構成を説明する。図1は、本実施の形態に係る三次元データ符号化復号システムの構成例を示す図である。図1に示すように、三次元データ符号化復号システムは、三次元データ符号化システム1001と、三次元データ復号システム1002と、センサ端末1003と、外部接続部1004とを含む。
上記実施の形態では、空間同期を例に、複数の三次元データを同期する例を説明したが、それ以外にも提示時刻、復号時刻、または、取得時刻を合わせる時間同期が実行されてもよい。空間同期及び時間同期の少なくとも一方が実行されてもよい。
1002 三次元データ復号システム
1003 センサ端末
1004 外部接続部
1011 三次元データ生成システム
1012 提示部
1013 符号化部
1014 多重化部
1015 入出力部
1016 制御部
1017 センサ情報取得部
1018 三次元データ生成部
1021 センサ情報取得部
1022 入出力部
1023 逆多重化部
1024 復号部
1025 提示部
1026 ユーザインタフェース
1027 制御部
1031 三次元モデル学習部
1032 三次元モデル符号化部
1033 三次元モデル復号部
1034 レンダリング再構成部
1041 データ分割部
1042 符号化部
1051 復号部
1052 データ結合部
1061 端末提示画面
1061a 点群ボタン
1061b メッシュボタン
1062 端末提示画面
1063 端末提示画面
1070 サーバ
1071 データ生成部
1072 点群生成部
1073 メッシュ生成部
1074 モデル生成部
1075 同期部
1076 点群符号化部
1077 メッシュ符号化部
1078 モデル符号化部
1079 多重化部
1080 データ抽出部
1090 端末
1101 点群センサ
1102 カメラ
1110 データ生成部
1111 点群生成部
1112 メッシュ生成部
1113 モデル生成部
1120 端末
1121 復号部
1122 同期提示部
1130 復号装置
1131 回路
1132 メモリ
Claims (16)
- 回路と、
前記回路に接続されるメモリとを備え、
前記回路は、動作において、
三次元オブジェクトを表す第1データ、及び、前記三次元オブジェクトを表す第2データを含む符号化方式の1つを示す符号化方式情報と、前記三次元オブジェクトが含まれる三次元空間を示す識別情報とを含む符号化データを取得し、
前記符号化データに基づいて、前記三次元空間に対応する、前記第1データ及び前記第2データを復号し、
前記第1データをレンダリングして提示用の第1提示データを生成し、
前記第2データをレンダリングして提示用の第2提示データを生成し、
生成した前記第2提示データから前記第1提示データに切り替えて提示する
復号装置。 - 前記第1データは、前記三次元オブジェクトを表す点群データである
請求項1に記載の復号装置。 - 前記第2データは、前記三次元オブジェクトを表すメッシュデータである
請求項1に記載の復号装置。 - 前記第2データは、前記三次元オブジェクトを表す三次元モデルデータであり、
前記三次元モデルデータは、視線及び二次元画像の複数セットを機械学習することで得られる機械学習モデルを示す
請求項1に記載の復号装置。 - 前記第2データは、前記三次元オブジェクトを所定の視線方向から見た場合の二次元画像である
請求項1に記載の復号装置。 - 前記回路は、さらに、
ユーザからの提示データの切り替え要求を取得し、
前記提示において、前記切り替え要求に応じて、前記第2提示データから前記第1提示データに切り替えて提示する
請求項1から5のいずれか1項に記載の復号装置。 - 前記回路は、さらに、
ユーザから提示の態様を変更させるための操作を受け付け、
前記提示において、前記操作に応じて前記提示の態様を変更し、変更に応じて、前記第2提示データから前記第1提示データに切り替えて提示する
請求項1から5のいずれか1項に記載の復号装置。 - 前記回路は、
前記取得において、前記符号化データを符号化装置から通信ネットワークを介して取得し、
前記提示において、前記通信ネットワークの帯域に応じて、前記第2提示データから前記第1提示データに切り替えて提示する
請求項1から5のいずれか1項に記載の復号装置。 - 前記回路は、
前記提示において、使用可能な前記回路の能力に応じて、前記第2提示データから前記第1提示データに切り替えて提示する
請求項1から5のいずれか1項に記載の復号装置。 - 前記符号化データは、前記第1データの座標系と、前記第2データの座標系とを同期させるための同期情報を含み、
前記回路は前記提示において、前記同期情報に基づいて、前記第1提示データ及び前記第2提示データを提示する
請求項1から5のいずれか1項に記載の復号装置。 - 前記回路は、さらに、
前記第1データの座標系と、前記第2データの座標系とを同期させるか否かを判定し、
前記第1データの座標系と、前記第2データの座標系とを同期させると判定した場合、前記回路は前記提示において、前記同期情報に基づいて、前記第1提示データ及び前記第2提示データを提示する
請求項10に記載の復号装置。 - 前記第1データ及び前記第2データのそれぞれは、前記第1データ及び前記第2データにおいて共通の構成を有する
請求項1から5のいずれか1項に記載の復号装置。 - 前記符号化データは、前記三次元オブジェクトが含まれる前記三次元空間を特定するための空間情報を含み、
前記回路は、さらに、
前記三次元空間の一部の領域を示す対象領域を取得し、
前記空間情報に基づいて、前記第1データのうちの一部の第1重複データであって、前記対象領域と重複する第1重複データを特定し、
前記復号において、特定した前記第1重複データを復号する
請求項1から5のいずれか1項に記載の復号装置。 - 三次元オブジェクトを表す第1データ、及び、前記三次元オブジェクトを表す第2データを含む符号化方式の1つを示す符号化方式情報と、前記三次元オブジェクトが含まれる三次元空間を示す識別情報とを含む符号化データを取得し、
前記符号化データに基づいて、前記第1データ及び前記第2データを復号し、
前記第1データをレンダリングして提示用の第1提示データを生成し、
前記第2データをレンダリングして提示用の第2提示データを生成し、
生成した前記第2提示データから前記第1提示データに切り替えて提示する
復号方法。 - 三次元オブジェクトを表す第1データを復号する復号装置において、
回路と、
前記回路に接続されるメモリとを備え、
前記回路は、動作において、
前記三次元オブジェクトを表し、且つ、前記第1データの第1符号化方式と異なる第2符号化方式を示す符号化方式情報を復号し、
前記符号化方式情報が示す第2符号化方式の第2データを復号し、
前記第2データは、提示用の第2提示データを生成するために用いられる
復号装置。 - 三次元オブジェクトを表す第1データを符号化する符号化装置において、
回路と、
前記回路に接続されるメモリとを備え、
前記回路は、動作において、
前記三次元オブジェクトを表し、且つ、前期第1データの第1符号化方式と異なる第2符号化方式を示す符号化方式情報を生成し、
前記符号化方式情報が示す第2符号化方式の第2データを生成し、
前記符号化方式情報と前記第2データを含むビットストリームを生成し、
前記第2データは、提示用の第2提示データを生成するために用いられる
符号化装置。
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202480036898.XA CN121286014A (zh) | 2023-06-12 | 2024-06-12 | 解码装置、解码方法及编码装置 |
| EP24823399.1A EP4727133A1 (en) | 2023-06-12 | 2024-06-12 | Decoding device, decoding method, and encoding device |
| JP2025527958A JPWO2024257784A1 (ja) | 2023-06-12 | 2024-06-12 | |
| US19/404,294 US20260087678A1 (en) | 2023-06-12 | 2025-12-01 | Decoding device, decoding method, and encoding device |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363472386P | 2023-06-12 | 2023-06-12 | |
| US63/472,386 | 2023-06-12 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US19/404,294 Continuation US20260087678A1 (en) | 2023-06-12 | 2025-12-01 | Decoding device, decoding method, and encoding device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024257784A1 true WO2024257784A1 (ja) | 2024-12-19 |
Family
ID=93852252
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2024/021291 Ceased WO2024257784A1 (ja) | 2023-06-12 | 2024-06-12 | 復号装置、復号方法、及び、符号化装置 |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US20260087678A1 (ja) |
| EP (1) | EP4727133A1 (ja) |
| JP (1) | JPWO2024257784A1 (ja) |
| CN (1) | CN121286014A (ja) |
| WO (1) | WO2024257784A1 (ja) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2014020663A1 (ja) | 2012-07-30 | 2014-02-06 | 三菱電機株式会社 | 地図表示装置 |
| JP2016513437A (ja) * | 2013-02-28 | 2016-05-12 | エルジー エレクトロニクス インコーポレイティド | 信号送受信装置および信号送受信方法 |
-
2024
- 2024-06-12 EP EP24823399.1A patent/EP4727133A1/en active Pending
- 2024-06-12 CN CN202480036898.XA patent/CN121286014A/zh active Pending
- 2024-06-12 JP JP2025527958A patent/JPWO2024257784A1/ja active Pending
- 2024-06-12 WO PCT/JP2024/021291 patent/WO2024257784A1/ja not_active Ceased
-
2025
- 2025-12-01 US US19/404,294 patent/US20260087678A1/en active Pending
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2014020663A1 (ja) | 2012-07-30 | 2014-02-06 | 三菱電機株式会社 | 地図表示装置 |
| JP2016513437A (ja) * | 2013-02-28 | 2016-05-12 | エルジー エレクトロニクス インコーポレイティド | 信号送受信装置および信号送受信方法 |
Non-Patent Citations (1)
| Title |
|---|
| AOKI, SHUICHI; YUJI, OHKAWA; YOSHIRO, TAKIGUCHI: "Development of rendering system for immersive media for presentation on various types of devices", IPSJ SIG TECHNICAL REPORT (CSEC), vol. 2023-AVM-120, no. 14, 21 February 2023 (2023-02-21), pages 1 - 6, XP009559756, ISSN: 2188-8655 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN121286014A (zh) | 2026-01-06 |
| US20260087678A1 (en) | 2026-03-26 |
| EP4727133A1 (en) | 2026-04-15 |
| JPWO2024257784A1 (ja) | 2024-12-19 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP4290868B1 (en) | 3d object streaming method, device, and program | |
| CN110419224B (zh) | 消费视频内容的方法、电子设备和服务器 | |
| US20210006806A1 (en) | An apparatus, a method and a computer program for volumetric video | |
| JP7771318B2 (ja) | 三次元データ符号化方法、三次元データ復号方法、三次元データ符号化装置、及び三次元データ復号装置 | |
| CN116233493B (zh) | 沉浸媒体的数据处理方法、装置、设备及可读存储介质 | |
| KR102499904B1 (ko) | 가상 현실 미디어 콘텐트 내에 포함시키기 위해 실세계 장면의 맞춤화된 뷰의 가상화된 투영을 생성하기 위한 방법들 및 시스템들 | |
| JP7656134B2 (ja) | 三次元データ処理方法及び三次元データ処理装置 | |
| JP2025031761A (ja) | 三次元データ格納方法、三次元データ取得方法、三次元データ格納装置、及び三次元データ取得装置 | |
| US20230206575A1 (en) | Rendering a virtual object in spatial alignment with a pose of an electronic device | |
| CN114095737B (zh) | 媒体文件封装及解封装方法、装置、设备及存储介质 | |
| US20260101063A1 (en) | Encoding device, decoding device, encoding method, and decoding method | |
| WO2024257784A1 (ja) | 復号装置、復号方法、及び、符号化装置 | |
| WO2024257786A1 (ja) | 復号装置、復号方法、符号化装置、符号化方法、及び、装置 | |
| WO2025079587A1 (ja) | 符号化装置、復号装置、符号化方法、及び、復号方法 | |
| WO2025079598A1 (ja) | 符号化装置、復号装置、符号化方法、及び、復号方法 | |
| CN115481280B (zh) | 容积视频的数据处理方法、装置、设备及可读存储介质 | |
| WO2025079588A1 (ja) | 符号化装置、復号装置、符号化方法、及び、復号方法 | |
| JP2024165003A (ja) | シーン記述編集装置及びプログラム | |
| HK40064620A (en) | Data processing method, apparatus, device and readable storage medium for immersive media | |
| CN121151590A (zh) | 媒体文件封装与解封装方法、装置、设备及存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24823399 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2025527958 Country of ref document: JP Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2025527958 Country of ref document: JP |
|
| ENP | Entry into the national phase |
Ref document number: 2024823399 Country of ref document: EP Effective date: 20260112 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024823399 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2024823399 Country of ref document: EP Effective date: 20260112 |
|
| ENP | Entry into the national phase |
Ref document number: 2024823399 Country of ref document: EP Effective date: 20260112 |
|
| ENP | Entry into the national phase |
Ref document number: 2024823399 Country of ref document: EP Effective date: 20260112 |
|
| ENP | Entry into the national phase |
Ref document number: 2024823399 Country of ref document: EP Effective date: 20260112 |
|
| WWP | Wipo information: published in national office |
Ref document number: 2024823399 Country of ref document: EP |