WO2023003349A1 - 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 - Google Patents
포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 Download PDFInfo
- Publication number
- WO2023003349A1 WO2023003349A1 PCT/KR2022/010606 KR2022010606W WO2023003349A1 WO 2023003349 A1 WO2023003349 A1 WO 2023003349A1 KR 2022010606 W KR2022010606 W KR 2022010606W WO 2023003349 A1 WO2023003349 A1 WO 2023003349A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- point cloud
- cloud data
- data
- point
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/816—Monomedia components thereof involving special video data, e.g 3D video
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/001—Model-based coding, e.g. wire frame
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
- G06F3/013—Eye tracking input arrangements
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—Three-dimensional [3D] image rendering
- G06T15/10—Geometric effects
- G06T15/20—Perspective computation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/20—Scenes; Scene-specific elements in augmented reality scenes
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional [3D] objects
- G06V20/647—Three-dimensional [3D] objects by matching two-dimensional images to three-dimensional objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/84—Generation or processing of descriptive data, e.g. content descriptors
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2210/00—Indexing scheme for image generation or computer graphics
- G06T2210/56—Particle system, point based geometry or rendering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/02—Details
- H04L12/16—Arrangements for providing special services to substations
- H04L12/18—Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
- H04L12/1813—Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
Definitions
- Embodiments relate to a method and apparatus for processing point cloud content.
- the point cloud content is content expressed as a point cloud, which is a set of points belonging to a coordinate system representing a 3D space.
- Point cloud content can express three-dimensional media, and provides various services such as VR (Virtual Reality), AR (Augmented Reality), MR (Mixed Reality), and autonomous driving service. used to provide However, tens of thousands to hundreds of thousands of point data are required to express point cloud content. Therefore, a method for efficiently processing a vast amount of point data is required.
- Embodiments provide an apparatus and method for efficiently processing point cloud data.
- Embodiments provide a point cloud data processing method and apparatus for solving latency and encoding/decoding complexity.
- a point cloud data transmission method may include encoding the point cloud data and transmitting a bitstream including the point cloud data. Receiving a bitstream including point cloud data according to embodiments and decoding the point cloud data may be included.
- Apparatus and method according to embodiments may process point cloud data with high efficiency.
- Devices and methods according to embodiments may provide a point cloud service of high quality.
- Devices and methods according to embodiments may provide point cloud content for providing general-purpose services such as VR services and autonomous driving services.
- FIG. 1 is a block diagram illustrating an example of a communication system 1 according to embodiments.
- FIG. 2 illustrates a block configuration diagram of a wireless communication system to which methods according to embodiments may be applied.
- 3 shows an example of a 3GPP signal transmission/reception method.
- FIG. 4 shows an example in which a physical channel is mapped into a self-contained slot according to embodiments.
- FIG. 5 shows an example of an ACK/NACK transmission process and a PUSCH transmission process.
- FIG. 6 shows a downlink structure for media transmission of a 5GMS service according to embodiments.
- FIG. 7 shows an example of a FLUS structure for Uplink service.
- FIG. 8 shows a point cloud data processing system according to embodiments.
- FIG 9 shows an example of a point cloud data processing device according to embodiments.
- FIG. 10 shows an example of a point cloud data processing device according to embodiments.
- FIG. 11 shows an example of a point cloud data processing device according to embodiments.
- FIG. 12 shows an example of a point cloud data processing device according to embodiments.
- FIG. 13 shows an example of a point cloud data processing device according to embodiments.
- FIG. 14 shows an example of a point cloud data processing device according to embodiments.
- 15 shows a transmission structure for a UE on a visited network according to embodiments.
- 16 illustrates a call connection between UEs according to embodiments.
- FIG 17 shows an apparatus for transmitting and receiving point cloud data according to embodiments.
- FIG. 18 shows an architecture for XR communication on a 5G network according to embodiments.
- Figure 19 shows a structure for XR communication according to embodiments
- 20 illustrates a protocol stack for XR interactive service on 3GPP 5G according to embodiments.
- 21 illustrates point-to-point XR videoconferencing according to embodiments.
- FIG. 22 shows an XR videoconferencing extension according to embodiments.
- Figure 23 shows XR videoconferencing extensions according to embodiments.
- FIG. 24 shows an example of a Point Cloud Encoder according to embodiments according to embodiments.
- FIG. 25 shows an example of a point cloud decoder according to embodiments.
- 26 shows an example of an operational flowchart of a transmitting device according to embodiments.
- FIG. 27 shows an example of an operation flowchart of a receiving device according to embodiments.
- 29 shows an example of filtering according to embodiments.
- FIG. 30 shows a vector configuration according to embodiments.
- 32 shows an example of generating an axis for an object of interactive point cloud data according to embodiments.
- 33 illustrates axis selection, estimation, transformation, angle generation, and rotation matrix generation according to an embodiment.
- 34 shows a method of converting point cloud data according to embodiments.
- 35 illustrates a camera point, an image point, and an image plane according to embodiments.
- 36 shows a criterion of point cloud data according to embodiments.
- FIG. 37 shows a relationship between a point, a camera, and a laser projector according to embodiments.
- 40 shows a method of obtaining a sampling eye according to embodiments.
- 41 shows normal vectors of a matrix for neighboring points according to embodiments.
- FIG. 42 shows an example of generating a planar reference axis from vectors related to the user's shoulder and axial vertebrae according to embodiments.
- FIG. 43 shows a face point source and an eye point source according to embodiments.
- 44 shows a vector for a source point according to embodiments.
- 49 shows metadata according to embodiments.
- 50 shows metadata according to embodiments.
- 51 shows a point cloud data transmission method according to embodiments.
- FIG. 52 shows a method for receiving point cloud data according to embodiments.
- FIG. 1 is a block diagram illustrating an example of a communication system 1 according to embodiments.
- a communication system 1 includes wireless devices 100a to 100f, a base station (BS) 200 and a network 300 .
- a base station (BS) 200 includes a fixed station, a Node B, an evolved-NodeB (eNB), a Next Generation NodeB (gNB), a base transceiver system (BTS), an access point (AP) ), a network or a 5G (5th generation) network node, an Artificial Intelligence (AI) system, a road side unit (RSU), a robot, an Augmented Reality/Virtual Reality (AR/VR) system, a server, and the like.
- AI Artificial Intelligence
- RSU road side unit
- AR/VR Augmented Reality/Virtual Reality
- a wireless device refers to a device that communicates with a base station and / or other wireless devices using a radio access technology (eg, 5G New RAT (NR), Long Term Evolution (LTE)), It may be called a communication/wireless/5G device or a user equipment (UE).
- a radio access technology eg, 5G New RAT (NR), Long Term Evolution (LTE)
- NR 5G New RAT
- LTE Long Term Evolution
- UE user equipment
- the wireless device is not limited to the above embodiments, and includes a robot 100a, a vehicle 100b-1 and 100b-2, an XR (eXtended Reality) device 100c, and a hand-held device 100d.
- home appliances 100e Internet of Thing (IoT) devices 100f
- AI devices/servers 400 e.g., Internet of Thing (IoT) devices 100f, and AI devices/servers 400.
- the XR device 100c represents a device that provides XR content (eg, Augmented Reality (AR)/Virtual Reality (VR)/Mixed Reality (MR) content).
- an XR device may be referred to as an AR/VR/MR device.
- the XR device 100c is a Head-Mounted Device (HMD), a Head-Up Display (HUD) installed in a vehicle, a television, a smartphone, a computer, a wearable device, a home appliance, and a digital signage. , can be implemented in the form of vehicles, robots, etc.
- HMD Head-Mounted Device
- HUD Head-Up Display
- the vehicles 100b-1 and 100b-2 are vehicles equipped with a wireless communication function, self-driving vehicles, vehicles capable of performing inter-vehicle communication, UAVs (Unmanned Aerial Vehicles) (eg, drones), and the like.
- the mobile device 100d may include a smart phone, a smart pad, a wearable device (eg, a smart watch, a smart glass), a computer (eg, a laptop computer), and the like.
- the home appliance 100e may include a TV, a refrigerator, a washing machine, and the like.
- the IoT device 100f may include a sensor, a smart meter, and the like.
- the wireless devices 100a to 100f may be connected to the network 300 through the base station 200 .
- the wireless devices 100a to 100f may be connected to the AI server 400 through the network 300 .
- the network 300 may be configured using a 3G network, a 4G (eg, LTE) network, a 5G (eg, NR) network, or a 6G network.
- the wireless devices 100a to 100f may communicate with each other through the base station 200/network 300, but may also communicate directly (eg, sidelink communication) without going through the base station/network.
- the wireless devices 100a to 100f/base station 200 and base station 200/base station 200 may transmit and receive radio signals through wireless communication/connection 150a, 150b, and 150c.
- Wireless communication/connection includes uplink/downlink communication (150a), which is communication between wireless devices and base stations, sidelink communication (150b) (or D2D communication), which is communication between wireless devices, and communication between base stations (150c). ) (e.g. relay, integrated access backhaul (IAB) and various radio access technologies (eg, 5G, NR, etc.).
- IAB integrated access backhaul
- the wireless devices 100a to 100f and the base station 200 according to the embodiments Signals can be transmitted/received through various physical channels of the wireless communication/connection 150a, 150b, and 150c.
- various types of transmission/reception of radio signals can be performed.
- At least one or more of a configuration information setting process, various signal processing processes (eg, channel encoding/decoding, modulation/demodulation, resource mapping/demapping, etc.), resource allocation process, etc. may be performed.
- a user terminal (eg, an XR device (eg, the XR device 100c of FIG. 1 )) according to embodiments provides XR content such as audio/video data, voice data, and surrounding information data. Specific information including XR data (or AR/VR data) required for this may be transmitted to a base station or other user terminal through a network.
- a user terminal may perform an initial access operation to a network. During the initial access process, the user terminal may acquire cell search and system information for acquiring downlink (DL) synchronization.
- Downlink represents communication from a base station (eg, BS) or a transmitter that is part of the base station to a user equipment (UE) or a receiver included in the user equipment.
- a user terminal may perform a random access operation for accessing a network.
- the user terminal may transmit a preamble for uplink (UL) synchronization acquisition or UL data transmission, and may perform a random access response reception operation.
- Uplink represents communication from a UE or a transmitting unit that is part of a UE to a BS or a receiving unit that is part of a BS.
- the UE may perform an UL Grant reception operation to transmit specific information to the BS.
- the uplink grant is for receiving time/frequency resource scheduling information for uplink data transmission.
- a user terminal may transmit specific information to a base station through a 5G network based on a UL grant.
- a base station may perform XR content processing.
- the user terminal may perform a downlink grant (DL Grant) reception operation to receive a response to specific information through the 5G network.
- DL Grant downlink grant
- a downlink grant represents receiving time/frequency resource scheduling information to receive downlink data.
- the user terminal may receive a response to specific information through the network based on the downlink grant.
- FIG. 2 illustrates a block configuration diagram of a wireless communication system to which methods according to embodiments may be applied.
- the wireless communication system includes a first communication device 910 and/or a second communication device 920 .
- 'A and/or B' may be interpreted as having the same meaning as 'including at least one of A or B'.
- the first communication device may represent a BS and the second communication device may represent a UE (or the first communication device may represent a UE and the second communication device may represent a BS).
- the first communication device and the second communication device include processors 911 and 921, memories 914 and 924, one or more Tx/Rx RF modules (radio frequency modules 915 and 925), Tx processors 912 and 922, and Rx processors 913 and 923. , antennas 916 and 926. Tx/Rx modules are also called transceivers.
- the processor 911 may perform a signal processing function of a layer higher than the physical layer (eg, layer 2 (L2)). For example, in the Downlink, or DL (communication from a first communication device to a second communication device), higher layer packets from the core network are provided to the processor 911 .
- L2 layer 2
- the processor 911 provides multiplexing between logical channels and transport channels and radio resource allocation to the second communication device 920, and is responsible for signaling to the second communication device do.
- the first communication device 910 and the second communication device 920 are processors (eg, audio/video encoder, audio/video decoder, etc. ) may further include.
- the processor according to the embodiments processes video data corresponding to various video standards (eg, video standards such as MPEG2, AVC, HEVC, and VVC) and various audio standards (eg, MPEG 1 Layer 2 Audio, AC3, and HE).
- video standards eg, video standards such as MPEG2, AVC, HEVC, and VVC
- various audio standards eg, MPEG 1 Layer 2 Audio, AC3, and HE
- -Audio data processed by audio standards such as AAC, E-AC-3, HE-AAC, NGA, etc.
- the processor may process XR data or XR media data processed using Video-based Point Cloud Compression (V-PCC) or Geometry-based Point Cloud Compression (G-PCC).
- a processor processing higher layer data may be implemented as a single processor or a single chip by being combined with the processors 911 and 921 .
- a processor processing upper layer data may be implemented as a separate chip or a separate processor from the processors 911 and 921 .
- the transmit (TX) processor 912 implements various signal processing functions for the L1 layer (ie, physical layer).
- the signal processing function of the physical layer can facilitate forward error correction (FEC) in the second communication device.
- the signal processing function of the physical layer includes coding and interleaving.
- a signal that has undergone encoding and interleaving is modulated into complex valued modulation symbols through scrambling and modulation.
- BPSK, QPSK, 16QAM, 64QAM, 246QAM, etc. may be used for modulation depending on the channel.
- Complex-valued modulation symbols (hereafter referred to as modulation symbols) are divided into parallel streams, each stream mapped to an OFDM subcarrier, multiplexed with a reference signal in the time and/or frequency domain, and put together using IFFT. Combined to create a physical channel carrying a stream of time domain OFDM symbols.
- OFDM symbol streams are spatially precoded to create multiple spatial streams.
- Each spatial stream may be provided to a different antenna 916 via a separate Tx/Rx module (or transceiver 915).
- Each Tx/Rx module can frequency upconvert each spatial stream to an RF carrier for transmission.
- each Tx/Rx module (or transceiver) 925 receives a signal of an RF carrier through each antenna 926 of each Tx/Rx module.
- Each Tx/Rx module restores the signal of the RF carrier to a baseband signal and provides it to the receive (RX) processor 923.
- the RX processor implements various signal processing functions of L1 (ie, physical layer).
- the RX processor may perform spatial processing on the information to recover any spatial stream destined for the second communication device.
- multiple spatial streams are destined for the second communication device, they may be combined into a single OFDMA symbol stream by multiple RX processors.
- the RX processor converts the OFDM symbol stream, which is a time domain signal, into a frequency domain signal using a Fast Fourier Transform (FFT).
- FFT Fast Fourier Transform
- the frequency domain signal includes a separate OFDM symbol stream for each subcarrier of the OFDM signal.
- the modulation symbols and reference signal on each subcarrier are recovered and demodulated by determining the most probable signal constellation points transmitted by the first communication device. These soft decisions may be based on channel estimate values.
- the soft decisions are decoded and deinterleaved to recover the data and control signals originally transmitted by the first communication device on the physical channel. Corresponding data and control signals are provided to processor 921 .
- the UL (communication from the second communication device to the first communication device) is handled in the first communication device 910 in a manner similar to that described with respect to the receiver function in the second communication device 920 .
- Each Tx/Rx module 925 receives a signal through a respective antenna 926.
- Each Tx/Rx module provides an RF carrier and information to the RX processor 923.
- Processor 921 may be associated with memory 924 that stores program codes and data. Memory may be referred to as a computer readable medium.
- FIGS. 3 to 5 show examples of one or more signal processing methods and/or operations for a physical L1 layer (ie, physical layer). Examples disclosed in FIGS. 3 to 5 may be the same as or similar to examples of signal processing methods and/or operations performed by the transmit (TX) processor 912 and/or the transmit (TX) processor 922 described in FIG. 2 . there is.
- 3 shows an example of a 3GPP signal transmission/reception method.
- the UE when the UE is powered on or enters a new cell, it may perform an initial cell search task such as synchronizing with a BS (S201).
- the UE may synchronize with the BS by receiving a primary synchronization channel (P-SCH) and a secondary synchronization channel (S-SCH) from the BS and obtain information such as a cell ID. .
- the P-SCH and the S-SCH may be referred to as a primary synchronization signal (PSS) and a secondary synchronization signal (SSS), respectively.
- the UE may obtain intra-cell broadcast information by receiving a physical broadcast channel (PBCH) from the BS. Meanwhile, the UE may check the downlink channel state by receiving a downlink reference signal (DL RS) in the initial cell search step.
- PBCH physical broadcast channel
- DL RS downlink reference signal
- the UE can obtain more detailed system information by receiving the PDSCH according to the PDCCH and the information carried on the PDCCH (S202).
- the UE may perform a random access procedure for the BS (steps S203 to S206).
- the UE may transmit a specific sequence as a preamble through PRACH (S203 and S205) and receive a random access response (RAR) message for the preamble through a PDCCH and a corresponding PDSCH (S204 and S205).
- RAR random access response
- a contention resolution procedure may be additionally performed.
- the UE may perform PDCCH/PDSCH reception (S207) and PUSCH/PUCCH transmission (S208) as a general uplink/downlink signal transmission process.
- the UE receives DCI through the PDCCH.
- the UE monitors a set of PDCCH candidates at monitoring occasions configured in one or more control element sets (CORESETs) on the serving cell according to corresponding search space configurations.
- the set of PDCCH candidates to be monitored by the UE may be defined in terms of search space sets.
- a search space set according to embodiments may be a common search space set or a UE-specific search space set.
- a CORESET consists of a set of (physical) resource blocks having a time duration of 1 to 3 OFDM symbols.
- the network may configure the UE to have multiple CORESETs.
- the UE monitors PDCCH candidates in one or more search space sets.
- monitoring means attempting to decode PDCCH candidate(s) within the search space. If the UE succeeds in decoding one of the PDCCH candidates in the search space, the UE determines that it has detected a PDCCH in the corresponding PDCCH candidate, and can perform PDSCH reception or PUSCH transmission based on the DCI in the detected PDCCH.
- PDCCH may be used to schedule DL transmissions on PDSCH and UL transmissions on PUSCH.
- the DCI on the PDCCH is a downlink assignment (i.e., DL grant) that includes at least modulation and coding format and resource allocation information related to the downlink shared channel, or related to the uplink shared channel. , may include a UL grant including modulation and coding formats and resource allocation information.
- the UE can acquire DL synchronization by detecting the SSB.
- the UE can identify the structure of the SSB burst set based on the detected SSB (time) index (SSB index, SSBI), and can detect the symbol/slot/half-frame boundary accordingly.
- the frame/half-frame number to which the detected SSB belongs may be identified using system frame number (SFN) information and half-frame indication information.
- SFN system frame number
- the UE may obtain a 10-bit SFN for a frame to which the PBCH belongs from the PBCH.
- the UE may acquire 1-bit half-frame indication information to determine whether the corresponding PBCH belongs to the first half-frame or the second half-frame among frames.
- the half-frame indication bit value when the half-frame indication bit value is 0, it indicates that the SSB to which the PBCH belongs belongs to the first half-frame within the frame. If the half-frame indication bit value is 1, it indicates that the SSB to which the PBCH belongs belongs to the second half-frame within the frame.
- the UE may acquire the SSBI of the SSB to which the PBCH belongs based on the DMRS sequence and the PBCH payload carried by the PBCH.
- Table G1 below shows the random access procedure of the UE.
- Step 1 PRACH preamble in UL * initial beam acquisition
- Step 4 Contention resolution on DL * Temporary C-RNTI on PDCCH for initial access
- the random access process is used for a variety of purposes.
- the random access procedure may be used for network initial access, handover, and UE-triggered UL data transmission.
- the UE may acquire UL synchronization and UL transmission resources through a random access procedure.
- the random access process is divided into a contention-based random access process and a contention-free random access process.
- FIG. 4 shows an example in which a physical channel is mapped into a self-contained slot according to embodiments.
- PDCCH may be transmitted in the DL control region, and PDSCH may be transmitted in the DL data region.
- PUCCH may be transmitted in the UL control region, and PUSCH may be transmitted in the UL data region.
- the GP provides a time gap between the base station and the UE in a process of switching from a transmission mode to a reception mode or a process of switching from a reception mode to a transmission mode. Some symbols at the time of transition from DL to UL within a subframe may be set as GPs.
- a PDCCH carries Downlink Control Information (DCI).
- DCI Downlink Control Information
- PCCCH includes transmission format and resource allocation of downlink shared channel (DL-SCH), resource allocation information for uplink shared channel (UL-SCH), paging information for paging channel (PCH), It carries system information on DL-SCH, resource allocation information for higher layer control messages such as random access response transmitted on PDSCH, transmission power control command, and activation/cancellation of Configured Scheduling (CS).
- the DCI includes a cyclic redundancy check (CRC), and the CRC is masked/scrambled with various identifiers (eg, Radio Network Temporary Identifier, RNTI) according to the owner or usage of the PDCCH.
- CRC cyclic redundancy check
- the CRC is masked with a terminal identifier (eg, Cell-RNTI, C-RNTI). If the PDCCH is for paging, the CRC is masked with Paging-RNTI (P-RNTI). If the PDCCH is related to system information (eg, System Information Block, SIB), the CRC is masked with System Information RNTI (SI-RNTI). If the PDCCH is for a random access response, the CRC is masked with RA-RNTI (Random Access-RNTI).
- a terminal identifier eg, Cell-RNTI, C-RNTI
- P-RNTI Paging-RNTI
- SIB System Information Block
- SI-RNTI System Information RNTI
- RA-RNTI Random Access-RNTI
- the PDCCH is composed of 1, 2, 4, 8, and 16 Control Channel Elements (CCEs) according to Aggregation Levels (ALs).
- CCE is a logical allocation unit used to provide a PDCCH of a predetermined code rate according to a radio channel state.
- CCE consists of six REGs (Resource Element Groups).
- REG is defined as one OFDM symbol and one (P)RB.
- the PDCCH is transmitted through a CORESET (Control Resource Set).
- CORESET is defined as a set of REGs with a given numonology (eg SCS, CP length, etc.).
- a plurality of CORESETs for one UE may overlap in the time/frequency domain.
- CORESET may be set through system information (eg, Master Information Block, MIB) or UE-specific upper layer (eg, Radio Resource Control, RRC, layer) signaling. Specifically, the number of RBs and the number of OFDM symbols constituting the CORESET (up to 3) may be set by higher layer signaling.
- MIB Master Information Block
- RRC Radio Resource Control
- the UE monitors PDCCH candidates.
- the PDCCH candidate indicates CCE(s) that the UE should monitor for PDCCH detection.
- Each PDCCH candidate is defined as 1, 2, 4, 8, or 16 CCEs according to AL.
- Monitoring includes (blind) decoding of PDCCH candidates.
- a set of PDCCH candidates monitored by the UE is defined as a PDCCH search space (Search Space, SS).
- the search space includes a Common Search Space (CSS) or a UE-specific search space (USS).
- the UE may obtain DCI by monitoring PDCCH candidates in one or more search spaces configured by MIB or higher layer signaling.
- Each CORESET is associated with one or more search spaces, and each search space is associated with one COREST.
- a search space can be defined based on the following parameters.
- controlResourceSetId Indicates a CORESET related to the search space
- An opportunity (eg, time / frequency resource) to monitor PDCCH candidates is defined as a PDCCH (monitoring) opportunity.
- PDCCH (monitoring) opportunity One or more PDCCH (monitoring) opportunities may be configured within a slot.
- UCI Uplink Control Information
- -HARQ (Hybrid Automatic Repeat request)-ACK (Acknowledgement): This is a response to a downlink data packet (eg, codeword) on the PDSCH. Indicates whether a downlink data packet has been successfully received. In response to a single codeword, 1 bit of HARQ-ACK may be transmitted, and 2 bits of HARQ-ACK may be transmitted in response to two codewords.
- HARQ-ACK responses include positive ACK (simply, ACK), negative ACK (NACK), DTX or NACK/DTX.
- HARQ-ACK is mixed with HARQ ACK/NACK and ACK/NACK.
- MIMO-related feedback information includes a Rank Indicator (RI) and a Precoding Matrix Indicator (PMI).
- PUSCH carries uplink data (eg, UL-SCH transport block, UL-SCH TB) and / or uplink control information (UCI), and CP-OFDM (Cyclic Prefix - Orthogonal Frequency Division Multiplexing) waveform or It is transmitted based on a DFT-s-OFDM (Discrete Fourier Transform - spread - Orthogonal Frequency Division Multiplexing) waveform.
- DFT-s-OFDM Discrete Fourier Transform - spread - Orthogonal Frequency Division Multiplexing
- the terminal when transform precoding is impossible (eg, transform precoding is disabled), the terminal transmits a PUSCH based on a CP-OFDM waveform, and when transform precoding is possible (eg, transform precoding is enabled), the terminal transmits a CP-OFDM waveform.
- the PUSCH may be transmitted based on an OFDM waveform or a DFT-s-OFDM waveform.
- PUSCH transmission is dynamically scheduled by the UL grant in DCI or semi-static based on higher layer (eg, RRC) signaling (and/or Layer 1 (L1) signaling (eg, PDCCH)) It can be scheduled (configured grant).
- PUSCH transmission may be performed on a codebook basis or a non-codebook basis.
- FIG. 5 shows an example of an ACK/NACK transmission process and a PUSCH transmission process.
- 5(a) shows an example of an ACK/NACK transmission process.
- the UE may detect the PDCCH in slot #n.
- the PDCCH includes downlink scheduling information (eg, DCI formats 1_0 and 1_1), and the PDCCH indicates a DL assignment-to-PDSCH offset (K0) and a PDSCH-HARQ-ACK reporting offset (K1).
- DCI formats 1_0 and 1_1 may include the following information.
- -Frequency domain resource assignment Represents a set of RBs allocated to PDSCH
- K0 indicating the start position (eg, OFDM symbol index) and length (eg, number of OFDM symbols) of the PDSCH in the slot
- HARQ process ID (Identity) for data (eg, PDSCH, TB)
- the UE may receive PDSCH in slot #(n+K0) according to the scheduling information of slot #n, and then transmit UCI through PUCCH in slot #(n+K1).
- UCI includes a HARQ-ACK response for PDSCH. If the PDSCH is configured to transmit up to 1 TB, the HARQ-ACK response may consist of 1-bit. When the PDSCH is configured to transmit up to two TBs, the HARQ-ACK response may consist of 2-bits if spatial bundling is not configured and 1-bit if spatial bundling is configured.
- the UCI transmitted in slot #(n+K1) includes HARQ-ACK responses for the plurality of PDSCHs.
- MAC medium access control
- Each DL HARQ process manages state variables related to the number of transmissions of MAC PDUs (Physical Data Blocks) in the buffer, HARQ feedback for the MAC PDUs in the buffer, and the current redundancy version.
- MAC PDUs Physical Data Blocks
- Each HARQ process is distinguished by a HARQ process ID.
- 5(b) shows an example of a PUSCH transmission process.
- the UE may detect the PDCCH in slot #n.
- the PDCCH includes uplink scheduling information (eg, DCI format 0_0, 0_1).
- DCI formats 0_0 and 0_1 may include the following information.
- -Frequency domain resource assignment Represents a set of RBs allocated to the PUSCH
- Time domain resource assignment Indicates the slot offset K2, the start position (eg, symbol index) and length (eg, number of OFDM symbols) of the PUSCH in the slot.
- the start symbol and length may be indicated through SLIV (Start and Length Indicator Value) or may be indicated separately.
- the UE may transmit PUSCH in slot #(n+K2) according to the scheduling information of slot #n.
- PUSCH includes UL-SCH TB.
- Embodiments may be applied to a 5G-based media streaming (hereinafter referred to as 5GMS) system.
- the 5GMS structure is a system that supports MNO (Mobile Network Operator) and third party's media downlink streaming service.
- MNO Mobile Network Operator
- the 5GMS structure supports related network or UE functions and APIs, and provides backward compatibles regardless of whether MBMS is supported or not and/or 5G standard or EUTRAN installation.
- the definition of Streaming used in media using 5G is defined as the generation and delivery of time-continuous media, and the definition of Streaming Point indicates that the transmitter and receiver directly transmit and consume.
- the 5GMS structure basically operates in downlink and uplink environments and has bidirectionality.
- the 5GMS service may use 3G, 4G, and 6G networks as well as 5G networks, and is not limited to the above-described embodiments.
- Embodiments may also provide a network slicing function according to service types.
- FIG. 6 shows a downlink structure for media transmission of a 5GMS service according to embodiments.
- FIG. 6 shows a media transmission structure for at least one of 4G, 5G, and 6G networks and is a device method capable of operating in a unidirectional downlink media streaming environment. Since it is a downlink system, media is produced in the network and Trusted Media Function, and the media is delivered to the UE.
- Each block diagram is conceptually composed of a set of functions necessary for media transmission and reception.
- Inter-Connection Interface refers to a link for sharing or controlling a specific part of each media block, and is used when not all necessary element technologies are utilized. For example, 3rd party external application and operator application can be connected to enable communication through Inter-Connection Interface when functions such as information sharing (user data, media track, etc.) are required even though independent application operation is performed.
- Media includes all information and media such as time continuous, time discontinuous, image, picture, video, audio, text, etc., and additionally includes all the format and size of the format in which the corresponding media is to be transmitted. .
- Sink in FIG. 6 represents a UE, a processor included in the UE (for example, the processor 911 for signal processing of a higher layer described in FIG. 2 ), or hardware constituting the UE.
- the sink according to the embodiments may perform a receiving operation in which a streaming service is received in the form of unicast from a source providing media to the sink.
- a sink according to embodiments may receive control information from a source and perform a signal processing operation based on the control information.
- Sink according to embodiments may receive media/metadata (eg, XR data or extended media data) from a source.
- Sink according to embodiments may include a 3rd Party External Application block, an Operator Application block, and/or a 5G Media Reception Function block.
- the 3rd Party External Application block and the Operator Application block represent UE Applications operating in the Sink stage.
- the 3rd Party External Application block according to the embodiments is an application operated by a third party that exists other than 4G, 5G, and 6G networks, and can drive API access of Sink.
- the 3rd Party External Application block according to embodiments may receive information using 4G, 5G, or 6G networks or through direct Point-to-Point Communication. Therefore, Sink's UE can receive additional services through Native or Download Installed Applications.
- the operator application block may manage applications (5G Media Player) associated with a media streaming driving environment including media applications. When the application is installed, Sink's UE can start accessing media services through API using Application Socket and send and receive related data information.
- the API enables data to be transmitted to a specific end-system through a session configuration using a socket.
- Socket connection method can be transmitted through general TCP-based internet connection.
- the sink can receive control/data information from the Cloud Edge and perform offloading to transmit the control/data information to the Cloud Edge.
- a sink may include an offloading management block. Offloading management according to embodiments may control operations of an operator application block and/or a 3rd party application block in order to control sink offloading.
- the 5G Media Reception Function block may receive operations related to offloading from the offloading management block, obtain media that can be received through 4G, 5G, and 6G networks, and process the media.
- a 5G Media Reception Function block may include a general Media Access Client block, a DRM Client block, a Media Decoder, a Media Rendering Presentation block, an XR Rendering block, an XR Media Processing block, and the like.
- the corresponding block is only an example, and the name and/or operation are not limited to the embodiments.
- the Media Access Client block may receive data, eg, a media segment, received through at least one or more of 4G, 5G, and 6G networks.
- the Media Access Client block may de-format (or decapsulate) various media transmission formats such as DASH, CMAF, and HLS.
- Data output from the Media Access Client block can be processed and displayed according to each decoding characteristic.
- the DRM Client block may determine whether to use the received data. For example, the DRM client block can perform a control operation so that authorized users can use media information within the access range.
- the Media Decoding block is a general audio/video decoder, and among deformatted data, various standards (video standards such as MPEG2, AVC, HEVC, VVC, and MPEG 1 Layer 2 Audio, AC3, HE-AAC, E- Audio/video data processed according to audio standards such as AC-3, HE-AAC, NGA, etc.) can be decoded.
- a Media Rendering Presentation block may render media suitable for a receiving device.
- a Media Rendering Presentation block according to embodiments may be included in a Media decoding block.
- An XR Media Processing block and an XR Rendering block according to embodiments are blocks for processing XR data among deformatted data (or decapsulated data).
- the XR Media Processing block (for example, the processor 911 described in FIG. 2 or a processor that processes higher layer data) is the XR data received from the source or the information received from the offloading management block (for example, Object information, Position information). etc.) can be used to perform processing on XR media.
- An XR rendering block according to embodiments may render and display XR media data among received media data.
- An XR Media Processing block and an XR rendering block according to embodiments may process and render point cloud data processed according to a Video-based Point Cloud Compression (V-PCC) or Geometry-based Point Cloud Compression (G-PCC) method.
- V-PCC Video-based Point Cloud Compression
- G-PCC Geometry-based Point Cloud Compression
- V-PCC Video-based Point Cloud Compression
- G-PCC Geometry-based Point Cloud Compression
- Source indicates a media server using at least one of 4G, 5G, and 6G networks or a UE capable of providing media and can perform Control Function and Server Function functions.
- the Server Function starts and hosts 4G, 5G, and 6G media services.
- 3rd Party Media Server refers to various media servers operated by third parties that exist outside of 4G, 5G, and 6G networks, and can be a Network External Media Application Server.
- External Server which is generally operated by a third party service, can equally perform media creation, encoding, formatting, etc. in a non-4G, 5G, or 6G network.
- the control function represents a network-based application function, and may include a control-oriented information delivery function when performing authentication of Sink and other media servers and media.
- the source can start a connection through the API connection of the internal application through the control function, form a media session, or perform other additional information requests.
- the source exchanges PCF information with other network functions through the control function.
- the source can check the external network capability using NEF through the control function and perform general monitoring and provisioning through the exposure process. Therefore, NEF can receive other network information and store the received information as structured data using a specific standardized interface. The stored information can be exposed/re-exposure to other networks and applications by NEF, and the information exposed in various network environments can be collected and used for analysis.
- an API control plane is formed, and when a session connection is made, an environment in which media can be transmitted is formed including tasks such as security (authentication, authorization, etc.).
- multiple APIs can be created or a Control Plane can be created through one API.
- APIs can be created from third party media servers, and Media Control Functions and APIs of UEs can form Media User Plane APIs.
- Source can generate and deliver media in various ways to perform Downlink media service functions, and includes all functions that can deliver media to the UE corresponding to the sink, the final destination, starting from simply storing media and serving as a media relaying. can do.
- Modules or blocks inside Sink and Source may transmit and share information through an Inter-Connection Link and Inter-Connection Interface having bi-directionality.
- Embodiments describe a UL structure and method for transmitting media produced content in real time in a 5GMS system to social media, users, and servers.
- Uplink basically defines that media is not delivered to users in the form of distribution, but media is produced from the user terminal UE's point of view and delivered to the media server.
- the uplink system is configured in a form in which individual users directly provide content, so the use case and system structure to utilize the system configuration method handled by the terminal can be configured in a form different from that of the downlink.
- the FLUS system consists of a source entity that produces media and a sink entity that consumes media, and services such as voice, video, and text are delivered through 1:1 communication.
- the FLUS Source can be a single UE or multiple scattered UEs or Capture Devices. Since it is based on the 5G network, it can support 3GPP IMS/MTSI service, support IMS service through IMS Control Plane, and support service by complying with MTSI Service Policy regulations. If IMS/MTSI service is not supported, various user plane instantiation services can be supported through Network Assistance function for Uplink service.
- FIG. 7 shows an example of a FLUS structure for Uplink service.
- the FLUS structure may include the Source and Sink described in FIG. 6 .
- a Source according to embodiments may correspond to a UE.
- Sink according to embodiments may correspond to a UE or a network.
- Uplink is composed of Source and Sink according to media creation and delivery goals, and Source can be a terminal device, UE, and Sink can be another UE or network.
- a Source can receive media content from one or more Capture Devices.
- a Capture Device may or may not be connected as part of a UE. If the sink receiving the media exists in the UE rather than the network, the decoding and rendering functions are included in the UE, and the received media must be delivered to the corresponding function. Conversely, if the sink corresponds to the network, the received media can be delivered to the Processing or Distribution Sub-Function.
- F Link is more specifically Media Source and Sink (F-U end-points), Control Source and Sink (F-C end-points), Remote Controller and Remote Control Target (F-RC end-points) and Assistance Sender and Receiver (F-A end-points). All of these source sinks are classified as logical functions. Therefore, the corresponding functions may exist on the same physical device or may not exist on the same device due to separation of functions.
- Each function can also be separated into multiple physical devices and connected by different interfaces.
- Multiple F-A and F-RC points can exist in a single FLUS Source. Each point is independent of FLUS Sink and can be created according to Offered Service. As described above, the F Link Point assumes the security function of all sub-functions and links that exist in the F Point, and the corresponding authentication process can be included.
- FIG. 8 shows a point cloud data processing system according to embodiments.
- the point cloud processing system 1500 shown in FIG. 8 obtains, encodes, and transmits point cloud data (e.g., a BS or UE described in FIGS. 1 to 7) and receives and decodes video data. It may include a receiving device (eg, the UE described in FIGS. 1 to 7) that acquires point cloud data.
- point cloud data may be acquired through a process of capturing, synthesizing, or generating point cloud data (S1510).
- 3D position (x, y, z)/property (color, reflectance, transparency, etc.) data eg, PLY (Polygon File format or the Stanford Triangle format) file, etc.
- PLY Polygon File format or the Stanford Triangle format
- V-PCC Video-based Point Cloud Compression
- G-PCC Geometry-based Point Cloud Compression
- a geometry stream may be generated by reconstructing and encoding positional information of points, and an attribute stream may be generated by reconstructing and encoding attribute information (eg, color) associated with each point.
- attribute information eg, color
- V-PCC it is compatible with 2D video, but requires more data (e.g., geometry video, attribute video, occupancy map video) than G-PCC to recover V-PCC processed data. And additional information (auxiliary information) is required, so a longer delay time may occur when providing a service.
- One or more output bit streams may be encapsulated in the form of a file (for example, a file format such as ISOBMFF) together with related metadata and transmitted through a network or a digital storage medium (S1530).
- point cloud related metadata itself may be encapsulated in a file.
- a device or processor (for example, the processor 911 or processor 921 described in FIG. 2, the upper layer processor, or the Sink described in FIG. 6 or the XR Media Processing block included in the Sink) according to the embodiments is
- the received video data is decapsulated to obtain one or more bit streams and related meta data, and the obtained bit streams are decoded using a V-PCC or G-PCC method to restore 3D point cloud data. It can (S1540).
- a renderer for example, the sink described in FIG. 6 or an XR rendering block included in the sink
- a device or processor may perform a feedback process of transferring various feedback information acquired in a rendering/display process to a transmitting device or a decoding process (S1560).
- Feedback information may include head orientation information, viewport information indicating a region currently viewed by a user, and the like. Since the interaction between the user and the service (or content) provider is made through the feedback process, the device according to the embodiments can provide various services considering higher user convenience, as well as the above-described V-PCC or By using the G-PCC method, there is a technical effect of providing a faster data processing speed or enabling a clear video composition.
- FIG 9 shows an example of a point cloud data processing device according to embodiments.
- FIG. 9 shows a device that performs a point cloud data processing operation according to the G-PCC scheme.
- the point cloud data processing apparatus shown in FIG. 9 is the UE described in FIGS. 1 to 7 (for example, the processor 911 or processor 921 described in FIG. 2, the processor for processing higher layer data, or the processor described in FIG. 6).
- a point cloud data processing apparatus includes a point cloud acquisition unit (Point Cloud Acquisition), a point cloud encoding unit (Point Cloud Encoding), a file / segment encapsulation unit (File / Segment Encapsulation), and / or a delivery unit (Delivery) included.
- Each component of the processing device may be a module/unit/component/hardware/software/processor or the like.
- the geometry, attribute, auxiliary data, and mesh data of the point cloud can be configured as separate streams or stored in different tracks in a file. Furthermore, it may be included in a separate segment.
- the Point Cloud Acquisition unit acquires a point cloud.
- point cloud data may be acquired through a process of capturing, synthesizing, or generating a point cloud through one or more cameras.
- point cloud data including the 3D position of each point (which can be expressed as x, y, z position values, etc., hereinafter referred to as geometry) and the properties of each point (color, reflectance, transparency, etc.) It can be obtained and can be generated as, for example, a PLY (Polygon File format or the Stanford Triangle format) file including it.
- PLY Polygon File format or the Stanford Triangle format
- point cloud-related metadata for example, metadata related to capture, etc.
- point cloud-related metadata for example, metadata related to capture, etc.
- the Point Cloud Encoding unit performs a Geometry-based Point Cloud Compression (G-PCC) procedure, which performs a series of procedures such as prediction, transformation, quantization, and entropy coding, and the encoded data ( Encoded video/video information) may be output in the form of a bitstream.
- G-PCC Geometry-based Point Cloud Compression
- the point cloud encoding unit may divide point cloud data into geometry (or geometry information) and attribute (attribute information) and encode them respectively. Also, the encoded geometry information and attribute information may be output as bitstreams, respectively. Output bitstreams may be multiplexed into one bitstream.
- the point cloud encoding unit may receive metadata. Metadata represents metadata related to content for Point Cloud. For example, there may be initial viewing orientation metadata.
- Metadata indicates whether the point cloud data indicates forward or backward data, and the like.
- the point cloud encoding unit may receive orientation information and/or viewport information. Encoding may be performed based on point cloud encoding unit metadata, orientation information, and/or viewport information.
- a non-stream output from the point cloud encoding unit according to embodiments may include point cloud related metadata.
- the Point Cloud Encoding unit according to embodiments performs geometry compression, attribute compression, auxiliary data compression, and mesh data compression.
- Geometry compression encodes the geometry information of point cloud data. Geometry (or geometry information) represents points (or location information of each point) on a 3D space.
- Attribute compression encodes attributes of point cloud data. Attributes (or attribute information) represent attributes of each point (for example, attributes such as color and reflectance).
- Attribute compression can process one or more attributes of one or more points.
- Auxiliary data compression encodes auxiliary data associated with a point cloud.
- Auxiliary data represents metadata about a Point Cloud.
- Mesh data compression encodes Mesh data.
- Mesh data represents connection information between point clouds.
- Mesh data according to embodiments may include mesh data representing a triangular shape.
- the Point Cloud encoding unit encodes the geometry, attributes, auxiliary data, and mesh data related to the point, which are information necessary for rendering the point.
- the point cloud encoding unit can encode geometry, attributes, auxiliary data, and mesh data and transmit them as a single bitstream.
- the point cloud encoding unit may encode geometry, attribute, auxiliary data, and mesh data, respectively, and output one or more bitstreams or encoded data for transmitting the encoded data (for example, a geometry bitstream , attribute bitstream, etc.).
- Each operation of the point cloud encoding unit may be performed in parallel.
- the file/segment encapsulation unit performs media track encapsulation and/or metadata track encapsulation.
- the file/segment encapsulation unit creates tracks to deliver encoded geometry (geometry information), encoded attributes, encoded auxiliary data, and encoded mesh data in a file format.
- a bitstream including encoded geometry (geometry information), a bitstream including encoded attributes, a bitstream including encoded Auxiliary data, and a bitstream including encoded Mesh data are assigned to one or more tracks.
- the file/segment encapsulation unit encapsulates geometry (geometry information), attributes, auxiliary data, and mesh data into one or more media tracks.
- the file/segment encapsulation unit includes metadata in a media track or encapsulates metadata in a separate metadata track.
- the file/segment encapsulation unit encapsulates the point cloud stream(s) in the form of files and/or segments. When the point cloud stream(s) is encapsulated in the form of segment(s) and delivered, it is delivered in DASH format.
- the file/segment encapsulation unit delivers the file when encapsulating the point cloud stream(s) in the form of a file.
- the delivery unit may deliver a point cloud bitstream or a file/segment including the corresponding bitstream to a receiving unit of a receiving device through a digital storage medium or a network. For transmission, processing according to any transmission protocol may be performed. Data that has been processed for transmission can be delivered through a broadcasting network and/or broadband. These data may be delivered to the receiving side in an on-demand manner. Digital storage media may include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, and SSD.
- the delivery unit may include an element for generating a media file through a predetermined file format, and may include an element for transmission through a broadcasting/communication network. The delivery unit receives orientation information and/or viewport information from the receiver.
- the delivery unit may deliver the acquired orientation information and/or viewport information (or information selected by the user) to the file/segment encapsulation unit and/or the point cloud encoding unit.
- the point cloud encoding unit may encode all point cloud data or encode point cloud data indicated by the orientation information and/or viewport information.
- the file/segment encapsulation unit may encapsulate all point cloud data or encapsulate point cloud data indicated by the orientation information and/or viewport information.
- the delivery unit may deliver all point cloud data or point cloud data indicated by the orientation information and/or the viewport information.
- FIG. 10 shows an example of a point cloud data processing device according to embodiments.
- FIG. 10 shows an example of a device performing an operation of receiving and processing point cloud data processed according to the G-PCC method.
- the device of FIG. 10 may process data in a manner corresponding to the manner described in FIG. 9 .
- the point cloud data processing device shown in FIG. 10 corresponds to or may be included in the UE described in FIGS. 1 to 10 (for example, the processor 911 or processor 921 described in FIG. 2 or described in FIG. 8 ). Sink or the XR Media Processing block included with the Sink, etc.).
- Point Cloud data processing device includes a delivery client, a sensing/tracking unit, a file/segment decapsulation unit, a Point Cloud decoding unit ) and/or Point Cloud Rendering, and a display.
- Each configuration of the receiving device may be a module/unit/component/hardware/software/processor or the like.
- a delivery client may receive point cloud data, a point cloud bitstream, or a file/segment including the corresponding bitstream transmitted by the point cloud data processing device described in FIG. 9 .
- the device of FIG. 10 may receive point cloud data through a broadcasting network or point cloud data through a broadband.
- point cloud video data may be received through a digital storage medium.
- the device of FIG. 10 may perform a process of decoding the received data and rendering it according to the user's viewport.
- the apparatus of FIG. 10 may include a reception processing unit (eg, the processor 911 of FIG. 2 ) for processing the received point cloud data according to a transmission protocol.
- the reception processing unit may perform the reverse process of the transmission processing unit so as to correspond to processing for transmission performed on the transmission side.
- the reception processing unit can deliver the acquired point cloud data to the decapsulation processing unit, and the acquired point cloud related metadata to the metadata parser.
- a sensing/tracking unit obtains orientation information and/or viewport information.
- the sensing/tracking unit may transmit the obtained orientation information and/or viewport information to a delivery client, a file/segment decapsulation unit, and a point cloud decoding unit.
- the delivery client may receive all point cloud data based on the orientation information and/or viewport information or point cloud data indicated by the orientation information and/or viewport information.
- the file/segment decapsulation unit may decapsulate all point cloud data or point cloud data indicated by the orientation information and/or viewport information based on the orientation information and/or the viewport information.
- the point cloud decoding unit may decode all point cloud data or decode point cloud data indicated by the orientation information and/or viewport information, based on the orientation information and/or viewport information.
- the file/segment decapsulation unit performs media track decapsulation and/or metadata track decapsulation.
- the decapsulation processing unit (file/segment decapsulation) may decapsulate point cloud data in the form of a file received from the reception processing unit.
- the decapsulation processor may decapsulate files or segments according to ISOBMFF and the like to obtain a point cloud bitstream or point cloud related metadata (or a separate metadata bitstream).
- the acquired point cloud bitstream can be delivered to the point cloud decoder, and the acquired metadata (or metadata bitstream) related to the point cloud can be delivered to the metadata processing unit.
- a point cloud bitstream may contain metadata (metadata bitstream).
- the metadata processing unit may be included in the point cloud video decoder or configured as a separate component/module.
- Point cloud-related metadata acquired by the decapsulation processing unit may be in the form of a box or track in a file format.
- the decapsulation processing unit may receive metadata necessary for decapsulation from the metadata processing unit, if necessary. Metadata related to the point cloud may be transmitted to the point cloud decoder and used for the point cloud decoding procedure, or may be transmitted to the renderer and used for the point cloud rendering procedure.
- the Point Cloud Decoding unit performs geometry decompression, attribute decompression, auxiliary data decompression, and/or mesh data decompression.
- the Point Cloud decoder can decode data by receiving a bitstream and performing an operation corresponding to the operation of the Point Cloud encoder. In this case, the point cloud decoder can decode the point cloud data by dividing it into geometry and attributes as will be described later.
- a Point Cloud decoder can restore (decode) geometry from a geometry bitstream contained within an input bitstream, and restore (decode) attribute values based on an attribute bitstream contained within the input bitstream and the restored geometry. )can do.
- the mesh may be restored (decoded) based on the mesh bitstream included in the input bitstream and the restored geometry.
- Point cloud can be restored by restoring the position of each point in 3D and the attribute information of each point based on the position information according to the restored geometry and the (color) texture attribute according to the decoded attribute value.
- Each operation of the point cloud decoding unit may be performed in parallel.
- Geometry decompression decodes the geometry data from the point cloud stream(s). Attribute decompression decodes attribute data from the point cloud stream(s). Auxiliary data decompression decodes Auxiliary data from point cloud stream(s). Mesh data decompression decodes the mesh data from the point cloud stream(s).
- the Point Cloud Rendering unit restores the position of each point in the point cloud and attributes of the point based on the decoded geometry, attribution, auxiliary data, and mesh data, and renders the corresponding point cloud data. .
- the point cloud rendering unit generates and renders mesh (connection) data between point clouds based on the restored geometry, the restored attributes, the restored auxiliary data, and/or the restored mesh data.
- the point cloud rendering unit receives metadata from the file/segment encapsulation unit and/or the point cloud decoding unit.
- the point cloud rendering unit may render point cloud data based on metadata according to an orientation or a viewport.
- the device of FIG. 10 may include a display.
- a display according to embodiments may display a rendered result.
- FIG. 11 shows an example of a point cloud data processing device according to embodiments.
- the point cloud data processing apparatus shown in FIG. 11 is the UE described in FIGS. 1 to 8 (for example, the processor 911 or processor 921 described in FIG. 2, or the sink described in FIG. 6 or XR Media included in the sink) Processing block, etc.) or BS, or may correspond to the UE.
- Point Cloud data processing apparatus includes Point Cloud Acquisition, Patch Generation, Geometry Image Generation, Attribute Image Generation, Accupancy Map Generation Occupancy Map Generation, Auxiliary Data Generation, Mesh Data Generation, Video Encoding, Image Encoding, File/Segment Encapsulation ), including the delivery department.
- patch generation, geometry image generation, attribute image generation, accupancy map generation, auxiliary data generation, and mesh data generation are referred to as point cloud pre-processing, pre-processor, or controller. can do.
- Video encoding unit includes Geometry video compression, Attribute video compression, Occupancy map compression, Auxiliary data compression, and Mesh data compression do.
- the image encoding unit includes Geometry video compression, Attribute video compression, Occupancy map compression, Auxiliary data compression, and Mesh data compression do.
- the file/segment encapsulation unit includes video track encapsulation, metadata track encapsulation, and image encapsulation.
- Each configuration of the transmission device may be a module/unit/component/hardware/software/processor or the like.
- the geometry, attribute, auxiliary data, and mesh data of the point cloud can be configured as separate streams or stored in different tracks in a file. Furthermore, it may be included in a separate segment.
- the Point Cloud Acquisition unit acquires a point cloud.
- point cloud data may be acquired through a process of capturing, synthesizing, or generating a point cloud through one or more cameras.
- point cloud data including the 3D position of each point (which can be expressed as x, y, z position values, etc., hereinafter referred to as geometry) and the properties of each point (color, reflectance, transparency, etc.) It can be obtained and can be generated as, for example, a PLY (Polygon File format or the Stanford Triangle format) file including it.
- PLY Polygon File format or the Stanford Triangle format
- point cloud-related metadata for example, metadata related to capture, etc.
- point cloud-related metadata for example, metadata related to capture, etc.
- a patch generation or patch generator generates patches from point cloud data.
- a patch generator generates point cloud data or point cloud video into one or more pictures/frames.
- a picture/frame may generally mean a unit representing one image in a specific time period.
- the points constituting the point cloud video are one or more patches (a set of points constituting a point cloud, points belonging to the same patch are adjacent to each other in the 3D space, and in the process of mapping to a 2D image, among the planes of the 6-sided bounding box Occupancy, which is a binary map that indicates whether data exists at the corresponding location on the 2D plane as a value of 0 or 1 when divided into a set of points mapped in the same direction) and mapped on a 2D plane You can create map pictures/frames.
- a geometry picture/frame which is a picture/frame in the form of a depth map that expresses the location information (geometry) of each point constituting the point cloud video in a patch unit
- a texture picture/frame which is a picture/frame that expresses the color information of each point constituting a point cloud video in units of patches
- metadata needed to reconstruct a point cloud from individual patches can be generated, which can include patch information such as the location and size of each patch in 2D/3D space.
- the patch can be used for 2D image mapping.
- point cloud data can be projected onto each face of a cube.
- a geometry image, one or more attribute images, an accupancy map, auxiliary data, and/or mesh data may be generated based on the generated patch.
- Geometry Image Generation Geometry Image Generation, Attribute Image Generation, Occupancy Map Generation, Auxiliary Data Generation and/or Mesh by a pre-processor or controller Data generation (Mesh Data Generation) is performed.
- Geometry Image Generation generates a geometry image based on a result of patch generation.
- a geometry represents a point in a three-dimensional space.
- a geometry image is generated using an accupancy map including information related to the 2D image packing of the patch, auxiliary data (patch data), and/or mesh data.
- the geometry image is related to information such as depth (e.g., near, far) of the patch generated after patch generation.
- Attribute Image Generation creates an attribute image.
- an attribute may indicate a texture.
- the texture may be a color value matched to each point.
- a plurality of (N) attributes attributes (attributes such as color and reflectance) images including textures may be generated.
- a plurality of attributes may include material (information about a material), reflectance, and the like.
- an attribute may additionally include information that can change color by time and light, even in the same texture.
- Occupancy Map Generation generates an Occupancy Map from patches.
- the accupancy map includes information indicating whether data exists in a pixel of a corresponding geometry or attribute image.
- Auxiliary Data Generation generates auxiliary data including patch information. That is, Auxiliary data represents metadata about patches of Point Cloud objects. For example, information such as a normal vector for a patch may be indicated. Specifically, according to embodiments, auxiliary data may include information necessary for reconstructing a point cloud from patches (eg, information about the position and size of a patch in 2D/3D space, projection normal ) identification information, patch mapping information, etc.)
- Mesh Data Generation generates mesh data from patches.
- Mesh represents connection information between adjacent points.
- triangular data may be represented.
- mesh data according to embodiments means connectivity information between points.
- the point cloud pre-processor or control unit generates metadata related to patch generation, geometry image generation, attribute image generation, accupancy map generation, auxiliary data generation, and mesh data generation.
- the point cloud transmission device performs video encoding and/or image encoding in response to the result generated by the pre-processor.
- the point cloud transmission device may generate point cloud image data as well as point cloud video data.
- point cloud data includes only video data, only image data, and/or both video data and image data. there may be
- the video encoding unit performs geometry video compression, attribute video compression, accupancy map compression, auxiliary data compression, and/or mesh data compression.
- the video encoding unit generates video stream(s) containing each encoded video data.
- geometry video compression encodes point cloud geometry video data.
- Attribute video compression encodes attribute video data in a point cloud.
- Auxiliary data compression encodes auxiliary data associated with point cloud video data.
- Mesh data compression encodes the mesh data of point cloud video data. Each operation of the point cloud video encoding unit may be performed in parallel.
- the image encoding unit performs geometry image compression, attribute image compression, accupancy map compression, auxiliary data compression, and/or mesh data compression.
- the image encoding unit generates image(s) including each encoded image data.
- geometry image compression encodes point cloud geometry image data.
- Attribute image compression encodes the attribute image data of a point cloud.
- Auxiliary data compression encodes auxiliary data associated with point cloud image data.
- Mesh data compression encodes mesh data associated with point cloud image data. Each operation of the point cloud image encoding unit may be performed in parallel.
- the video encoding unit and/or the image encoding unit may receive metadata from the pre-processor.
- the video encoding unit and/or the image encoding unit may perform each encoding process based on metadata.
- a file/segment encapsulation unit encapsulates video stream(s) and/or image(s) in the form of a file and/or segment.
- the file/segment encapsulation unit performs video track encapsulation, metadata track encapsulation, and/or image encapsulation.
- Video track encapsulation may encapsulate one or more video streams in one or more tracks.
- Metadata track encapsulation may encapsulate metadata related to a video stream and/or image into one or more tracks. Metadata includes data related to the content of point cloud data. For example, initial viewing orientation metadata may be included. According to embodiments, metadata may be encapsulated in a metadata track, or may be encapsulated together in a video track or an image track.
- Image encapsulation may encapsulate one or more images in one or more tracks or items.
- 4 video streams and 2 images may be encapsulated in one file.
- the file/segment encapsulation unit may receive metadata from the pre-processor.
- the file/segment encapsulation unit may perform encapsulation based on metadata.
- the files and/or segments generated by the file/segment encapsulation are transmitted by the point cloud transmission device or transmission unit.
- segment(s) may be delivered based on a DASH-based protocol.
- the delivery unit may deliver a point cloud bitstream or a file/segment including the corresponding bitstream to a receiving unit of a receiving device through a digital storage medium or a network. For transmission, processing according to any transmission protocol may be performed. Data that has been processed for transmission can be delivered through a broadcasting network and/or broadband. These data may be delivered to the receiving side in an on-demand manner. Digital storage media may include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, and SSD.
- the delivery unit may include an element for generating a media file through a predetermined file format, and may include an element for transmission through a broadcasting/communication network. The delivery unit receives orientation information and/or viewport information from the receiver.
- the delivery unit may deliver the acquired orientation information and/or viewport information (or information selected by the user) to a pre-processor, a video encoding unit, an image encoding unit, a file/segment encapsulation unit, and/or a point cloud encoding unit.
- the point cloud encoding unit may encode all point cloud data or encode point cloud data indicated by the orientation information and/or viewport information.
- the file/segment encapsulation unit may encapsulate all point cloud data or encapsulate point cloud data indicated by the orientation information and/or viewport information.
- the delivery unit may deliver all point cloud data or point cloud data indicated by the orientation information and/or the viewport information.
- the pre-processor may perform the above-described operation for all point cloud data or for point cloud data indicated by orientation information and/or viewport information.
- the video encoding unit and/or the image encoding unit may perform the above-described operation for all point cloud data or may perform the above-described operation for point cloud data indicated by orientation information and/or viewport information.
- the file/segment encapsulation unit may perform the above-described operation for all point cloud data or for point cloud data indicated by orientation information and/or viewport information.
- the transmitter may perform the above-described operation for all point cloud data or for point cloud data indicated by orientation information and/or viewport information.
- FIG. 12 shows an example of a point cloud data processing device according to embodiments.
- the point cloud data processing apparatus shown in FIG. 12 may process data in a method corresponding to the method described in FIG. 11 .
- the point cloud data processing apparatus shown in FIG. 12 corresponds to the UE described in FIGS. 1 to 8 or may be included in the UE (for example, the processor 911 or processor 921 described in FIG. 2), which processes higher layer data. processor, or the Sink described in FIG. 6 or the XR Media Processing block included in the Sink).
- Point Cloud data processing apparatus includes a delivery client, a sensing/tracking unit, a file/segment decapsulation unit, a video decoding unit, It includes an image decoding unit, point cloud processing and/or point cloud rendering unit, and a display.
- the video decoding unit performs geometry video decompression, attribute video decompression, occupancy map decompression, auxiliary data decompression, and/or mesh data decompression.
- the image decoding unit performs geometry image decompression, attribute image decompression, occupancy map decompression, auxiliary data decompression, and/or mesh data decompression.
- Point cloud processing includes Geometry Reconstruction and Attribute Reconstruction.
- a delivery client may receive point cloud data, a point cloud bitstream, or a file/segment including the corresponding bitstream transmitted by the point cloud data processing apparatus according to the embodiments of FIG. 13 .
- the device of FIG. 14 may receive point cloud data through a broadcasting network or point cloud data through a broadband.
- point cloud video data may be received through a digital storage medium.
- the device of FIG. 14 may perform a process of decoding the received data and rendering it according to the user's viewport.
- the device of FIG. 14 may include a reception processing unit (eg, the processor 911 of FIG. 2 ) although not shown in the drawing.
- the reception processing unit according to embodiments may perform processing according to a transport protocol on the received point cloud data.
- the receiving processing unit may perform the reverse process of the above-described transmission processing unit so as to correspond to processing for transmission performed on the transmission side.
- the reception processing unit can deliver the acquired point cloud data to the decapsulation processing unit, and the acquired point cloud related metadata to the metadata parser.
- a sensing/tracking unit obtains orientation information and/or viewport information.
- the sensing/tracking unit may transmit the obtained orientation information and/or viewport information to a delivery client, a file/segment decapsulation unit, and a point cloud decoding unit.
- the delivery client may receive all point cloud data based on the orientation information and/or viewport information or point cloud data indicated by the orientation information and/or viewport information.
- the file/segment decapsulation unit may decapsulate all point cloud data or point cloud data indicated by the orientation information and/or viewport information based on the orientation information and/or the viewport information.
- the point cloud decoding unit (video decoding unit and/or image decoding unit) may decode all point cloud data or decode point cloud data indicated by the orientation information and/or viewport information, based on the orientation information and/or viewport information.
- the point cloud processing unit may process all point cloud data or process point cloud data indicated by orientation information and/or viewport information.
- the file/segment decapsulation unit performs video track decapsulation, metadata track decapsulation, and/or image decapsulation.
- the decapsulation processing unit may decapsulate point cloud data in the form of a file received from the reception processing unit.
- the decapsulation processor may decapsulate files or segments according to ISOBMFF and the like to obtain a point cloud bitstream or point cloud related metadata (or a separate metadata bitstream).
- the acquired point cloud bitstream can be delivered to the point cloud decoder, and the acquired metadata (or metadata bitstream) related to the point cloud can be delivered to the metadata processing unit.
- a point cloud bitstream may contain metadata (metadata bitstream).
- the metadata processing unit may be included in the point cloud video decoder or configured as a separate component/module.
- Point cloud-related metadata acquired by the decapsulation processing unit may be in the form of a box or track in a file format.
- the decapsulation processing unit may receive metadata necessary for decapsulation from the metadata processing unit, if necessary. Metadata related to the point cloud may be transmitted to the point cloud decoder and used for the point cloud decoding procedure, or may be transmitted to the renderer and used for the point cloud rendering procedure.
- the file/segment decapsulation unit may generate metadata related to point cloud data.
- Video Track Decapsulation decapsulates video tracks contained in files and/or segments. Decapsulate video stream(s) including geometry video, attribute video, accupancy map, Auxiliary data and/or Mesh data.
- Metadata track decapsulation decapsulates a bitstream including metadata and/or additional data related to point cloud data.
- Image decapsulation decapsulates image(s) including geometry image, attribute image, accupancy map, auxiliary data and/or mesh data.
- the video decoding unit performs geometry video decompression, attribute video decompression, accupancy map decompression, auxiliary data decompression, and/or mesh data decompression.
- the video decoding unit decodes geometry video, attribute video, auxiliary data, and/or mesh data in response to a process performed by the video encoding unit of the point cloud transmission device according to embodiments.
- the image decoding unit performs geometry image decompression, attribute image decompression, accupancy map decompression, auxiliary data decompression, and/or mesh data decompression.
- the image decoding unit decodes the geometry image, the attribute image, the auxiliary data and/or the mesh data in response to the process performed by the image encoding unit of the point cloud transmission device according to the embodiments.
- the video decoding unit and/or the image decoding unit may generate metadata related to video data and/or image data.
- the point cloud processing unit performs geometry reconstruction and/or attribute reconstruction.
- the geometry reconstruction reconstructs a geometry video and/or a geometry image from decoded video data and/or decoded image data based on an accupancy map, auxiliary data, and/or mesh data.
- Attribute Reconstruction reconstructs an attribute video and/or an attribute image from a decoded attribute video and/or a decoded attribute image based on an accupancy map, auxiliary data, and/or mesh data.
- the attribute may be a texture.
- an attribute may mean a plurality of attribute information.
- the point cloud processing unit may receive metadata from the video decoding unit, the image decoding unit and/or the file/segment decapsulation unit, and process the point cloud based on the metadata.
- the point cloud rendering unit renders the reconstructed point cloud.
- the point cloud rendering unit may receive metadata from the video decoding unit, the image decoding unit and/or the file/segment decapsulation unit, and render the point cloud based on the metadata.
- the device of FIG. 12 may include a display.
- a display according to embodiments may display a rendered result.
- FIG. 13 shows an example of a point cloud data processing device according to embodiments.
- An apparatus for processing point cloud data includes a data input unit 12000, a quantization processing unit 12001, a voxelization processing unit 12002, an octree occupancy code generation unit 12003, a four-surface model processing unit 12004, intra/inter coding processing unit 12005, arithmetic coder 12006, metadata processing unit 12007, color conversion processing unit 12008, attribute conversion processing unit 12009, prediction/lifting/RAHT conversion processing unit 12010, arithmetic coder 12011 and/or Alternatively, the transmission processing unit 12012 may be included.
- the data input unit 12000 receives or acquires point cloud data.
- the data input unit 12000 may correspond to the point cloud acquisition unit 10001 of FIG. 1 according to embodiments.
- the quantization processing unit 12001 quantizes the geometry of point cloud data, for example, position value information of points.
- the voxelization processing unit 12002 according to the exemplary embodiments voxels position value information of quantized points.
- the octree occupancy code generation unit 12003 may represent position value information of voxelized points as an octree based on an octree occupancy code.
- the four-surface model processing unit 12004 may express and process an octree for location value information of points of a point cloud based on a surface model method.
- the intra/inter coding processing unit 12005 may intra/inter code the point cloud data.
- the arithmetic coder 12006 may encode point cloud data based on an arithmetic coding scheme.
- the metadata processing unit 12007 processes metadata related to point cloud data, eg, setting values, and provides them to a necessary process such as a geometry encoding process and/or an attribute encoding process. Also, the metadata processing unit 12007 according to embodiments may generate and/or process signaling information related to geometry encoding and/or attribute encoding. Signaling information according to embodiments may be encoded separately from geometry encoding and/or attribute encoding. Also, signaling information according to embodiments may be interleaved.
- the color conversion processing unit 12008 may convert colors of point cloud data based on attributes of the point cloud data, for example, attribute value information and/or reconstructed position values of points.
- the attribute conversion processing unit 12009 may convert attribute values of point cloud data.
- the prediction/lifting/RAHT conversion processing unit 12010 may attribute-code point cloud data based on a combination of a prediction method, a lifting method, and/or a RAHT method.
- the arithmetic coder 12011 may encode point cloud data based on an arithmetic coding scheme.
- the transmission processing unit 12012 transmits each bitstream including encoded geometry information and/or encoded attribute information and metadata information, or transmits encoded geometry information and/or encoded attribute information and metadata Information can be configured and transmitted as one bit stream.
- encoded geometry information,/or encoded attribute information, and meta data information according to embodiments are composed of one bitstream, the bitstream may include one or more sub-bitstreams.
- a bitstream includes a Sequence Parameter Set (SPS) for signaling at the sequence level, a Geometry Parameter Set (GPS) for signaling of geometry information coding, an Attribute Parameter Set (APS) for signaling of attribute information coding, and a tile It may include signaling information and slice data including TPS (Tile Parameter Set) for level signaling.
- Slice data may include information about one or more slices.
- One slice according to embodiments may include one geometry bitstream Geom00 and one or more attribute bitstreams Attr00 and Attr10.
- a TPS according to embodiments may include information about each tile (for example, coordinate value information and height/size information of a bounding box) for one or more tiles.
- a geometry bitstream may include a header and a payload.
- the header of the geometry bitstream may include identification information (geom_geom_parameter_set_id) of a parameter set included in GPS, a tile identifier (geom_tile id), a slice identifier (geom_slice_id), and information about data included in a payload.
- the metadata processing unit 12007 may generate and/or process signaling information and transmit it to the transmission processing unit 12012.
- a process for position values of points and a process for attribute values of points may perform each process by sharing data/information with each other.
- FIG. 14 shows an example of a point cloud data processing device according to embodiments.
- FIG. 14 shows an example of a device that performs a point cloud data processing operation according to the G-PCC scheme described in FIG. 10 .
- the point cloud data processing apparatus shown in FIG. 14 may perform the reverse process of the point cloud data processing apparatus described in FIG. 13 .
- An apparatus for processing point cloud data includes a receiving unit 13000, a receiving processing unit 13001, an arithmetic decoder 13002, an octree reconstruction processing unit 13003 based on an Occupancy code, and a surface model processing unit (triangle reconstruction, up-sampling, voxel). image) 13004, inverse quantization processing unit 13005, metadata parser 13006, arithmetic decoder 13007, inverse quantization processing unit 13008, prediction/lifting/RAHT inverse transformation processing unit 13009, color inverse transformation processing unit 13010 and/or a renderer 13011.
- the receiving unit 13000 receives point cloud data.
- the reception processing unit 13001 may obtain a geometry bitstream and/or an attribute bitstream included in the received point cloud data, metadata including signaling information, and the like.
- the arithmetic decoder 13002 may decode a geometry bitstream based on an arithmetic scheme.
- the octree reconstruction processing unit 13003 based on the occupancy code may reconstruct the decoded geometry into an octree based on the occupancy code.
- the surface model processing unit 13004 performs triangle reconstruction, up-sampling, voxelization, and/or a combination thereof on point cloud data based on a surface model method. processing can be performed.
- the inverse quantization processing unit 13005 may inverse quantize point cloud data.
- the metadata parser 13006 may parse metadata included in the received point cloud data, for example, setting values.
- the metadata parser 13006 can pass metadata to each of the geometry decoding process and/or attribute decoding process. Each process according to embodiments may be performed based on necessary metadata.
- the arithmetic decoder 13007 may decode an attribute bitstream of point cloud data in an arithmetic manner based on a reconstructed position value.
- the inverse quantization processing unit 13008 may inverse quantize point cloud data.
- the prediction/lifting/RAHT inverse transform processing unit 13009 may process point cloud data based on a prediction/lifting/RAHT method and/or a method according to a combination thereof.
- the inverse color transformation processing unit 13010 may inversely transform color values of point cloud data.
- the renderer 13011 may render point cloud data.
- 15 shows a transmission structure for a UE on a visited network according to embodiments.
- 3GPP The 3rd Generation Partnership Project
- the Multimedia Division establishes and distributes standards for transmitting and receiving media by defining protocols related to media codecs.
- the definition of media and transmission scenarios covers a wide range. This includes cases in which mobile/fixed reception is performed by a personal computer or portable receiver along with Radio Access and Internet-based technology.
- This wide-ranging standard enactment in 3GPP enabled ubiquitous multimedia services to cover various users and use cases, and enable users to quickly experience high-quality media anytime, anywhere.
- media services are classified according to their unique characteristics and are divided into Conversational, Streaming, and other services according to the target application. Conversational Service is extended from Session Initiation Protocol (SIP) based telephone service network.
- SIP Session Initiation Protocol
- the Multimedia Telephony Service for the IP Multimedia Subsystem aims at a low-latency real-time conversation service.
- Streaming service delivers real-time or re-acquired content in Unicast based on Packet Switched Service (PSS).
- PSS Packet Switched Service
- broadcasting services within the PSS system can use mobile TV through Multimedia Broadcast/Multicast Service (MBMS).
- MBMS Multimedia Broadcast/Multicast Service
- 3GPP provides Messaging or reality services.
- the three basic services described above are continuously being revised or updated in order to satisfy the highest possible user experience, and provide scalability so that they can be mutually compatible with available network resources or existing standards.
- Media includes video codec, voice, audio, image, graphic, and text corresponding to each service.
- IMS IP Multimedia Subsystem
- IETF Internet Engineering Task Force
- IMS is used as the basic protocol of SIP protocol, and it manages multimedia sessions efficiently through this.
- MTSI Multimedia Telephony Service for IMS
- MTSI includes not only Signaling, Transport, Jitter Buffer, Management, Packet-Loss Handling, and Adaptation, but also Adding/Dropping Media During Call, etc., so that predictable media can be created, transmitted, and received.
- MTSI uses the 3GPP network
- NR, LTE, and HSPA are connected to IMS
- Wi-Fi and Bluetooth are also extended and connected.
- MTSI transmits and receives data negotiation messages to the existing IMS network, and has a structure in which data is transmitted between users when transmission and reception are completed. Therefore, the IMS network can be equally used, and MTSI additionally defines only Audio Encoder/Decoder, Video Encoder/Decoder, Text, Session Setup and Control, and Data Channel.
- Data Channel Capable MTSI represents an enabling channel to support media transmission and uses Stream Control Transmission Protocol (SCTP) over Datagram Transport Layer Security (DTLS) and Web Real-Time Communication (WebRTC).
- SCTP Stream Control Transmission Protocol
- DTLS Datagram Transport Layer Security
- WebRTC Web Real-Time Communication
- SCTP is used to provide security services between Network Layer and Transport Layer of TCP. Since it is extended from the existing platform, it defines Media Control Data as well as Media Control and Media Codec for media management, and general control is handled through Media Streaming Setup through SIP/SDP. Since Setup/Control is passed between clients, Adding/Dropping of media is also included. MTSI also includes IMS Messaging, a non-conversational service. Media is carried over 3GPP Layer 2 using the Packet Data Convergence Protocol (PDCP). PDCP delivers IP packets from the client to the base station and generally performs user plane data, control plane data, header compression, and ciphering/protection.
- PDCP Packet Data Convergence Protocol
- UE 15 is a transmission structure in which a call session can be transferred between two UEs existing in an arbitrary visited network when User Equipment (UE) A/B exists.
- UE A/B may exist in operator A or B or the same network, and it is assumed that four other networks exist to describe the entire system of MTSI.
- UE A and B perform session establishment to transmit media within the IMS system. After the session is established, UE A and B transmit media through the IP network.
- the main function of IMS is the Call State Control Function (CSCF), which manages multimedia sessions using SIP.
- CSCF Call State Control Function
- Each CSCF plays the role of server or proxy and performs different types of functions according to each purpose.
- Proxy CSCF acts as a SIP proxy server.
- the P-CSCF internally analyzes and transmits SIP messages in order to receive all SIP messages and deliver them to the UE to transmit.
- P-CSCF can perform resource management and is closely connected to the gateway of the network.
- the gateway is associated with the IP access bearer General Packet Radio Service (GPRS).
- GPRS General Packet Radio Service
- GPRS is a second-generation wireless system, it is linked with basic functions to support PS services.
- P-CSCF and GPRS must be in the same network.
- UE A exists in any Visited Network, and UE A and P-CSCF exist in the network.
- S-CSCF Serving CSCF
- HSS Home Subscriber Server
- the S-CSCF can receive the message and connect to another CSCF in the vicinity or connect to the Application Server (AS) and forward the SIP message to another AS.
- Interrogating CSCF (I-CSCF) performs the same proxy server function as P-CSCF, but is connected to an external network.
- the process of encrypting SIP messages can be performed by observing network availability and network configuration.
- HSS is a central data server that contains user-related information.
- the Subscriber Location Function (SLF) represents an information map linking a user's address to a corresponding HSS.
- the Multimedia Resource Function (MRF) includes multimedia resources in the home network. MRF consists of Multimedia Resource Function Controller (MRFC) and Multimedia Resource Function Processor (MRFP).
- MRFC is a control plane of MRC and plays a control role of managing stream resources within MRFP.
- the Breakout Gateway Control Function (BGCF) is a SIP server that is connected to Public-Switched Telephone Network (PSTN) or Communication Server (CS) and represents a gateway that transmits SIP messages.
- PSTN Public-Switched Telephone Network
- CS Communication Server
- MGWF Media Gateway Control Function
- MGW Media Gateway
- 16 illustrates a call connection between UEs according to embodiments.
- IP connection In an IMS-based network, an environment where IP connection is possible is required, and IP connection is performed in the Home Network or the Visited Network.
- an IP connection When an IP connection is established, an interactive environment composition, which is a detailed element of XR, is formed, and the transmitted data is virtual reality such as 360 Video/G-PCC (Geometry-based Point Cloud Compression)/V-PCC (Video-based Point Cloud Compression). Information in which data is compressed is exchanged or data is transmitted. XR data can be subdivided into two areas and delivered.
- the AS When transmitted based on the MTSI standard, the AS transfers the Call/Hold/Resume method through Route Control Plane signaling using the CSCF mechanism and performs a third party call connection.
- media transmission When a call connection is performed, media transmission is simply transmitted between UEs A and B, and when two UEs exist, MTSI operates as shown in FIG. 16 within the IMS network.
- FIG 17 shows an apparatus for transmitting and receiving point cloud data according to embodiments.
- the video encoder and audio encoder may correspond to the XR device 100c, the encoding in FIG. 8 (S1520), and the point cloud encoder in FIG. 9, FIG. 11, and FIG. 13.
- the video decoder and the audio decoder may correspond to the XR device 100c, the decoding in FIG. 8 (S1540), the point cloud decoder in FIG. 10, FIG. 12, and FIG. 14.
- MTSI limits the relevant elements and connection points of Client Terminals within the IMS network. Therefore, the scope for the configuration is defined as shown in FIG.
- the determination of the physical interaction of synchronization related to the speaker, display, user interface, microphone, camera, and keyboard is not discussed in MTSI.
- the area within box 170 determines the scope of how to control media or related media.
- transmitting SIP corresponds to IMS
- MTSI does not include a part that controls specific SIP. Therefore, the range of MTSI and IMS can be determined according to the data structure, delivery method, and service definition. If it is defined like MTSI, it can be defined as a standard within the following range.
- RFC 4566-based SDP and SDP Capability Negotiation must be used and related Streaming Setup must be used.
- the transmission medium that transmits media must comply with not only Coded Media (to which Transport Protocol is applied) but also Packet-based Network Interface.
- the method of transmitting data uses RTP Stream of RFC 3550, and SCTP (RFC 4960) or WebRTC Data Channel can be used for Data Channel.
- Devices for transmitting and receiving point cloud data may include all devices configured as devices such as mobile phones, desktops, and AR glasses. Assuming that it is a mobile phone, there are a speaker, display, user interface, microphone, camera, and keyboard, and the input signal can be transmitted to the encoding/decoding block.
- Methods/operations according to embodiments may be processed by the video encoder of FIG. 17 . It can be linked with software.
- the G-PCC structure call flow may be included in a session setup & control part.
- Each component of FIG. 17 may correspond to hardware, software, processor, and/or a combination thereof.
- An apparatus for transmitting and receiving point cloud data may support IP connection.
- XR range exists in RAN (Radio Access Network) such as UMTS (Universal Mobile Telecommunications System) and Visited Networks such as SGSN (Serving SPRC Support Node) and GGSN (Gateway GPRS Support Note) roaming service and IP connectivity scenarios should be considered. If IP connectivity is to be considered, IP service must be provided even where the IMS network does not exist, and GPRS (General Packet Radio Service) roaming must also be connected to the home network. If an IMS-based network is provided, End-to-End QoS (Quality of Service) must be provided to maintain IP connectivity.
- RAN Radio Access Network
- UMTS Universal Mobile Telecommunications System
- Visited Networks such as SGSN (Serving SPRC Support Node) and GGSN (Gateway GPRS Support Note) roaming service and IP connectivity scenarios should be considered. If IP connectivity is to be considered, IP service must be provided even where the IMS network does not exist, and GPRS (General Packet Radio Service) roaming must also
- QoS Requirement generally uses SIP (Session Initiation Protocol) to define a session, change a session, or terminate a session, and can deliver the following information: type of media, direction of traffic (upward or downward), Bitrate, Packet Size, Packet Transport Frequency, RTP Payload, Bandwidth Adaptation.
- SIP Session Initiation Protocol
- An apparatus for transmitting and receiving point cloud data may perform IP policy control/secure communication.
- the Policy Control Element can activate a bearer suitable for media traffic through a SIP message, and prevents the operator from using bearer resources incorrectly.
- the IP address and bandwidth of transmission and reception can also be adjusted equally at the bearer level.
- a start or stop point of media traffic can be set using a policy control element, and problems related to synchronization can be solved.
- a acknowledgment message can be transmitted through the IP network using the Policy Control Element, and the Bearer service can be modified, stopped, or terminated.
- Privacy can be requested for the security of the UE.
- An apparatus for transmitting and receiving point cloud data may be associated with other networks.
- the IMS network of any type of terminal should be able to connect various users and networks as much as possible. It can include PSTN or ISDN as well as mobile and Internet users.
- the entity visiting the Visited Network provides service and control information for the user and performs Registration/Session Establishment within the Internet network. In this way, if it exists in the Visited Network, service control restrictions occur, and considerations arise according to multiple roaming model scenarios.
- the quality may deteriorate due to the service speed of the Visited Network.
- a role such as security or charging is added, the area of service control and execution method for the Home Network/Visited Network should be considered.
- the 3GPP standard defines the architecture layered in the IMS network. Therefore, Transport/Bearer are defined separately.
- the application plane generally covers the scope of the application server, the control plane can be divided into HSS, CSCF, BGCF, MRFC, MRFP, SGW, SEG, and the user plane can be divided into SGSN, GGSN, IM-MGW, etc.
- FIG. 18 shows an architecture for XR communication on a 5G network according to embodiments.
- An apparatus for transmitting and receiving point cloud data may efficiently perform XR communication based on a communication network, as shown in FIG. 18 .
- Real-time point cloud two-way conversation using 5G networks can be achieved using three methods. 1) point cloud data exchange using IMS telephone network, 2) point cloud data streaming using 5GMS media network, 3) web-based media transmission method using WebRTC. Therefore, it is necessary to define an XR interactive service scenario to transmit data. Scenarios can be delivered in various forms, and can be divided into the process of acquiring data, the process of all end-to-end services using the 5G network, and the composition of scenarios.
- Application Download In order to proceed with the XR Teleconference, Application Download must be performed in advance.
- a built-in or downloadable application program is required. This program can transmit data by selecting 1) telephone network 2) media network 3) web network as the transmission type of data transmitted through 5G.
- the program When the program is installed, check the access authority of the general device and the account personal information authority to check the basic environment for sending and receiving data.
- Capture equipment of point cloud equipment including a receiving device and a transmitting device for receiving the other party's data, or a converter capable of converting dimensional data into 3D, or any video input capable of transmitting or converting data in 3D at 360 degrees.
- Voice data includes a built-in microphone or speaker, and also includes a check of hardware capabilities to minimally process point cloud data.
- Hardware includes GPU/CPU functions that can perform Pre-Rendering or Post-Rendering, and may include hardware capacity and memory size performed during processing.
- Personal information includes things that can additionally deliver real-time information of users such as account information for accessing applications, IP, and cookies, and use consent is performed to transmit in advance.
- Figure 19 shows a structure for XR communication according to embodiments
- an identifier that can distinguish between user authentication and user is created.
- users are distinguished by using e-mail or ID and password, and the authenticated user's tag is formed by itself.
- a guide mode for an initial user to effectively exchange point cloud data or use a system may be provided.
- it can determine how the field of view can be accessed. If it is a device capable of directly capturing or receiving a point cloud, data can be transmitted and received as it is. If a point cloud is received using an HMD, scaling or transformation suitable for a 360 degree environment must be preceded.
- the received display is a 2D display based on a commonly used mobile phone or monitor rather than a device that receives 3D data, it should be able to faithfully express 3D within a 2D screen.
- a method of rotating or enlarging the screen with a finger it is possible to implement or check a 3D image in a 2D display.
- an avatar In order for a user to express himself in a 3D space, an avatar must be created.
- the avatar can be virtual data by graphic, 3D transformation form of a person or object directly acquired as a point cloud, or audio without any data.
- Avatar expressed in 3D can be modified by the user's definition or selection. For example, a person can change the shape of their face or wear clothes, hats, accessories, etc. that can represent their individuality, and can transform into various forms to express their individuality.
- emotions can be expressed through conversations between people, and emotions can be adjusted according to changes in the face shape of text or graphics.
- the created avatar participates in a virtual space. If it is 1:1 interactive, each data is transmitted to the other party, but the space where the other party receives it also needs to be formed simply. When there are multiple participants, spaces that can be shared by multiple participants must be created, and these spaces can be spaces composed of arbitrary graphics or data spaces directly obtained as point clouds. Depending on the size and situation of the data to be shared, the data can be stored in each device and processed quickly, and if the size of the data is large, it can be stored in the cloud or central server and shared. As the user's avatar, an avatar created in advance using a library or the like may be used. A basic common avatar therefore does not need to be newly created from the user's point of view or to capture and transmit data.
- various objects used in the space may be added according to a user request, and the data may be graphics or data acquired as a point cloud.
- objects can be objects that are easily accessible or familiar in the conference room, such as documents, cups, and laser pointers.
- users composed of respective avatars can participate in the space, and the user can participate in the meeting place by moving his or her avatar to the created space.
- the space is determined by the host in charge of the meeting, and the host can change the space by selecting it. Acquisition of a well-known conference hall in advance can give the effect of attending a company meeting room at home, and obtaining an overseas travel or famous historical site can give the effect of meeting at home at the historic site.
- the space created with virtual random graphics rather than point clouds may vary depending on the idea or implementation method of the space organizer who creates the user's space.
- a user participates in a space, he or she can enter by forming a user profile.
- the user's profile is used to classify the list of conference hall or space participants, and if multiple users participate, it is possible to check whether a conversation is possible and whether the user's receiving status is working correctly.
- the user's name or nickname should be displayed, and whether the user is currently busy or muted should be displayed.
- Space limitations may vary depending on the application constituting the host or server. In an environment where free movement in space is restricted, the user must be able to move to a desired location.
- 20 illustrates a protocol stack for XR interactive service on 3GPP 5G according to embodiments.
- 5G XR media can be transmitted in a variety of ways. 1) point cloud data exchange using IMS telephone network, 2) point cloud data streaming using 5GMS media network, 3) web-based media transmission method using WebRTC, WebRTC method shares two data at the application level .
- each transmission protocol is defined, and transmission and reception must be performed in compliance with the specifications.
- XR Conversational Service needs to add dimension information and data parameters to observe QoS.
- fast data processing and low-latency conversation service are possible because the data is transmitted using the real-time telephone network.
- there is no protocol for recovering errors in transmission escape there is a disadvantage of having to proceed with a conversation by relying on continuous feedback information.
- Point cloud-based real-time two-way video conversation can be divided into 1:1 conversation transmission and participation in multiple video conferences like a single phone call.
- both scenarios require a processor that handles media rather than directly delivering data, and must be provided in an environment where virtual meetings can be held.
- 21 illustrates point-to-point XR videoconferencing according to embodiments.
- the basic phone call request of the conversation is processed by the network function, and when using the MTSI network, the transmission and reception media uses MRF (Media Source Function) or MCU (Media Control Unit).
- MRF Media Source Function
- MCU Media Control Unit
- the MRF/MCU receives point cloud compressed data, and when the sender wants to transmit additional information (view screen, camera information, view direction, etc.) in addition to the compressed data, the data is also transmitted to the MRF/MCU.
- a video is created through an internal process, and one video includes a main video and several thumbnails.
- the processed video is delivered to each receiver again, and processing such as transcoding and resize may occur. If MRF requires a process such as transcoding, it may have an effect of increasing the maximum delay time by the processing time.
- a pre-processing process may be performed by transmitting thumbnail data to each sender and receiver in advance.
- MRF performs audio and media analysis, application server, billing server linkage, and resource management functions.
- the AS (Application Server) connected to the MRF includes the HSS linkage function for checking the subscriber's status in the telephone network and provides MRF connection and additional functions. Additional functions include a password call service, lettering service, ringback tone service, incoming and outgoing call blocking service, etc. on a real phone.
- each user must have a 3D point cloud capture camera.
- the camera must include the user's color information, location information, and depth information. If depth cannot be expressed, a converter capable of expressing a 2D image in 3D can be used.
- the captured information used includes geometry-based point cloud compression (G-PCC) or video-based point cloud compression (V-PCC) data.
- G-PCC geometry-based point cloud compression
- V-PCC video-based point cloud compression
- the transmitter must have equipment capable of receiving the other party's data.
- Receiving equipment generally refers to any equipment that can represent the acquired point cloud data. Therefore, it can be a 2D-based display and can include all equipment capable of visually expressing point cloud graphics such as HMD and holographic.
- the expressed data must be capable of receiving and processing the data transmitted from the MRF/MCU where the transmission and receiver data are processed.
- the captured point cloud data is transmitted to the MRF/MCU, and the received data generates and transmits data to each user by an internal process. It transmits basic information necessary for a conversation, a virtual space of a conversation where a conversation is required, or view information of a point of view desired by the other party, or transmits compressed data.
- the virtual space is simply used as a space to simplify by projecting a point cloud, and if the projection space is not used, all data captured by the camera is simply transmitted to the other party.
- B and C require an application to operate a video conference.
- the application checks the following basic service operations.
- Transmitter Check AR Glass, 360 Camera, Fisheye Camera, Phone Camera, Mic, Kinnect, LiDAR, etc.
- B and C acquire point data to transmit to the other party using a point cloud capture camera before participating in a conversation.
- the point data is generally data obtained by acquiring faces or shapes of B and C, and data acquired through each unique equipment can be output.
- the above scenario can be implemented based on a simple telephone network in an environment that does not know any media.
- Prior data must be received through MRF/MCU before creating a telephone network, and MRF/MCU receives all data transmitted from B and C.
- the video conversation scenario of two people in a point cloud is divided into two types as follows.
- scenario (a) all data is transmitted in a one-to-one conversation. All point cloud information of B is directly delivered to C, and C can process all of B's data or partially process it based on the additional information delivered from B. Similarly, B needs to receive all the point cloud data transmitted by C and can process some based on the additional information transmitted by C.
- scenario (b) MRF/MCU exists between the telephone networks, and B and C deliver point cloud data to the MRF/MCU existing between the two. The MRF/MCU processes the received data and delivers the corresponding data to B and C according to the specific conditions requested by B and C. Therefore, B and C may not receive all the data from the point cloud they send to each other.
- the multiperson video conference function can also be expanded, and an additional virtual space A can be included and delivered to B or C.
- an additional virtual space A can be included and delivered to B or C.
- B and C rather than directly receiving a point cloud, it is possible to place B and C in a virtual meeting space and transmit the entire virtual space to B and C in the form of a third person or a first person.
- David (D) participating, B, C, and D can freely talk to each other in the space of A.
- FIG. 22 shows an XR videoconferencing extension according to embodiments.
- the MRF/MCU can receive each data and process one data, and its schematic diagram is represented as shown in FIG. 22.
- B, C, and D deliver the acquired point cloud data to MRF/MCU.
- Each received data forms one unit frame by transcoding and creates a scene that can compose the data of the aggregated points.
- the composition of the scene is given to the person who requested hosting among B, C, and D, and in general, a point space can be created by forming various scenes.
- MRF/MCU transmits all or part of the point cloud data based on the received data information and the camera Viewpoint and Viewport requested by B, C, and D. can be conveyed
- Figure 23 shows XR videoconferencing extensions according to embodiments.
- B who has host authority, can share his data or screen with conference participants.
- Data that can be shared includes media that can be delivered to a third party, such as overlay form, independent screen, data, etc., other than image dialogue.
- B transmits the data to be shared to the MRF/MCU, and C and D can receive the shared data by their own request.
- SDP can be used to determine the number of Overlays or Laying, and capability must be measured whether all data can be received during the Offer/Answer process and whether all data to be delivered can be received. This process can be determined at the beginning of participation in multiple conferences, and data processing capability for each user can be confirmed when a telephone network is created when a data sharing function must be provided by default.
- Sharing data is generally created to share some or all of the screens of applications running in the host during a conversation, such as presentation files, excel files, and desktop screens. The generated data is compressed or the resolution is converted and delivered to the user who wants to receive it.
- FIG. 24 shows an example of a Point Cloud Encoder according to embodiments according to embodiments.
- FIG. 24 shows the GPCC encoder of FIG. 9 in detail.
- the point cloud encoder converts point cloud data (eg, positions of points and/or attributes) and perform encoding operations.
- point cloud data eg, positions of points and/or attributes
- the point cloud content providing system may not be able to stream the corresponding content in real time. Therefore, the point cloud content providing system may reconstruct the point cloud content based on the maximum target bitrate in order to provide it according to the network environment.
- the point cloud encoder may perform geometry encoding and attribute encoding. Geometry encoding is performed before attribute encoding.
- a point cloud encoder includes a transformation coordinates (240000), a quantization unit (Quantize and Remove Points (Voxelize), 240001), an analyze octree (240002), a surface approximation analysis unit ( Analyze Surface Approximation, 240003), Arithmetic Encode, 240004, Reconstruct Geometry, 240005, Transform Colors, 40006, Transfer Attributes, 240007, RAHT Transformation unit 40008, LOD generation unit (Generated LOD, 240009), lifting conversion unit (Lifting) 240010, coefficient quantization unit (Quantize Coefficients, 240011), and/or Arithmetic Encode (240012).
- Geometry encoding may include octree geometry coding, direct coding, trisoup geometry encoding, and entropy encoding. Direct coding and trisup geometry encoding are applied selectively or in combination. Also, geometry encoding is not limited to the above example.
- a coordinate system conversion unit 40000 receives positions and converts them into a coordinate system.
- the positions may be converted into positional information in a 3D space (eg, a 3D space expressed in XYZ coordinates).
- Location information in a 3D space may be referred to as geometry information.
- a quantization unit 40001 quantizes geometry.
- the quantization unit 40001 may quantize points based on minimum position values of all points (for example, minimum values on each axis for the X axis, Y axis, and Z axis).
- the quantization unit 40001 multiplies the difference between the minimum position value and the position value of each point by a preset quatization scale value, and then performs a quantization operation to find the nearest integer value by performing rounding or rounding.
- one or more points may have the same quantized position (or position value).
- the quantization unit 40001 performs voxelization based on quantized positions to reconstruct quantized points.
- points of point cloud content may be included in one or more voxels.
- the quantization unit 40001 may match groups of points in the 3D space to voxels.
- one voxel may include only one point.
- one voxel may include one or more points.
- the position of the center of a corresponding voxel may be set based on the positions of one or more points included in one voxel.
- attributes of all positions included in one voxel may be combined and assigned to the corresponding voxel.
- the octree analyzer 240002 performs octree geometry coding (or octree coding) to represent voxels in an octree structure.
- the octree structure represents points matched to voxels based on an octal tree structure.
- the surface approximation analysis unit 240003 may analyze and approximate an octree.
- Octree analysis and approximation according to embodiments is a process of analyzing to voxelize a region including a plurality of points in order to efficiently provide octree and voxelization.
- Arithmetic encoder 240004 entropy encodes the octree and/or the approximated octree.
- the encoding method includes an Arithmetic encoding method.
- a geometry bitstream is created.
- Color conversion unit 240006, attribute conversion unit 240007, RAHT conversion unit 40008, LOD generation unit 240009, lifting conversion unit 240010, coefficient quantization unit 240011 and/or Arithmetic encoder 240012 performs attribute encoding.
- one point may have one or more attributes. Attribute encoding according to embodiments is equally applied to attributes of one point. However, when one attribute (for example, color) includes one or more elements, independent attribute encoding is applied to each element.
- Attribute encoding may include color transform coding, attribute transform coding, region adaptive hierarchical transform (RAHT) coding, interpolaration-based hierarchical nearest-neighbour prediction-prediction transform coding, and interpolation-based hierarchical nearest transform (RAHT) coding.
- RAHT region adaptive hierarchical transform
- RAHT interpolaration-based hierarchical nearest-neighbour prediction-prediction transform
- RAHT interpolation-based hierarchical nearest transform
- -neighbor prediction with an update/lifting step (Lifting Transform)) coding may be included.
- the above-described RAHT coding, predictive transform coding, and lifting transform coding may be selectively used, or a combination of one or more codings may be used.
- attribute encoding according to embodiments is not limited to the above-described example.
- the color conversion unit 240006 performs color conversion coding to convert color values (or textures) included in attributes.
- the color conversion unit 240006 may convert the format of color information (for example, convert RGB to YCbCr).
- An operation of the color conversion unit 240006 according to embodiments may be optionally applied according to color values included in attributes.
- the geometry reconstructor 240005 reconstructs (decompresses) an octree and/or an approximated octree.
- the geometry reconstructor 240005 reconstructs an octree/voxel based on a result of analyzing the distribution of points.
- the reconstructed octree/voxel may be referred to as reconstructed geometry (or reconstructed geometry).
- the attribute transformation unit 240007 performs attribute transformation for transforming attributes based on positions for which geometry encoding has not been performed and/or reconstructed geometry. As described above, since attributes depend on geometry, the attribute conversion unit 240007 can transform attributes based on reconstructed geometry information. For example, the attribute conversion unit 240007 may convert an attribute of a point at a position based on a position value of a point included in a voxel. As described above, when the position of the central point of one voxel is set based on the positions of one or more points included in one voxel, the attribute conversion unit 240007 transforms attributes of one or more points. When tri-soup geometry encoding is performed, the attribute conversion unit 240007 may transform attributes based on tri-soup geometry encoding.
- the attribute conversion unit 240007 is an average value of attributes or attribute values (for example, color or reflectance of each point) of neighboring points within a specific position/radius from the position (or position value) of the center point of each voxel. Attribute conversion can be performed by calculating .
- the attribute conversion unit 240007 may apply a weight according to the distance from the central point to each point when calculating the average value. Therefore, each voxel has a position and a calculated attribute (or attribute value).
- the attribute conversion unit 240007 may search for neighboring points existing within a specific location/radius from the position of the center point of each voxel based on the K-D tree or the Morton code.
- the K-D tree is a binary search tree and supports a data structure that can manage points based on location so that a quick Nearest Neighbor Search (NNS) is possible.
- the Morton code is generated by expressing coordinate values (for example, (x, y, z)) representing the three-dimensional positions of all points as bit values and mixing the bits. For example, if the coordinate value indicating the position of the point is (5, 9, 1), the bit value of the coordinate value is (0101, 1001, 0001).
- the attribute conversion unit 40007 may sort points based on Molton code values and perform a nearest neighbor search (NNS) through a depth-first traversal process. After the attribute transformation operation, if a nearest neighbor search (NNS) is required in another transformation process for attribute coding, a K-D tree or Morton code is used.
- NSS nearest neighbor search
- the converted attributes are input to the RAHT conversion unit 40008 and/or the LOD generation unit 240009.
- the RAHT conversion unit 240008 performs RAHT coding for predicting attribute information based on reconstructed geometry information. For example, the RAHT converter 240008 may predict attribute information of a node at a higher level of the octree based on attribute information associated with a node at a lower level of the octree.
- An LOD generator 240009 generates a level of detail (LOD) to perform predictive transform coding.
- LOD according to embodiments is a degree representing detail of point cloud content. A smaller LOD value indicates lower detail of point cloud content, and a larger LOD value indicates higher detail of point cloud content. Points can be classified according to LOD.
- the lifting transform unit 240010 performs lifting transform coding for transforming attributes of a point cloud based on weights. As described above, lifting transform coding may be selectively applied.
- the coefficient quantization unit 240011 quantizes attribute-coded attributes based on coefficients.
- the Arithmetic encoder 240012 encodes the quantized attributes based on Arithmetic coding.
- the elements of the point cloud encoder are not shown in the figure, hardware, software, including one or more processors or integrated circuits configured to communicate with one or more memories included in the point cloud providing device, It may be implemented in firmware or a combination thereof.
- One or more processors may perform at least one or more of operations and/or functions of elements of the point cloud encoder.
- one or more processors may operate or execute a set of software programs and/or instructions to perform operations and/or functions of elements of the point cloud encoder.
- One or more memories may include high speed random access memory, and may include non-volatile memory (eg, one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid state memory devices). memory devices (Solid-state memory devices, etc.).
- FIG. 25 shows an example of a point cloud decoder according to embodiments.
- the point cloud decoder may perform a decoding operation that is a reverse process of the encoding operation of the point cloud encoder.
- a point cloud decoder can perform geometry decoding and attribute decoding. Geometry decoding is performed before attribute decoding.
- a point cloud decoder includes an arithmetic decoder (25000), an octree synthesizer (synthesize octree, 25001), a surface approximation synthesizer (synthesize surface approximation, 25002), and a geometry reconstructor (reconstruct geometry). . ), an inverse lifting unit (25009), and/or an inverse transform colors (25010).
- the Arismetic decoder 25000, the octree synthesis unit 25001, the surface deoxymation synthesis unit 25002, the geometry reconstruction unit 25003, and the coordinate system inverse transformation unit 25004 may perform geometry decoding.
- Geometry decoding according to embodiments may include direct coding and trisoup geometry decoding. Direct coding and tri-sup geometry decoding are selectively applied. Also, geometry decoding is not limited to the above example, and is performed in the reverse process of geometry encoding.
- the Arismetic decoder 25000 decodes the received geometry bitstream based on Arithmetic coding.
- the operation of the Arithmetic Decoder 25000 corresponds to the reverse process of the Arithmetic Encoder 240004.
- the octree synthesizer 25001 may generate an octree by obtaining an occupancy code from a decoded geometry bitstream (or information on geometry secured as a result of decoding). A detailed description of the occupancy code is as described with reference to FIG. 24 .
- the surface deoxymation synthesis unit 25002 may synthesize a surface based on the decoded geometry and/or the generated octree.
- the geometry reconstructor 25003 may regenerate geometry based on surfaces and/or decoded geometry. As described in FIG. 24, direct coding and trisup geometry encoding are selectively applied. Accordingly, the geometry reconstructor 251003) directly brings and adds position information of points to which direct coding is applied. In addition, when tri-soup geometry encoding is applied, the geometry reconstructor 25003 may restore the geometry by performing reconstruction operations of the geometry reconstructor 240005, for example, triangle reconstruction, up-sampling, and voxelization operations. there is.
- the reconstructed geometry may include a point cloud picture or frame that does not include attributes.
- the coordinate system inverse transformation unit 25004 may obtain positions of points by transforming the coordinate system based on the restored geometry.
- the Arithmetic Decoder 25005, Inverse Quantization Unit 25006, RAHT Transformation Unit 25007, LOD Generator 25008, Inverse Lifting Unit 25009, and/or Color Inverse Transformation Unit 25010 perform attribute decoding.
- can Attribute decoding according to embodiments includes Region Adaptive Hierarchial Transform (RAHT) decoding, Interpolaration-based hierarchical nearest-neighbour prediction-Prediction Transform decoding, and interpolation-based hierarchical nearest-neighbour prediction with an update/lifting transform. step (Lifting Transform)) decoding.
- RAHT Region Adaptive Hierarchial Transform
- Interpolaration-based hierarchical nearest-neighbour prediction-Prediction Transform decoding and interpolation-based hierarchical nearest-neighbour prediction with an update/lifting transform.
- step (Lifting Transform)) decoding The above three decodings may be selectively used, or a combination of one or more decodings may be used.
- the Arismetic decoder 25005 decodes the attribute bitstream by Arithmetic coding.
- the inverse quantization unit 25006 inverse quantizes a decoded attribute bitstream or information about an attribute obtained as a result of decoding, and outputs inverse quantized attributes (or attribute values). Inverse quantization may be selectively applied based on attribute encoding of the point cloud encoder.
- the RAHT conversion unit 25007, the LOD generation unit 25008, and/or the inverse lifting unit 25009 may process the reconstructed geometry and inverse quantized attributes.
- the RAHT converter 11007, the LOD generator 11008, and/or the inverse lifter 11009 may selectively perform a decoding operation corresponding to the encoding of the point cloud encoder.
- the color inverse transform unit 11010 performs inverse transform coding for inverse transform of color values (or textures) included in decoded attributes.
- An operation of the inverse color transform unit 11010 may be selectively performed based on an operation of the color transform unit 240006 of the point cloud encoder.
- Elements of the point cloud decoder of FIG. 25 are hardware including one or more processors or integrated circuits configured to communicate with one or more memories included in the point cloud providing device, although not shown in the drawing. , may be implemented in software, firmware, or a combination thereof.
- One or more processors may perform at least one or more of the operations and/or functions of the elements of the point cloud decoder of FIG. 25 described above.
- one or more processors may operate or execute a set of software programs and/or instructions to perform operations and/or functions of elements of the point cloud decoder of FIG. 25 .
- 26 shows an example of an operational flowchart of a transmitting device according to embodiments.
- Each component of the transmitting device may correspond to software, hardware, processor and/or a combination thereof.
- An operation process of a transmitter for compressing and transmitting point cloud data using V-PCC may be as shown in the drawing.
- a point cloud data transmission device may be referred to as a transmission device or the like.
- a patch for mapping a 2D image of a point cloud is created.
- additional patch information is generated, and the corresponding information can be used in geometry image generation, texture image generation, and geometry restoration processes for smoothing.
- the generated patches undergo a patch packing process of mapping into a 2D image.
- an occupancy map can be generated, and the occupancy map can be used in a geometry image generation, texture image generation, and geometry restoration process for smoothing.
- the geometry image generation unit 26002 generates a geometry image using the additional patch information and the occupancy map, and the generated geometry image is encoded into a single bitstream through video encoding.
- the encoding pre-processing 26003 may include an image padding procedure.
- the generated geometry image or the geometry image regenerated by decoding the encoded geometry bitstream may be used for 3D geometry reconstruction and may then undergo a smoothing process.
- the texture image generation unit 26004 may generate a texture image using the (smoothed) 3D geometry, a point cloud, additional patch information, and an occupancy map.
- the generated texture image may be coded into one video bitstream.
- the metadata encoding unit 26005 may encode additional patch information into one metadata bitstream.
- the video encoder 26006 may encode the occupancy map into one video bitstream.
- the multiplexer 26007 multiplexes the video bitstream of the created geometry, texture image, and occupancy map and the additional patch information metadata bitstream into one bitstream.
- the transmitter 26008 may transmit the bitstream to the receiver.
- the generated geometry, texture image, video bitstream of the occupancy map and additional patch information metadata bitstream may be generated as a file with one or more track data or encapsulated into segments and transmitted to a receiver through a transmitter.
- FIG. 27 shows an example of an operation flowchart of a receiving device according to embodiments.
- Each component of the receiving device may correspond to software, hardware, processor, and/or a combination thereof.
- An operation process of a receiving end for receiving and restoring point cloud data using V-PCC may be as shown in the drawing.
- the operation of the V-PCC receiver may follow the reverse process of the operation of the V-PCC transmitter of FIG. 26 .
- a device for receiving point cloud data may be referred to as a receiving device or the like.
- the bitstream of the received point cloud is demultiplexed by the demultiplexer 27000 into compressed geometry image, texture image, and video bitstreams of the occupancy map and additional patch information metadata bitstream after file/segment decapsulation. do.
- a video decoder 27001 and a metadata decoder 27002 decode the demultiplexed video bitstreams and metadata bitstreams.
- the 3D geometry is restored using the geometry image decoded by the geometry restoration unit 27003, the occupancy map, and additional patch information, and then a smoothing process is performed by the smoother 27004.
- a color point cloud image/picture may be reconstructed by the texture restoration unit 27005 by assigning a color value to the smoothed 3D geometry using a texture image.
- a color smoothing process can be additionally performed to improve objective/subjective visual quality, and the modified point cloud image/picture derived through this process is rendered through a rendering process (ex. by point cloud renderer). displayed to the user through Meanwhile, the color smoothing process may be omitted in some cases.
- FIG. 28 shows an XR device 100c, a wireless communication system including an encoder/decoder (FIG. 2), a point cloud data processing system connected to a communication network (FIGS. 8 to 14), and a device for transmitting/receiving point cloud data (FIGS. 17, 24- 27) shows an example of interactive point cloud data processed by the method/device according to the corresponding embodiments.
- the method/apparatus for transmitting and receiving point cloud data according to embodiments may compress and restore interactive point cloud data as shown in FIG. 28 .
- a method/device for transmitting/receiving point cloud data according to embodiments may be referred to as a method/device according to embodiments.
- a method/device may include and perform a method of generating a body-based rotation axis and parameters for a real-time virtual conversation and conferencing system (Method of shoulder-neck reference axis for XR conversational systems).
- Embodiments include an efficient human recognition method of a realistic virtual conversation and conferencing system capable of acquiring a user's face in 3D in real time and interactively and having a conversation in a virtual environment.
- a camera field that can recognize multiple people
- a point camera that can physically acquire the user's shape or face
- a color camera and a camera that can express depth
- a method of recognizing and classifying objects of people or objects is considered important.
- Most 3D technology methods use a sensor recognition method using LiDAR and a method of recognizing point cloud data acquired in real time as objects such as animals, people, and cars.
- the embodiments may include an object recognition method capable of quickly acquiring points and obtaining features in an environment in which object recognition and motion are determined based on a fixed screen.
- Embodiments include MTSI's Virtual Reality (VR) of 3GPP TS 26.114 and XR (Extended Reality) of TR26.928 and include the 3GPP TS26.223 standard that discusses IMS-based Telepresence.
- VR Virtual Reality
- XR Extended Reality
- 3GPP TS26.223 3GPP TS26.223 standard that discusses IMS-based Telepresence.
- a mobile or separate receiver can participate in a immersive conference by attending a virtual conference.
- Examples where interactive data can be delivered in a media format include 5G Media Architecture 3GPP TS26.501, TS26.512, TS26.511. Additionally, related standards may include TS26.238, TS26.939, TS24.229, TS26.295, TS26.929, and TS26.247 for the specification of services.
- the existing method for recognizing objects in real time through an encoder/decoder obtains a point cloud in a two-dimensional form and acquires it in a bird-view or front-view form. and form a depth map in each acquisition method.
- the RGP camera uses the formed point and depth to form a 3D object and performs an algorithm for determining the object based on the acquired information.
- a bounding box is created and an object is recognized within the box. Since points existing in the point cloud do not exchange information with each other, a method of tracking the shape of an object using an object recognized in a box and an existing database is widely used.
- the pre-processing process and the process of learning and recognizing objects are less required than in the real-time automation system.
- the recognition form of an object is limited to a person or an object, and a relatively fixed form of point cloud is formed rather than a changing object form.
- a person's perception creates a criterion of points based on a person's full body shape.
- the axis of rotation of the person is created based on the spine of the pelvis based on the waist, stomach, and hands where points are relatively dense.
- a skeleton is formed by averaging the density of points at the center of the pelvis and arbitrary points on the head, shoulders, hands, legs, and feet. The skeleton value is then applied to the avatar of the 3D graphic to express the movement or movement of the graphic image or is used for recognizing the movement of objects in games.
- the method/apparatus according to the embodiments is a fast recognition method that does not require a pre-processing function by using the characteristics of an interactive camera and using the fact that the main user of the user is a person and the change of the object included in the camera is relatively small. devised about The devised method uses the initial recognition method using the angle of the head, shoulders and neck that proceeds interactively rather than the waist and spine of a person. The angle can be easily obtained using the angle of the vector expressed on the 2D screen of the point, and the reference point can be easily created in the 3D conversion system, which can simplify the processing speed in the real-time 3D interactive system.
- the method/device recognizes a person based on the neck and shoulders of a 3D object recognized in a 2D plane for low computation and low latency processing of interactive point cloud data.
- actual point data requires data information to recognize whether the 2D screen of the input point cloud is a person or an object. Since information about objects other than people is not transmitted in 3D real-time conversation, the location of the central object can be roughly identified with a set of points centered on the camera.
- the figure below represents various cases in which a person can be recognized on a 2D screen when a real 3D recognition camera is applied.
- the recognition structure of an object shown in front of a camera can be expressed differently according to the shape of the shoulder and face of a real person, and the screen is transmitted based on the upper body, unlike the whole body point.
- information of a 2D space and 3D depth information using an IR camera may be obtained in advance.
- 3D depth information is used to express the dimension of an object and has depth information of points on a 2D plane. Therefore, noise of surrounding objects can be removed in advance through the intersection of the outermost depth data of the objects in the point cluster, and the primary filtered information in the 2D picture can be obtained.
- the filtering process proceeds in two steps.
- 29 shows an example of filtering according to embodiments.
- the picture determined by the three filtering can be schematized as shown in FIG.
- FIG. 30 shows a vector configuration according to embodiments.
- FIG. 29 shows that the data of FIG. 29 is generated in various types of vector configurations.
- a method/apparatus creates a vector of point cloud data as shown in FIG. 30, divides the data into 16 pieces within a point cloud box range, and detects a dense area based on a distribution map of points, as shown in FIG. 31. and can express
- the method/apparatus according to the embodiments may generate a central axis according to an object of an object as shown in FIG. 31 and may generate a head-vertebral angle and an arbitrary shoulder angle.
- 32 shows an example of generating an axis for an object of interactive point cloud data according to embodiments.
- the method/device may obtain a central axis and a center point of a person on an interactive 2D screen as shown in FIG. 32 . If the Shoulder Axis is not found, it can be recognized as an object rather than a person, but only the Rotation Head Spine Axis is defined based on the center point of the point. When two axes are generated, they can be recognized as Head Spine and Shoulder Axis, and form a central coordinate and a shoulder reference. If only one reference is recognized in two axes, the existing data processing method or data pre-processing is omitted, and the recognized data is saved in an interactive format and transmitted. In addition, when all data is not recognized, changes in action parameters that can recognize human actions may not be applied.
- a crystallization-based centroid coordinate and transformation system is used to find the angle of the Shoulder Axis.
- the angle of the head and shoulders is used for each transformation that corrects the front of the user's screen, or is used as a correction value for 2D tilt auto transformation and 3D graphic mapping.
- the method of finding the optimal angle by stochastic distribution represents the angle of the vector that meets the reference head/shoulder angle, and is 50% within the range of the point cloud box formed based on the non-regression method. All the angles of the vector can be extracted from the places where the above points are concentrated.
- 33 illustrates axis selection, estimation, transformation, angle generation, and rotation matrix generation according to an embodiment.
- Theta_k represents an angle between one vector k_1 and k_2 in the k-th box, and each vector is (x_k1, y_k1, z_k1) and (x_k2, y_k2, z_k2) x, It is composed of y and z-axis position values.
- the T matrix is a transformation matrix and generally consists of a diagonal matrix, but when a transformation bias is generated, a transformation constant may be considered after zero padding. When bias is generated, conversion matrix T may be the same as in step 3302.
- a_1, a_2, and a_3 represent weight values corresponding to the first, second, and third input values
- t_1, t_2, and t_3 represent addition values corresponding to the first, second, and third input values.
- ⁇ Diag[ ⁇ ... Among ⁇ ] ⁇ angles the variable is processed only when it is the same as the initial input Theta value (eg ⁇ ), and the Head/Shoulder Angle Alpha_i ⁇ HS value of the ith 2D frame is calculated as in step 3303.
- the rotation angle is used as an initial visual field correction value when tracking the front of a person.
- the rotation conversion can be converted into (x, y) axis, (y, z) axis, or (z, x) axis according to the depth of view and the position of the box of the point.
- the Head/Shoulder Angle Alpha_i ⁇ HS value can convert x, y, z points by rotation as shown below, and the rotation matrix is defined as in step 3304.
- step 3304 Psi represents the rotation of the z axis, theta represents the rotation of the y axis, and phi represents the rotation value of the x axis, and each value can be applied by substituting Alpha_i ⁇ HS.
- the transmitting device and the receiving device according to the embodiments may provide the following effects.
- the initial data recognition method is based on the point distribution rather than the database method, so the human shape can be recognized quickly, efficiently and accurately.
- the formed central axis is simple and fast compared to data learning recognition, and easily obtains a correction angle capable of rotation correction, reducing the preprocessing function of calculation and obtaining a more accurate rotation initial value.
- Objects on the entire screen can be divided and used as fixed reference information for calculation criteria when switching to global coordinates.
- the generated simplified object may be used as a reference value for tracking an elaborated object that is tracked based on an object as needed.
- 34 shows a method of converting point cloud data according to embodiments.
- FIG. 34 is a wireless communication system including an XR device 100c, an encoder/decoder (FIG. 2), a point cloud data processing system connected to a communication network (FIGS. 8 to 14), and a device for transmitting and receiving point cloud data (FIGS. 17, 24- 27) shows a method/apparatus according to embodiments corresponding to converting point cloud data for conversation convenience.
- a method/device may include and perform a method of tracking of eyes and facial direction for real-time XR conversational systems for a real-time virtual conversation and conferencing system.
- Embodiments further include a method for efficiently tracking and recognizing human eyes in an immersive virtual conversation and conferencing system capable of 3D acquiring a user's face in real time and interactively and having a conversation in a virtual environment.
- a camera field that can recognize multiple people
- a point camera that can physically acquire the user's shape or face
- a color camera and a camera that can express depth
- a method of recognizing and classifying objects of people or objects is considered important.
- Most 3D technology methods use a sensor recognition method using LiDAR and a method of recognizing point cloud data acquired in real time as objects such as animals, people, and cars.
- a person's eyes and mouth are important elements that can recognize a person's psychology or mood.
- a user can feel the realism of a conversation just by looking at a certain place, and it can be an important clue to behavioral elements that cannot be expressed verbally.
- Eye-tracking technology is primarily designed to prevent accidents in self-driving cars by looking into the driver's eyes and recognizing eye patterns.
- many technologies are designed to track the direction of the eyes.
- Embodiments may further include a method of expressing a conversation with a person in a realistic manner by comprehensively extracting features of the eye, which is a standard of conversation, the direction of the face, and the direction of the nose, for real-time conversation with a person.
- the feature method is a method of recognizing a person's face by extracting facial features, but when the face is rotated more than 45 degrees, the feature may not be recognized.
- the appearance method is a method that utilizes the position of the head or the face of an individual by using prior data information. It is easy to recognize a person, but errors in recognition occur when light or strong shadows occur. In addition, the appearance method requires a lot of prior information to learn data.
- the 3D recognition method uses a method of making a part of a patch into a database and tracking it, a method of averaging multiple people's faces and storing them in a graphic card for comparison, and a method of abbreviating simplified information and using it for an avatar.
- high-resolution images are required to track eyes, and elliptical images of eyes looking straight ahead can be tracked depending on the performance of the camera, but high-resolution three-dimensional information is required for imaging.
- This patent relates to a method for minimizing prior information in relation to the above and recognizing a human face in real time and determining a direction.
- a method to track the main characteristics of a person obtained in 2D and 3D and determine the direction a person is looking from a large directional perspective.
- this method it is possible to quickly determine the reference axis and the angle of the direction in real-time conversation, and it helps to efficiently create a reference axis in human-to-human conversation.
- Embodiments allow 3D point cloud data to be processed within a fixed range for low computation and low latency processing of objects recognized as 3D.
- the drawing below shows the recognition on the screen when a real 3D recognition camera is applied.
- the camera acquires surface area information of an object using a 2D or infrared camera, and an additional device such as a distance measurement capable of measuring depth may be used with the infrared camera.
- the initial recognition method of an object is [simplicity of recognition method: Box is formed at a fixed position in a 6-sided 3-dimensional space within the Field of View, and the unit of the object is based on a fixed form regardless of the location of the object or point or the shape of the image.
- T is a transformation matrix
- RT is a three-dimensional rotation matrix
- t is a translation transformation vector
- 0 represents a zero vector.
- u_n and v_n are median values obtained on a two-dimensional plane and are values used to predict the median value.
- c_x, and c_y, f_x, f_y are intrinsic camera parameters, a pinhole camera model, a principal point and a focal length, and a schematic diagram of their values is shown in FIG. and the following
- 35 illustrates a camera point, an image point, and an image plane according to embodiments.
- o_x, o_y, and o_z are offset movement values within the center reference and are determined as arbitrary values by the conversion system or movement value.
- Points that are determined without a database are generally created by the center pinhole of the camera as shown in the figure above.
- a picture of the standardized data can be expressed as shown in FIG.
- 36 shows a criterion of point cloud data according to embodiments.
- Line 3600 is a schematic diagram of the reflection of a wave obtained as line 3601 after being fired by a laser. Assuming that a 3D solid rectangle is obtained as a 2D plane among 3D points, the arrival and reflection of the point is virtually created.
- the initial point is created based on the currently widely used center principal point. If an axis is formed, only the depth map information of the set bounding box where points excluding the shoulder axis exist based on the head spine axis is filtered (box 3602).
- FIG. 37 shows a relationship between a point, a camera, and a laser projector according to embodiments.
- the d (distance) value of the z-axis within the projector can be measured with the above-mentioned value, and the formula for obtaining d is defined as the 3800 method of FIG. 38.
- u ⁇ d_n and v ⁇ d_n represent u and v vectors corresponding to d ⁇ n.
- d ⁇ n represents the distance constant according to the reflection distance of the object plane and the reference plane
- d ⁇ n represents the optimal value that exists in the set of alphabet D that exists in the 3602 bounding box.
- the constant z represents the distance between the camera for measuring the distance and the actual captured screen
- z' represents the distance of the screen where the loss of the actually measured distance occurs.
- the proposed method instead of determining the image of an existing point as the median value of the camera, it is determined by a simple formula, and the actual measurement reference value of the z-axis is determined as in the 3801 method of FIG. 38 as follows.
- z_0 is a fixed constant
- f_z is a focal length
- d (distance) represents a projection distance difference between a reference plane and an object plane e with respect to the camera.
- the method according to the embodiments can quickly acquire a reference point and an optimal direction for a person, but since it is not based on human face data, the following errors may occur. 1) If a person wears glasses or makeup, the starting position may fluctuate compared to the median value. 2) If the angle of looking at the object is not the front, an error may occur in initial data acquisition. 3) If you are in front of the camera with a pointed object such as a finger or a straw in front of your face, the starting point can be based on the pointed object.
- the point that occurs as above is the existing [Box formation at a fixed position in the 6-sided 3-dimensional space within the Field of View/how to calculate the unit of an object based on a fixed shape regardless of the location of an object or point or the shape of an image] and [Omit the pre-processor of the 3D video to generate the human spine and shoulder axis based on the data as it is input, and extract the automatic conversion Rotation Angle (calculate the 2D vector angle of the point dense area. Rotation angle for frontal tracking. automatic extraction), use as an initial value for the calculation standard when converting the Global Coordinate through a 2- or 3-dimensional converter for frontal data acquisition] to correct errors or easily compensate by general methods.
- the direction value will have a value similar to that of the center point.
- the above error may occur in the process of acquiring the initial data and converging in the optimal direction, but the speed of convergence does not significantly differ from the median value.
- the resolution of a set of stored point clouds should be high.
- the resolution of the point cluster is determined by the performance of the camera to be transmitted or an error occurs due to the rendering method. Therefore, the general gaze of a person is determined by the face and the vector R.
- the eye tracking method starts by tracking the state of the eye, and can form blind areas corresponding to the left eye and the right eye on the human face in the same way as the object tracking method.
- XR device 100c a wireless communication system including an encoder/decoder (Fig. 2), a point cloud data processing system connected to a communication network (Figs. 8 to 14), a point cloud data transmitting and receiving device (Figs. 17 and 24-27), etc.
- a method/device according to corresponding embodiments may track an eye, as shown in FIG. 39 .
- a mask sampling filter is used to track the 3D sampling eye in the plane of the squared eye.
- the mask sampling filter requires data based on the shape of the eye, samples close to a circular shape are acquired within a two-dimensional plane.
- Acquisition method can be performed in a list square (Least Square) method, and the method is determined as shown in FIG.
- 40 shows a method of obtaining a sampling eye according to embodiments.
- a_l, b_l, a_r, and b_r represent the center points of the left eye and the right eye, and R refers to a fitting circle radius that can replace the black ruler of the eye.
- X_i and y_i represent points present in the point alphabet E present in each eye. Assuming that the direction vector of a person's nose is v_n, and the direction vectors of both eyes are v_l and v_r, the information on the direction recognized by the person is finally calculated as in the 4001 method.
- the transmitting device and the receiving device according to the embodiments may provide the following effects.
- operation is not limited to the type of point cloud or the capture device.
- the proposed method can be easily linked with geometry information used in general geometry-based point cloud or video-based point cloud compression methods.
- the method/apparatus may include and perform a method of merging different point cloud view over 5G networks in a real-time realistic virtual conversation using a 5G network.
- Embodiments are when 3D data is received from multiple places when a 5G network is used in a realistic virtual conversation and conference system capable of acquiring a user's face in 3D in real time and bidirectionally and having a conversation in a virtual environment. It is about how to synthesize that data.
- a camera field that can recognize multiple people
- a point camera that can physically acquire the user's shape or face
- a color camera and a camera that can express depth
- a method of recognizing and classifying objects of people or objects is considered important.
- Most 3D technology methods use a sensor recognition method using LiDAR and a method of recognizing point cloud data acquired in real time as objects such as animals, people, and cars.
- a service network In order to achieve interactive service of real-time point cloud, a service network must be premised.
- the service using the 5G network connects to the Internet or a wireless network to deliver bi-directional user information and acquire initial data. Acquired data includes all of acquiring overall information about what kind of user the user is and what kind of service the user wants.
- the real-time point cloud can perform a service using a network to be previously delivered in order to acquire the service. Depending on the flow of planning the service, it may be transmitted as media data or may be transmitted using a telephone network.
- point data is allocated from two or more cameras or two or more resources, it is required to combine the input data into one structure and deliver it to the user.
- the combining method of the point cloud may be a method of simply combining points, but a modified method rather than a simple combining method may be required in order to recognize objects in real time and transmit higher quality realistic data.
- Embodiments include MTSI's Virtual Reality (VR) of 3GPP TS 26.114 and XR (Extended Reality) of TR26.928 and include the 3GPP TS26.223 standard that discusses IMS-based Telepresence. Through this standard, a mobile or separate receiver can participate in a immersive conference by attending a virtual conference. If interactive data can be delivered in media format, this patent covers 5G Media Architecture 3GPP TS26.501, TS26.512, TS26.511. Additionally, related standards may include TS26.238, TS26.939, TS24.229, TS26.295, TS26.929, and TS26.247 for the specification of services. Also, technologies related to data processing include ISO/IEC JTC 1/SC 29/WG3 NBMP.
- Embodiments propose a point combining method suitable for an XR Conversational service requiring real-time conversation in a threshold environment in which the above related points are combined.
- the proposed method acquires point data in real time and uses a method of combining the acquired data in real time, and the combined data enables two-way data to be exchanged through the 5G network.
- Embodiments are extended and used for two techniques: partial face neck orient calibration and eye nose direction.
- the following information can be obtained in real time.
- the method/device includes: 1. Index of a point existing in a bounding box 2. Point depth information 3. Reference axes of the head and shoulders and automatic In relation to rotation angle, eye_nose_direction_calibration (see FIGS. 34 to 40), information such as 4. the direction of a person's character and 5. the direction of a person's eyes can be obtained.
- the five parameters are information that can be obtained by using a point capture camera to process interactive virtual reality points in real time. Unlike conventional methods, the above information is designed to be suitable for an interactive virtual environment in a way that quickly acquires data and quickly grasps human components.
- point data may be obtained by taking pictures from multiple angles with a plurality of cameras in a large-scale conference hall instead of one direction of the user's third person.
- two or more point generating input sets may occur, and there is a need for combining point sets rather than 1:1 delivery by a specific request.
- the method/device according to the embodiments basically assumes that the user's face is photographed from the front or the back. If the change in the user's face is significant (e.g., one camera shoots the user's front, but another camera is directed from the soles of the feet to the user's face, or from the top of the user's head), the initial user's face
- a difficulty in combining user data in a data combining method that occurs in an environment in which there is a large difference between different cameras.
- a conventionally widely used method is used rather than a fast data processing method of a point cloud, and a method of combining a given point set by referring to multiple models or adjusting the location of the combined points by iterative correction is adopted.
- the metadata index value that can obtain the value must be shared in advance. If such data does not exist, the camera recognizes surrounding objects, analyzes the data combination method, and connects two point data by matching them.
- Possible reference objects that can exist in the center of a person can be chairs, desks, computers, potted plants, and windows of buildings. After performing the object recognition process by using the existing data information about the object, the data of the point of the human head and the front point can be equally transformed and synthesized/combined with the reference point of the recognized object as the center.
- a point From one point p, it is possible to configure a set of neighboring points existing within a radius r close to a specific sphere or plane.
- a point has information about each normal value, and if the normal value of the point p does not exist, the normal vector can be predicted by analyzing the normal value of the neighboring node.
- a normal vector of a 2x2 Matrix A composed of is predicted and calculated as shown in FIG.
- 41 shows normal vectors of a matrix for neighboring points according to embodiments.
- XR device 100c a wireless communication system including an encoder/decoder (Fig. 2), a point cloud data processing system connected to a communication network (Figs. 8 to 14), a point cloud data transmitting and receiving device (Figs. 17 and 24-27), etc.
- a method/apparatus according to corresponding embodiments may generate a normal vector of a metric for combining points having similar characteristics, as shown in FIG. 41 .
- V represents an Eigen Vector and Sigma represents an Eigen Value.
- Vector V generated in the neighboring node is used as the normal value. If the normal values and directions of all nodes are determined, based on basic data such as point position and normal information [face neck orientation: see Figs. 28 to 33] and [eye nose direction calibration: see Figs. 34 to 40] Through the process of forming the main reference feature (Reference Feature). For XR Conversational, a person-centered feature can be obtained. First, the vector result value of the spine of the user's shoulder is made as a reference axis in a two-dimensional plane, an example of which is shown in FIG.
- FIG. 42 shows an example of generating a planar reference axis from vectors related to the user's shoulder and axial vertebrae according to embodiments.
- the XR device 100c may generate axes for the shoulder and spine of a human object participating in a conversation based on normal information, five parameters according to embodiments, and a reference feature. .
- an example of an axis for a person is a figure in which a person's silhouette is formed based on the external information of a point cloud obtained from a random user, and the axis of the person's shoulder and head is determined based on the shaped two-dimensional data. [face neck orientation: see Figs. 28 to 33 1].
- the two axes in FIG. 42 form a basis axis, such as the x and y axes of a two-dimensional plane, with two axes formed from the human axis, v_s represents a shoulder reference vector, and w_h represents a head spine reference vector. reference vector).
- the angles of the two basis vectors may or may not be orthogonal.
- FIG. 43 shows a face point source and an eye point source according to embodiments.
- the XR device 100c may generate source points related to a face such as the eyes and nose of a human object participating in a conversation.
- (u_n, v_n) is an optimal point position value representing (or predicted) the human nose obtained in the process of calculating the distance constant according to the reflection distance between the object plane and the reference plane.
- indicates (a_l, b_l) and (a_r, b_r) represent the center points of the predicted person's left eye and right eye.
- the circles in FIG. 43 may be sources for determining a person's eyes and nose.
- 44 shows a vector for a source point according to embodiments.
- n_l (u_n-x, v_n-y)
- n_l (a_l-x, b_l-y)
- n_r (a_r-x, b_r-y)
- the orientation-based axis configuration has two advantages. One is easy to rotate, so it can be easily combined in various ways based on the axis of the person. Since the axis is a reference point, it is possible to reduce errors caused by detailed movements. If the existing method is based on a set of points, all errors due to combinations or errors of some points must be considered. Based on a total of 3 reference vectors and 2 main reference axes, 3 point reference feature references can be created based on the head and shoulders.
- the feature reference forms a total of six feature bins as a combination of three vector directions based on two main axes, and the feature bins indicate the degree of repetition or overlap of the points formed. Compare and combine two or more sets of points.
- HSP Head-Spine Feature Point
- XR device 100c a wireless communication system including an encoder/decoder (Fig. 2), a point cloud data processing system connected to a communication network (Figs. 8 to 14), a point cloud data transmitting and receiving device (Figs. 17 and 24-27), etc.
- the method/apparatus according to the corresponding embodiments may generate a Head-Spine Feature Point (HSP) based on the method of FIGS. 43 to 44 as shown in 4500 of FIG. 45 .
- HSP Head-Spine Feature Point
- '.' represents the inner product of two vectors.
- HSP can be calculated as a rotation angle, and the value is calculated as in the 4501 method of FIG.
- the Shoulder Feature Point (SP) is calculated as in the 4502 method of FIG. 45 .
- '' represents the inner product of two vectors.
- SP can be calculated as a rotation angle, and the value is calculated as in the 4503 method.
- the created 12 feature values can be used in a widely known way. 1) By forming a feature map using histogram-based data accumulation, it is possible to store and categorize feature information composed of step functions. 2) Parameter Restriction can be applied, such as determining the Interval as a criterion between adjacent neighbors within a point using the Sphere radius value, and mathematical conversion is easily possible with Feature distribution. 3) Average using the average and variance values of all points Statistical values based on can be extracted, and unique features can be found using bias values. 4) The generated value can be compared with the existing histogram or reference value using the Kullback-Leibler distance (divergence) model to facilitate the consistency or analysis of the data set.
- Data is verified through 9 feature references between point sets p and q of data acquired with two or more cameras, and combining can be performed if the two features are similar.
- the distribution and comparison of all points are composed of the points classified in the feature set, and the error minimization method is generally known as the Iterative Closest Point method, and the conversion formula is the same as the 4504 method.
- the point p_i of all n point indexes i is subjected to a combining method that satisfies an error constant that minimizes a new q_i value that is combined with the R transformation value and the translation value of T.
- the method according to the embodiments assumes that objects of the same method are acquired by a plurality of cameras. If the points are combined assuming that there are two people, a phenomenon in which two or more people are combined may occur because the points to be combined cannot be combined by separating the detailed characteristics (personal distinction, etc.) of the person. In order to prevent this phenomenon, it is possible to use a method to minimize errors by adding one or more widely known features (Shape, Sphere, Plane, Edge, Blank, etc.) other than the basic feature. There are also cases in which human bonding does not occur. Data for which basic human characteristics (Head-Spine Axis, Shoulder Axis) are not formed may cause a situation in which it may be difficult to obtain the value due to noise noise or interference.
- a person's basic axis can be created using the re-acquisition method. If two or more point values are not obtained in the Bounding Box 1) Data error signals can be detected by creating a signal flag such as No Detected. If the flag signal is not detected, an axis is formed and feature values can be extracted. Thirdly, there is a case in which basic feature values are obtained in terms of human characteristics, but nose and eye values are not obtained. In such an environment, the Principal Point of an image is determined as a large set of points such as a person's face, not a person's nose, and data can be transmitted without acquiring detailed eye values. In this case, classification that can be recognized as a person or non-human object or animal within the meta data is required. Since XR Conversational needs to acquire human information and process data in real time, metadata of required errors is also required to a minimum, and possible recognition information is shown in FIG. 46 .
- XR device 100c a wireless communication system including an encoder/decoder (Fig. 2), a point cloud data processing system connected to a communication network (Figs. 8 to 14), a point cloud data transmitting and receiving device (Figs. 17 and 24-27), etc.
- a method/device according to corresponding embodiments may generate and transmit metadata as shown in FIG. 46 . If the Coarse Indicator field is 0, the field is No Detected (Human), that is, no human is detected, if the field is 1, it is Coarse Detected (Human), that is, that is human is detected, and if the field is 2, Others , that is, other.
- Metadata related to FIG. 46 may be created and transmitted in the form shown in FIG. 47 .
- simple data can be exchanged by generating dictionary information of attributes using a telephone network.
- Parameters to be connected are transmitted by constructing a data-ized reference template, and elements transmitted according to the form of point cloud data may be transmitted together with data as shown in FIGS. 48-50.
- Metadata may include media parameters 4800 and feature parameters 4801 as shown in FIG. 48 , and an encoder may transmit a bitstream including point cloud data, media parameters, and feature parameters to a decoder.
- the media parameter 4800 may include the following elements.
- Codec Indicates a codec type such as 264/avc or h.265/hevc, and may indicate an image compression type such as PNG or JPG.
- Chroma Indicates a chroma subsampling type such as yuv420, yuv422, or yuv444.
- Frame rate Indicates the number of frames per second, such as 30 seconds or 60 seconds.
- Resolution Indicates a resolution such as 3840x2160, 7680 x 4320, etc.
- Feature parameter 4801 may include the following elements.
- Feature extraction method Indicates a feature extraction method such as SIFT, SURF, KAZE, AKAZE, ORB, BRISK, BRIEF, or LoG.
- Feature point number Indicates the number of feature points.
- Feature point positions Indicates feature point positions identified by X, Y coordinates.
- Feature correspondence Represents a corresponding point for each feature point.
- 49 shows metadata according to embodiments.
- Metadata may include camera parameters as shown in FIG. 49 , and an encoder may transmit a bitstream including point cloud data and camera parameters to a decoder.
- a camera parameter may include the following elements.
- Camera shutter type May indicate rolling or global.
- Camera sync skew (Camera_sync_skew): 0 if synchronized, otherwise -1 indicates sync out in milliseconds.
- Capture_settings Indicates the scene type, such as indoor or outdoor, ambient light, exposure, etc.
- Camera extrinsics object Represents camera transformation parameters (translation and rotation for global to camera transformation) used to align the image in 3D space.
- Camera Intrinsics object Represents the camera-specific parameters (focal length, principal points, and distortion coefficients) used to align images in 3D space.
- 50 shows metadata according to embodiments.
- Metadata may include stitching parameters as shown in FIG. 50, and an encoder may transmit a bitstream including point cloud data and stitching parameters to a decoder.
- Seam positions Represents the interpolation region that affects the final stitching quality.
- a region structure can be represented by a series of pixel points (start, intersection, end).
- the interpolated region locations can be represented by a mask image with only 1 or 0 values for a more sophisticated stitching process.
- a mask image can also be placed in a URL or URI.
- Stitching_method Can indicate a specific stitching algorithm for partial or full stitching approaches.
- Seam_extent_of_freedom may indicate a degree of freedom for moving the seam area, for example, a degree of freedom such as horizontal.
- Convergence selection Indicates the convergence (convergence) selection criteria. It can indicate the level of semantics of decisions dealing with ROI-related inclusion/exclusion/weighting criteria.
- Camera weighting Indicates the weight in the stitching (sewing) process. The higher the weight value, the more important the camera is. Or it may be the alignment number of the camera array. This value can be dynamic, for example depending on the user's viewing preferences.
- the transmitting device and the receiving device according to the embodiments may provide the following effects.
- Data of a person's point cloud obtained by two or more cameras can be efficiently combined.
- the extracted features are composed of axes of points, so that the combination algorithm can be used efficiently and the convergence rate is reduced.
- 51 shows a point cloud data transmission method according to embodiments.
- S5100 a method for transmitting point cloud data according to embodiments, may include encoding point cloud data.
- FIG. 1 XR device 100c Encoding operations according to the embodiments are shown in FIG. 1 XR device 100c, FIG. 2 terminal, FIG. 8 acquisition/encoding, FIG. 9-14 encoder, FIG. 17 video/audio encoder 1700, FIG. 24 encoder, FIG. 26 transmission device, FIG. 28 It may correspond to or include encoding of point cloud data according to -45.
- the method for transmitting point cloud data may further include transmitting a bitstream including point cloud data.
- Transmission operations according to embodiments correspond to FIG. 8 transmission, FIGS. 9 and 11 transmission, FIG. 13 transmission, FIG. 17 transmission and reception, FIGS. 24 and 26 bitstream transmission, transmission of a bitstream including metadata in FIGS. 46-50, and the like. or may contain
- FIG. 52 shows a method for receiving point cloud data according to embodiments.
- a method for receiving point cloud data may include receiving a bitstream including point cloud data.
- the reception operation according to the embodiments corresponds to reception in FIG. 8, reception in FIGS. 10 and 12, reception in FIG. 14, transmission and reception in FIG. 17, reception of a bitstream in FIGS. 25 and 27, and reception of a bitstream including metadata in FIGS. 46-50. or may contain
- the method for receiving point cloud data may further include decoding the point cloud data.
- FIG. 1 XR device 100c, FIG. 2 terminal, FIG. 8 decoding, FIG. 9-14 decoder, FIG. 17 video/audio encoder 1700, FIG. 25 decoder, FIG. 27 receiving device, FIG. 28-45 It may correspond to or include decoding of point cloud data according to Fig. 46-50, metadata-based decoding, and the like.
- a transmission method includes encoding point cloud data; and transmitting a bitstream including point cloud data; can include
- encoding the point cloud data includes filtering the point cloud data, and the filtering step is performed on points of the point cloud data.
- An object of point cloud data may be a person/human attending a meeting. Since the object includes the upper body region including the face and neck, an object recognized as 3D can be efficiently processed using a 2D image. Filtering according to embodiments may filter points on a 2D image to detect information on the outline of an object and an area in which important points are concentrated. A bounding box corresponding to a region including points may be generated, and regions of the 2D image may be divided into a plurality of bounding boxes. Based on the divided regions, information about a head-vertebral axis, a head-vertebral angle, a shoulder axis, and a shoulder angle may be obtained.
- the encoding of point cloud data includes, based on information about the shape of an object and a 2D image, the point cloud A 2D image may be segmented using boxes for data, and based on the distribution of points included in the 2D image, an area where points are dense may be expressed, and a center point of an object and two axes may be obtained.
- the two axes may mean the head spine axis and the shoulder axis.
- the head-vertebral axis and the shoulder axis can be used as important information on the shape of a person. Human behavior can be recognized through two axes.
- the two axes may be referred to as a first axis, a second axis, and the like.
- To obtain the axes based on the vector to the points, we can generate the angle of the vector (see Fig. 33). Based on the vector, a matrix about the coordinates of a point can be created, and based on the matrix, an angle value about an axis can be created (see Fig. 33).
- encoding the point cloud data includes filtering depth information of a bounding box in which a point exists based on a first axis among two axes, and bounding A constant for the reflection distance with respect to the plane for the box can be generated based on the focal length of the coordinate axis.
- data on the direction and gaze of the person can be acquired through a matrix (see FIG. 34).
- a point serving as a principal point may exist on an image plane, and the image plane may exist on a focal length and a coordinate axis.
- An image point and a camera point may be located on the same line, and two vector information about a principal point or an image center may be used (see FIG. 35).
- the head/spine axis may have more influence on a person's line of sight.
- an error in the direction of a person's gaze can be corrected.
- the step of encoding point cloud data includes generating gaze information about the center points of the left eye and right eye of an object, a direction vector of the nose of the object, and left eye and Based on the direction vector of the right eye, the gaze direction of the object may be generated.
- a sampling filter can be used to track them (see Fig. 39).
- Vector values for both eyes and pupils may be generated, and direction information recognized by the object may be obtained based on the vectors.
- the step of encoding the point cloud data in relation to the shoulder/head reference vector is to generate reference vectors for the object based on the two axes, and based on the reference vectors, the object cloud data. It is possible to generate point sources for , and based on the point sources, vectors for three points.
- main points existing on or near the axis may be the left eye, the right eye, the nose, etc. related to a person's line of sight. Based on the vector information of the main points, feature points related to a person's line of sight may be extracted.
- the step of encoding point cloud data is based on vectors and reference vectors for three points, the point reference feature reference , and the point reference feature reference may include a head spine feature point and a shoulder feature point.
- the bitstream includes signaling information indicating an error related to detection of an object, and the bitstream further includes a media parameter, a feature parameter, a camera parameter, and a stitching parameter. can do.
- a camera field capable of recognizing multiple people a point camera capable of physically obtaining a user's shape or face, a color camera, and a camera capable of expressing depth can be used.
- objects of people or objects may be recognized and classified.
- Point cloud data acquired in real time can be recognized as objects such as animals, people, and cars.
- point data is allocated from two or more cameras or two or more resources
- a structure for processing input data can be combined into one structure.
- the point cloud combining method goes further than simply combining points, and can recognize objects in real time and transmit realistic data of a little higher quality. Therefore, it is possible to perform network-based communication by recognizing and classifying people through feature points and additional feature information, and synthesizing point cloud data about a plurality of people.
- a method for transmitting point cloud data according to embodiments is performed by a transmission device, and the transmission device according to embodiments includes an encoder encoding point cloud data; and a transmitter that transmits a bitstream including point cloud data; can include
- the receiving method corresponding to the transmitting method may include a method corresponding to the transmitting method and/or a reverse process.
- a receiving method includes receiving a bitstream including point cloud data; and decoding the point cloud data; can include
- the step of decoding the point cloud data includes the step of filtering the point cloud data, and the step of filtering includes the step of filtering a two-dimensional image of a point of the point cloud data based on the depth of the attribute data of the point and the location information of the point. It may include generating, excluding points based on a vector for a 2D image, and generating information about a shape of an object of point cloud data.
- the decoding of the point cloud data may include dividing the 2D image using a box for the point cloud data based on the information about the shape of the object and the 2D image, and dividing the 2D image based on the distribution of points included in the 2D image.
- a method for receiving point cloud data is performed by a receiving device, the receiving device comprising: a receiving unit receiving a bitstream including point cloud data; and a decoder to decode the point cloud data; can include
- a decoder for decoding the point cloud data performs an operation of filtering the point cloud data, and the filtering operation converts a two-dimensional image of a point of the point cloud data into a two-dimensional image based on the depth of attribute data of the point and the location information of the point. It may include generating, excluding points based on a vector of a 2D image, and generating information about a shape of an object of point cloud data.
- a decoder for decoding the point cloud data divides the 2D image using a box for the point cloud data, based on the information about the shape of the object and the 2D image, and determines the distribution of points included in the 2D image. Based on this, it is possible to express an area where points are dense, and obtain a center point and two axes of an object.
- each drawing has been divided and described, but it is also possible to design to implement a new embodiment by merging the embodiments described in each drawing. And, according to the needs of those skilled in the art, designing a computer-readable recording medium in which programs for executing the previously described embodiments are recorded falls within the scope of the embodiments.
- the device and method according to the embodiments are not limited to the configuration and method of the embodiments described above, but the embodiments are selectively combined with all or part of each embodiment so that various modifications can be made. may be configured.
- Various components of the device of the embodiments may be implemented by hardware, software, firmware or a combination thereof.
- Various components of the embodiments may be implemented as one chip, for example, as one hardware circuit.
- components according to the embodiments may be implemented as separate chips.
- at least one or more of the components of the device according to the embodiments may be composed of one or more processors capable of executing one or more programs, and the one or more programs may be executed. Any one or more of the operations/methods according to the examples may be performed or may include instructions for performing the operations/methods.
- Executable instructions for performing methods/operations of an apparatus may be stored in a non-transitory CRM or other computer program products configured for execution by one or more processors, or may be stored in one or more may be stored in transitory CRM or other computer program products configured for execution by processors.
- the memory according to the embodiments may be used as a concept including not only volatile memory (eg, RAM) but also non-volatile memory, flash memory, PROM, and the like. Also, those implemented in the form of a carrier wave such as transmission through the Internet may be included.
- the processor-readable recording medium is distributed in computer systems connected through a network, so that the processor-readable code can be stored and executed in a distributed manner.
- first, second, etc. may be used to describe various components of the embodiments. However, interpretation of various components according to embodiments should not be limited by the above terms. These terms are only used to distinguish one component from another. Only thing For example, a first user input signal may be referred to as a second user input signal. Similarly, the second user input signal may be referred to as the first user input signal. Use of these terms should be construed as not departing from the scope of the various embodiments. Although both the first user input signal and the second user input signal are user input signals, they do not mean the same user input signals unless the context clearly indicates otherwise.
- operations according to embodiments described in this document may be performed by a transceiver including a memory and/or a processor according to embodiments.
- the memory may store programs for processing/controlling operations according to embodiments, and the processor may control various operations described in this document.
- a processor may be referred to as a controller or the like.
- Operations in embodiments may be performed by firmware, software, and/or a combination thereof, and the firmware, software, and/or combination thereof may be stored in a processor or stored in a memory.
- the transmitting/receiving device may include a transmitting/receiving unit for transmitting/receiving media data, a memory for storing instructions (program codes, algorithms, flowcharts and/or data) for processes according to embodiments, and a processor for controlling operations of the transmitting/receiving device.
- a transmitting/receiving unit for transmitting/receiving media data
- a memory for storing instructions (program codes, algorithms, flowcharts and/or data) for processes according to embodiments
- a processor for controlling operations of the transmitting/receiving device.
- a processor may be referred to as a controller or the like, and may correspond to, for example, hardware, software, and/or combinations thereof. Operations according to the above-described embodiments may be performed by a processor. Also, the processor may be implemented as an encoder/decoder for the operations of the above-described embodiments.
- the embodiments may be applied in whole or in part to an apparatus and system for transmitting and receiving point cloud data.
- Embodiments may include changes/variations, which do not depart from the scope of the claims and their equivalents.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Computer Graphics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Geometry (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
Description
Claims (15)
- 포인트 클라우드 데이터를 인코딩하는 단계; 및상기 포인트 클라우드 데이터를 포함하는 비트스트림을 전송하는 단계; 를 포함하는,포인트 클라우드 데이터 송신 방법.
- 제1항에 있어서,상기 포인트 클라우드 데이터를 인코딩하는 단계는,상기 포인트 클라우드 데이터를 필터링하는 단계를 포함하고,상기 필터링하는 단계는,상기 포인트 클라우드 데이터의 포인트에 관한 2차원 이미지를 상기 포인트의 어트리뷰트 데이터의 깊이 및 상기 포인트의 위치 정보에 기초하여 생성하는 단계,상기 2차원 이미지에 대한 벡터에 기초하여, 포인트를 제외하는 단계,상기 포인트 클라우드 데이터의 오브젝트의 형태에 관한 정보를 생성하는 단계를 포함하는,포인트 클라우드 데이터 송신 방법.
- 제2항에 있어서,상기 포인트 클라우드 데이터를 인코딩하는 단계는,상기 오브젝트의 형태에 관한 정보 및 상기 2차원 이미지에 기초하여, 상기 포인트 클라우드 데이터를 위한 박스를 사용하여 상기 2차원 이미지를 분할하고,상기 2차원 이미지에 포함된 포인트의 분포에 기초하여, 포인트가 밀집한 영역을 표현하고,상기 오브젝트의 센터 포인트 및 두 가지 축들을 획득하는,포인트 클라우드 데이터 송신 방법.
- 제3항에 있어서,상기 포인트 클라우드 데이터를 인코딩하는 단계는,상기 두 가지 축들 중 제1축에 기초하여, 포인트가 존재하는 바운딩 박스의 뎁스 정보를 필터링하고,상기 바운딩 박스에 대한 평면에 관한 반사 거리에 대한 상수를 좌표축의 초점거리에 기초하여 생성하는,포인트 클라우드 데이터 송신 방법.
- 제4항에 있어서,상기 포인트 클라우드 데이터를 인코딩하는 단계는,상기 오브젝트에 관한 왼쪽 눈 및 오른쪽 눈의 중심점에 관한 시선 정보를 생성하고,상기 오브젝트의 코의 방향 벡터 및 상기 왼쪽 눈 및 상기 오른쪽 눈의 방향 벡터에 기초하여, 상기 오브젝트의 시선 방향을 생성하는,포인트 클라우드 데이터 송신 방법.
- 제3항에 있어서,상기 포인트 클라우드 데이터를 인코딩하는 단계는,상기 두 가지 축들에 기초하여, 상기 오브젝트에 관한 레퍼런스 벡터들을 생성하고,상기 레퍼런스 벡터들에 기초하여, 상기 오브젝트에 관한 포인트 소스들을 생성하고,상기 포인트 소스들에 기초하여, 세 가지의 포인트들에 대한 벡터들을 생성하는,포인트 클라우드 데이터 송신 방법.
- 제6항에 있어서,상기 포인트 클라우드 데이터를 인코딩하는 단계는,상기 세 가지의 포인트들에 대한 벡터들 및 상기 레퍼런스 벡터들에 기초하여, 포인트 참조 특징 레러펀스를 생성하고,상기 포인트 참조 특징 레퍼런스는, 머리 척추 특징 포인트 및 어깨 특징 포인트를 포함하는,포인트 클라우드 데이터 송신 방법.
- 제7항에 있어서,상기 비트스트림은 상기 오브텍트의 감지에 관한 에러를 나타내는 시그널링 정보를 포함하고,상기 비트스트림은 미디어 파라미터, 특징 파라미터, 카메라 파라미터, 스티칭 파라미터를 더 포함하는,포인트 클라우드 데이터 송신 방법.
- 포인트 클라우드 데이터를 인코딩하는 인코더; 및상기 포인트 클라우드 데이터를 포함하는 비트스트림을 전송하는 트랜스미터; 를 포함하는,포인트 클라우드 데이터 송신 장치.
- 포인트 클라우드 데이터를 포함하는 비트스트림을 수신하는 단계; 및상기 포인트 클라우드 데이터를 디코딩하는 단계; 를 포함하는,포인트 클라우드 데이터 수신 방법.
- 제10항에 있어서,상기 포인트 클라우드 데이터를 디코딩하는 단계는,상기 포인트 클라우드 데이터를 필터링하는 단계를 포함하고,상기 필터링하는 단계는,상기 포인트 클라우드 데이터의 포인트에 관한 2차원 이미지를 상기 포인트의 어트리뷰트 데이터의 깊이 및 상기 포인트의 위치 정보에 기초하여 생성하는 단계,상기 2차원 이미지에 대한 벡터에 기초하여, 포인트를 제외하는 단계,상기 포인트 클라우드 데이터의 오브젝트의 형태에 관한 정보를 생성하는 단계를 포함하는,포인트 클라우드 데이터 수신 방법.
- 제11항에 있어서,상기 포인트 클라우드 데이터를 디코딩하는 단계는,상기 오브젝트의 형태에 관한 정보 및 상기 2차원 이미지에 기초하여, 상기 포인트 클라우드 데이터를 위한 박스를 사용하여 상기 2차원 이미지를 분할하고,상기 2차원 이미지에 포함된 포인트의 분포에 기초하여, 포인트가 밀집한 영역을 표현하고,상기 오브젝트의 센터 포인트 및 두 가지 축들을 획득하는,포인트 클라우드 데이터 수신 방법.
- 포인트 클라우드 데이터를 포함하는 비트스트림을 수신하는 수신부; 및상기 포인트 클라우드 데이터를 디코딩하는 디코더; 를 포함하는,포인트 클라우드 데이터 수신 장치.
- 제13항에 있어서,상기 포인트 클라우드 데이터를 디코딩하는 디코더는,상기 포인트 클라우드 데이터를 필터링하는 동작을 수행하고,상기 필터링하는 동작은,상기 포인트 클라우드 데이터의 포인트에 관한 2차원 이미지를 상기 포인트의 어트리뷰트 데이터의 깊이 및 상기 포인트의 위치 정보에 기초하여 생성하는 단계,상기 2차원 이미지에 대한 벡터에 기초하여, 포인트를 제외하는 단계,상기 포인트 클라우드 데이터의 오브젝트의 형태에 관한 정보를 생성하는 단계를 포함하는,포인트 클라우드 데이터 수신 장치.
- 제14항에 있어서,상기 포인트 클라우드 데이터를 디코딩하는 디코더는,상기 오브젝트의 형태에 관한 정보 및 상기 2차원 이미지에 기초하여, 상기 포인트 클라우드 데이터를 위한 박스를 사용하여 상기 2차원 이미지를 분할하고,상기 2차원 이미지에 포함된 포인트의 분포에 기초하여, 포인트가 밀집한 영역을 표현하고,상기 오브젝트의 센터 포인트 및 두 가지 축들을 획득하는,포인트 클라우드 데이터 수신 장치.
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18/290,676 US20250086842A1 (en) | 2021-07-20 | 2022-07-20 | Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method |
| EP22846219.8A EP4375947A4 (en) | 2021-07-20 | 2022-07-20 | POINT CLOUD DATA TRANSMISSION DEVICE, POINT CLOUD DATA TRANSMISSION METHOD, POINT CLOUD DATA RECEIVING DEVICE, AND POINT CLOUD DATA RECEIVING METHOD |
| CN202280057293.XA CN117836815A (zh) | 2021-07-20 | 2022-07-20 | 点云数据发送设备、点云数据发送方法、点云数据接收设备以及点云数据接收方法 |
Applications Claiming Priority (6)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR20210094548 | 2021-07-20 | ||
| KR10-2021-0094544 | 2021-07-20 | ||
| KR20210094544 | 2021-07-20 | ||
| KR10-2021-0094548 | 2021-07-20 | ||
| KR20210139835 | 2021-10-20 | ||
| KR10-2021-0139835 | 2021-10-20 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023003349A1 true WO2023003349A1 (ko) | 2023-01-26 |
Family
ID=84979304
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/KR2022/010606 Ceased WO2023003349A1 (ko) | 2021-07-20 | 2022-07-20 | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US20250086842A1 (ko) |
| EP (1) | EP4375947A4 (ko) |
| WO (1) | WO2023003349A1 (ko) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20250095212A1 (en) * | 2023-09-15 | 2025-03-20 | Electronics And Telecommunications Research Institute | Method for transmitting and obtaining dynamic 3-dimensional avatar data and apparatus for the same |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20240117101A (ko) * | 2022-01-13 | 2024-07-31 | 엘지전자 주식회사 | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 |
| US11900525B2 (en) * | 2022-02-14 | 2024-02-13 | Google Llc | Learned volumetric attribute compression using coordinate-based networks |
| US12614315B2 (en) * | 2022-10-21 | 2026-04-28 | Qualcomm Incorporated | Attribute coding for point cloud compression |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR101666937B1 (ko) * | 2016-05-13 | 2016-10-17 | 한국건설기술연구원 | 3차원 포인트 클라우드의 대용량 데이터를 처리하기 위한 장치 및 그 방법 |
| KR20170028605A (ko) * | 2015-09-04 | 2017-03-14 | 한국전자통신연구원 | Rgb-d 영상 기반 사람 영역 추출 장치 및 그 방법 |
| KR20200038534A (ko) * | 2017-09-18 | 2020-04-13 | 애플 인크. | 포인트 클라우드 압축 |
| WO2020189943A1 (ko) * | 2019-03-15 | 2020-09-24 | 엘지전자 주식회사 | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 |
| WO2020193038A1 (en) * | 2019-03-22 | 2020-10-01 | Interdigital Vc Holdings France | Processing a point cloud |
Family Cites Families (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TW413795B (en) * | 1999-02-26 | 2000-12-01 | Cyberlink Corp | An image processing method of 3-D head motion with three face feature points |
| ATE454849T1 (de) * | 2002-10-15 | 2010-01-15 | Volvo Technology Corp | Verfahren für die auswertung der kopf- und augenaktivität einer person |
| US7397473B2 (en) * | 2004-07-16 | 2008-07-08 | Geometric Ltd. | Geometry based search method for 3D CAx/PDM repositories |
| US9547838B2 (en) * | 2013-11-06 | 2017-01-17 | Oracle International Corporation | Automated generation of a three-dimensional space representation and planogram verification |
| US10223810B2 (en) * | 2016-05-28 | 2019-03-05 | Microsoft Technology Licensing, Llc | Region-adaptive hierarchical transform and entropy coding for point cloud compression, and corresponding decompression |
| US10861196B2 (en) * | 2017-09-14 | 2020-12-08 | Apple Inc. | Point cloud compression |
| US10327697B1 (en) * | 2018-12-20 | 2019-06-25 | Spiral Physical Therapy, Inc. | Digital platform to identify health conditions and therapeutic interventions using an automatic and distributed artificial intelligence system |
| CN111523398A (zh) * | 2020-03-30 | 2020-08-11 | 西安交通大学 | 一种融合2d人脸检测和3d人脸识别的方法及装置 |
-
2022
- 2022-07-20 US US18/290,676 patent/US20250086842A1/en active Pending
- 2022-07-20 WO PCT/KR2022/010606 patent/WO2023003349A1/ko not_active Ceased
- 2022-07-20 EP EP22846219.8A patent/EP4375947A4/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20170028605A (ko) * | 2015-09-04 | 2017-03-14 | 한국전자통신연구원 | Rgb-d 영상 기반 사람 영역 추출 장치 및 그 방법 |
| KR101666937B1 (ko) * | 2016-05-13 | 2016-10-17 | 한국건설기술연구원 | 3차원 포인트 클라우드의 대용량 데이터를 처리하기 위한 장치 및 그 방법 |
| KR20200038534A (ko) * | 2017-09-18 | 2020-04-13 | 애플 인크. | 포인트 클라우드 압축 |
| WO2020189943A1 (ko) * | 2019-03-15 | 2020-09-24 | 엘지전자 주식회사 | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 |
| WO2020193038A1 (en) * | 2019-03-22 | 2020-10-01 | Interdigital Vc Holdings France | Processing a point cloud |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4375947A4 * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20250095212A1 (en) * | 2023-09-15 | 2025-03-20 | Electronics And Telecommunications Research Institute | Method for transmitting and obtaining dynamic 3-dimensional avatar data and apparatus for the same |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4375947A1 (en) | 2024-05-29 |
| EP4375947A4 (en) | 2025-08-06 |
| US20250086842A1 (en) | 2025-03-13 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2023003349A1 (ko) | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 | |
| US11812049B2 (en) | Image encoder and related non-transitory computer readable medium for image decoding | |
| US12160600B2 (en) | System and method for video coding | |
| US20250106423A1 (en) | Image encoder, image decoder, image encoding method, and image decoding method | |
| CN111295884B (zh) | 图像处理装置及图像处理方法 | |
| WO2019151798A1 (ko) | 무선 통신 시스템에서 이미지에 대한 메타데이터를 송수신하는 방법 및 장치 | |
| WO2022211476A1 (en) | Method and apparatus for supporting teleconferencing and telepresence containing multiple 360 degree videos | |
| US12149723B2 (en) | Decoder and decoding method | |
| WO2021206333A1 (ko) | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 | |
| WO2023106721A1 (en) | Method and device for providing split computing based on device capability | |
| EP3117604B1 (en) | Elementary video bitstream analysis | |
| US12526475B2 (en) | Reproduction apparatus, transmission apparatus, reproduction method, and transmission method | |
| US20180352248A1 (en) | Image decoding method, image encoding method, image decoding device, image encoding device, and image encoding/decoding device | |
| CA3086574A1 (en) | Encoder, encoding method, decoder, and decoding method | |
| WO2021025168A1 (en) | System and method for video coding | |
| WO2022005116A1 (ko) | 무선 통신 시스템에서 데이터의 송수신을 제어하기 위한 방법 및 장치 | |
| WO2023003354A1 (ko) | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 | |
| WO2023014085A1 (ko) | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 | |
| Adeyemi-Ejeye et al. | Impact of packet loss on 4K UHD video for portable devices | |
| US11297329B2 (en) | Image encoding method, transmission method, and image encoder | |
| EP3837853A1 (en) | A method, apparatus and computer program for providing edited video content | |
| WO2023101510A1 (ko) | 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법 | |
| US12075042B2 (en) | Encoder, decoder, encoding method, and decoding method | |
| WO2020036389A1 (en) | Method for transmitting video, apparatus for transmitting video, method for receiving video, and apparatus for receiving video | |
| Benedetto et al. | QoS assessment of 3G video-phone calls by tracing watermarking exploiting the new colour space ‘YST’ |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22846219 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 18290676 Country of ref document: US |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2022846219 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202280057293.X Country of ref document: CN |
|
| ENP | Entry into the national phase |
Ref document number: 2022846219 Country of ref document: EP Effective date: 20240220 |
|
| WWP | Wipo information: published in national office |
Ref document number: 18290676 Country of ref document: US |