WO2025211547A1 - Procédé et système de reconstruction d'une scène de réalité étendue multidimensionnelle - Google Patents
Procédé et système de reconstruction d'une scène de réalité étendue multidimensionnelleInfo
- Publication number
- WO2025211547A1 WO2025211547A1 PCT/KR2025/000786 KR2025000786W WO2025211547A1 WO 2025211547 A1 WO2025211547 A1 WO 2025211547A1 KR 2025000786 W KR2025000786 W KR 2025000786W WO 2025211547 A1 WO2025211547 A1 WO 2025211547A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- scene
- user
- capture
- dimensional
- capturing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating three-dimensional [3D] models or images for computer graphics
- G06T19/006—Mixed reality
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
- G06F3/013—Eye tracking input arrangements
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/017—Gesture based interaction, e.g. based on a set of recognized hand gestures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
- G06F3/04815—Interaction with a metaphor-based environment or interaction object displayed as three-dimensional [3D], e.g. changing the user viewpoint with respect to the environment or object
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—Three-dimensional [3D] image rendering
- G06T15/50—Lighting effects
- G06T15/506—Illumination models
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating three-dimensional [3D] models or images for computer graphics
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating three-dimensional [3D] models or images for computer graphics
- G06T19/20—Editing of three-dimensional [3D] images, e.g. changing shapes or colours, aligning objects or positioning parts
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/54—Extraction of image or video features relating to texture
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/20—Scenes; Scene-specific elements in augmented reality scenes
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/70—Labelling scene content, e.g. deriving syntactic or semantic representations
Definitions
- Embodiments disclosed herein relate to Extended Reality (XR) systems, and more particularly to methods and systems for providing guidance to a user to capture a scene for reconstruction using an XR device.
- XR Extended Reality
- Extended Reality is rapidly expanding, enabling users in different locations to share an immersive experience of the same physical space through scene reconstruction.
- the 3D XR experience involves spatial, acoustic, illumination, and appearance information components, which require advanced techniques to help users capture rich data for high-quality reconstruction.
- Modelling scene illumination is crucial for enhancing the realism and immersiveness of an XR experience. It includes various elements (such as, but not limited to, shadows, depth perception, virtual objects, and mood lighting) which contribute to a more convincing and engaging environment. For instance, in a Virtual Reality (VR) party, dynamic lighting that changes based on the music being played can significantly enhance the atmosphere and make the experience more enjoyable and immersive for participants. Properly modelled lighting helps in creating believable and lifelike virtual scenes that closely mimic real-world interactions with light, thereby improving the overall XR experience.
- VR Virtual Reality
- modelling scene acoustics is vital for achieving spatial audio, which allows users to identify the direction and distance of sounds within the XR environment, thereby enhancing realism.
- Spatial audio adds a layer of depth to the experience, making the experience feel more authentic and immersive.
- accurately modelled acoustics enable users to pinpoint where a sound is coming from, (for example, a conversation in a virtual meeting or the direction of footsteps in a VR game). This auditory information complements the visual cues, creating a more cohesive and lifelike experience.
- VST Visual See Through
- the embodiments herein provide a method and system for reconstructing a multi-dimensional extended reality (XR) scene, the method comprising, obtaining, by an XR device, information corresponding to at least one of: semantics of a scene, geometry of the scene and acoustic of the scene using a sensor data, while a user moves within the scene during the capturing of the scene, determining, by the XR device, a scene type and at least one capture threshold for capturing a multi-dimensional scene using a trained model based on the obtained information, wherein the multi-dimensional scene comprises at least one of: texture characteristics of the scene, spatial characteristics of the scene, acoustics characteristics of the scene, and illumination characteristics of the scene, generating, by the XR device, a user guidance for an assisted scene capturing using the scene type and the at least one capture threshold, wherein the user guidance includes at least one of: a path to be followed by the user and at least one action to be performed by the user along the path, and capturing, by the XR
- the embodiments herein provide a method that may perform at least one of: indicating a gaze fixation point for the user to focus an imaging device, indicating a walking speed depending on at least one detail in a part of the scene, and augmenting a captured data for at least one optimized texture, optimized acoustics, optimized illumination, and optimized material reconstruction in the scene.
- the embodiments herein provide a method that may perform at least one of: listening experience at a given position in the multi-dimensional XR scene.
- the embodiments herein provide a method that may capture the multi-dimensional scene representing at least one of the spatial characteristics of the scene, the illumination characteristics of the scene and the acoustic characteristics of the scene during the assisted scene capture using the generated user guidance comprises: processing a tag associated with an image, a depth of the image, a microphone response signal, a material classification associated with the scene, a sound generation time, lighting information, and acoustic information; and capturing, by the XR device, the multi-dimensional scene representing at least one of the spatial characteristics of the scene, the illumination characteristics of the scene and the acoustic characteristics of the scene based on the processing.
- the embodiments herein provide a method and system to determine the multi-dimensional scene by generating a map upon initiating a traverse through an environment, marking an area in which user intended to travel in the environment and usage in the map, estimating a ground plane, a user height and a pose region in the environment, and estimating an occlusion blind spot by ray propagation from a pose region in the environment.
- the embodiments herein provide a method and system wherein the user guidance comprises at least one of: modifying a lighting in the scene by at least one of: turning ON light, turning OFF light, adjusting a brightness of the light, and changing a colour of the light to capture a model different light source on scene lighting.
- FIG. 1 depicts a scene reconstruction method, according to existing arts
- FIGS. 2A-2E depict flow diagrams for providing live guidance to a user for capturing data for performing joint spatial-acoustic-illumination for XR scene reconstruction, according to embodiments as disclosed herein;
- FIGS. 3A-3G depicts an example user journey, wherein the user uses joint spatial-acoustic-illumination for XR scene reconstruction, according to embodiments as disclosed herein;
- FIG. 4 depicts hardware component of the XR Device, according to embodiments as disclosed herein;
- FIG. 5 depicts an example user scenario of a virtual experience, according to embodiments as disclosed herein;
- FIG. 7 depicts an example user scenario of productivity, according to embodiments as disclosed herein.
- FIG. 8 depicts an example user scenario of architecture and design, according to embodiments as disclosed herein;
- FIG. 9A-9B depicts the difference between an example spatial video and a video captured using joint spatial-acoustic-illumination XR scene reconstruction, according to embodiments as disclosed herein;
- FIG. 10 depicts an example of real-time guidance and feedback for capturing media using the joint spatial-acoustic-illumination XR scene reconstruction., according to embodiments as disclosed herein;
- FIGS. 11A-11B depicts example illumination and acoustics of a reconstructed scene that can be modified using the joint spatial-acoustic-illumination XR scene reconstruction, according to embodiments as disclosed herein;
- FIGS. 12A-12B depicts an example flowchart with live user assistance, according to embodiments as disclosed herein;
- FIGS. 13A-13B depicts an example flowchart with XR scene reconstruction, according to embodiments as disclosed herein;
- FIG. 14A and 14B depict an example surface with higher texture richness alongside a surface with lower texture richness, according to embodiments as disclosed herein
- FIG. 15 depicts an example of recreation and high-fidelity spatial reconstruction with a spatial and pose coverage, according to embodiments as disclosed herein;
- FIG. 16 depicts an example view of direction of a light source for modelling and recreating illumination experience(s), according to embodiments as disclosed herein;
- FIG. 17 is a flowchart depicting a method for providing user guidance for reconstructing a multi-dimensional XR scene, according to embodiments as disclosed herein;
- FIG. 18 is a flowchart depicting a method for reconstructing a multi-dimensional XR scene, according to embodiments as disclosed herein.
- Embodiments herein may be described and illustrated in terms of blocks which carry out a described function or functions. These blocks, which may be referred to herein as managers, units, modules, hardware components or the like, are physically implemented by analog and/or digital circuits such as logic gates, integrated circuits, microprocessors, microcontrollers, memory circuits, passive electronic components, active electronic components, optical components, hardwired circuits and the like, and may optionally be driven by a firmware.
- the circuits may, for example, be embodied in one or more semiconductor chips, or on substrate supports such as printed circuit boards and the like.
- circuits constituting a block may be implemented by dedicated hardware, or by a processor (e.g., one or more programmed microprocessors and associated circuitry), or by a combination of dedicated hardware to perform some functions of the block and a processor to perform other functions of the block.
- a processor e.g., one or more programmed microprocessors and associated circuitry
- Each block of the embodiments may be physically separated into two or more interacting and discrete blocks without departing from the scope of the disclosure.
- the blocks of the embodiments may be physically combined into more complex blocks without departing from the scope of the disclosure.
- An embodiment according to the present disclosure may disclose methods and systems for guiding a user to capture a scene (e.g., multi-dimensional extended reality (XR) scene or the like) for reconstruction using an extended reality (XR) device.
- a scene e.g., multi-dimensional extended reality (XR) scene or the like
- XR extended reality
- an embodiment according to the present disclosure may provide real-time feedback and active guidance to a user (when the user is capturing data) to ensure accurate data collection for capturing multi-dimensional scene information, while simultaneously delivering high-quality scene reconstruction in less time.
- the gathered data including pose-tagged HDR images, depth images, microphone responses, and material classifications, are sent to the high-dimensional joint scene reconstruction module 240.
- the high-dimensional joint scene module 240 combines spatial, acoustic, illumination, and material properties to build a comprehensive 3D representation of the scene, either on the device or in the cloud.
- the user guided capture stage 242 may comprise a user action planner, a user gaze fixation planner, and/or a path and speed planner.
- the user action planner may recommend an action plan to the user 232.
- the user gaze fixation planner may recommend one or more fixation points to the user 232.
- the path and speed planner may provide an optimal path and speed for capture to the user 232.
- the guided capture stage 242 may provide a guidance comprise at least one of the recommended action plan, the recommended fixation points, or the optimal path and speed for capture to the user 232.
- the user guided capture stage 242 may suggest or recommend one or more actions to the user 232.
- the one or more actions may include, but are not limited to, placing audio source(s) and/or audio receiver(s) (e.g., a phone, buds, and/or a speaker) at indicated position(s) along the guided path for richer acoustic reconstruction.
- the one or more actions may include, but are not limited to, modifying a lighting in the scene by turning on or off one or more light sources, modifying brightness and/or color of a light source for better capture, and/or modeling how each light source influence lighting of the scene.
- FIG. 2E depicts the process of performing multi-dimensional Joint Scene Reconstruction.
- a method according to the present disclosure involves verifying the capture region marking using XR inputs from various sources such as cameras 206, IMUs 201, ToF sensors 202, mics 203, and eye-tracking cameras 205 illustrated in FIG. 2A.
- a scene understanding module e.g., the scene understanding module 238 of FIG. 2A
- the coarse mesh depiction is a representation of a scene in 3D, which provides good surface understanding.
- the mesh density or number of points in the mesh is lower giving a large-scale understanding. Though the coarse mesh depiction covers the mesh density or number of points in the mesh is lower and might miss on minute details, but will be generated rapidly and with lesser sensor data due to lower mesh resolution for dynamic tasks like obstacle avoidance and path planning.
- a method includes determining one or more capture settings, such as, but not limited to, texture, acoustics, illumination accuracy, and scene type, which are selected by the user and sent to the user guided capture stage 242.
- a typical listening experience i.e., acoustics
- the user guided capture stage 242 can send one or more parameters to a high-dimensional joint scene reconstruction module 240, which performs spatial, acoustic, illumination, and material property reconstructions either on-device or in the cloud.
- Examples of the parameters can be, but not limited to, pose-tagged HDR images, depth images, microphone responses, material classification, sound generation time, and so on.
- the Pose-tagged HDR images capture both high-dynamic-range visuals and the camera's position, aiding 3D scene reconstruction, whereas the depth images map object distances, essential for 3D modeling. The depth information is crucial for building a 3D model of the scene.
- the audio data captured by microphones, for example, the microphone responses include the sound characteristics to recreate the acoustic environment.
- Material classification involves identifying and categorizing the materials present in a scene, helping in creating a more accurate reconstruction of the scene.
- the specific time taken at which a sound is produced within a scene helps in synchronizing the sound with the visual and spatial data, ensuring that the reconstructed scene accurately reflects the original environment.
- the various actions mentioned above may be performed in the order presented, in a different order or simultaneously. Additionally or alternatively, the method provide real-time guidance during the capture process. This allows users to make informed adjustments on-the-fly, thereby enhancing the overall quality and precision of the XR experience.
- Embodiments herein can assess the quality of input data (prior to reconstruction), thereby enhancing the accuracy and relevance of captured data.
- embodiments herein provide dynamic, real-time guidance tailored to the user's context during the capture process.
- Embodiments herein use a personalized feedback mechanism to ensure that the input data is optimized for the specific needs of various user personas, for example, users involved in social media, architecture, music, and so on.
- Embodiments herein can employ default and custom profiles to address the unique requirements of different users, enhancing the versatility and applicability of the method across diverse fields and applications.
- Embodiments herein can improve the user experience by delivering context-aware, actionable insights at the moment of data capture. Further, in some embodiments, some actions listed in FIG. 2A-2E may be omitted.
- Users can enhance their audio experience by strategically placing audio sources and receivers (such as, but not limited to, phones, earbuds, and speakers), at one or more specified locations along a suggested path to achieve a richer acoustic reconstruction. Additionally, the users can optimize the lighting within the scene by adjusting existing light sources (such as, by turning them on or off, or by altering their brightness and color settings) to accurately capture and model the influence of each light source on the overall scene lighting. These adjustments will significantly improve both the auditory and visual quality of the scene, creating a more immersive and detailed environment.
- audio sources and receivers such as, but not limited to, phones, earbuds, and speakers
- Embodiments herein enhance the comprehensiveness and quality of the data collected, thereby facilitating superior XR reconstruction of various scene attributes (such as, but not limited to, geometry, textures, acoustics, and illumination).
- scene attributes such as, but not limited to, geometry, textures, acoustics, and illumination.
- FIGs. 3A-3G depict an example user journey using the proposed method of joint spatial-acoustic-illumination XR scene reconstruction.
- a user 30 uses a XR device 300.
- the XR device 300 is illustrated as a pair of handy controllers, a configuration of the XR device 300 is not limited thereto.
- the XR device 300 may include a display device (e.g., a head-mounted display apparatus, an augmented reality (AR)/XR helmet, or an AR/VR glasses) which provides a user interface to the user 30.
- the XR device 300 may utilize one or more algorithms for tracking one or more gestures of the user 30 without any handy controller.
- FIG. 3A illustrates an example image indicating that a user 30 of a XR device 300 has accessed an application for XR scene reconstruction. While the user 30 uses the XR device 300, the XR device 300 may provide a user interface 302 comprising one or more icons representing corresponding functionality.
- the user interface 302 may comprise at least one of: an icon representing current time, an icon representing a face (or an icon) of the user 30, an icon representing one or more wireless connections of the XR device 300, an icon corresponding to a functionality representing a list of applications installed on the XR device 300, an icon corresponding to a functionality for representing a list of contacts stored on the XR device 300, an icon corresponding to a functionality for representing a list of notifications occurred in the XR device 300, an icon corresponding to a functionality for sharing one or more XR scenes generated by the XR device 300, or an icon corresponding to a functionality for modifying settings of the XR device 300.
- the XR device 300 may provide an application list user interface 304.
- the application list user interface 304 may comprise one or more icons respectively corresponding to one or more applications of the XR device 300.
- the XR device may further representing a line user interface 306 which indicates tracked intention of the user 30.
- the line user interface 306 may represented based on tracked, by the XR device 300, gesture of the user 30.
- the application for joint spatial-acoustic-illumination XR scene reconstruction may be opened.
- the user 30 can be provided with an interface that prompts them to begin the process.
- FIG. 3B illustrates an example image, wherein the application for XR scene reconstruction provides instructions via a guidance user interface 310.
- the application may provide the guidance user interface 310 which guides the user 30 to walk around the scene and mark the capture region. For example, via the guidance user interface 310, the application instructs the user 30 to walk around the environment and mark the boundaries of the capture region.
- FIG. 3C illustrates an example image indicating that the user 30 wearing the XR device 300 confirms that the capture region is marked correctly on the ground plane.
- the XR device 300 may comprise a head-mounted wearable device 300c which has an imaging device (for example, a camera).
- FIG. 3D illustrates an example image indicating an estimated path 312 for optimal capture that has been highlighted for the user 30 to follow.
- the XR device may display a user interface 314 for asking the user 30 to confirm the estimated path 312 (e.g., the capture region).
- the application may re-estimate the capture region.
- the gaze fixation point is indicated for the user to focus an imaging device (e.g., a camera of the XR device 300) on. Additionally or alternatively, the walking speed is indicated depending on the details in different parts of the scene. Other actions to augment the richness of the captured data for better textures, acoustics, illumination, material reconstruction are suggested.
- an imaging device e.g., a camera of the XR device 300
- FIGs. 3E and 3F illustrate an example image indicating that the user 30 is guided to walk along a highlighted path 316, and to perform an instructed action for optimization of image capturing.
- the XR device 300c may guide the user 30 to gaze a certain fixed point (e.g., a point 320) in the environment while walking along the highlighted path 326.
- the application continuously captures spatial data, while simultaneously collecting information on the scene's acoustic properties and illumination conditions.
- the application may reconstruct the XR scene based on the collected information, and indicates the user 30 that data associated with the scene is successfully captured by using an user interface 322 (as depicted in the example depicted in FIG. 3G). As the user 30 walks along this highlighted path 316, the user 30 is guided to perform specific action(s) to ensure optimal data capture, including focusing the camera on designated gaze fixation points and adjusting walking speed based on scene details.
- the application can suggest one or more additional actions to enhance data richness, such as, but not limited to, fixating on points to capture detailed lighting information.
- the user 30 can be prompted to focus on a specific point (e.g., the point 320 illustrated in FIG. 3F) to gather more light source details.
- the application can indicate that the user 30 has correctly marked the capture region on the ground plane.
- the application analyzes the gathered data, and extracts detailed scene semantics and geometric information from the gathered data. Thereafter, the scene semantics and geometry information of the marked region is collected, and scene type and capture accuracy thresholds are determined (for example, one or more thresholds for texture, acoustics, and/or illumination accuracy).
- the application determines the scene type and sets specific capture accuracy thresholds for texture, acoustics, and illumination.
- the captured data including pose-tagged HDR images, depth images, microphone response signals, material classifications, signature sound generation times, and other lighting and acoustic information, can be then processed to create a reconstruction. This processing can be performed either offline (e.g., by the XR device 300) or on the cloud, ensuring comprehensive and high-quality data for accurate scene reconstruction.
- This comprehensive dataset ensures that the reconstructed XR scene accurately reflects the real-world environment, providing a highly immersive and realistic experience.
- FIG. 4 depicts hardware component of the XR Device 230 comprises of processor 230a, a scene reconstruction controller 230b and a memory 230c.
- the XR device 230 may exclude at least one of these components or may further include at least one other component.
- the processor 230a includes one or more processing devices or processing circuitry, such as one or more microprocessors, microcontrollers, digital signal processors (DSPs), application specific integrated circuits (ASICs), or field programmable gate arrays (FPGAs).
- the processor 120 includes one or more of a central processing unit (CPU), an application processor (AP), a communication processor (CP), or a graphics processor unit (GPU).
- the processor 230a is able to perform control on at least one of the other components of the XR device 230.
- the scene reconstruction controller (230b) is coupled with the processor (230a) and the memory (230c).
- the scene reconstruction controller (230b) is configured to obtain information corresponding to at least one of: the semantics of the scene, the geometry of the scene, or acoustic of the scene using the sensor data, while a user moves within the scene during capturing the scene.
- the scene reconstruction controller 230b is configured to activate user marking of the capture region if the capture region is verified, determine a scene type and at least one capture threshold for capturing a multi-dimensional scene using a trained model based on the obtained information, generate a user guidance for an assisted scene capturing using the scene type and the at least one capture thresholds, wherein the user guidance includes at least one of: a path to be followed by the user or at least one action to be performed by the user along the path, and capture the multi-dimensional scene representing at least one of the spatial characteristics of the scene, the illumination characteristics of the scene, or the acoustic characteristics of the scene during the assisted scene capture using the generated user guidance.
- the memory 230c may store commands or data related to at least one other component of the XR device 230.
- the memory 230c can include volatile memory (e.g., a random-access memory (RAM)) and/or non-volatile memory (e.g., a flash memory or a solid-state drive (SSD)).
- the memory 230c may comprise one or more storage medium which store(s) one or more instructions. The one or more instructions may cause, when executed by the processor 230a and/or the scene reconstruction controller 230b individually or collectively, the XR device 230 to perform any combinations of operations described herein.
- FIG. 5 depicts an example virtual experience.
- Embodiments herein can provide the user with an immersive virtual concert experience that closely simulates the sensations of attending a live event in person.
- Embodiments herein can replicate the intricate details of concert lighting and sound systems, thereby ensuring that the user can experience the atmosphere and ambience of a live performance with remarkable fidelity.
- Embodiments herein can capture the visual and auditory nuances of a concert, and can also extend the experience beyond geographical limitations, thereby allowing friends and loved ones to join in and enjoy the event together, regardless of their location. This enhanced social interaction amplifies the enjoyment and creates a shared, memorable experience, bridging the gap between physical presence and virtual participation.
- FIG. 6 depicts an example virtual experience.
- the user utters the following: "I want to share my experience of this museum exhibit with my friends in an immersive way?".
- Embodiments herein can capture a high-fidelity reconstruction of scene in 3D. The user can share this with friends who can view the exhibit from any viewpoint in an immersive way.
- FIG. 7 depicts an example virtual experience.
- the user utters the following: "I want to have a meeting, but my team works in a hybrid mode across multiple offices, and it is hard to communicate effectively with just video meetings".
- Embodiments herein offer the potential to create a high-dimensional reconstruction of a meeting room, meticulously modeling not just the physical layout of the meeting room, but also the acoustics and lighting in the meeting room, thereby ensuring a highly immersive virtual environment.
- Embodiments herein can capture and replication of in-person interactions accurately, including subtle aspects such as tone of voice, gestures, and body language of the participants in the meeting, all presented in a 1:1 scale.
- embodiments herein effectively bridge the communication gap, allowing for interactions that closely mimic face-to-face meetings. This level of detail enhances the realism of virtual engagements, making them more effective and engaging compared to traditional virtual communication platforms.
- embodiments herein can significantly enhance the architectural design process by enabling precise scaled 3D reconstructions, which are crucial for accurate design, acoustic analysis, and illumination modelling.
- Embodiments herein allow architects to experiment with various materials and design styles in a virtual environment, offering a comprehensive perspective from any vantage point within the reconstructed scene. Additionally, embodiments herein facilitates real-time collaboration by allowing on-site workers to update and share the current progress of a project with the architect remotely, ensuring continuous and effective communication even when the architect is off-site. This seamless integration of technology not only optimizes design accuracy and efficiency, but also enhances the overall management and execution of architectural projects.
- FIG. 10 depicts an example scenario, wherein real-time guidance and feedback is used for capturing the scene.
- the XR device may start capturing data from surroundings.
- the XR device may suggest following a path and performing one or more actions (e.g., capturing one or more points nearby with the XR device) to the user. While the user follows the suggestion, the XR device may collect data from the surroundings, and reconstruct the scene based on the collected data.
- FIGs. 11A-11B depict the illumination and acoustics of the reconstructed scene that can be modified using the proposed joint spatial-acoustic-illumination XR scene reconstruction.
- a jointed XR experience may be reconstructed, and provided to a user as illustrated in FIG. 11B.
- the jointed XR experience may include one or spatial characteristics (e.g., one or more objects included in the environment), one or more illumination characteristics (e.g., one or more lightings 1102), and one or more acoustic characteristics (e.g., one or more sound elements 1104).
- One or more characteristics included in the joint XR experience may be customized.
- the one or more lightings 1102 and/or the one or more sound elements 1104 may be customized (or adjusted) by the user.
- spatial video viewers may be limited to the perspective from which the video was initially recorded, typically bound to the positions of the original camera(s). Accordingly, users may be confined to a fixed viewpoint, limiting their ability to explore the scene dynamically.
- Spatial video technology offers the potential for users to explore a scene from various angles and distances, providing a more immersive and flexible experience. In an embodiment, this dynamic exploration is facilitated by a calibrated stereo camera setup, which records the scene from multiple viewpoints.
- a method according to an embodiment of the present disclosure revolutionizes this approach by eliminating the need for such complex equipment. It allows for capturing spatial video with just a single camera, thus simplifying the recording process while still enabling users to interact with and view the scene from different perspectives. This advancement opens up new possibilities for more accessible and versatile spatial video applications.
- a method according to an embodiment of the present disclosure delivers real-time feedback and active guidance to users, enhancing their ability to capture high-quality XR experiences.
- By integrating high-dimensional capture within a single framework it enables users to record spatial richness, including pose, acoustic, and illumination elements.
- Accurate audio recreation is fundamental to creating a realistic and immersive XR experience, as it enhances spatial awareness, facilitates social interaction, fosters emotional connections, and provides a nuanced understanding of the environment.
- precise scene lighting modelling is vital for lifelike rendering, affecting the perception of depth, shape, and texture, thereby ensuring a believable experience. This approach allows users dynamic control over the scene, making applications like XR design and digital twins more practical and effective.
- FIGs. 12A & 12B depicts a flow chart with live user assistance flow.
- FIGs. 12A & 12B depicts a system 200 for providing user guidance for reconstructing a multidimensional XR scene using a server connected to a XR Device 230.
- the XR device 230 may comprise one or more modules illustrated in FIGs. 12A and 12B.
- the XR Device 230 is configured to receive sensor data from a plurality of sensors 201-205 of FIG. 2A(on a user initiating the capture of a scene), derive information on the scene's semantics and geometry using the received data, and activate a user marking out capture region module 236 of FIG. 2A (on the region being successfully verified).
- the user marking out capture region module 236 creates a coarse semantic scene mesh based on the derived information, while optionally allowing the user to mark a capture region.
- the XR Device 230 determines the scene type and capture accuracy thresholds using a pre-trained model, wherein the pre-trained model uses texture, spatial, acoustics, and/or illumination characteristics from a metric semantic localization & mapping module 210, a material type classifier module 212, and a shadow region classifier & reconstruction module 214.
- the XR Device 230 using the pre-trained model generates user guidance for a second round of scene capturing, which includes a path and actions for the user. During this second round, the user captures the multidimensional scene data following the provided guidance.
- FIGs. 13A & 13B depicts a system 400 detailed flow-chart with an XR scene reconstruction flow.
- the XR device 230 may comprise one or more modules illustrated in FIGs. 13A and 13B.
- the XR Device 230 is configured to create a coarse semantic scene mesh using the derived information, determine the scene type and capture thresholds for multidimensional data, determine if a capture region is marked on a ground plane, and highlight an optimal capture path.
- the capture threshold defines the threshold value for quality of reconstruction metrics of the scene. If the metrics or reconstruction quality scores from scene reconstruction algorithms improve above the threshold, the scene is deemed to be good at having reconstructed the environment with accuracy as expected by user through the threshold.
- the thresholds are set at the beginning, based on user history, environment, user inputs, and user preference.
- the reconstruction quality in each of these modalities or dimensions depends on these threshold values.
- the capture path guides the user along this path, provides real-time visual feedback for reconstructing the scene (wherein the scene encompasses spatial, acoustic, illumination, and material appearance characteristics), and offers user assistance for creating multi-dimensional XR scenes (which integrates integrating texture, spatial, audio, visual, and illumination experiences).
- the XR Device 230 is configured to evaluate data quality, provide guidance based on user profiles and scenes, and/or support both default and custom user personas tailored to the user and/or application.
- a scene segmentation module 222 creates a coarse scene mesh by estimating the semantics and geometry of a scene, while the user is marking a capture region.
- the pre-trained model determines the scene type and sets capture thresholds for multi-dimensional scene data, including texture, spatial, acoustic, and illumination characteristics.
- a ground plane confirmation module verifies if the capture region is correctly marked on the ground plane.
- a path highlighting module (e.g., a path planner module 218 of FIG. 12B) highlights an optimal capture path for the user.
- a user guidance module 220 of FIG. 12B instructs the user to follow this path and perform necessary actions.
- a data processing module processes the captured multi-dimensional scene data to reconstruct an XR scene, while a real-time feedback module provides visual feedback for scene reconstruction.
- a live user assistance module offers real-time support for creating the multi-dimensional XR scene.
- An evaluation and guidance module benchmarks reconstruction input data quality and provides live guidance based on user profiles and scenes.
- a user persona support module 211 of FIG. 12A accommodates default and custom user personas 213 of FIG. 12A based on different user needs.
- the XR Device 230 can locate the audio sources within a room using Direction of Arrival (DOA) algorithms combined with visual cues extracted from recorded video or images.
- DOA Direction of Arrival
- the XR Device 230 can further estimate one or more location related parameters (such as, but not limited to, dimensions and absorption values) from recorded video or images.
- the XR Device 230 can perform object segmentation for identifying material types and their properties, which can be used as initial conditions. Subsequently, the XR Device 230 can fine-tune these parameters through analysis of recorded audio response spectra.
- Modelling and recreating the listening experience using the proposed method includes estimating the Room Impulse Response (RIR), and user guidance for recording the RIR.
- the RIR is the transfer function between the sound source and the microphone.
- T60 and space dimensions can be used for estimating the RIR, wherein T60 is the time taken for the sound to decay by 60dB. T60 can be different at different locations in the location.
- the XR Device 230 can provide guidance to the user to record RIRs at probable locations in the space where the listener is located (for example, in front of the TV, chairs, etc.)
- the XR Device 230 confirms if the capture region is on a ground plane, and in step 1808, the XR Device 230 highlights an estimated path for the user to follow for effective scene capture. In step 1810, as the user navigates this path, the XR Device 230 provides guidance to ensure accurate scene data collection. In step 1810, the XR Device 230 processes the captured data to reconstruct the XR scene, which incorporates spatial, acoustic, illumination, and material appearance characteristics into a single framework, offering real-time visual feedback to achieve high-quality, rich XR scene capture with customizable and realistic lighting, acoustics, and virtual object manipulation in step 1812.
- the XR Device 230 provides user assistance for creating multi-dimensional XR scenes by integrating texture, spatial, audio, visual, and illumination experiences within a unified framework. It also serves as a benchmark to evaluate the quality of reconstruction input data, offering user guidance tailored to individual profiles and scenes while supporting both default and custom user personas to cater to diverse user needs.
- the various actions in method 1800 may be performed in the order presented, in a different order or simultaneously. Further, in some embodiments, some actions listed in FIG. 18 may be omitted.
- Embodiments herein disclose can enhance the collected data to improve the XR reconstruction of a scene by refining its geometry, textures, acoustics, and lighting.
- the embodiments disclosed herein can be implemented through at least one software program running on at least one hardware device and performing network management functions to control the network elements.
- the elements include blocks which can be at least one of a hardware device, or a combination of hardware device and software module.
- the embodiments disclosed herein describe a method for user guidance for reconstructing a multi-dimensional extended reality (XR) scene using an XR Device and providing real-time feedback and active guidance to the user when capturing data. Therefore, it is understood that the scope of the protection is extended to such a program and in addition to a computer readable means having a message therein, such computer readable storage means (e.g., computer-readable storage medium) contain (or store) program code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device.
- XR extended reality
- the method is implemented in at least one embodiment through or together with a software program written in e.g., Very high speed integrated circuit Hardware Description Language (VHDL) another programming language, or implemented by one or more VHDL or several software modules being executed on at least one hardware device.
- VHDL Very high speed integrated circuit Hardware Description Language
- the hardware device can be any kind of portable device that can be programmed.
- the device may also include means which could be e.g., hardware means like e.g., an ASIC, or a combination of hardware and software means, e.g., an ASIC and an FPGA, or at least one microprocessor and at least one memory with software modules located therein.
- the method embodiments described herein could be implemented partly in hardware and partly in software.
- the invention may be implemented on different hardware devices, e.g., using a plurality of CPUs.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Graphics (AREA)
- Human Computer Interaction (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Architecture (AREA)
- Computational Linguistics (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Des modes de réalisation de la présente invention divulguent des procédés et des systèmes de reconstruction d'une scène de réalité étendue (XR) multidimensionnelle. Un procédé, réalisé par un dispositif XR, peut obtenir des informations correspondant : à la sémantique d'une scène et/ou à la géométrie de la scène et/ou à l'acoustique de la scène à l'aide de données de capteur, déterminer un type de scène et au moins un seuil de capture pour capturer une scène multidimensionnelle à l'aide d'un modèle entraîné sur la base des informations obtenues, et offrir une rétroaction en temps réel et un guidage actif à un utilisateur. Le procédé génère un guidage d'utilisateur pour une capture de scène assistée à l'aide du type de scène et du ou des seuils de capture, pour capturer la scène multidimensionnelle à l'aide du guidage d'utilisateur généré.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| IN202441028031 | 2024-04-04 | ||
| IN202441028031 | 2024-12-30 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025211547A1 true WO2025211547A1 (fr) | 2025-10-09 |
Family
ID=97269159
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/KR2025/000786 Pending WO2025211547A1 (fr) | 2024-04-04 | 2025-01-14 | Procédé et système de reconstruction d'une scène de réalité étendue multidimensionnelle |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025211547A1 (fr) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160234432A1 (en) * | 2013-10-28 | 2016-08-11 | Olympus Corporation | Image processing apparatus and image processing method |
| KR101867051B1 (ko) * | 2011-12-16 | 2018-06-14 | 삼성전자주식회사 | 촬상장치, 촬상 구도 제공 방법 및 컴퓨터 판독가능 기록매체 |
| US20190349562A1 (en) * | 2017-02-14 | 2019-11-14 | Samsung Electronics Co., Ltd. | Method for providing interface for acquiring image of subject, and electronic device |
| CN111754569A (zh) * | 2020-06-28 | 2020-10-09 | 中国银行股份有限公司 | 室内人数报警的方法、装置及系统、电子设备及存储介质 |
| KR20240012449A (ko) * | 2021-07-13 | 2024-01-29 | 엘지전자 주식회사 | 증강현실과 혼합현실에 기반한 경로 안내 장치 및 경로안내 시스템 |
-
2025
- 2025-01-14 WO PCT/KR2025/000786 patent/WO2025211547A1/fr active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR101867051B1 (ko) * | 2011-12-16 | 2018-06-14 | 삼성전자주식회사 | 촬상장치, 촬상 구도 제공 방법 및 컴퓨터 판독가능 기록매체 |
| US20160234432A1 (en) * | 2013-10-28 | 2016-08-11 | Olympus Corporation | Image processing apparatus and image processing method |
| US20190349562A1 (en) * | 2017-02-14 | 2019-11-14 | Samsung Electronics Co., Ltd. | Method for providing interface for acquiring image of subject, and electronic device |
| CN111754569A (zh) * | 2020-06-28 | 2020-10-09 | 中国银行股份有限公司 | 室内人数报警的方法、装置及系统、电子设备及存储介质 |
| KR20240012449A (ko) * | 2021-07-13 | 2024-01-29 | 엘지전자 주식회사 | 증강현실과 혼합현실에 기반한 경로 안내 장치 및 경로안내 시스템 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Yang et al. | Audio augmented reality: A systematic review of technologies, applications, and future research directions | |
| TWI813098B (zh) | 用於新穎視圖合成之神經混合 | |
| CN102413414B (zh) | 用于扩展现实的高精度3维音频的系统和方法 | |
| JP6377082B2 (ja) | 鏡のメタファを使用した遠隔没入型体験の提供 | |
| TWI647593B (zh) | 模擬環境顯示系統及方法 | |
| KR20220125358A (ko) | 인공 현실에서 물리적 환경의 실시간 시각화를 디스플레이하기 위한 시스템, 방법 및 매체 | |
| Garg et al. | Geometry-aware multi-task learning for binaural audio generation from video | |
| Geronazzo et al. | Applying a single-notch metric to image-guided head-related transfer function selection for improved vertical localization | |
| US20230401789A1 (en) | Methods and systems for unified rendering of light and sound content for a simulated 3d environment | |
| Garg et al. | Visually-guided audio spatialization in video with geometry-aware multi-task learning | |
| Kim et al. | Immersive audio-visual scene reproduction using semantic scene reconstruction from 360 cameras | |
| CN118138789A (zh) | 一种数字人直播方法、装置、设备、介质及程序产品 | |
| Privitera et al. | On the effect of user tracking on perceived source positions in mobile audio augmented reality | |
| CN111881807A (zh) | 基于人脸建模及表情追踪的vr会议控制系统及方法 | |
| US12087090B2 (en) | Information processing system and information processing method | |
| WO2025211547A1 (fr) | Procédé et système de reconstruction d'une scène de réalité étendue multidimensionnelle | |
| CN117292094B (zh) | 一种岩洞内演艺剧场的数字化应用方法及系统 | |
| US20240119619A1 (en) | Deep aperture | |
| Siegl et al. | An augmented reality human–computer interface for object localization in a cognitive vision system | |
| Chang et al. | Applying deep learning and building information modeling to indoor positioning based on sound | |
| Thery et al. | Impact of the visual rendering system on subjective auralization assessment in VR | |
| US12462508B1 (en) | User representation based on an anchored recording | |
| Córdova-Esparza et al. | Telepresence system based on simulated holographic display | |
| Henson | We’re In This Together: Embodied Interaction, Affect, and Design Methods in Asymmetric, Co-Located, Co-Present Mixed Reality | |
| Menzer | Preliminary study on integrating 3d audio with 2d game engines for immersive storytelling |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 25782853 Country of ref document: EP Kind code of ref document: A1 |