WO2024191234A1 - Method and apparatus for processing an image - Google Patents
Method and apparatus for processing an image Download PDFInfo
- Publication number
- WO2024191234A1 WO2024191234A1 PCT/KR2024/095121 KR2024095121W WO2024191234A1 WO 2024191234 A1 WO2024191234 A1 WO 2024191234A1 KR 2024095121 W KR2024095121 W KR 2024095121W WO 2024191234 A1 WO2024191234 A1 WO 2024191234A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- depth
- incomplete
- masked
- original
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/60—Image enhancement or restoration using machine learning, e.g. neural networks
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/60—Intended control result
- G05D1/617—Safety or protection, e.g. defining protection zones around obstacles or avoiding hazards
- G05D1/622—Obstacle avoidance
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—Three-dimensional [3D] image rendering
- G06T15/10—Geometric effects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—Three-dimensional [3D] image rendering
- G06T15/10—Geometric effects
- G06T15/40—Hidden part removal
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three-dimensional [3D] modelling for computer graphics
- G06T17/20—Finite element generation, e.g. wire-frame surface description, tesselation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating three-dimensional [3D] models or images for computer graphics
- G06T19/20—Editing of three-dimensional [3D] images, e.g. changing shapes or colours, aligning objects or positioning parts
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/77—Retouching; Inpainting; Scratch removal
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/13—Edge detection
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D2101/00—Details of software or hardware architectures used for the control of position
- G05D2101/10—Details of software or hardware architectures used for the control of position using artificial intelligence [AI] techniques
- G05D2101/15—Details of software or hardware architectures used for the control of position using artificial intelligence [AI] techniques using machine learning, e.g. neural networks
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D2105/00—Specific applications of the controlled vehicles
- G05D2105/10—Specific applications of the controlled vehicles for cleaning, vacuuming or polishing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2210/00—Indexing scheme for image generation or computer graphics
- G06T2210/56—Particle system, point based geometry or rendering
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2219/00—Indexing scheme for manipulating 3D models or images for computer graphics
- G06T2219/004—Annotating, labelling
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2219/00—Indexing scheme for manipulating 3D models or images for computer graphics
- G06T2219/20—Indexing scheme for editing of 3D models
- G06T2219/2012—Colour editing, changing, or manipulating; Use of colour codes
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2219/00—Indexing scheme for manipulating 3D models or images for computer graphics
- G06T2219/20—Indexing scheme for editing of 3D models
- G06T2219/2016—Rotation, translation, scaling
Definitions
- the disclosure relates to a method for processing an image, and an apparatus for the same, and more particularly to a method for performing masking and inpainting for generalizable scene completion, and an apparatus for the same.
- 3D structures of scenes may be important for many applications, for example robot navigation, planning, manipulation, and interaction. Improvements in 3D perception capabilities have accompanied the increasing availability of depth sensors on smartphones and robots. However, a complete and coherent reconstruction is challenging when only partial observation of the scene is available.
- Scene completion is an important task which may allow for better robot action planning such as grasp planning, path planning, and long-horizon task planning. Scene completion may also be useful in contexts such as autonomous navigation and image generation for augmented reality (AR) and virtual reality (VR) devices.
- AR augmented reality
- VR virtual reality
- a single view of the environment may capture only limited information of the scene, which presents a major challenge for scene completion.
- Example embodiments address at least the above problems and/or disadvantages and other disadvantages not described above. Also, the example embodiments are not required to overcome the disadvantages described above, and may not overcome any of the problems described above.
- a method for processing image data for scene completion may include obtaining an original image from an original viewpoint corresponding to a first direction, wherein the original image includes an object and a background, wherein a first surface of the object is an image of the object corresponding to the first direction.
- the method may include receiving an original image from an original viewpoint corresponding to a first direction, wherein the original image includes an object and a background, wherein a first surface of the object is an image of the object corresponding to the first direction.
- the method may include obtaining a first image from a new viewpoint corresponding to a second direction different from the first direction by rotating the original image based on 3-dimensional (3D) information generated from 2-dimensional (2D) information which is obtained from the original image.
- the method may include determining an area within the first image for generating a second surface of the object based on depth information about a depth between the object and the background of the original image.
- the method may include obtaining a second image by inputting the first image and the determined area to an artificial intelligence (AI) inpainting model, wherein the AI inpainting model generates the second surface of the object which occupies a portion of the determined area in the second image.
- AI artificial intelligence
- an electronic device for processing image data for scene completion may include at least one memory configured to store instructions.
- the electronic device for processing image data for scene completion may include at least one processor configured to execute the instructions to receive an original image from an original viewpoint corresponding to a first direction, wherein the original image includes an object and a background, wherein a first surface of the object is an image of the object corresponding to the first direction.
- the electronic device for processing image data for scene completion may include at least one processor configured to execute the instructions to obtain an original image from an original viewpoint corresponding to a first direction, wherein the original image includes an object and a background, wherein a first surface of the object is an image of the object corresponding to the first direction.
- the electronic device for processing image data for scene completion may include at least one processor configured to execute the instructions to obtain a first image from a new viewpoint corresponding to a second direction different from the first direction by rotating the original image based on 3-dimensional (3D) information generated based on 2-dimensional information which is obtained from the original image.
- the electronic device for processing image data for scene completion may include at least one processor configured to execute the instructions to determine an area with the first image for generating a second surface of the object based on depth information about a depth between the object and the background of the original image.
- the electronic device for processing image data for scene completion may include at least one processor configured to execute the instructions to obtain a second image by inputting the first image and the determined area to an artificial intelligence (AI) inpainting model, wherein the AI inpainting model generates the second surface of the object which occupies a portion of the determined area in the second image.
- AI artificial intelligence
- FIG. 1 is a diagram showing a viewpoint module, according to an embodiment of the present disclosure
- FIG. 3B is a diagram illustrating a process for generating a merged point cloud, according to an embodiment of the present disclosure
- FIG. 5 is a diagram showing an example of generating a mask without using surface-aware masking, according to an embodiment of the present disclosure
- FIG. 6A to 6C illustrate results of performing scene completion based on a mask generated according to FIG. 5, according to an embodiment of the present disclosure
- FIGS. 7A-7D are diagrams showing an example of generating a mask using surface-aware masking, according to an embodiment of the present disclosure
- FIG. 8A to 8C illustrate results of performing scene completion based on a mask generated according to FIGS. 7A-7D, according to an embodiment of the present disclosure
- FIG. 9 is a flowchart illustrating a method of performing surface-aware masking for scene completion, according to an embodiment of the present disclosure
- FIGS. 10A to 10C show further examples of a surface-aware masking process, according to an embodiment of the present disclosure
- FIGS. 11A to 11C show further examples of a surface-aware masking process, according to an embodiment of the present disclosure
- FIGS. 12A and 12B are flowcharts illustrating a use applications of scene completion methods, according to an embodiment of the present disclosure
- FIGS. 13A and 13B are a flowchart illustrating a method of processing an image to perform scene completion, according to an embodiment of the present disclosure
- FIG. 14 is a diagram of electronic devices for performing scene completion according to an embodiment of the present disclosure.
- FIG. 15 is a diagram of components of one or more electronic devices of FIG. 12 according to an embodiment of the present disclosure.
- the expression, "at least one of a, b, and c,” should be understood as including only a, only b, only c, both a and b, both a and c, both b and c, all of a, b, and c, or any variations of the aforementioned examples.
- module is intended to be broadly construed as hardware, software, firmware, or any combination thereof.
- Embodiments may relate to methods, systems, and apparatuses for performing scene completion.
- Embodiments may provide a method, system, or apparatus which may obtain an input image of a scene, for example an RGB-D image, and may generate a completed 3D representation of the scene, for example a completed scene point cloud, which may include regions which are unobservable or occluded in the input image.
- a point cloud may be a multidimensional set of points which represent at least one of an object and a space.
- each point may represent geometric coordinates of a single point on a surface of an object, and may further represent information such as texture information and color information corresponding to the single point.
- the scene may include one or more objects and a background
- the completed scene point cloud may include both depth information and texture information about the scene and the one or more objects included and the background in the scene.
- embodiments are not limited thereto.
- embodiments may relate to any multidimensional representations of objects and spaces, for example mesh representations, voxel grid representations, implicit surface representations, distance field representations, and any other type of representation.
- the reconstruction of the completed scene point cloud may be performed in two general steps, for example a step of scene view completion, and a step of lifting the scene from a two-dimensional representation to a three-dimensional representation.
- an embodiment may apply the generalization capability of large language models to inpaint the missing areas of color images rendered from different viewpoints. Then, these inpainted images may be converted from two-dimensional (2D) images to three-dimensional (3D) representations, for example point clouds, by predicting per-pixel depth values using a combination of a trained network and depth information in the input image.
- this lifting process may be referred to as deprojection.
- an entire completed scene point cloud for a scene may be reconstructed based on a single image of the scene, for example a single RGB-D image. For example, based on the single image, the entire scene layout may be reconstructed in a globally-consistent fashion.
- Some related-art methods may be confined to task-specific models which often do not generalize appropriately to distributions beyond the training data, which may limit their applicability.
- an embodiment of the present disclosure may provide generalization to unseen scenes, objects, and categories by leveraging inpainted features.
- An embodiment may utilize the generalizable aspects of machine learning (ML) and artificial intelligence (AI) models, for example visual language models (VLMs) for completing novel views and depth maps.
- ML machine learning
- AI artificial intelligence
- VLMs visual language models
- the present disclosure is not limited in this regard, and an embodiment may utilize other types of ML and AI models.
- the integrated pipeline provided by an embodiment may be used for scene completion of unseen objects with occlusion and clutter.
- the generalization capabilities of large VLMs with respect to 2D images may be leveraged to lift the information contained in the 2D images into 3D space for practical robotics applications. Accordingly, an embodiment may provide consistent scene completion in new environments, and with unseen objects.
- FIG. 1 is an example of a diagram showing a viewpoint module for performing scene completion, according to an embodiment of the present disclosure.
- a viewpoint module 100 may include an image rotation module 102, a surface-aware masking (SAM) module 104, an inpainting model 106, one or more depth estimation models 108, for example normal estimation model 108A and boundary estimation model 108B, a depth completion module 110, and a deprojection module 112.
- SAM surface-aware masking
- the viewpoint module 100 may obtain (e.g. receive, capture, download) an original image, for example an RGB-D image of a scene, as input, and may output one or more estimated point clouds , where N is the number of predicted points in the scene, and H and W denote dimensions of the RGB-D image.
- the RGB-D image may include an input color image and an input depth image .
- a color image may be referred to as an RGB image or a texture image, and the like.
- the image rotation module 102, the SAM module 104, and the inpainting model 106 may be referred to as an inpainting pipeline, which may obtain the RGB-D image from an original viewpoint , and may output an incomplete depth image and an inpainted color image from a new viewpoint .
- the original viewpoint may correspond to a view of the scene from a first direction
- the new viewpoint may correspond to a view of the scene from a second direction which is different from the first direction.
- the one or more depth estimation models 108 and the depth completion module 110 may be referred to as a depth completion pipeline, which may obtain the incomplete depth image and the inpainted color image , and may output an estimated depth image .
- the deprojection module 112 may generate 3D information about the scene based on 2D information which is obtained from the RGB-D image .
- the 2D information may include at least one from among boundary information, texture information, color information, and depth information included in the RGB-D image .
- the 3D information may include a 3D representation of the scene, for example a point cloud as discussed above.
- the deprojection module 112 may obtain the inpainted color image and the estimated depth image , and may obtain an estimated point cloud corresponding to the viewpoint .
- a process of generating a 2D image from a 3D representation may be referred to as projecting the 2D image from the point cloud.
- a process of generating a 3D representation such as a point cloud from a 2D image may be referred to as deprojecting the point cloud from the 3D image.
- a depth image which is a 2D image that has a depth value at every pixel, and also given camera information used to capture the 2D image (for example focal length, etc.), it may be possible to deproject each pixel using the camera information and the depth information at that 2D pixel location.
- this may be similar to drawing a line or ray from the camera through the 2D pixel location, and placing a point along the line at a distance corresponding to the depth information for the pixel. If the depth image is available, then the deprojection may be performed without an algorithm or model. However, if no depth image is available, or only a partial depth image is available, an AI model such as the one or more depth estimation models 108 may be used to predict the depth image.
- FIG. 2 is a flowchart illustrating a method 200 of processing an image to perform scene completion, according to an embodiment of the present disclosure.
- one or more operations of the method 200 of FIG. 2 may be performed by or using the viewpoint module 100 and any of the elements included therein, and any other element described herein.
- the incomplete color image and an incomplete depth image may be referred to as "incomplete” because they may be missing information about one or more areas of the scene which are obscured or occluded by an object in the deprojected point cloud .
- incomplete when the point cloud is rotated, some points in the rotated point cloud may correspond to occluded areas of the scene which are obscured by a surface of an object which is present in the RGB-D image .
- the occluded areas of the scene may be regions which include at least one of a portion of a background of the original image (from the new viewpoint ), and a portion of a surface of an object (from the new viewpoint ).
- this portion of the surface of the object may be referred to as an "object area”. Therefore, when the rotated point cloud is used to generate a 2D image, this 2D may also be missing information, and therefore may be referred to as an incomplete image. Because the rotation of the point cloud may correspond to changing the viewpoint, the incomplete color image and the incomplete depth image may correspond to a new viewpoint .
- the incomplete color image and the incomplete depth image may be missing color information and depth information corresponding to areas of the scene which are occluded or otherwise not visible in the original RGB-D image .
- the incomplete color image and the incomplete depth image may be referred to as, or included in, an incomplete RGB-D image .
- a process for generating the incomplete RGB-D image from the new viewpoint based on information in the original image may be referred to as "rotating" the original image .
- the process of deprojecting the image into the point cloud , rotating the deprojected point cloud , and reprojecting to render the incomplete color image and the incomplete depth image described above with respect to operations S202 and S203 may be referred to as "rotating" the original image .
- the new viewpoint may be selected based on a context ratio , which may be determined based on Equation 1 below:
- Equation 1 above may denote a number of context pixels in an image, and may denote a number of all pixels in an image.
- the context ratio may provide an indication about how accurately an inpainting model such as the inpainting model 106 may be able to fill in missing areas in an image. For example, a low value of the context ratio may indicate that many areas are unknown, and that an inpainting model may struggle to fill in missing areas, and a high value of the context ratio may indicate that an inpainting model may more easily fill in missing areas, but may only fill in limited information.
- the image rotation module 102 may start from the original viewpoint , and may rotate the deprojected point cloud in various directions to various new viewpoints.
- an image may be projected based on the rotated point cloud, and a context ratio of the projected image may be calculated.
- the corresponding viewpoint may be selected as the new viewpoint .
- the predetermined criteria may be satisfied when the context ratio of a projected image being closest to context threshold from among context ratios a plurality of projected images corresponding to a plurality of new viewpoints. This process may be repeated to obtain a plurality of evenly spaced new viewpoints, but embodiments are not limited thereto.
- preprocessing steps may be applied to increase the quality of the inpainting results.
- the incomplete color image may be preprocessed to fill in relatively small holes which are produced as a result of the reprojecting described above.
- a naive inpainting filter that works with relatively small areas of missing values may be applied.
- the naive inpainting filter may be a general inpainting filter or inpainting model which is trained using a general image dataset that is not specific to the particular scene. Starting at boundaries of missing pixels, a weighted average of the nearest ground truth pixels may be determined. The naive inpainting filter may then work inward to fill larger holes.
- the naive inpainting filter may be used to fill relatively small holes of missing information in order to produce a denser image that gives more context for the inpainting model 106.
- the naive inpainting filter may produce unrealistic results for relatively large missing areas.
- the SAM module 104 may generate a mask which indicates the large missing areas.
- the missing areas may include areas in which no pixel information is available when the original image is rotated. In an embodiment, even if there is pixel information available when the original image is rotated (some of which may correspond to the background) the SAM module 104 may determine that an area predicted as the surface area of the object should be masked. An example of a method for generating the mask is provided below with respect to FIGS. 4 to 9C.
- the SAM module 104 may mask the incomplete color image to obtain a masked color image, and may mask the incomplete depth image to obtain a masked depth image.
- the viewpoint module 100 may provide the masked color image, or for example the mask and the incomplete color image , to the inpainting model 106 to obtain an inpainted color image .
- the inpainting model 106 may generate predicted image information corresponding to portions of the incomplete color image which are masked by the mask , and the inpainted color image may be generated by applying the predicted image information to the incomplete color image .
- the inpainted color image may be referred to as a predicted image.
- the inpainting model 106 may be or may include an AI or ML model, for example at least one of a diffusion model and a VLM such as DALL-E 2.
- the inpainting model 106 may obtain the masked color image and an input prompt P that describes the context of the original RGB-D image in words or text.
- the prompt may include "household objects on a table”.
- the prompt may include "room with carpet and furniture”.
- the prompt may include any additional known information about the scene, such as "a baseball and glove on a table” if these objects are known to be on the table, or "top-down view of household objects on a table” if the viewpoint is known to be from a top-down perspective.
- the additional known information may be at least one of information that was previously provided or confirmed by a user, information that is associated with the image such as information included in tags or metadata, and information obtained using image analysis or view analysis, for example using an image analysis algorithm or model.
- the prompt may include any other information.
- the original RGB-D image may be provided to an automatic captioning model, and the output of the output of the automatic captioning model may be used as the prompt .
- the output of the automatic captioning model may be a proposed prompt such as "household objects on a table". This output may be provided to the user, and the user may then revise or modify this proposed prompt to obtain a revised prompt.
- the revised prompt may be "household objects such as a dish, cloth, cutlery, and a pot on a table", or "household objects such as drinking glasses and dinner plates on a white marble dining table” (in which text in italics indicates modifications to the proposed prompt which are input by the user).
- the user may input an original prompt , and then based on the output of the inpainting model 106, may modify the original prompt to obtain a revised prompt, and may request a new inpainted image to be generated based on the revised prompt.
- the user may originally input "a baseball and glove on a table” as the original prompt .
- the user may input a revised prompt such as "a baseball and a leather baseball glove on a wooden table" (in which text in italics indicates revisions to the original prompt which are input by the user).
- a user may input any prompt as desired, for example to change the style of the original RGB-D image to another style.
- the appearance or visual style of the original RGB-D image may be modified using a neural style transfer (NST) model, for example by modifying style features of the original RGB-D image while maintaining content features of the original RGB-D image .
- NST neural style transfer
- the inpainting model 106 may output the inpainted color image , which may contain estimated areas corresponding to areas of the incomplete color image which are masked by the mask .
- prompt may refer to text used to initiate interaction with a generative model that generates images for electronic devices.
- a prompt may include one or more words, phrases, and/or sentences.
- the inpainting model 106 may be, may include, or may be similar to such a generative model.
- a prompt may contain natural language text that carries various information that the generative model can use to generate images, such as context, intent, task, constraints, and more.
- Electronic devices may process natural language text using natural language processing (NLP) models.
- NLP natural language processing
- prompts and revised prompts can be received from users.
- electronic devices may receive text input from users, or they can receive voice input and perform automatic speech recognition (ASR) to convert the user's voice input into text.
- ASR automatic speech recognition
- the present disclosure is not limited in this regard, and electronic devices may receive other types of input from users.
- prompts may be generated by electronic devices using various techniques, such as image captioning.
- electronic devices can receive image input from users and extract text descriptions from the images.
- prompts may be replaced with a similar expression that represents the same concept.
- prompts can be replaced with terms like “input,” “user input,” “input phrase,” “user command,” “directive,” “starting sentence,” “task query,” “trigger sentence,” “message,” and others, not limited to the examples mentioned.
- the inpainting model 106 may be used to generate multiple candidate inpainted color images based on the same masked color image. Then, these candidate inpainted color images may be compared against the input prompt P by encoding them to an embedded space, and the candidate inpainted color image having the highest similarity may be chosen as the inpainted color image .
- the inpainted color image may be provided to one or more depth estimation models 108.
- the one or more depth estimation models 108 may be ML or AI models.
- the inpainted color image may be provided to the normal estimation model 108A, which may be trained to estimate normals
- the inpainted color image may be provided to the boundary estimation model 108B, which may be trained to estimate occlusion boundaries.
- the one or more depth estimation models 108 may be trained or optimized for a specific category of scenes, for example a scene including objects on a tabletop, or a scene including a room to vacuumed by a robotic vacuum cleaner.
- an estimated normal(s) may be, for example, geometric normal(s).
- normal or “geometric normal” may refer to a vector associated with a point on a surface of a 3D object in computer graphics and 3D computer modeling, and may represent a direction in which a surface is facing at each point on the surface (e.g., the direction that is perpendicular to a tangent plane of the surface at that point).
- the depth completion module 110 may generate an estimated depth image based on the masked depth image and the output of the one or more depth estimation models. For example, depth information for areas with missing depths in the masked depth image may be computed by tracing along the estimated normal(s) from areas of known depth, and the estimated occlusion boundaries may act as barriers which the estimated normal(s) should not be traced across. As an example, in an embodiment a system of equations may be solved to minimize an error E , where E is defined according to Equation 2 below:
- Equation 2 above may denote the distance between the ground truth and estimated depth, may denote the influences of nearby pixels to have similar depths, and may denote the consistency of estimated depth and estimated normal values.
- , may denote constants or weight values corresponding to , , and , respectively.
- weight value corresponding to the estimated normal values based on the probability that a boundary is present.
- the value of may be obtained based on the estimated occlusion boundaries discussed above.
- the deprojection module 112 may generate an estimated point cloud corresponding to the viewpoint by deprojecting the inpainted color image and the estimated depth image .
- the estimated point cloud may be a completed scene point cloud, and may be used to perform other tasks such as robot action planning, autonomous navigation, and image generation for AR devices and VR devices.
- the method 200 may be performed multiple times based on multiple new viewpoints, and the resulting estimated point clouds may be merged to obtain the completed scene point cloud. An example of a merging process is described below with reference to FIGS. 3A-3B.
- FIG. 3A is a diagram showing a scene completion system
- FIG. 3B is a diagram illustrating a process for generating a merged point cloud, according to an embodiment of the present disclosure.
- a scene completion system 300 may include the viewpoint module 100 discussed above, and a merging module 302. The scene completion system 300 may obtain the RGB-D image as input, and may output a completed scene point cloud which is obtained based on multiple estimated point clouds.
- the method 200 discussed above may be performed on the original RGB-D image by rotating the point cloud by angle
- the method 200 may be performed again, this time rotating the point cloud by angle to obtain estimated point cloud corresponding to a viewpoint .
- the method 200 may then be performed two more times by rotating the point cloud by and to obtain estimated point cloud corresponding to a viewpoint , and estimated point cloud corresponding to a viewpoint . Accordingly, as shown in FIG. 3A, four novel views of the scene are obtained, complete with RGB and depth information.
- the merging module 302 may combine the estimated point clouds , , , and while enforcing consistency across them. For example, when inpainting real objects, completion of objects may be inconsistent, and hallucinated objects that are not in the original scene may be created by the inpainting model 106 and included in the inpainted color image .
- the merging module 302 may compare the original point cloud and at least one of the estimated point clouds , , , and , may determine points which intersect among multiple point clouds, and may add the intersecting points to the merged point cloud , while discarding points which are present in only one point cloud.
- the merged point cloud may only include points which are present in more than two point clouds, or points which are present in all of the point clouds.
- the merging module 302 may discard points which do not directly intersect, or may only discard points which are not within a certain threshold distance from points in other point clouds.
- the merged point cloud may be a completed scene point cloud, and may be used to perform other tasks such as robot action planning, autonomous navigation, and image generation for AR devices and VR devices.
- the method 200 may be performed multiple times based on multiple RGB-D images, and the resulting estimated point clouds may be merged to generate the completed scene point cloud.
- the point cloud may be determined by deprojecting multiple RGB-D images, and the other steps of the method 200 may be performed based on the point cloud .
- the method 200 may be performed based on the one or more additional or updated RGB-D images, and the resulting estimated point clouds may be merged with the previously-completed scene point cloud to obtain an updated point cloud.
- FIG. 4 is a diagram showing an example configuration of the SAM module 104, according to an embodiment of the present disclosure.
- the SAM module 104 may include a mask generation module 402, and an image masking module 404.
- the mask generation module 402 may generate the mask , which may indicate areas to be inpainted by the inpainting model 106.
- the inpainting model 106 may inadvertently use background pixels to perform when performing inpainting on an occluded surface of an object.
- an original RGB-D image may show a surface 502 of a foreground object, and background surfaces 504 and 506.
- some inappropriate background pixels 508 from the background surfaces 504 and 506, which would not actually be visible from the viewpoint may be inadvertently included in the incomplete color image , and may therefore be mistakenly used by the inpainting model 106 to perform inpainting corresponding to the object.
- an image of a surface 510 in the inpainted color image may be generated based on the inappropriate background pixels.
- FIG. 6A shows an example of an incomplete color image that shows background pixels which are inappropriately included in areas which would be covered by objects.
- FIG. 6B shows a mask generated based on the incomplete color image of FIG. 6A
- FIG. 6C shows an example inpainted color image in which the inappropriate background pixels were used for inpainting.
- the SAM module 104 may perform surface-aware masking.
- the mask generation module 402 may generate a 3D mesh, which may for example have a shape of a frustum, based on the input color image and an input depth image , and may use this 3D mesh to generate the mask .
- FIGS. 7A-7D show example operations which may be included in a surface-aware masking process, according to an embodiment of the present disclosure.
- a ray may be cast from the viewpoint through each point in the deprojected point cloud .
- the mask generation module 402 may perform this process for every ray to obtain an occlusion point cloud 702 which shows the potential space that could be possibly filled in the completed scene point cloud by objects corresponding to the surfaces.
- the mask generation module 402 may convert this occlusion point cloud to the mesh 700, and when the point cloud is rotated to the new viewpoint , the mesh 700 may be rotated as well, as shown in FIG. 7D. Then, when the SAM module 104 projects the incomplete color image and the incomplete depth image from the rotated point cloud, the SAM module 104 may discard points which are occluded by the mesh 700. Accordingly, the incomplete color image may be prevented from including inappropriate pixels, as shown by the dashed boxes in FIG. 7D. For example, as can be seen in FIG. 7D the incomplete color image does not include the inappropriate background pixels 506 shown in FIG. 5. After these pixels are discarded, the blank pixels in the incomplete color image and the incomplete depth image may be used as the mask . Based on the mask , an inpainted color image may be generated to include, for example, an image of a surface 704 in which the inappropriate background pixels are not included.
- FIG. 8A shows an example of an incomplete color image in which surface aware masking is performed according to the process described above with respect to FIGS. 7A to 7D.
- the mesh 700 may prevent inappropriate pixels from being included in the incomplete color image .
- FIG. 8B shows a mask generated based on the incomplete color image of FIG. 8A
- FIG. 8C shows an example inpainted color image in which the inappropriate background pixels are not included.
- FIG. 9 is a flowchart illustrating a method 900 of performing surface-aware masking, according to an embodiment of the present disclosure.
- one or more operations of the method 900 may correspond to the surface-aware masking process discussed above with respect to FIGS. 7A-7D.
- the mask generation module 402 may generate a plurality of points which extend beyond a surface included in the original RGB-D image .
- the mask generation module 402 may subsample pixels from a uniform grid in the input RGB-D image to obtain a set of points .
- the mask generation module 402 may initialize an empty point set , and for every point in , may deproject the point to a point in the point cloud , and generate additional points which are then added to the point set .
- the mask generation module 402 may add a predetermined number of additional points for each point , and the additional points may be equally spaced. In an embodiment, the number of additional points and the spacing therebetween may vary based on the scene.
- the mask generation module 402 may use fewer points which are more closely spaced than would be used for scene including a room to be vacuumed by a robot vacuum cleaner.
- embodiments are not limited thereto, and the number of additional points and the spacing therebetween may be determined in any manner.
- the point set may correspond to the points shown in FIG. 7B.
- the mask generation module 402 may generate a mesh based on the plurality of points.
- the mesh may be generated by performing surface triangulation on the points in the point set .
- this mesh may correspond to the mesh 700 discussed above.
- the mask generation module 402 may set the pixel to one ("1") if the estimated depth for the pixel in the incomplete depth image is equal to zero ("0") or is otherwise not present, or if the estimated depth for the pixel in the incomplete depth image is greater than the depth indicated for the pixel by the depth map representing the mesh.
- the pixels which are set to one ("1") may correspond to the masked areas and/or the points which are discarded when generating the masked color image and the masked depth image.
- the incomplete depth image includes an estimated depth for a particular pixel that is greater than the depth indicated for that same pixel by the depth map, this may indicate that the pixel corresponds to an area of the scene that was occluded or obscured in the original RGB-D image by a surface corresponding to the mesh. Accordingly, information corresponding to that pixel in the incomplete depth image and in the incomplete color image may be determined to be unreliable, and the pixel may therefore be masked and/or discarded when the masked color image and the masked depth image are generated.
- FIGS. 10A-10C and FIGS. 11A-11C show further examples of a surface-aware masking process, according to an embodiment of the present disclosure.
- an original RGB-D image may show a surface 1011 of a first foreground object and a surface 1012, and background surfaces 1013, 1014 and 1015.
- some inappropriate background pixels from the background surfaces 1013, 1014 and 1015 which would not actually be visible from the viewpoints and , may be inadvertently included in the incomplete color images and , and may therefore be mistakenly used by the inpainting model 106 to perform inpainting corresponding to the object.
- a ray may be cast from the viewpoint through each point in the deprojected point cloud .
- the ray Once the ray has passed through its respective point, for example by passing through one of the surfaces 1011, 1012, 1013, 1014, and 1015, it may be used to generate a list of points along the ray from that depth onward.
- the mask generation module 402 may perform this process for every ray to obtain an occlusion point cloud 1100 which shows the potential space that could be possibly filled in the completed scene point cloud by objects corresponding to the surfaces. As shown in FIGS.
- the mask generation module 402 may convert this occlusion point cloud to the mesh 1101, and when the point cloud is rotated to the new viewpoints and , the mesh 1101 may be rotated as well. Then, when the SAM module 104 projects the incomplete color image and the incomplete depth image from the rotated point cloud, the SAM module 104 may discard points which are occluded by the mesh 1101. Accordingly, the incomplete color image may be prevented from including inappropriate pixels.
- the mesh 700 and the mask corresponding to the new viewpoint may be generated based on depth information included in the original image , and then the incomplete color image and the incomplete depth image may be generated, for example by rotating and reprojecting the deprojected point cloud .
- an embodiment described above may be used by at least one of an AR device and a VR device to perform scene completion of an environment surrounding a user in order to generate appropriate AR and VR images in anticipation of movements by the user.
- an embodiment described above may be used to perform scene completion to reconstruct areas which are not immediately visible to the user, but which the user may wish to see later.
- the completed scene point cloud may then be used to construct a plurality of potential AR/VR images to be displayed to the user, which may help to reduce latency in images provided to the user. Accordingly, images displayed by the AR device or the VR device may seamlessly transition according to a user's head movements.
- FIG. 12A is a flowchart illustrating a method 1200A of performing scene completion in at least one of an AR device and a VR device, according to an embodiment of the present disclosure.
- one or more operations of the method 1200A may be performed by or using at least one of the viewpoint module 120, the scene completion system 300, and any of the elements included therein, and any other element described herein.
- the method 1200A may include obtaining an image corresponding to a current viewpoint of a user.
- the image may correspond to the original RGB-D depth image described above.
- the method 1200A may include performing scene completion to obtain a completed 3D representation of the environment of the user, for example a completed scene point cloud of a scene included in the environment.
- the scene completion may correspond to any of the scene completion methods described above.
- the method 1200A may include obtaining a plurality of potential AR/VR images corresponding to a plurality of potential viewpoints based on the completed point cloud.
- the estimated point cloud may correspond to at least one of the estimated point cloud and the merged point cloud described above.
- the plurality of potential AR/VR images may be AR images or VR images which are generated based on the at least one of the estimated point cloud and the merged point cloud .
- the plurality of potential AR/VR images may be or may include a potential AR image which presents information corresponding to objects in the environment of the user from the perspective of a viewpoint which the user has not yet viewed, or in an area which is hidden from the field of view of the user.
- the plurality of potential AR/VR images may be or may include a potential VR image which corresponds to a portion of the environment from the perspective of a viewpoint which the user had not yet viewed, or in an area which is hidden from the field of view of the user.
- the potential VR image may include a VR object, obstacle, or boundary which corresponds to a real object in the environment a portion of the environment from the perspective of a viewpoint which the user had not yet viewed.
- the method 1200A may include, based on the user moving from a position corresponding to the current viewpoint to a position corresponding to a potential viewpoint, displaying a transition between a current AR/VR image and a potential AR/VR image to the user.
- the current AR/VR image may be an AR or VR image corresponding to the current viewpoint of the user, and the potential AR/VR image may be selected from among the plurality of potential AR/VR images obtained in operation S1213. Accordingly, a seamless transition from the current AR/VR image may be provided by the plurality of AR/VR images.
- an embodiment described above may be used to manipulate or generate images in a device such as at least one of an AR device, a VR device, a mobile device, a camera, and a computer such as a personal computer, a laptop computer, and a tablet computer.
- a device such as at least one of an AR device, a VR device, a mobile device, a camera, and a computer such as a personal computer, a laptop computer, and a tablet computer.
- an embodiment described above may be used to generate a completed 3D representation of a scene based on a 2D image captured by a camera or an application or other computer program, for example a camera application. Based on the completed 3D representation, a user may generate one or more 2D images from different viewpoints or directions.
- the original image used to generate the completed 3D representation may correspond to only a portion of the 2D image.
- one or more objects may be extracted from the 2D image, and an embodiment described above may be used to generate 3D representations of the one or more objects, and new 2D images of the one or more objects may be generated based on input received from a user.
- the input from the user may be used to select new directions or viewpoints used to generate the 3D representation and the new 2D images.
- the user input may correspond to a manipulation of the 3D representation, and the new 2D images may be generated based on the manipulation being stopped.
- the user may provide an input such as a dragging gesture which may be used to rotate the 3D representation, and based on the dragging gesture being stopped, one or more new 2D images may be generated based on the rotated 3D representation.
- one or more new directions or viewpoints may be predicted in advance, and corresponding new 2D images may be created in advance, and each time the user provides an input such as a dragging gesture, a corresponding 2D image may be displayed to the user.
- an embodiment described above may be used to perform scene completion in order to assist with tasks performed by a robot.
- an embodiment described above may be used to plan actions such as grasping for a robotic arm, or to plan movements by a robotic vacuum cleaner.
- FIG. 12B is a flowchart illustrating a method 1200B of performing scene completion in at least one of an AR device and a VR device, according to an embodiment of the present disclosure.
- one or more operations of the method 1200B may be performed by or using at least one of the viewpoint module 120, the scene completion system 300, and any of the elements included therein, and any other element described herein.
- the method 1200B may include obtaining an image of an environment of the robot.
- this current image may correspond to the original RGB-D depth image described above.
- the robot may include a robotic vacuum cleaner, and the environment may include a room which is to be vacuumed by the robotic vacuum cleaner.
- the drone device such as a flying drone, and the environment may include a scene including an object which is to be observed or picked up by the drone, or an area in which the drone is to place an object.
- the robot may include a robotic arm, and the environment may include a tabletop scene which includes an object to be grasped by the robotic arm.
- the present disclosure is not limited in this regard.
- the method 1200B may include performing scene completion to obtain a completed 3D representation of the environment of the robot, for example a completed scene point cloud of a scene included in the environment.
- the scene completion may correspond to any of the scene completion methods described above.
- the completed 3D representation may include predicted areas which are hidden from view in original RGB-D depth image .
- the original RGB-D depth image may be captured from the perspective of a robotic vacuum cleaner with a limited vertical field of view, and these predicted areas may be an upper portion of the scene which is not visible to the robotic vacuum cleaner.
- the original RGB-D depth image may be captured from the perspective of a drone device with a limited vertical field of view, and these predicted areas may be a lower portion of the scene which is not visible to the drone device.
- the original RGB-D depth image may be captured from the perspective of a robotic arm with a limited horizontal field of view, and these predicted areas may be a left and/or right portion of the scene which is not visible to the robotic arm.
- these are provided only as examples, and embodiments are not limited thereto.
- the method 1200B may include planning a movement of the robot.
- planning the movement may include planning a route to be taken by the robotic vacuum cleaner in order to vacuum the room.
- planning the movement may include planning a movement to position the robotic arm to grasp the object.
- the robot may determine a new viewpoint or a portion of the new viewpoint based on a desired rotation direction for the robot, and an embodiment described above may be used to generate the a 2D image of the new viewpoint.
- the robot may determine a portion of a viewpoint that it expects to see based on anticipating another aspect of the recognized object based on the desired rotation direction, and an embodiment described above may be used to generate an image of that portion.
- planning the movement may include planning a movement based on the completed 3D presentation.
- FIG. 13A is a flowchart illustrating a method 1300A of performing scene completion, according to an embodiment of the present disclosure.
- one or more operations of the method 1300A may be performed by or using at least one of the viewpoint module 100, the scene completion system 300, and any of the elements included therein, and any other element described herein.
- the method 1300A may include obtaining an original image from an original viewpoint corresponding to a first direction, wherein the scene includes an object and a background, wherein a first surface of the object is an image of the object corresponding to the first direction.
- the original image may correspond to the RGB-D image discussed above.
- the first surface may correspond to the surface 502 in Fig. 5 and the surface 1011 in Fig. 10A.
- the method 1300A may include obtaining a first image from a new viewpoint corresponding to a second direction different from the first direction by rotating the original image based on 3D information generated from 2D information which is obtained from the original image.
- the 3D information may correspond to the deprojected point cloud discussed above.
- the first image may correspond to the and the incomplete depth image and the new viewpoint may correspond to the new viewpoint discussed above.
- the method 1300A may include determining an area within the first image for generating a second surface of the object based on depth information about a depth between the object and the background of the original image.
- the area may correspond to the inappropriate background pixels 508 in Fig. 5 as discussed above.
- the area within the first image for generating the second surface of the object may correspond to at least one of the mesh 700 and the mesh 1100 discussed above and may correspond to the masking area which done by the SAM module 104 that performs surface-aware masking.
- the area within the first image may correspond to at least a portion of the background of the original image.
- the area may be inadvertently considered to be included in a surface of the object, and the AI inpainting model may therefore inadvertently use inappropriate background pixels when performing inpainting the area within the first image as discussed above with respect to FIGS. 6A to 6C.
- this area may be one or more areas as discussed above with respect to FIGS. 10A-10C and FIGS. 11A-11C.
- the area may be masked, and the masked area may be provided to the AI inpainting model. Accordingly, the masked area within the first image may be considered to be the surface of object and may not be considered to be a background area. Therefore, the masked area may be inpainted as a surface of the object.
- the method 1300A may include generating a second image by inputting the first image and the determined area to an AI inpainting model, wherein the AI inpainting model generates the second surface of the object which occupies a portion of the determined area in the second image.
- the second image may correspond to the inpainted image discussed above.
- the second surface of object may correspond to the surface 704 in Fig. 7D.
- the area in the second image may correspond to the second surface of the object.
- the second image and the second surface of the object may correspond to a second direction different from the first direction.
- FIG. 13B is a flowchart illustrating a method 1300B of performing scene completion, according to an embodiment of the present disclosure.
- one or more operations of the method 1300B may be performed by or using at least one of the viewpoint module 100, the scene completion system 300, and any of the elements included therein, and any other element described herein.
- the method 1300B may include obtaining an original image from an original viewpoint corresponding to a first direction, wherein the scene includes an object and a background, wherein a first surface of the object is an image of the object corresponding to the first direction.
- the original image may correspond to the RGB-D image discussed above.
- the first surface may correspond to the surface 502 in Fig. 5 and the surface 1011 in Fig. 10A as discussed above.
- the method 1300B may include determining an area for generating a second surface of the object based on depth information about a depth between the object and the background of the original image.
- the area may correspond to the inappropriate background pixels 508 in Fig. 5 as discussed above.
- the area within the first image for generating the second surface of the object may correspond to at least one of the mesh 700 and the mesh 1100 and may correspond to the masking area which done by the SAM module 104 that performs surface-aware masking.
- the area may correspond to at least a portion of the background of the original image.
- the area may be inadvertently considered to be included in a surface of the object, and the AI inpainting model may therefore inadvertently use inappropriate background pixels when performing inpainting on the area as discussed above with respect to FIGS. 6A to 6C.
- this area may be one or more areas as discussed above with respect to FIGS. 10A-10C and FIGS. 11A-11C.
- the area may be masked, and the masked area may be provided to the AI inpainting model. Accordingly, the masked area may be considered to be the surface of object and may not be considered to be a background area. Therefore, the masked area may be inpainted as a surface of the object.
- the method 1300B may include obtaining a first image from a new viewpoint corresponding to a second direction different from the first direction by rotating the original image based on 3D information generated from 2D information which is obtained from the original image.
- the 3D information may correspond to the deprojected point cloud discussed above.
- the first image may correspond to the and the incomplete depth image and the new viewpoint may correspond to the new viewpoint discussed above.
- the method 1300B may include generating a second image by inputting the first image and the determined area to an AI inpainting model, wherein the AI inpainting model generates the second surface of the object which occupies a portion of the determined area in the second image.
- the second image may correspond to the inpainted image discussed above.
- the second surface of object may correspond to the surface 704 in Fig. 7D.
- the area in the second image may correspond to the second surface of the object.
- the second image and the second surface of the object may correspond to a second direction different from the first direction.
- FIG. 14 is a diagram of devices for performing a scene completion task according to an embodiment.
- FIG. 14 includes a user device 1410, a server 1420, and a communication network 1430.
- the user device 1410 and the server 1420 may interconnect via wired connections, wireless connections, or a combination of wired and wireless connections.
- the user device 1410 may include one or more devices (e.g., a processor 1411 and a data storage 1412) configured to retrieve an image corresponding to a search query.
- the user device 1410 may include a computing device (e.g., a desktop computer, a laptop computer, a tablet computer, a handheld computer, a smart speaker, a server, etc.), a mobile phone (e.g., a smart phone, a radiotelephone, etc.), a camera device, a wearable device (e.g., a pair of smart glasses, a smart watch, etc.), a home appliance (e.g., a robot vacuum cleaner, a smart refrigerator, etc. ), or a similar device.
- a computing device e.g., a desktop computer, a laptop computer, a tablet computer, a handheld computer, a smart speaker, a server, etc.
- a mobile phone e.g., a smart phone, a radiotelephone, etc.
- a camera device
- the data storage 1412 of the user device 1410 may include one or more of the viewpoint module 100 and the scene completion system 300, or any of the elements included therein.
- the user device 1410 may store one or more of the viewpoint module 100 and the scene completion system 300, or any of the elements included therein, or vice versa.
- the server 1420 may include one or more devices (e.g., a processor 1421 and a data storage 1422) configured to implement one or more of the viewpoint module 100 and the scene completion system 300, or any of the elements included therein.
- the data storage 1422 of the server 1420 may include one or more of the viewpoint module 100 and the scene completion system 300, or any of the elements included therein.
- the user device 1410 may store the one or more of viewpoint module 100 and the scene completion system 300, or any of the elements included therein.
- the communication network 1430 may include one or more wired and/or wireless networks.
- network 1430 may include a cellular network, a public land mobile network (PLMN), a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a telephone network (e.g., the Public Switched Telephone Network (PSTN)), a private network, an ad hoc network, an intranet, the Internet, a fiber optic-based network, or the like, and/or a combination of these or other types of networks.
- PLMN public land mobile network
- LAN local area network
- WAN wide area network
- MAN metropolitan area network
- PSTN Public Switched Telephone Network
- the number and arrangement of devices and networks shown in FIG. 14 are provided as an example. In practice, there may be additional devices and/or networks, fewer devices and/or networks, different devices and/or networks, or differently arranged devices and/or networks than those shown in FIG. 14. Furthermore, two or more devices shown in FIG. 14 may be implemented within a single device, or a single device shown in FIG. 14 may be implemented as multiple, distributed devices. Additionally, or alternatively, a set of devices (e.g., one or more devices) may perform one or more functions described as being performed by another set of devices.
- FIG. 15 is a diagram of components of one or more electronic devices of FIG. 14 according to an embodiment.
- An electronic device 1500 in FIG. 15 may correspond to the user device 1410 and/or the server 1420.
- FIG. 15 is for illustration only, and other embodiments of the electronic device 1500 could be used without departing from the scope of this disclosure.
- the electronic device 1500 may correspond to a client device or a server.
- the electronic device 1500 includes a bus 1510, a processor 1520, a memory 1530, an interface 1540, and a display 1550.
- the bus 1510 includes a circuit for connecting the components 1520 to 1550 with one another.
- the bus 1510 functions as a communication system for transferring data between the components 1520 to 1550 or between electronic devices.
- the bus 1510 may be a communication bus, a cross-over bar, a network, or the like.
- the bus 1510 is depicted as a single line in FIG. 15, the bus 1510 may be implemented using multiple (e.g., two or more) connections between the set of components of the electronic device 1500. The present disclosure is not limited in this regard.
- the processor 1520 includes one or more of a central processing unit (CPU), a graphics processor unit (GPU), an accelerated processing unit (APU), a many integrated core (MIC), a field-programmable gate array (FPGA), or a digital signal processor (DSP).
- the processor 1520 is able to perform control of any one or any combination of the other components of the electronic device 1500, and/or perform an operation or data processing relating to communication. For example, the processor 1520 may perform the methods discussed above.
- the processor 1520 executes one or more programs stored in the memory 1530.
- the memory 1530 may include a volatile and/or non-volatile memory.
- the memory 1530 may include volatile memory such as, but not limited to, random access memory (RAM), static RAM (SRAM), dynamic RAM (DRAM), and the like.
- the memory 1530 may include non-volatile memory such as, but not limited to, read only memory (ROM), electrically erasable programmable ROM (EEPROM), NAND flash memory, phase-change RAM (PRAM), magnetic RAM (MRAM), resistive RAM (RRAM), ferroelectric RAM (FRAM), magnetic memory, optical memory, and the like.
- ROM read only memory
- EEPROM electrically erasable programmable ROM
- NAND flash memory phase-change RAM
- PRAM phase-change RAM
- MRAM magnetic RAM
- RRAM resistive RAM
- FRAM ferroelectric RAM
- magnetic memory optical memory, and the like.
- the present disclosure is not limited in this regard, and the memory 1530 may include other types of dynamic and/or static memory storage.
- the memory 1530 may store information and/or instructions for use (e.g., execution) by the processor 1520.
- the memory 1530 stores information, such as one or more of commands, data, programs (one or more instructions), application(s) 1534, etc., which are related to at least one other component of the electronic device 1500 and for driving and controlling the electronic device 1500.
- commands and/or data may formulate an operating system (OS) 1532.
- OS operating system
- Information stored in the memory 1530 may be executed by the processor 1520.
- the application(s) 1534 may include the above-discussed embodiments. These functions can be performed by a single application or by multiple applications that each carry out one or more of these functions.
- the application(s) 1534 may include an artificial intelligence (AI) model for performing the methods discussed above.
- AI artificial intelligence
- the display 1550 may include, for example, a liquid crystal display (LCD), a light emitting diode (LED) display, an organic light emitting diode (OLED) display, a quantum-dot light emitting diode (QLED) display, a microelectromechanical systems (MEMS) display, or an electronic paper display.
- the display 1550 can also be a depth-aware display, such as a multi-focal display.
- the display 1550 is able to present, for example, various contents, such as text, images, videos, icons, and symbols.
- the communication interface 1544 may enable communication between the electronic device 1500 and other external devices, via a wired connection, a wireless connection, or a combination of wired and wireless connections.
- the communication interface 1544 may permit the electronic device 1500 to obtain information from another device and/or provide information to another device.
- the communication interface 1544 may include an Ethernet interface, an optical interface, a coaxial interface, an infrared interface, a radio frequency (RF) interface, a universal serial bus (USB) interface, a Wi-Fi interface, a cellular network interface, or the like.
- the communication interface 1544 may obtain videos and/or video frames from an external device, such as a server.
- the sensor(s) 1546 of the interface 1540 can meter a physical quantity or detect an activation state of the electronic device 1500 and convert metered or detected information into an electrical signal.
- the sensor(s) 1546 can include one or more cameras or other imaging sensors for capturing images of scenes.
- the sensor(s) 1546 can also include any one or any combination of a microphone, a keyboard, a mouse, and one or more buttons for touch input.
- the sensor(s) 1546 can further include an inertial measurement unit.
- the sensor(s) 1546 can include a control circuit for controlling at least one of the sensors included herein. Any of these sensor(s) 1546 can be located within or coupled to the electronic device 1500.
- the sensor(s) 1546 may obtain a text and/or a voice signal that contains one or more queries.
- the scene completion processes and methods described above may be written as computer-executable programs or instructions that may be stored in a medium.
- the medium may continuously store the computer-executable programs or instructions, or temporarily store the computer-executable programs or instructions for execution or downloading.
- the medium may be any one of various recording media or storage media in which a single piece or plurality of pieces of hardware are combined, and the medium is not limited to a medium directly connected to electronic device 1500, but may be distributed on a network.
- Examples of the medium include magnetic media, such as a hard disk, a floppy disk, and a magnetic tape, optical recording media, such as CD-ROM and DVD, magneto-optical media such as a floptical disk, and ROM, RAM, and a flash memory, which are configured to store program instructions.
- Other examples of the medium include recording media and storage media managed by application stores distributing applications or by websites, servers, and the like supplying or distributing other various types of software.
- a computer program product may include a product (for example, a downloadable application) in a form of a software program electronically distributed through a manufacturer or an electronic market. For electronic distribution, at least a part of the software program may be stored in a storage medium or may be temporarily generated.
- a model related to the neural networks described above may be implemented via a software module.
- the model When the model is implemented via a software module (for example, a program module including instructions), the model may be stored in a computer-readable recording medium.
- the model may be a part of the electronic device 1400 described above by being integrated in a form of a hardware chip.
- the model may be manufactured in a form of a dedicated hardware chip for artificial intelligence, or may be manufactured as a part of an existing general-purpose processor (for example, a CPU or application processor) or a graphic-dedicated processor (for example a GPU).
- the model may be provided in a form of downloadable software.
- a computer program product may include a product (for example, a downloadable application) in a form of a software program electronically distributed through a manufacturer or an electronic market.
- a product for example, a downloadable application
- the software program may be stored in a storage medium or may be temporarily generated.
- the storage medium may be a server of the manufacturer or electronic market, or a storage medium of a relay server.
- a method may include rendering an incomplete color image and an incomplete depth image corresponding to the new viewpoint based on the 3D information.
- the method may include masking a portion of the incomplete color image based on the 3D information and the incomplete depth image to obtain a masked color image, wherein the masked portion of the incomplete color image corresponds to the determined area and indicates that the masked portion of the incomplete color image is obscured by the object when the scene is viewed from the new viewpoint.
- the method may include inpainting the masked color image to obtain the second image.
- the method may include the obtaining the second image which includes inpainting the masked color image based on the AI inpainting model to obtain the second image.
- the method may include obtaining an image caption by providing the second image to an AI caption model.
- the method may include determining whether to re-inpaint the second image by comparing an embedding of the image caption and an embedding of the prompt.
- the method may include masking a portion of the incomplete depth image based on the 3D information and the incomplete depth image to obtain a masked depth image, wherein the masked portion of the incomplete depth image corresponds to the determined area.
- the method may include providing the second image to an AI depth estimation model.
- the method may include generating an estimated depth image based on the masked depth image and an output of the AI depth estimation model.
- the method may include generating a completed 3D representation based on the second image and the estimated depth image.
- the method may include the generating the estimated depth image which includes obtaining at least one estimated normal and at least one estimated occlusion boundary by providing the second image to the AI depth estimation model.
- the method may include the generating the estimated depth image which includes obtaining the estimated depth image based on the incomplete depth image, the at least one estimated normal, and the at least one estimated occlusion boundary.
- the method may include rendering a plurality of incomplete color images and a plurality of incomplete depth images from a plurality of new viewpoints based on the 3D information.
- the method may include masking the plurality of incomplete color images to obtain a plurality of masked color images, and masking the plurality of incomplete depth images to obtain a plurality of masked depth images.
- the method may include obtaining a plurality of second images by providing the plurality of masked color images to the AI inpainting model.
- the method may include providing the plurality of second images to the AI depth estimation model.
- the method may include obtaining a plurality of estimated depth images based on the plurality of masked depth images and a plurality of outputs of the AI depth estimation model, wherein the completed 3D representation is generated based on the plurality of second images and the plurality of estimated depth images.
- the generating of the completed 3D representation may include generating a plurality of estimated point clouds based on the second image, the estimated depth image, the plurality of second images, and the plurality of estimated depth images.
- the method may include merging the plurality of estimated point clouds by discarding points which are not included in at least two estimated point clouds from among the plurality of estimated point clouds to obtain a completed scene point cloud representing the scene.
- the masking may include generating a plurality of points which extend beyond a surface included in the original image.
- the masking may include generating a mesh based on the plurality of points.
- the masking may include rendering a depth map representing the mesh from the new viewpoint.
- the masking may include generating a mask based on a comparison between the incomplete depth image and the depth map.
- the masking may include applying the mask to the incomplete color image.
- the method may include the mask which indicates a plurality of pixels which are not used for generating the second image.
- the method may include the plurality of pixels which includes a first plurality of pixels for which a depth is not indicated by the incomplete depth image, and a second plurality of pixels for which a depth indicated by the incomplete depth image is greater than a depth indicated by the depth map.
- the method may include the original image which is captured by at least one of an augmented reality (AR) device and a virtual reality (VR) device.
- the method may include the original viewpoint which comprises a current viewpoint of a user, and the original image which corresponds to a current AR/VR image displayed to the user.
- the method may include obtaining a completed 3D representation of the scene based on the second image.
- the method may include obtaining a potential AR/VR image based on the completed 3D representation, wherein the potential AR/VR image corresponds to a potential viewpoint of the user.
- the method may include based on the user moving from a position corresponding to the current viewpoint to a position corresponding to the potential viewpoint, displaying a transition between the current AR/VR image and the potential AR/VR image to the user.
- the method may include the original image which is captured by a robot.
- the method may include planning a movement path for the robot based on the second image.
- an electronic device may include at least one processor configured to execute the instructions to render an incomplete color image and an incomplete depth image corresponding to the new viewpoint based on the 3D information.
- the electronic device may include at least one processor configured to execute the instructions to mask a portion of the incomplete color image based on the 3D information and the incomplete depth image to obtain a masked color image, wherein the masked portion of the incomplete color image corresponds to the determined area and indicates that the masked portion of the incomplete color image is obscured by the object when the scene is viewed from the new viewpoint.
- the electronic device may include at least one processor configured to execute the instructions to inpaint the masked color image to obtain the second image.
- the electronic device to inpaint the masked color image, may include at least one processor configured to execute the instructions to inpaint the masked color image based on the AI inpainting model to obtain the second image.
- the electronic device may include at least one processor configured to execute the instructions to obtain an image caption by providing the second image to an AI caption model.
- the electronic device may include at least one processor configured to execute the instructions to determine whether to re-inpaint the second image by comparing an embedding of the image caption and an embedding of the prompt.
- the electronic device may include at least one processor configured to execute the instructions to mask a portion of the incomplete depth image based on the 3D information and the incomplete depth image to obtain a masked depth image, wherein the masked portion of the incomplete depth image corresponds to the determined area.
- the electronic device may include at least one processor configured to execute the instructions to provide the second image to an AI depth estimation model.
- the electronic device may include at least one processor configured to execute the instructions to generate an estimated depth image based on the masked depth image and an output of the AI depth estimation model.
- the electronic device may include at least one processor configured to execute the instructions to generate a completed 3D representation based on the second image and the estimated depth image.
- the electronic device may include at least one processor configured to execute the instructions to obtain at least one estimated normal and at least one estimated occlusion boundary by providing the second image to the AI depth estimation model.
- the electronic device, to generate the estimated depth image may include at least one processor configured to execute the instructions to obtain the estimated depth image based on the incomplete depth image, the at least one estimated normal, and the at least one estimated occlusion boundary.
- the electronic device may include at least one processor configured to execute the instructions to render a plurality of incomplete color images and a plurality of incomplete depth images from a plurality of new viewpoints based on the 3D information.
- the electronic device may include at least one processor configured to execute the instructions to mask the plurality of incomplete color images to obtain a plurality of masked color images, and masking the plurality of incomplete depth images to obtain a plurality of masked depth images.
- the electronic device may include at least one processor configured to execute the instructions to obtain a plurality of second images by providing the plurality of masked color images to the AI inpainting model.
- the electronic device may include at least one processor configured to execute the instructions to provide the plurality of second images to the AI depth estimation model.
- the electronic device may include at least one processor configured to execute the instructions to obtain a plurality of estimated depth images based on the plurality of masked depth images and a plurality of outputs of the AI depth estimation model.
- the electronic device may include the completed 3D representation which is generated based on the plurality of second images and the plurality of estimated depth images.
- the electronic device may include at least one processor configured to execute the instructions to generate a plurality of estimated point clouds based on the second image, the estimated depth image, the plurality of second images, and the plurality of estimated depth images.
- the electronic device, to generate the completed 3D representation may include at least one processor configured to execute the instructions to merge the plurality of estimated point clouds by discarding points which are not included in at least two estimated point clouds from among the plurality of estimated point clouds.
- the electronic device, to mask the incomplete color image may include at least one processor configured to execute the instructions to generate a plurality of points which extend beyond a surface included in the original image.
- the electronic device, to mask the incomplete color image may include at least one processor configured to execute the instructions to generate a mesh based on the plurality of points.
- the electronic device, to mask the incomplete color image may include at least one processor configured to execute the instructions to render a depth map representing the mesh from the new viewpoint.
- the electronic device, to mask the incomplete color image may include at least one processor configured to execute the instructions to generate a mask based on a comparison between the incomplete depth image and the depth map.
- the electronic device, to mask the incomplete color image may include at least one processor configured to execute the instructions to apply the mask to the incomplete color image.
- the electronic device may include the mask which indicates a plurality of pixels which are not used for generating the second image.
- the electronic device may include the plurality of pixels which includes a first plurality of pixels for which a depth is not indicated by the incomplete depth image, and a second plurality of pixels for which a depth indicated by the incomplete depth image is greater than a depth indicated by the depth map.
- the electronic device may include the original image which is captured by at least one of an augmented reality (AR) device and a virtual reality (VR) device.
- the electronic device may include the original viewpoint which comprises a current viewpoint of a user, and the original image which corresponds to a current AR/VR image displayed to the user.
- the electronic device may include at least one processor which is configured to execute the instructions to obtain a completed 3D representation of the scene based on the second image.
- the electronic device may include at least one processor which is configured to execute the instructions to obtain a potential AR/VR image based on the completed 3D representation, wherein the potential AR/VR image corresponds to a potential viewpoint of the user.
- the electronic device may include at least one processor which is configured to execute the instructions, based on the user moving from a position corresponding to the current viewpoint to a position corresponding to the potential viewpoint, to display a transition between the current AR/VR image and the potential AR/VR image to the user.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Software Systems (AREA)
- Architecture (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Remote Sensing (AREA)
- Automation & Control Theory (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Radar, Positioning & Navigation (AREA)
- Aviation & Aerospace Engineering (AREA)
- Image Generation (AREA)
- Processing Or Creating Images (AREA)
Abstract
Description
Claims (15)
- A method for processing image data for scene completion comprising:obtaining an original image from an original viewpoint corresponding to a first direction, wherein the original image includes an object and a background, wherein a first surface of the object is an image of the object corresponding to the first direction;obtaining a first image from a new viewpoint corresponding to a second direction different from the first direction by rotating the original image based on 3-dimensional (3D) information generated from 2-dimensional (2D) information which is obtained from the original image;determining an area within the first image for generating a second surface of the object based on depth information about a depth between the object and the background of the original image; andobtaining a second image by inputting the first image and the determined area to an artificial intelligence (AI) inpainting model, wherein the AI inpainting model generates the second surface of the object which occupies a portion of the determined area in the second image.
- The method of claim 1, further comprising:rendering an incomplete color image and an incomplete depth image corresponding to the new viewpoint based on the 3D information;masking a portion of the incomplete color image based on the 3D information and the incomplete depth image to obtain a masked color image, wherein the masked portion of the incomplete color image corresponds to the determined area and indicates that the masked portion of the incomplete color image is obscured by the object when the scene is viewed from the new viewpoint; andinpainting the masked color image to obtain the second image.
- The method any one of claims 1 to 2, wherein the obtaining the second image comprises:inpainting the masked color image based on the AI inpainting model to obtain the second image.
- The method any one of claims 1 to 3, further comprising:obtaining an image caption by providing the second image to an AI caption model; anddetermining whether to re-inpaint the second image by comparing an embedding of the image caption and an embedding of the prompt.
- The method any one of claims 1 to 4, further comprising:masking a portion of the incomplete depth image based on the 3D information and the incomplete depth image to obtain a masked depth image, wherein the masked portion of the incomplete depth image corresponds to the determined area;providing the second image to an AI depth estimation model;generating an estimated depth image based on the masked depth image and an output of the AI depth estimation model; andgenerating a completed 3D representation based on the second image and the estimated depth image.
- The method any one of claims 1 to 5, wherein the masking comprises:generating a plurality of points which extend beyond a surface included in the original image;generating a mesh based on the plurality of points;rendering a depth map representing the mesh from the new viewpoint;generating a mask based on a comparison between the incomplete depth image and the depth map; andapplying the mask to the incomplete color image.
- The method any one of claims 1 to 6, wherein the mask indicates a plurality of pixels which are not used for generating the second image, andwherein the plurality of pixels includes a first plurality of pixels for which a depth is not indicated by the incomplete depth image, and a second plurality of pixels for which a depth indicated by the incomplete depth image is greater than a depth indicated by the depth map.
- An electronic device for processing image data for scene completion, the electronic device comprising:at least one memory configured to store instructions; andat least one processor configured to execute the instructions to:obtain an original image from an original viewpoint corresponding to a first direction, wherein the original image includes an object and a background, wherein a first surface of the object is an image of the object corresponding to the first direction,obtain a first image from a new viewpoint corresponding to a second direction different from the first direction by rotating the original image based on 3-dimensional (3D) information generated based on 2-dimensional information which is obtained from the original image,determine an area with the first image for generating a second surface of the object based on depth information about a depth between the object and the background of the original image; andobtain a second image by inputting the first image and the determined area to an artificial intelligence (AI) inpainting model, wherein the AI inpainting model generates the second surface of the object which occupies a portion of the determined area in the second image.
- The electronic device of claim 8, wherein the at least one processor is further configured to execute the instructions to:render an incomplete color image and an incomplete depth image corresponding to the new viewpoint based on the 3D information,mask a portion of the incomplete color image based on the 3D information and the incomplete depth image to obtain a masked color image, wherein the masked portion of the incomplete color image corresponds to the determined area and indicates that the masked portion of the incomplete color image is obscured by the object when the scene is viewed from the new viewpoint, andinpaint the masked color image to obtain the second image.
- The electronic device any one of claims 8 to 9, wherein to inpaint the masked color image, the at least one processor is further configured to execute the instructions to:inpaint the masked color image based on the AI inpainting model to obtain the second image.
- The electronic device any one of claims 8 to 10, wherein the at least one processor is further configured to execute the instructions to:obtain an image caption by providing the second image to an AI caption model; anddetermine whether to re-inpaint the second image by comparing an embedding of the image caption and an embedding of the prompt.
- The electronic device any one of claims 8 to 11, wherein the at least one processor is further configured to execute the instructions to:mask a portion of the incomplete depth image based on the 3D information and the incomplete depth image to obtain a masked depth image, wherein the masked portion of the incomplete depth image corresponds to the determined area;provide the second image to an AI depth estimation model;generate an estimated depth image based on the masked depth image and an output of the AI depth estimation model; andgenerate a completed 3D representation based on the second image and the estimated depth image.
- The electronic device any one of claims 8 to 12, wherein to mask the incomplete color image, the at least one processor is further configured to execute the instructions to:generate a plurality of points which extend beyond a surface included in the original image;generate a mesh based on the plurality of points;render a depth map representing the mesh from the new viewpoint;generate a mask based on a comparison between the incomplete depth image and the depth map; andapply the mask to the incomplete color image.
- The electronic device any one of claims 8 to 13, wherein the mask indicates a plurality of pixels which are not used for generating the second image, andwherein the plurality of pixels includes a first plurality of pixels for which a depth is not indicated by the incomplete depth image, and a second plurality of pixels for which a depth indicated by the incomplete depth image is greater than a depth indicated by the depth map.
- A computer-readable medium configured to store instructions which, when executed by at least one processor of a device, cause the at least one processor to perform the method of any one of claims 1 to 7.
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP24771233.4A EP4599405A4 (en) | 2023-03-14 | 2024-02-14 | METHOD AND DEVICE FOR PROCESSING AN IMAGE |
| CN202480007064.6A CN120677505A (en) | 2023-03-14 | 2024-02-14 | Method and apparatus for processing image |
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363452059P | 2023-03-14 | 2023-03-14 | |
| US63/452,059 | 2023-03-14 | ||
| US18/400,889 US20240312166A1 (en) | 2023-03-14 | 2023-12-29 | Rotation, inpainting and completion for generalizable scene completion |
| US18/400,889 | 2023-12-29 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024191234A1 true WO2024191234A1 (en) | 2024-09-19 |
Family
ID=92714507
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/KR2024/095121 Ceased WO2024191234A1 (en) | 2023-03-14 | 2024-02-14 | Method and apparatus for processing an image |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US20240312166A1 (en) |
| EP (1) | EP4599405A4 (en) |
| CN (1) | CN120677505A (en) |
| WO (1) | WO2024191234A1 (en) |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12608879B2 (en) * | 2022-12-12 | 2026-04-21 | Adobe Inc. | Generation of a 360-degree object view by leveraging available images on an online platform |
| US12592030B2 (en) | 2023-08-17 | 2026-03-31 | Adobe Inc. | Interactive three-dimension aware text-to-image generation |
| US20250117995A1 (en) * | 2023-10-05 | 2025-04-10 | Adobe Inc. | Image and depth map generation using a conditional machine learning |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20140021766A (en) * | 2012-08-10 | 2014-02-20 | 광운대학교 산학협력단 | A boundary noise removal and hole filling method for virtual viewpoint image generation |
| KR20200063367A (en) * | 2018-11-23 | 2020-06-05 | 네이버웹툰 주식회사 | Method and apparatus of converting 3d video image from video image using deep learning |
| US20200410746A1 (en) * | 2019-06-27 | 2020-12-31 | Electronics And Telecommunications Research Institute | Method and apparatus for generating 3d virtual viewpoint image |
| WO2021042134A1 (en) * | 2019-08-28 | 2021-03-04 | Snap Inc. | Generating 3d data in a messaging system |
| KR20220140402A (en) * | 2021-04-08 | 2022-10-18 | 구글 엘엘씨 | Neural blending for new view synthesis |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10726560B2 (en) * | 2014-10-31 | 2020-07-28 | Fyusion, Inc. | Real-time mobile device capture and generation of art-styled AR/VR content |
| WO2023014368A1 (en) * | 2021-08-05 | 2023-02-09 | Google Llc | Single image 3d photography with soft-layering and depth-aware inpainting |
-
2023
- 2023-12-29 US US18/400,889 patent/US20240312166A1/en active Pending
-
2024
- 2024-02-14 WO PCT/KR2024/095121 patent/WO2024191234A1/en not_active Ceased
- 2024-02-14 EP EP24771233.4A patent/EP4599405A4/en active Pending
- 2024-02-14 CN CN202480007064.6A patent/CN120677505A/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20140021766A (en) * | 2012-08-10 | 2014-02-20 | 광운대학교 산학협력단 | A boundary noise removal and hole filling method for virtual viewpoint image generation |
| KR20200063367A (en) * | 2018-11-23 | 2020-06-05 | 네이버웹툰 주식회사 | Method and apparatus of converting 3d video image from video image using deep learning |
| US20200410746A1 (en) * | 2019-06-27 | 2020-12-31 | Electronics And Telecommunications Research Institute | Method and apparatus for generating 3d virtual viewpoint image |
| WO2021042134A1 (en) * | 2019-08-28 | 2021-03-04 | Snap Inc. | Generating 3d data in a messaging system |
| KR20220140402A (en) * | 2021-04-08 | 2022-10-18 | 구글 엘엘씨 | Neural blending for new view synthesis |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4599405A4 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN120677505A (en) | 2025-09-19 |
| US20240312166A1 (en) | 2024-09-19 |
| EP4599405A1 (en) | 2025-08-13 |
| EP4599405A4 (en) | 2026-01-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2024191234A1 (en) | Method and apparatus for processing an image | |
| US10732725B2 (en) | Method and apparatus of interactive display based on gesture recognition | |
| CN110310175A (en) | System and method for mobile augmented reality | |
| WO2023051289A1 (en) | Navigation method and apparatus for unmanned device, medium, and unmanned device | |
| US20120124509A1 (en) | Information processor, processing method and program | |
| WO2019231130A1 (en) | Electronic device and control method therefor | |
| WO2019059505A1 (en) | Method and apparatus for recognizing object | |
| WO2024090989A1 (en) | Multi-view segmentation and perceptual inpainting with neural radiance fields | |
| US12354385B2 (en) | Image processing method and apparatus, electronic device, and computer-readable storage medium for identifying two-dimensional shapes using a depth image | |
| WO2020138602A1 (en) | Method for identifying user's real hand and wearable device therefor | |
| WO2017099555A1 (en) | Handwritten signature authentication system and method based on time division segment block | |
| WO2015199502A1 (en) | Apparatus and method for providing augmented reality interaction service | |
| KR102275682B1 (en) | SLAM-based mobile scan backpack system for rapid real-time building scanning | |
| WO2025028912A1 (en) | Electronic device for generating virtual object and operation method thereof | |
| CN121043130A (en) | A method, apparatus, equipment, medium, and product for controlling a robotic arm. | |
| WO2020204355A1 (en) | Electronic device and control method therefor | |
| WO2023239035A1 (en) | Electronic device for obtaining image data related to hand gesture and operation method therefor | |
| WO2024002065A1 (en) | Video encoding method and apparatus, electronic device, and medium | |
| WO2019245320A1 (en) | Mobile robot device for correcting position by fusing image sensor and plurality of geomagnetic sensors, and control method | |
| WO2019207875A1 (en) | Information processing device, information processing method, and program | |
| WO2023090808A1 (en) | Representing 3d shapes with probabilistic directed distance fields | |
| WO2023224326A1 (en) | Augmented reality device for acquiring depth information, and operating method therefor | |
| WO2017171142A1 (en) | System and method for detecting facial feature point | |
| WO2023063570A1 (en) | Electronic device for obtaining image data relating to hand motion and method for operating same | |
| WO2022270683A1 (en) | Depth map image generation method and computing device for same |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24771233 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024771233 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2024771233 Country of ref document: EP Effective date: 20250509 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202480007064.6 Country of ref document: CN |
|
| WWP | Wipo information: published in national office |
Ref document number: 2024771233 Country of ref document: EP |
|
| WWP | Wipo information: published in national office |
Ref document number: 202480007064.6 Country of ref document: CN |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |