WO2023090038A1 - 情報処理装置、映像処理方法、プログラム - Google Patents
情報処理装置、映像処理方法、プログラム Download PDFInfo
- Publication number
- WO2023090038A1 WO2023090038A1 PCT/JP2022/038981 JP2022038981W WO2023090038A1 WO 2023090038 A1 WO2023090038 A1 WO 2023090038A1 JP 2022038981 W JP2022038981 W JP 2022038981W WO 2023090038 A1 WO2023090038 A1 WO 2023090038A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- terminal device
- display device
- video
- background
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—Three-dimensional [3D] image rendering
- G06T15/10—Geometric effects
- G06T15/20—Perspective computation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/2224—Studio circuitry; Studio devices; Studio equipment related to virtual studio applications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—Three-dimensional [3D] image rendering
- G06T15/10—Geometric effects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating three-dimensional [3D] models or images for computer graphics
- G06T19/006—Mixed reality
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
- G11B27/036—Insert-editing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
- H04N5/2621—Cameras specially adapted for the electronic generation of special effects during image pickup, e.g. digital cameras, camcorders, video cameras having integrated special effects capability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
- H04N5/272—Means for inserting a foreground image in a background image, i.e. inlay, outlay
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
- H04N5/272—Means for inserting a foreground image in a background image, i.e. inlay, outlay
- H04N2005/2726—Means for inserting a foreground image in a background image, i.e. inlay, outlay for simulating a person's appearance, e.g. hair style, glasses, clothes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/2224—Studio circuitry; Studio devices; Studio equipment related to virtual studio applications
- H04N5/2226—Determination of depth image, e.g. for foreground/background separation
Definitions
- This technology relates to an information processing device, a video processing method, and a video processing technology implemented as a program.
- this disclosure proposes a technology that allows virtual production to be executed more easily.
- the information processing device is a state in which a display device and a terminal device having a shooting function are associated, and when the terminal device captures an object and an image displayed on the display device, the display An image processing unit that renders a 3D model based on relative position information between the device and the terminal device to generate an image to be displayed on the display device.
- “Association” between a display device and a terminal device means that they are paired at least as targets for relative position detection.
- the information processing device performs at least a process of rendering a 3D model based on relative position information between the display device and the terminal device.
- Such an information processing apparatus of the present disclosure can be considered as a processor provided within a terminal device, or as a terminal device itself including such a processor.
- the information processing device of the present disclosure can be considered as a processor provided within the display device, or as the display device itself including such a processor.
- the information processing device of the present disclosure is a processor provided in a device separate from the display device and the terminal device (for example, a cloud server, etc.), or can be considered as the device itself including such a processor. .
- FIG. 1 is an explanatory diagram of a virtual production shooting system
- FIG. FIG. 10 is an explanatory diagram of a background image according to camera positions in virtual production
- FIG. 10 is an explanatory diagram of a background image according to camera positions in virtual production
- FIG. 4 is an explanatory diagram of a video content production process
- 1 is a block diagram of a virtual production shooting system
- FIG. 4 is a flowchart of background image generation of the imaging system
- 1 is a block diagram of a shooting system using multiple cameras in virtual production
- FIG. 1 is a block diagram of an information processing device according to an embodiment
- FIG. FIG. 4 is an explanatory diagram of virtual production according to the embodiment
- FIG. 4 is an explanatory diagram of relative position detection according to the embodiment
- FIG. 10 is an explanatory diagram of display of a captured image in the terminal device according to the embodiment; 1 is a block diagram of a system configuration according to a first embodiment; FIG. It is a block diagram of the system configuration of the second embodiment. It is a block diagram of the system configuration of the third embodiment. It is a block diagram of the system configuration of a 4th embodiment.
- FIG. 12 is a block diagram of a system configuration according to a fifth embodiment;
- FIG. FIG. 12 is a block diagram of a system configuration according to a sixth embodiment;
- FIG. 10 is a flow chart of overall processing of the first to sixth embodiments; 3 is a block diagram of the functional configuration of the first embodiment; FIG.
- FIG. 6 is a flowchart of a processing example according to the first embodiment; It is a block diagram of the functional configuration of the second embodiment. 9 is a flowchart of a processing example according to the second embodiment;
- FIG. 11 is a block diagram of a functional configuration of a third embodiment;
- FIG. 11 is a flowchart of a processing example of the third embodiment;
- FIG. 13 is a block diagram of a functional configuration of a fourth embodiment;
- FIG. FIG. 11 is a flow chart of a processing example of the fourth embodiment;
- FIG. FIG. 12 is a block diagram of the functional configuration of the fifth embodiment;
- FIG. FIG. 14 is a flow chart of a processing example of the fifth embodiment;
- FIG. FIG. 13 is a block diagram of a functional configuration of a sixth embodiment;
- FIG. 16 is a flow chart of a processing example of the sixth embodiment; FIG. It is explanatory drawing of the area
- FIG. 20 is an explanatory diagram of a layer configuration according to the seventh embodiment;
- FIG. 21 is an explanatory diagram of an additional virtual image according to the seventh embodiment;
- FIG. 21 is an explanatory diagram of an additional virtual image according to the seventh embodiment;
- FIG. 21 is an explanatory diagram of an additional virtual image according to the seventh embodiment;
- FIG. FIG. 13 is a flowchart of overall processing of the seventh embodiment;
- FIG. FIG. 21 is a block diagram of a functional configuration of a seventh embodiment;
- FIG. FIG. 14 is a flowchart of processing according to the seventh embodiment;
- FIG. It is explanatory drawing of other embodiment.
- video or “image” includes both still images and moving images.
- image refers not only to the state displayed on the display, but also to the image data not displayed on the display.
- FIG. 1 schematically shows an imaging system 500.
- This photographing system 500 is a system for photographing as a virtual production, and the drawing shows part of the equipment arranged in the photographing studio.
- a performance area 501 is provided in which performers 510 perform acting and other performances.
- a large display device is arranged at least on the back surface of the performance area 501, and further on the left and right sides and on the top surface.
- the device type of the display device is not limited, the drawing shows an example using an LED wall 505 as an example of a large display device.
- a single LED wall 505 forms a large panel by connecting and arranging a plurality of LED panels 506 vertically and horizontally.
- the size of the LED wall 505 referred to here is not particularly limited, but may be any size necessary or sufficient for displaying the background when the performer 510 is photographed.
- a required number of lights 580 are arranged at required positions such as above or to the side of the performance area 501 to illuminate the performance area 501 .
- a camera 502 is arranged for filming, for example, movies and other video content.
- the camera 502 can be moved by a cameraman 512, and can be operated to change the shooting direction, angle of view, and the like.
- the movement of the camera 502, the angle of view operation, and the like are performed by remote operation.
- the camera 502 may move or change the angle of view automatically or autonomously. For this reason, the camera 502 may be mounted on a camera platform or a moving object.
- the performer 510 in the performance area 501 and the video displayed on the LED wall 505 are captured together. For example, by displaying the scenery as the background image vB on the LED wall 505, it is possible to shoot the same image as when the performer 510 is actually acting in the place of the scenery.
- An output monitor 503 is arranged near the performance area 501 .
- the image captured by the camera 502 is displayed on the output monitor 503 in real time as a monitor image vM. This allows the director and staff who produce the video content to check the video being shot.
- the photography system 500 for photographing the performance of the performer 510 against the backdrop of the LED wall 505 in the photography studio has various advantages over greenback photography.
- post-production after shooting is more efficient than when shooting with a green screen. This is because there are cases where so-called chromakey synthesis can be made unnecessary, and there are cases where color correction and reflection synthesis can be made unnecessary. Also, even if chromakey synthesis is required at the time of shooting, the fact that there is no need to add a background screen also contributes to efficiency.
- the green tint does not increase, so the correction is unnecessary.
- the background image vB the reflection on the actual article such as the glass is naturally obtained and photographed, so there is no need to synthesize the reflection image.
- the background video vB will be explained with reference to FIGS. 2 and 3.
- FIG. Even if the background image vB is displayed on the LED wall 505 and photographed together with the performer 510, simply displaying the background image vB makes the background of the photographed image unnatural. This is because the background image vB is a two-dimensional image that is actually three-dimensional and has depth.
- the camera 502 can shoot the performer 510 in the performance area 501 from various directions, and can also perform a zoom operation.
- the performer 510 does not stop at one place either.
- the actual appearance of the background of the performer 510 should change according to the position, shooting direction, angle of view, etc. of the camera 502, but such a change cannot be obtained with the background image vB as a plane image. Therefore, the background image vB is changed so that the background, including the parallax, looks the same as it actually does.
- FIG. 2 shows camera 502 photographing actor 510 from a position on the left side of the figure
- FIG. 3 shows camera 502 photographing actor 510 from a position on the right side of the figure.
- the shooting area image vBC is shown within the background image vB.
- a portion of the background image vB excluding the shooting area image vBC is called an "outer frustum”
- the shooting area image vBC is called an "inner frustum”.
- the background image vB described here refers to the entire image displayed as the background including the shooting area image vBC (inner frustum).
- the range of this shooting area image vBC corresponds to the range actually shot by the camera 502 within the display surface of the LED wall 505 . Then, the photographing area image vBC is transformed according to the position of the camera 502, the photographing direction, the angle of view, etc. so as to represent the scene actually seen when the position of the camera 502 is set as the viewpoint. ing.
- 3D background data which is a 3D (three dimensions) model as a background
- the 3D background data is sequentially rendered in real time based on the viewpoint position of the camera 502. do.
- the range of the shooting area image vBC is actually set to be slightly wider than the range shot by the camera 502 at that time. This is to prevent the image of the outer frustum from being reflected due to the drawing delay when the range to be photographed changes slightly due to the panning, tilting, or zooming of the camera 502, or to prevent the image of the outer frustum from This is to avoid the influence of diffracted light.
- the image of the shooting area image vBC rendered in real time in this way is combined with the image of the outer frustum.
- the image of the outer frustum used in the background image vB is rendered in advance based on the 3D background data, and a part of the image of the outer frustum incorporates the image as the shooting area image vBC rendered in real time.
- the entire background image vB is generated.
- the output monitor 503 displays the monitor image vM including the performer 510 and the background, which is the captured image.
- the background of this monitor image vM is the shooting area image vBC.
- the background included in the captured image is a real-time rendered image.
- the background image vB is not only displayed two-dimensionally, but also the shooting area image is displayed so that the same image as when actually shooting on location can be shot. Background video vB including vBC is changed in real time.
- the production process of video content as a virtual production that shoots with the shooting system 500 will be explained.
- the video content production process is roughly divided into three stages. They are asset creation ST1, production ST2, and post-production ST3.
- Asset creation ST1 is the process of creating 3D background data for displaying the background video vB.
- the background image vB is generated by performing real-time rendering using the 3D background data at the time of shooting. Therefore, 3D background data as a 3D model is created in advance.
- 3D background data production methods include full CG (Full Computer Graphics), point cloud data (Point Cloud) scanning, and photogrammetry.
- Full CG is a method of creating 3D models with computer graphics. Although this method requires the most man-hours and time among the three methods, it is suitable for use when an unrealistic image or an image that is actually difficult to shoot is desired to be used as the background image vB.
- LiDAR lidar
- 360-degree image is taken from the same position with a camera.
- This is a method of generating a 3D model from point cloud data by loading data. Compared to full CG, 3D models can be produced in a short time. In addition, it is easier to create a high-definition 3D model than photogrammetry.
- Photogrammetry is a technique of photogrammetry that analyzes parallax information and obtains dimensions and shapes from two-dimensional images obtained by photographing an object from multiple viewpoints. 3D model production can be done in a short time. Note that point group information acquired by the lidar may be used in generating 3D data by photogrammetry.
- these methods are used to create a 3D model that becomes 3D background data.
- the above methods may be used in combination.
- a part of a 3D model produced by point cloud data scanning or photogrammetry is produced by CG and synthesized.
- Production ST2 is the process of shooting in a shooting studio as shown in FIG. Elemental technologies in this case include real-time rendering, background display, camera tracking, and lighting control.
- Real-time rendering is a rendering process for obtaining a shooting area image vBC at each point in time (each frame of the background image vB) as described with reference to FIGS. This renders the 3D background data produced by the asset creation ST1 from a viewpoint corresponding to the position of the camera 502 at each time point.
- Camera tracking is performed to obtain shooting information from the camera 502, and tracks the position information, shooting direction, angle of view, etc. of the camera 502 at each point in time.
- Real-time rendering according to the viewpoint position of the camera 502 and the like can be executed by providing the rendering engine with shooting information including these in association with each frame.
- the shooting information is information linked or associated with video as metadata.
- the shooting information is assumed to include position information of the camera 502 at each frame timing, orientation of the camera, angle of view, focal length, F number (aperture value), shutter speed, lens information, and the like.
- Lighting control refers to controlling the state of lighting in the imaging system 500, specifically controlling the light intensity, emission color, lighting direction, etc. of the light 580. For example, lighting control is performed according to the time setting and location setting of the scene to be shot.
- Post-production ST3 indicates various processes performed after shooting. For example, video correction, video adjustment, clip editing, video effects, and the like are performed.
- Image correction may include color gamut conversion, color matching between cameras and materials, and the like. Color adjustment, brightness adjustment, contrast adjustment, etc. may be performed as image adjustment. As clip editing, clip cutting, order adjustment, time length adjustment, etc. may be performed. As a video effect, synthesis of CG video and special effect video may be performed.
- FIG. 5 is a block diagram showing the configuration of the photographing system 500 outlined in FIGS. 1, 2, and 3. As shown in FIG.
- the imaging system 500 shown in FIG. 5 includes the LED wall 505 by the plurality of LED panels 506, camera 502, output monitor 503, and light 580 described above.
- the imaging system 500 further comprises a rendering engine 520, an asset server 530, a sync generator 540, an operation monitor 550, a camera tracker 560, an LED processor 570, a lighting controller 581, and a display controller 590, as shown in FIG.
- the LED processor 570 is provided corresponding to each LED panel 506 and drives the corresponding LED panel 506 for video display.
- the sync generator 540 generates a synchronization signal for synchronizing the frame timing of the image displayed by the LED panel 506 and the frame timing of the imaging by the camera 502 and supplies it to each LED processor 570 and the camera 502 . However, this does not prevent the output from the sync generator 540 from being supplied to the rendering engine 520 .
- the camera tracker 560 generates shooting information by the camera 502 at each frame timing and supplies it to the rendering engine 520 .
- the camera tracker 560 detects the position of the LED wall 505 or relative position information of the camera 502 with respect to a predetermined reference position and the shooting direction of the camera 502 as one piece of shooting information, and supplies these to the rendering engine 520.
- a specific detection method by the camera tracker 560 there is a method of randomly arranging reflectors on the ceiling and detecting the position from reflected infrared light emitted from the camera 502 side.
- a detection method there is also a method of estimating the self-position of the camera 502 based on gyro information mounted on the platform of the camera 502 or the body of the camera 502 or image recognition of the image captured by the camera 502 .
- the angle of view, focal length, F number, shutter speed, lens information, etc. may be supplied from the camera 502 to the rendering engine 520 as shooting information.
- the asset server 530 is a server that stores the 3D model produced by the asset creation ST1, that is, the 3D background data in a recording medium, and can read out the 3D model as needed. That is, it functions as a DB (data Base) for 3D background data.
- DB data Base
- the rendering engine 520 performs processing for generating a background image vB to be displayed on the LED wall 505 . Therefore, the rendering engine 520 reads the necessary 3D background data from the asset server 530 . The rendering engine 520 then renders the 3D background data as viewed from the spatial coordinates designated in advance, and generates an image of the outer frustum used in the background image vB. As a process for each frame, the rendering engine 520 uses the shooting information supplied from the camera tracker 560 and the camera 502 to specify the viewpoint position and the like with respect to the 3D background data, and renders the shooting area video vBC (inner frustum). I do.
- the rendering engine 520 combines the shooting area video vBC rendered for each frame with the pre-generated outer frustum to generate the background video vB as video data for one frame.
- the rendering engine 520 then transmits the generated video data of one frame to the display controller 590 .
- the display controller 590 generates a divided video signal nD by dividing one frame of video data into video portions to be displayed on each LED panel 506 and transmits the divided video signal nD to each LED panel 506 .
- the display controller 590 may perform calibration according to individual differences/manufacturing errors such as color development between display units. Note that these processes may be performed by the rendering engine 520 without providing the display controller 590 . That is, the rendering engine 520 may generate the divided video signal nD, perform calibration, and transmit the divided video signal nD to each LED panel 506 .
- Each LED processor 570 drives the LED panel 506 based on the received divided video signal nD to display the entire background video vB on the LED wall 505 .
- the background image vB includes the shooting area image vBC rendered according to the position of the camera 502 at that time.
- the camera 502 can capture the performance of the performer 510 including the background image vB displayed on the LED wall 505 in this way.
- the image captured by the camera 502 is recorded on a recording medium inside the camera 502 or by an external recording device (not shown), and is also supplied to the output monitor 503 in real time and displayed as a monitor image vM.
- An operation image vOP for controlling the rendering engine 520 is displayed on the operation monitor 550 .
- the engineer 511 can perform necessary settings and operations regarding rendering of the background video vB while viewing the operation image vOP.
- a lighting controller 581 controls the emission intensity, emission color, irradiation direction, and the like of the light 580 .
- the lighting controller 581 may, for example, control the lights 580 asynchronously with the rendering engine 520, or may control them in synchronization with the shooting information and rendering processing. Therefore, the lighting controller 581 may perform light emission control according to instructions from the rendering engine 520 or a master controller (not shown).
- FIG. 6 shows a processing example of the rendering engine 520 in the photographing system 500 having such a configuration.
- the rendering engine 520 reads the 3D background data to be used this time from the asset server 530 in step S10, and develops it in an internal work area. Then, an image to be used as an outer frustum is generated.
- the rendering engine 520 repeats the processing from step S30 to step S60 at each frame timing of the background video vB until it determines in step S20 that the display of the background video vB based on the read 3D background data has ended.
- step S30 the rendering engine 520 acquires shooting information from the camera tracker 560 and the camera 502. This confirms the position and state of the camera 502 to be reflected in the current frame.
- step S40 the rendering engine 520 performs rendering based on the shooting information. That is, rendering is performed by specifying the viewpoint position for the 3D background data based on the position of the camera 502 to be reflected in the current frame, the shooting direction, the angle of view, or the like. At this time, image processing reflecting focal length, F number, shutter speed, lens information, etc., can also be performed. By this rendering, video data as the shooting area video vBC can be obtained.
- the rendering engine 520 performs a process of synthesizing the outer frustum, which is the overall background image, and the image reflecting the viewpoint position of the camera 502, that is, the shooting area image vBC. For example, it is a process of synthesizing an image generated by reflecting the viewpoint of the camera 502 with an image of the entire background rendered from a specific reference viewpoint. As a result, the background image vB of one frame displayed on the LED wall 505, that is, the background image vB including the shooting area image vBC is generated.
- step S60 The processing of step S60 is performed by the rendering engine 520 or the display controller 590.
- the rendering engine 520 or the display controller 590 generates a divided video signal nD that divides the one-frame background video vB into videos displayed on individual LED panels 506.
- FIG. Calibration may be performed.
- each divided video signal nD is transmitted to each LED processor 570 .
- the background image vB including the shooting area image vBC captured by the camera 502 is displayed on the LED wall 505 at each frame timing.
- FIG. 7 shows a configuration example when a plurality of cameras 502a and 502b are used.
- the cameras 502a and 502b are configured to be able to take pictures in the performance area 501 independently.
- Each camera 502 a , 502 b and each LED processor 570 are also kept synchronized by a sync generator 540 .
- Output monitors 503a and 503b are provided corresponding to the cameras 502a and 502b, and are configured to display images captured by the corresponding cameras 502a and 502b as monitor images vMa and vMb.
- camera trackers 560a and 560b are provided corresponding to the cameras 502a and 502b, and detect the positions and shooting directions of the corresponding cameras 502a and 502b, respectively.
- the shooting information from the camera 502 a and the camera tracker 560 a and the shooting information from the camera 502 b and the camera tracker 560 b are sent to the rendering engine 520 .
- the rendering engine 520 can perform rendering to obtain the background video vB of each frame using the shooting information on either the camera 502a side or the camera 502b side.
- FIG. 7 shows an example using two cameras 502a and 502b, it is also possible to use three or more cameras 502 for shooting.
- a plurality of cameras 502 there is a situation that the shooting area images vBC corresponding to the respective cameras 502 interfere with each other.
- the shooting area image vBC corresponding to the camera 502a is shown. will also be needed. If each shooting area image vBC corresponding to each camera 502a, 502b is simply displayed, they interfere with each other. Therefore, it is necessary to devise a way to display the shooting area image vBC.
- the information processing device 70 is a device such as a computer device capable of information processing, particularly video processing.
- the information processing device 70 is assumed to be a PC (personal computer), a workstation, a mobile terminal device such as a smart phone or a tablet, a video editing device, or the like.
- the information processing device 70 may be a computer device configured as a server device or an arithmetic device in cloud computing.
- the information processing device 70 can function as a 3D model creation device that creates a 3D model in the asset creation ST1.
- the information processing device 70 can also function as a rendering engine 520 that configures the shooting system 500 used in the production ST2.
- the information processing device 70 can also function as an asset server 530 .
- the information processing device 70 can also function as a video editing device that performs various video processing in the post-production ST3.
- FIG. 8 can also be said to be the hardware configuration of the terminal device 1 , the display device 2 , and the cloud server 4 .
- the RAM 73 also appropriately stores data necessary for the CPU 71 to execute various processes.
- the video processing unit 85 is configured as a processor that performs various video processing. For example, it is a processor that can perform some processing or a plurality of processing related to video, such as 3D model generation processing, rendering, DB processing, video editing processing, and image recognition processing based on image analysis.
- the video processing unit 85 can be implemented by, for example, a CPU separate from the CPU 71, a GPU (Graphics Processing Unit), a GPGPU (General-purpose computing on graphics processing units), an AI (artificial intelligence) processor, or the like. Note that the video processing unit 85 may be provided as a function within the CPU 71 .
- the CPU 71 , ROM 72 , RAM 73 , nonvolatile memory section 74 and video processing section 85 are interconnected via a bus 83 .
- An input/output interface 75 is also connected to this bus 83 .
- the input/output interface 75 is connected to an input section 76 including operators and operating devices.
- various operators and operating devices such as a keyboard, mouse, key, dial, touch panel, touch pad, remote controller, etc. are assumed.
- a user's operation is detected by the input unit 76 , and a signal corresponding to the input operation is interpreted by the CPU 71 .
- a microphone is also envisioned as input 76 .
- a voice uttered by the user can also be input as operation information.
- the input/output interface 75 is connected integrally or separately with a display unit 77 such as an LCD (Liquid Crystal Display) or an organic EL (electro-luminescence) panel, and an audio output unit 78 such as a speaker.
- the display unit 77 is a display unit that performs various displays, and is configured by, for example, a display device provided in the housing of the information processing device 70, a separate display device connected to the information processing device 70, or the like.
- the display unit 77 displays various images, operation menus, icons, messages, and the like on the display screen based on instructions from the CPU 71, that is, as a GUI (Graphical User Interface).
- GUI Graphic User Interface
- the input/output interface 75 may be connected to a storage section 79 and a communication section 80, each of which is composed of a HDD (Hard Disk Drive), a solid-state memory, or the like.
- a HDD Hard Disk Drive
- a solid-state memory or the like.
- the storage unit 79 can store various data and programs.
- a DB can also be configured in the storage unit 79 .
- the storage unit 79 can be used to construct a DB that stores 3D background data groups.
- the communication unit 80 performs communication processing via a transmission line such as the Internet, and communication with various devices such as an external DB, editing device, and information processing device through wired/wireless communication, bus communication, and the like.
- the communication unit 80 can access the DB as the asset server 530 and receive shooting information from the camera 502 and camera tracker 560 .
- the information processing device 70 used in the post-production ST3 it is possible to access the DB as the asset server 530 through the communication section 80.
- a drive 81 is also connected to the input/output interface 75 as required, and a removable recording medium 82 such as a magnetic disk, optical disk, magneto-optical disk, or semiconductor memory is appropriately loaded.
- Video data and various computer programs can be read from the removable recording medium 82 by the drive 81 .
- the read data is stored in the storage unit 79 , and video and audio contained in the data are output from the display unit 77 and the audio output unit 78 .
- Computer programs and the like read from the removable recording medium 82 are installed in the storage unit 79 as required.
- the information processing device 70 may include various sensors as the sensor section 86 as necessary.
- a sensor section 86 comprehensively indicates various sensors.
- the CPU 71 and the image processing section 85 can perform corresponding processing based on the information from the sensor section 86 .
- Specific sensors in the sensor unit 86 include, for example, a range sensor such as a ToF (Time of Flight) sensor, a range/direction sensor such as a rider, a position information sensor, an illuminance sensor, an infrared sensor, and a touch sensor.
- an IMU intial measurement unit
- the angular velocity may be detected by a pitch-, yaw-, and roll three-axis angular velocity (gyro) sensor, for example.
- the information processing device 70 may include a camera section 87 .
- this information processing device 70 may be implemented as a terminal device 1 having a photographing function, which will be described later.
- the camera unit 87 includes an image sensor, a signal processing circuit photoelectrically converted by the image sensor, and the like.
- the camera unit 87 shoots images as moving images and still images.
- the captured image is processed by the image processing unit 85 and the CPU 71, stored in the storage unit 79, displayed on the display unit 77, and transmitted to another device by the communication unit 80.
- the distance information obtained by the distance measuring sensor of the sensor unit 86 is depth information to the subject.
- the CPU 71 and the video processing unit 85 generate a depth map corresponding to each frame of the captured video based on the detection value of the distance measuring sensor, or generate depth information of a specific subject detected by object detection processing from the image. can be detected.
- the software for the processing of this embodiment can be installed via network communication by the communication unit 80 or via the removable recording medium 82.
- the software may be stored in advance in the ROM 72, the storage unit 79, or the like.
- Embodiment> A virtual production according to the embodiment will be described below. It cannot be said that a general user can easily execute virtual production using the shooting system 500 as a large-scale studio set described above. Therefore, the present embodiment proposes a technology that enables video production using virtual production technology to be easily performed even at home or the like.
- FIG. 9 shows an example of a terminal device 1, a display device 2, and an object 10 (person, animal, article, etc.) that actually exists as an object to be photographed.
- an object 10 person, animal, article, etc.
- the terminal device 1 is, for example, a smartphone, a tablet terminal, a notebook PC, etc., and is equipped with a function of capturing images.
- the terminal device 1 is desirably a small device that can be carried by the user, but it may be a device that is not suitable for carrying around, such as a desktop PC.
- the display device 2 has at least a function of displaying images, and is assumed to be, for example, a home television receiver or a video monitor device.
- the user uses his/her smart phone as the terminal device 1 and the television receiver at home as the display device 2 to perform photography as a virtual production.
- the display device 2 is recognized by the terminal device 1, for example.
- the display device 2 is recognized on the image captured by the terminal device 1 .
- the terminal device 1 recognizes the display device 2 as the target of relative position detection.
- a television receiver having an AR marker 3 may be recognized as the display device 2 .
- the terminal device 1 and the display device 2 may be paired by short-range wireless communication or the like.
- the terminal device 1 may be recognized as the partner for relative position detection from the display device 2 side. In any case, at least one of the terminal device 1 and the display device 2 recognizes a pair to be subjected to relative position detection.
- the background image vB generated by rendering using the 3D background model is displayed on the display device 2 .
- the user shoots the background image vB displayed on the display device 2 and the real object 10 in front of it with the terminal device 1 .
- the background image vB is rendered based on the relative position of the terminal device 1 with respect to the display device 2
- the background image vB can be rendered in accordance with the parallax due to the direction and the positional relationship with the position of the terminal device 1 as the viewpoint. can be generated. That is, the background image vB equivalent to the inner frustum described above can be displayed on the display device 2 . Therefore, it is possible to realize photography equivalent to that of the photography system 500 of FIG. 1 described above with the terminal device 1 such as a smartphone and the display device 2 such as a television receiver.
- photographing system 500 In the photographing system 500 described above with reference to FIGS. 1 to 7, these are implemented by different devices, and photographing as a virtual production is realized by linking them.
- either the terminal device 1 or the display device 2 implements these functions.
- a cloud server 4 which will be described later, may be used. This allows the user to shoot a virtual production at home or the like and easily create an attractive moving image. To make it possible to create an attractive introductory video of, for example, an item created as a hobby, a pet, an action of a subject person, or the like.
- an example of installing an AR marker 3 on the display device 2 as shown in FIG. 10A can be considered.
- the terminal device 1 shoots an image using the shooting function
- the relative position with respect to the display device 2 can be detected by recognizing the AR marker 3 in the image.
- the AR marker 3 can be used only when the terminal device 1 is shooting a range including the AR marker 3 . Therefore, it is desirable to use techniques such as SLAM (Simultaneous Localization and Mapping) in consideration of the fact that the AR marker 3 is out of the frame. For example, the surrounding environment is sensed by LiDAR or the like, and the self-position is estimated by SLAM based on the environmental information. Further, the terminal device 1 can also estimate its own position by itself based on the captured video and IMU detection data. Relative position detection with respect to the display device 2 can be performed based on these self-position estimations.
- SLAM Simultaneous Localization and Mapping
- the terminal device 1 detects the relative position
- relative position detection can be similarly performed.
- the display device 2 since the display device 2 is actually assumed to be a home-use television receiver or the like, the size of the display device 2, more precisely, the size of the display surface is also detected. Display size information is obtained by this display size detection.
- the user activates an application program on the terminal device 1 and manually inputs an actual numerical value. For example, the user actually measures and inputs the vertical and horizontal lengths of the display device 2 .
- the user inputs the product name, model number, etc. of the display device 2, such as a television receiver, and the application program accesses a DB (database) to automatically search for the size.
- DB database
- the size of the display device 2 can also be automatically detected by the terminal device 1 .
- the range of the display device 2 can be specified from the depth map based on the distance information obtained by the distance measuring sensor for the captured image. Thereby, the size of the display device 2 can be detected. Also, more accurately, it is desired to detect the size of the screen 2a shaded in FIG. 10B instead of the housing size of the display device 2. Therefore, when detecting the size, it is conceivable to transmit a specific color image from the terminal device 1 to the display device 2 for display, detect the range of the color in the captured image, and calculate the actual size.
- the size of the display device 2 and the relative positions of the terminal device 1 and the display device 2 are detected, and the background image vB rendered based on the relative position information is displayed on the display device 2. Then, the terminal device 1 takes an image of the object 10 including the displayed background image vB.
- the terminal device 1 displays the captured image vC on the screen 1a as shown in FIG. 11A, for example.
- the captured image vC is an image including the background image vB and the object 10 .
- a user can view an image on the screen 1 a of the terminal device 1 as a monitor for the shooting while taking a picture with the terminal device 1 .
- the image quality may be changed by applying filter processing to the entire image.
- filter processing For example, an anime-like filter or the like can be applied.
- the angle of view may be slightly changed at the time of editing later so that the angle of view is larger than the angle of view assumed to be finally used. By doing so, it is possible to photograph a wide range, which may be advantageous for the AR marker 3 of the display device 2 and environment recognition for SLAM.
- the background image vB corresponding to that range may be drawn at a lower resolution.
- the invalid area frame 16 is displayed in a display mode such as hatching or graying to present the user with the range that will be used as the video content being produced. By doing so, the user can produce video content in which the subject from the desired distance, direction, and angle is appropriately framed in the shooting by the terminal device 1 .
- each configuration example is referred to as a first embodiment to a sixth embodiment.
- Each figure does not show the detection and transmission/reception of the display size information of the display device 2, but since the display size information does not change between the start and end of shooting, it is first detected once by some method. , the state acquired by the rendering device.
- the terminal device 1 and the display device 2 are associated as targets for relative position detection.
- the relative position information RP is the relative position between the terminal device 1 and the display device 2 in all examples.
- the communication between the terminal device 1 and the display device 2, the communication between the terminal device 1 and the cloud server 4, and the communication between the display device 2 and the cloud server 4 may be wired communication or wireless communication. good. Also, direct communication between devices or network communication may be used.
- FIG. 12 shows a configuration example of the first embodiment. It is composed of a terminal device 1 and a display device 2 .
- the terminal device 1 has a 3D background model 5 .
- the terminal device 1 performs relative position detection, and renders the background image vB from the 3D background model 5 based on the relative position information.
- the terminal device 1 transmits the background image vB to the display device 2 .
- the display device 2 displays the background image vB.
- FIG. 13 shows a configuration example of the second embodiment. It is composed of a terminal device 1 and a display device 2 .
- the display device 2 comprises a 3D background model 5 .
- the terminal device 1 performs relative position detection and obtains relative position information RP.
- the terminal device 1 transmits relative position information RP to the display device 2 .
- the display device 2 renders the background image vB from the 3D background model 5 and displays the background image vB based on the relative position information.
- FIG. 14 shows a configuration example of the third embodiment. It is composed of a terminal device 1 , a display device 2 and a cloud server 4 .
- a cloud server 4 comprises a 3D background model 5 .
- the terminal device 1 performs relative position detection and transmits relative position information RP to the cloud server 4 .
- the cloud server 4 renders the background image vB from the 3D background model 5 based on the relative position information RP.
- the cloud server 4 transmits the background video vB to the terminal device 1 .
- the terminal device 1 transmits the background image vB received from the cloud server 4 to the display device 2 .
- the display device 2 displays the background image vB.
- FIG. 15 shows a configuration example of the fourth embodiment. It is composed of a terminal device 1 , a display device 2 and a cloud server 4 .
- a cloud server 4 comprises a 3D background model 5 .
- the terminal device 1 performs relative position detection and transmits relative position information RP to the cloud server 4 .
- the cloud server 4 renders the background image vB from the 3D background model 5 based on the relative position information RP.
- the cloud server 4 transmits the background image vB to the display device 2 .
- the display device 2 displays the background image vB.
- FIG. 16 shows a configuration example of the fifth embodiment. It is composed of a terminal device 1 , a display device 2 and a cloud server 4 .
- a cloud server 4 comprises a 3D background model 5 .
- the display device 2 performs relative position detection and transmits relative position information RP to the cloud server 4 .
- the cloud server 4 renders the background image vB from the 3D background model 5 based on the relative position information RP.
- the cloud server 4 transmits the background image vB to the display device 2 .
- the display device 2 displays the background image vB.
- FIG. 17 shows a configuration example of the sixth embodiment. It is composed of a terminal device 1 , a display device 2 and a cloud server 4 .
- a cloud server 4 comprises a 3D background model 5 .
- the terminal device 1 performs relative position detection and transmits relative position information RP to the display device 2 .
- the display device 2 transmits the relative position information RP received from the terminal device 1 to the cloud server 4 .
- the cloud server 4 renders the background image vB from the 3D background model 5 based on the relative position information RP.
- the cloud server 4 transmits the background image vB to the display device 2 .
- the display device 2 displays the background image vB.
- FIG. 18 is a process executed by any device in the system according to the configurations of the first to sixth embodiments. Here, it is described as a processing procedure for the entire system.
- the display size is detected by one of the devices in the system in step ST51. That is, information about the size of the screen 2a of the display device 2 is detected.
- the display size information is acquired by the device that renders the background image vB.
- step ST52 it is determined whether the photographing is finished. For example, it is determined that the photographing is finished by the user's operation to end the photographing on the terminal device 1 .
- each device ends the processing of FIG.
- the processing from step ST53 to step ST56 is repeated for each frame timing of the background video vB and the shot video vC until it is determined that the shooting has ended. Note that the frame timing of the background image vB and the frame timing of the shot image vC are kept in a synchronous relationship.
- step ST53 relative position detection is performed by any device (terminal device 1 or display device 2) in the system.
- the detected relative position information RP is acquired by the rendering device.
- a process of rendering the background image vB from the 3D background model 5 based on the relative position information RP is performed by any device in the system.
- step ST55 display processing of the background image vB obtained by rendering on the display device 2 is performed.
- step ST56 in the terminal device 1, the background video vB of the display device 2 and the object 10 are captured while the captured video vC is displayed on the screen 1a.
- FIG. 19 shows functional configurations of the terminal device 1 and the display device 2 in the first embodiment shown in FIG. It should be noted that the functional configuration described in each embodiment is realized by the hardware configuration as the information processing device 70 in FIG. It is.
- the terminal device 1 has a display size detection unit 31, a relative position detection unit 32, a 3D model management unit 33, a background layer rendering unit 34, a communication control unit 35, an imaging unit 38, and a display control unit 39. .
- the display size detection unit 31 is a function that performs display size detection processing for the display device 2 associated as a target for relative position detection. As described above, there are methods for detecting the display size according to user input and automatic detection methods. Therefore, display size detection can be executed by the CPU 71 and the video processing unit 85 via a user interface using the input unit 76 and the display unit 77 in the information processing device 70 . Information received by the communication unit 80 or information read from the DB stored in the storage unit 79 may be used as the size information searched based on the model number. Further, the CPU 71 and the image processing section 85 can automatically detect the display size using information from the camera section 87 and the sensor section 86 .
- the relative position detection unit 32 is a function that performs processing for detecting relative position information RP between the associated terminal device 1 and display device 2 .
- the CPU 71 or the video processing unit 85 using information from the camera unit 87, the sensor unit 86, and the communication unit 80 in the information processing device 70 Realized.
- the 3D model management unit 33 is a function that manages the 3D background model 5 for generating the background video vB.
- the 3D background model 5 created in the asset creation process is stored in the storage unit 79 or the like, managed, and read out at the time of rendering.
- the 3D model management unit 33 is realized by processing of the image processing unit 85 in the information processing device 70, for example.
- the background layer rendering unit 34 has a function of rendering the background image vB, and is realized by the processing of the image processing unit 85 and the CPU 71 in the information processing device 70 .
- the communication control unit 35 is a function of transmitting and receiving information with other devices in the terminal device 1 . It is a control function of communication via the communication section 80 in the information processing device 70 as the terminal device 1 , and is realized by the image processing section 85 and the CPU 71 .
- the imaging unit 38 has a function of capturing images as moving images and still images, and is realized by the camera unit 87 in the information processing device 70 .
- the display control unit 39 has a function of controlling display of an image on the screen 1a of the terminal device 1, and is realized as a control function of the image processing unit 85 and the display unit 77 by the CPU 71 in the information processing device 70 as the terminal device 1. be.
- the display device 2 includes a communication control section 36 and a display control section 37.
- the communication control unit 36 is a function of transmitting and receiving information with other devices in the display device 2 . It is a control function of communication via the communication section 80 in the information processing device 70 as the display device 2 , and is realized by the image processing section 85 and the CPU 71 .
- the display control unit 37 has a function of controlling display of an image on the screen 2a of the display device 2, and is implemented as a control function of the display unit 77 by the image processing unit 85 and the CPU 71 in the information processing device 70 as the terminal device 1. be.
- the process shown in FIG. 20 is performed by the terminal device 1 and the display device 2, so that the processing operation shown in FIG. 18 is performed as the entire system.
- the shooting by the terminal device 1 is not shown in the flowchart, basically, the user operates to enter a recording standby state in a shooting mode as a virtual production, thereby capturing a moving image (an image captured by an image sensor). Acquisition of data) is started, and the display of the photographed image vC on the screen 1a as a through image is started. Then, according to the recording start operation, the captured image vC is recorded on the recording medium as the image content.
- the recording stop operation in response to the recording stop operation, the recording of the video content to the recording medium is stopped, and a recording standby state is entered. Then, the shooting as a virtual production is terminated by a predetermined end operation, and the display of the captured image vC on the screen 1a is also terminated.
- the flowchart of each embodiment shows processing for each frame timing from the start to the end of shooting as a virtual production.
- the terminal device 1 detects the display size of the display device 2 by the display size detection unit 31 in step S101.
- the terminal device 1 determines in step S102 that the virtual production shooting has ended, and during the period in which the virtual production shooting has not ended, steps S103 to S106 are repeated for each frame timing of the shot video vC.
- step S ⁇ b>103 the terminal device 1 performs relative position detection using the relative position detection unit 32 .
- step S104 the terminal device 1 causes the background layer rendering section 34 to render the 3D background model 5 read from the 3D model management section 33 in the off-screen buffer based on the display size information and the relative position information RP. That is, the background image vB is generated.
- the off-screen buffer is a non-display screen, and is a temporary buffer area for rendering video prepared in the RAM 73 or the like.
- step S ⁇ b>105 the terminal device 1 uses the communication control unit 35 to transmit the background image vB in the off-screen buffer to the display device 2 .
- the processes of steps S202 and S203 are repeated for each frame while judging the end in step S201 until the end is reached.
- the end determination on the display device 2 side can be made, for example, by stopping reception of frames of the background video vB for a predetermined time or longer.
- the terminal device 1 may transmit a termination instruction signal at the time of termination, and the display device 2 may determine termination by receiving the signal.
- the display device 2 ends the display processing of the background image vB of the virtual production by the end determination.
- the display device 2 receives the background image vB from the terminal device 1 by means of the communication control unit 36 in step S202.
- the display device 2 causes the display control unit 37 to perform processing for displaying the received background image vB on the screen 2a.
- the background video vB generated by the terminal device 1 is transmitted to and displayed on the display device 2 frame by frame.
- the display device 2 and the object 10 are captured by the image capturing unit 38.
- the display control unit 39 displays the captured image vC of each frame obtained by capturing on the screen 1a. I do.
- FIG. 21 shows functional configurations of the terminal device 1 and the display device 2 in the second embodiment shown in FIG. Note that, in each of the following embodiments, the same reference numerals are given to the functional configurations already described, and redundant detailed descriptions thereof will be omitted. See the description of FIG. 11 above.
- the terminal device 1 has a relative position detection section 32 , a communication control section 35 , an imaging section 38 and a display control section 39 .
- the display device 2 has a display size detection unit 31 , a 3D model management unit 33 , a background layer rendering unit 34 , a communication control unit 36 and a display control unit 37 .
- the process shown in FIG. 22 is performed by the terminal device 1 and the display device 2, so that the processing operation shown in FIG. 18 is performed as the entire system.
- the same step numbers are attached to the processes that have already been explained.
- the terminal device 1 judges the end of the virtual production shooting in step S102. The processing of steps S103, S110, and S106 is repeated.
- step S ⁇ b>103 the terminal device 1 performs relative position detection using the relative position detection unit 32 .
- step S ⁇ b>110 the terminal device 1 performs processing for transmitting the relative position information RP to the display device 2 using the communication control unit 35 .
- step S106 the terminal device 1 causes the display control unit 39 to perform processing for displaying the photographed video vC of each frame obtained by photographing by the imaging unit 38 on the screen 1a.
- the display size of the display device 2 is detected by the display size detection unit 31 in step S210 when virtual production shooting is started.
- the display size detection unit 31 since it is the size of the screen 2a itself, the display size detection unit 31 may be formed as a storage unit that stores the size information of the screen 2a.
- step S210 may be a process for the CPU 71 in the display device 2 to read out the stored display size.
- the display device 2 repeats the processing of steps S211, S212, and S203 for each frame while judging the end in step S201 until the end is reached.
- step S ⁇ b>211 the display device 2 receives the relative position information RP from the terminal device 1 through the communication control unit 36 .
- step S212 the background layer rendering unit 34 of the display device 2 renders the 3D background model 5 read from the 3D model management unit 33 based on the display size information and the received relative position information RP to render the background image vB. to generate
- step S203 the display device 2 causes the display control unit 37 to perform processing for displaying the generated background image vB on the screen 2a.
- the background image vB rendered by the display device 2 based on the relative position information RP detected by the terminal device 1 is displayed.
- FIG. 23 shows functional configurations of the terminal device 1, the display device 2, and the cloud server 4 in the third embodiment shown in FIG.
- the terminal device 1 has a display size detection section 31 , a relative position detection section 32 , a communication control section 35 , an imaging section 38 and a display control section 39 .
- the display device 2 has a communication control section 36 and a display control section 37 .
- the cloud server 4 has a 3D model management section 33 , a background layer rendering section 34 and a communication control section 40 .
- the communication control unit 40 is a function that transmits and receives information to and from other devices in the cloud server 4. It is a control function of communication via the communication unit 80 in the information processing device 70 as the cloud server 4 and is realized by the video processing unit 85 and the CPU 71 .
- the processing shown in FIG. 24 is performed by the terminal device 1, the cloud server 4, and the display device 2, so that the processing operation shown in FIG. executed.
- the terminal device 1 When virtual production shooting is started by user operation or automatic start control, the terminal device 1 detects the display size of the display device 2 by the display size detection unit 31 in step S120. The terminal device 1 then transmits the display size information to the cloud server 4 .
- the cloud server 4 receives the display size information in step S301 and stores it for subsequent rendering.
- the terminal device 1 determines the end of the virtual production shooting in step S102, and repeats the processing of steps S121, S122, S105, and S106 for each frame timing of the shot video vC during the period when the virtual production shooting is not finished.
- step S121 the terminal device 1 performs relative position detection by the relative position detection unit 32, and performs processing for transmitting the detected relative position information RP to the cloud server 4 by the communication control unit 35.
- the cloud server 4 After receiving the display size information in step S301, the cloud server 4 repeats the processing of steps S303, S304, and S305 while determining the end in step S302.
- the end determination on the cloud server 4 side can be made, for example, by stopping reception of the relative position information RP from the terminal device 1 for a predetermined time or more, or by disconnecting network communication with the terminal device 1, or the like.
- the terminal device 1 may transmit a termination instruction signal at the time of termination, and the cloud server 4 may determine termination by receiving the signal. The cloud server 4 terminates the processing by the termination determination.
- the cloud server 4 receives the relative position information RP from the communication control unit 40 in step S303 until it is determined to be finished.
- the cloud server 4 and the background layer rendering unit 34 render the 3D background model 5 read from the 3D model management unit 33 based on the display size information and the received relative position information RP to generate the background image vB. Generate.
- the cloud server 4 performs processing for transmitting the background image vB to the terminal device 1 by the communication control unit 40 in step S305.
- the terminal device 1 After receiving the background image vB in step S ⁇ b>122 , the terminal device 1 performs processing for transmitting the received background image vB to the display device 2 by the communication control unit 40 . Further, in step S106, the terminal device 1 causes the display control unit 39 to perform processing for displaying the photographed image vC of each frame obtained by photographing by the imaging unit 38 on the screen 1a.
- the display device 2 performs steps S201, S202, and S203 in the same manner as in the first embodiment (FIG. 20). As a result, the display device 2 performs an operation of displaying the received background image vB.
- FIG. 25 shows functional configurations of the terminal device 1, the display device 2, and the cloud server 4 in the fourth embodiment shown in FIG.
- the functions provided by the terminal device 1, the cloud server 4, and the display device 2 are the same as in FIG. However, the communication control unit 40 of the cloud server 4 maintains the communication connection with both the terminal device 1 and the display device 2 during execution of virtual production shooting.
- the terminal device 1, the cloud server 4, and the display device 2 perform the processing shown in FIG. executed as a whole.
- the processes of steps S120, S102, S121, and S106 are performed in the same manner as in FIG.
- the terminal device 1 in this case does not need the process of receiving the background image vB from the cloud server 4 and transferring it to the display device 2, as described with reference to FIG.
- the cloud server 4 performs steps S301, S302, S303, S304, and S305 as shown in FIG. Although the process is similar to that of the schematic diagram 24, the background image vB is transmitted to the display device 2 in step S305.
- the display device 2 performs steps S201, S202, and S203 as shown in FIG. As a result, the display device 2 performs an operation of displaying the background image vB received from the cloud server 4 .
- FIG. 27 shows functional configurations of the terminal device 1, the display device 2, and the cloud server 4 in the fifth embodiment shown in FIG.
- the terminal device 1 has an imaging unit 38 and a display control unit 39 .
- the display device 2 has a display size detection section 31 , a relative position detection section 32 , a communication control section 36 and a display control section 37 .
- the cloud server 4 has a 3D model management section 33 , a background layer rendering section 34 and a communication control section 40 .
- the processing shown in FIG. 28 is performed by the terminal device 1, the cloud server 4, and the display device 2, so that the processing operation of FIG. executed.
- step S106 display processing of the captured image vC in step S106 is performed for each frame until the process ends in step S102.
- the display device 2 reads its own display size information by the display size detection unit 31 in step S220, and transmits the display size information to the cloud server 4.
- the cloud server 4 receives the display size information in step S301 and stores it for subsequent rendering.
- the display device 2 determines in step S201 that the virtual production shooting has ended, and repeats the processes of steps S221, S202, and S203 for each frame timing of the background video vB during the period in which the virtual production shooting has not ended.
- step S221 the terminal device 1 performs relative position detection by the relative position detection unit 32, and performs processing for transmitting the detected relative position information RP to the cloud server 4 by the communication control unit 36.
- the cloud server 4 After receiving the display size information in step S301, the cloud server 4 repeats the processes of steps S303, S304, and S305 until it is determined to end while determining the end in step S302.
- the cloud server 4 receives the relative position information RP from the display device 2 through the communication control unit 40 .
- the cloud server 4 and the background layer rendering unit 34 render the 3D background model 5 read from the 3D model management unit 33 based on the display size information and the received relative position information RP to generate the background image vB. Generate.
- the cloud server 4 performs processing for transmitting the background image vB to the display device 2 by the communication control unit 40 in step S305.
- the display device 2 When the display device 2 receives the background image vB in step S202, it performs display processing of the background image vB in step S203. As a result, the display device 2 performs an operation of displaying the received background image vB.
- FIG. 29 shows functional configurations of the terminal device 1, the display device 2, and the cloud server 4 in the sixth embodiment shown in FIG.
- the terminal device 1 has a relative position detection section 32 , a communication control section 35 , an imaging section 38 and a display control section 39 .
- the display device 2 has a display size detection section 31 , a communication control section 36 and a display control section 37 .
- the cloud server 4 has a 3D model management section 33 , a background layer rendering section 34 and a communication control section 40 .
- the terminal device 1, the cloud server 4, and the display device 2 perform the processing shown in FIG. executed.
- the terminal device 1 performs relative position detection, shooting, and display of the shot video vC. Therefore, relative position detection is performed in step S130, relative position information RP is transmitted to the display device 2, and display processing of the captured image vC is performed in step S106 at each frame timing until the process ends in step S102.
- the display device 2 reads its own display size information by the display size detection unit 31 and transmits it to the cloud server 4 in step S220.
- the cloud server 4 accordingly receives the display size information in step S301 and stores it for subsequent rendering.
- the display device 2 determines the end of the virtual production shooting in step S201, and repeats the processing of steps S231, S232, S202, and S203 for each frame timing of the background video vB during the period when the virtual production shooting has not ended.
- step S231 the display device 2 receives the relative position information RP transmitted from the terminal device 1 by the communication control unit 36, and in step S232 performs processing for transmitting the relative position information RP to the cloud server 4.
- the cloud server 4 receives the relative position information RP in step S303, renders the background image vB in step S304, and renders the background image vB to the display device 2 in step S305 while determining the end in step S302 in the same manner as in FIG. Repeat the sending process.
- the display device 2 When the display device 2 receives the background image vB in step S202, it performs display processing of the background image vB in step S203. As a result, the display device 2 performs an operation of displaying the received background image vB.
- the display device 2 When shooting using the background image vB as a virtual image, the display device 2 exists behind the object 10 which is the foreground. Therefore, it is not possible to place the display device 2 in front of the object 10 to display a virtual image or to apply an effect to the image in front of the object 10 . In other words, the virtual image ends up being the background side of the object 10 .
- the “front” of the object 10 refers to the side of the terminal device 1 when viewed from the object 10 , that is, the side of the device that performs photography.
- FIG. 31 shows the positional relationship between the terminal device 1 and the display device 2, and the object position 60.
- FIG. The object position 60 is the position where the object 10 actually exists. Depending on the terminal device 1 , the object 10 and the background image vB displayed on the display device 2 are shot.
- the front area 61 is an area in front of the object 10 in the image vC captured by the terminal device 1 .
- a rear area 62 is an area behind the object 10 .
- the other areas 63 and 64 are areas that are not in front of or behind the object 10 .
- FIG. 32 shows an example of adding a ring-shaped additional virtual image 11 to the object 10 .
- the image is a ring surrounding the object 10 .
- a background layer 50 a foreground 51 , and an overlay layer 52 when adding such an additional virtual image 11 .
- a foreground 51 is an image of the object 10 itself.
- a background layer 50 is a layer of the background image vB displayed on the display device 2 .
- the photographed image vC includes images of the background layer 50 and the foreground 51 .
- an overlay layer 52 is set in front of the foreground, and a ring-shaped additional virtual image 11 is drawn on this overlay layer 52.
- the captured image vC is an additional ring-shaped image.
- a virtual image 11 is added.
- a virtual image is included as the background image vB behind the object 10, but by considering the overlay layer 52, it is possible to add a virtual image in front of the object 10 as well. can.
- the ring-shaped additional virtual image 11 may be simply drawn on the overlay layer 52. Perform virtual video addition processing.
- the ring-shaped additional virtual image 11 is an image positioned over a front area 61 , a rear area 62 , and other areas 63 and 64 .
- the portion belonging to the front area 61 in the additional virtual image 11 is rendered on the overlay layer 52 .
- a portion belonging to the rear area 62 in the additional virtual image 11 is added to the background image vB.
- Portions belonging to the other areas 63 and 64 in the additional virtual image 11 may be drawn on the overlay layer 52, but are preferably added to the background image vB.
- the portion positioned in the front area 61 in the additional virtual image 11 needs to appear in front of the object 10 in the captured image vC, so the overlay layer 52 is used.
- the additional virtual image 11 is rendered as the image of the overlay layer 52, and the rendered overlay layer 52 is synthesized with the captured image vC.
- a portion located in the rear area 62 in the additional virtual image 11 is actually hidden behind the object 10 in the captured image vC. In that sense, it is conceivable that the portion belonging to the rear region 62 in the additional virtual image 11 is not rendered. However, it is preferable that this part is added to the background layer 50 in consideration of the reflection on the object 10 . For example, it is for realization of natural reflection on a glossy surface of the object 10 .
- the process of adding the image in the additional virtual image 11 is also performed when rendering the background image vB using the 3D background model.
- the parts located in the other areas 63 and 64 in the additional virtual image 11 do not overlap the object 10 in the captured image vC. Therefore, it may be drawn on the overlay layer 52 as well as the front area 61 . However, considering the effect of natural reflection on the object 10, it is preferable to add it to the background layer 50 when rendering the background image vB.
- the additional virtual image 11 can be added before and after the object 10 as the foreground 51 using the background layer 50 and the overlay layer 52 .
- FIG. 33 shows an example in which an additional virtual image 11a of characters is added to the background image vB and the photographed image vC of the object 10 .
- This is an example in which an additional virtual image 11a of characters is drawn on the overlay layer 52 and synthesized.
- FIG. 34 shows an example in which virtual heart-shaped or star-shaped additional virtual images 11a and 11b are added to the background image vB and the photographed image vC of the object 10 .
- the additional virtual image 11a of the front area 61 is drawn on the overlay layer 52 from the position of the person of the object 10, and the additional virtual image 11b of the rear area 62 and other areas 63 and 64 is drawn on the background layer 50. This is an example in which it is included in the video vB.
- the additional virtual image 11 is generated as an effect, and the background layer 50, It may also be applied to the overlay layer 52 .
- the position of the additional virtual image 11 (11a, 11b) is set according to the position of the body of the person (object 10) in the image. 33, the additional virtual image 11a overlaps the body of the object 10.
- FIG. FIG. 34 shows an example in which the additional virtual image 11a is positioned on the face (cheek).
- FIG. 35 shows an example in which the additional virtual image 11b is added to the background layer 50 by the user touching the screen 1a of the terminal device 1 while shooting.
- a lightning-like additional virtual image 11b is added to the background image vB of the background layer 50 from the position specified by the touch at the timing of the touch of the user's finger 65 .
- FIG. 36 Each step in FIG. 36 is executed by any device in the system configuration including the terminal device 1, the display device 2, or the cloud server 4 as described in the first to sixth embodiments, respectively. processing. In other words, as in FIG. 18, it is described as a processing procedure for the entire system.
- the display size is detected by one of the devices in the system in step ST11. That is, the size information of the screen 2a of the display device 2 is detected. The size information is obtained by the rendering device.
- step ST12 it is determined whether the photographing is finished. For example, it is determined that the photographing is finished by the user's operation to end the photographing on the terminal device 1 . When it is determined to end, each device ends the processing of FIG. The processing from step ST13 to step ST23 is repeated for each frame timing of the background video vB and the shot video vC until it is determined that the shooting is finished.
- step ST13 relative position detection is performed by any device (terminal device 1 or display device 2) in the system.
- the detected relative position information RP is acquired by the rendering device.
- step ST14 for example, the terminal device 1 performs area detection. This is a process of detecting a front area 61, a rear area 62, and other areas 63 and 64 from the captured image vC of the current frame according to the position of the object 10 within the captured image vC.
- any device in the system determines whether or not the frame in which the background video vB is to be rendered this time is the frame to which the additional virtual video 11 is applied.
- the effect start timing for applying the additional virtual image 11 is designated by, for example, a user's touch operation.
- the effect start timing for applying the additional virtual image 11 may be instructed by a predetermined user operation other than the touch operation.
- automatic processing may be performed such that an effect applied to the additional virtual image 11 is activated when a specific subject is detected by image recognition processing. For example, when a smile is detected, a predetermined additional virtual image 11 is added.
- processing may be performed such that an effect as the additional virtual video 11 is activated at a preset time.
- step ST15 it is determined whether or not the current timing is the timing for adding the additional virtual image 11 to one or both of the background layer 50 and the overlay layer 52 .
- step ST15 the process proceeds from step ST15 to step ST17.
- the 3D background model is used in step ST17 to render the background image vB, for example, in the same manner as in the first embodiment. That is, using the 3D background model, rendering is performed based on the display size information and the relative position information RP to generate the background image vB.
- step ST16 application setting of the additional virtual video 11 is performed in any device in the system. Specifically, one or both of the application setting of the additional virtual image 11 to the background layer 50 and the application setting of the additional virtual image 11 to the overlay layer 52 are performed.
- step ST16 setting for applying the additional virtual image 11 to the background layer 50 is performed. That is, when rendering the background image vB using the 3D background model, the additional virtual image 11 is added to generate the background image vB.
- the screen position of the additional virtual image 11 within the background image vB is also set.
- the on-screen position of the additional virtual image 11 is set according to the touch position, object detection results such as recognition, finger bone recognition, and body bone recognition.
- the position in the screen is specified according to the
- step ST16 the content of the image as the additional virtual image 11 to be given to the overlay layer 52, the setting of the range to be drawn of the additional virtual image 11 according to the result of the area detection in step ST14, the operation or the image recognition, etc.
- the position in the screen is specified according to the
- any device in the system renders the background video vB using the 3D background model.
- the additional virtual image 11 is added to the background image vB. That is, using the 3D background model, rendering is performed based on the display size information and the relative position information RP, and the background image vB to which the additional virtual image 11 is added according to the settings in step ST16 is generated.
- step ST18 display processing of the background image vB obtained by rendering on the display device 2 is performed.
- the background video vB of the display device 2 and the object 10 are captured while the captured video is displayed on the screen 1a. Additions may be made.
- step ST ⁇ b>19 the terminal device 1 determines whether or not the frame of the current shot video vC is the frame for drawing the additional virtual video 11 using the overlay layer 52 . If the application setting of the additional virtual video 11 in the overlay layer 52 in step ST16 has not been previously performed for the frame of this time, the terminal device 1 proceeds from step ST19 to step ST23, and leaves the frame of the captured video vC of this time as it is. A process for displaying on the screen 1a is performed. This is when the frame of the current shot video vC is a frame in a period during which the effect of the additional virtual video 11 is not activated, or even during the activation period, all of the additional virtual video 11 is added to the background layer 50. , the overlay layer 52 is not used.
- step ST16 when the additional virtual video 11 is drawn using the overlay layer 52
- the terminal device 1 proceeds from step ST19 to step ST20, and the overlay layer is drawn. 52 renders. That is, rendering is performed using the display size information, the relative position information RP, and the 3D model or character image applied as the additional virtual image 11 to generate the image of the overlay layer 52 .
- step ST21 the terminal device 1 performs synthesis processing of the overlay layer 52 on the shot video vC.
- step ST22 the terminal device 1 can apply the filter to the entire image of the captured image vC after synthesis. For example, by applying a painterly filter, an anime-like filter, or the like, it is possible to perform filter processing as a kind of image effect.
- step ST23 the terminal device 1 displays the captured image vC on the screen 1a.
- the user can visually recognize the shot image vC to which the effect of the additional virtual image 11 is added in real time during shooting.
- the filter processing in step ST22 may be performed when the overlay layer 52 is not synthesized.
- Virtual production shooting proceeds by performing the processing of the above flow for the entire system, and virtual image addition processing such as adding an additional virtual image 11 before and after the object 10 can also be performed.
- virtual image addition processing such as adding an additional virtual image 11 before and after the object 10 can also be performed.
- a specific functional configuration for executing such processing and a processing example of each device will be described.
- FIG. 37 is an example in which the terminal device 1 and the display device 2 are configured as in the first embodiment of FIG.
- the terminal device 1 has a display size detection unit 31, a relative position detection unit 32, a 3D model management unit 33, a background layer rendering unit 34, a communication control unit 35, an imaging unit 38, and a display control unit 39. These are the same as those in the example of FIG. 49.
- the region detection unit 44 detects the front region 61, the rear region 62, and the other regions 63 and 64 described with reference to FIG. 31 for each frame of the captured image vC.
- the area detection unit 44 follows the image as the object 10 by image recognition to determine the object position 60, and detects the front area 61, the rear area 62, and the other areas 63 and 64 based on it.
- the object 10 is a non-moving object
- the front region 61, rear region 62, and other regions 63 and 64 of each frame are detected based on the relative position information RP. can also be done.
- the area detection unit 44 can be realized as a processing function of the CPU 71 and the image processing unit 85 using information from the camera unit 87, the sensor unit 86, and the communication unit 80 in the information processing device 70, for example.
- the input operation reception unit 45 has a function of detecting a user operation related to virtual image addition processing, such as a user's touch operation.
- a touch panel is provided on the screen 1a, and the input operation reception unit 45 detects a touch operation by this touch panel.
- the input operation reception unit 45 notifies the 3D model management unit 33 of operation information.
- the input operation reception unit 45 can be realized by the CPU 71 and the image processing unit 85 that detect an input by the input unit 76 in the information processing device 70 .
- the image recognition processing unit 46 performs recognition processing of the subject image in the captured video vC.
- the recognition processing result is notified to the 3D model management unit 33 .
- the image recognition processing unit 46 can be realized by the image processing unit 85 that analyzes the image captured by the camera unit 87 in the information processing device 70 .
- the 3D model management unit 33 can determine the activation timing of the effect to add the additional virtual image 11 and the position in the screen. , and the content of the additional virtual video 11 can be set. That is, the 3D model management unit 33 can set the application of the effect of adding the additional virtual image 11 described in step ST16 of FIG.
- the overlay layer rendering unit 47 has a function of rendering the overlay layer 52, and is realized by processing of the video processing unit 85 and the CPU 71 in the information processing device 70.
- the overlay layer rendering unit 47 and the background layer rendering unit 34 receive the display size information from the display size detection unit 31, the relative position information RP from the relative position detection unit 32, the 3D model from the 3D model management unit 33, and the area detection unit. Detection information of a front area 61, a rear area 62, and other areas 63 and 64 from 44 is supplied. As a result, the background layer rendering section 34 can perform the rendering in step ST17 of FIG. 36, and the overlay layer rendering section 47 can perform the rendering in step ST20.
- the image synthesizing unit 48 synthesizes the image vC captured by the imaging unit 38 and the image of the overlay layer 52 rendered by the overlay rendering unit 47 , and adds the additional virtual image 11 to the front area of the object 10 .
- the filter processing unit 49 performs filter processing as an effect on the video synthesized by the image synthesizing unit 48 .
- the image synthesizing unit 48 and the filter processing unit 49 are functions realized by the video processing unit 85 in the information processing device 70, for example.
- the terminal device 1 When shooting as a virtual production is started by user operation or automatic start control, the terminal device 1 detects the display size of the display device 2 by the display size detection unit 31 in step S101. In step S102, the terminal device 1 determines whether the virtual production shooting has ended, and repeats steps S103 to S106 for each frame timing of the shot image vC during the period when the virtual production shooting has not ended.
- step S ⁇ b>103 the terminal device 1 performs relative position detection using the relative position detection unit 32 .
- step S150 the terminal device 1 detects the front area 61, the rear area 62, and the other areas 63 and 64 in the frame of the current captured image vC by the area detection unit 44.
- step S151 the terminal device 1 determines whether it is the timing of the frame to which the additional virtual video 11 is applied. Then, if it is the timing, in step S152, the additional virtual image 11 is set to be applied to one or both of the background layer 50 and the overlay layer 52 .
- step S152 the additional virtual image 11 is set to be applied to one or both of the background layer 50 and the overlay layer 52 .
- step S153 the terminal device 1 causes the background layer rendering section 34 to render the 3D background model 5 read from the 3D model management section 33 in the off-screen buffer.
- the background video vB is generated with the additional virtual video 11 added to the video of the 3D background model 5.
- step S ⁇ b>105 the terminal device 1 uses the communication control unit 35 to transmit the background image vB in the off-screen buffer to the display device 2 .
- steps S202 and S203 are repeated for each frame until the end is determined while determining the end in step S201.
- the display device 2 receives the background image vB from the terminal device 1 by means of the communication control unit 36 in step S202.
- the display device 2 causes the display control unit 37 to perform processing for displaying the received background image vB on the screen 2a. Therefore, when the background image vB includes the additional virtual image 11 , the display device 2 displays a state in which the additional virtual image 11 is added to the background layer 50 .
- the terminal device 1 captures images of the display device 2 and the object 10 with the imaging unit 38, but in step S154, the frame of the current captured image vC is set to add the additional virtual image 11 to the overlay layer 52. determine whether or not there is If the additional virtual image 11 by the overlay layer 52 is not added, the terminal device 1 proceeds to step S106, and the display control unit 39 performs processing for displaying the photographed image vC of each frame obtained by photographing on the screen 1a.
- the terminal device 1 When adding the additional virtual image 11 by the overlay layer 52, the terminal device 1 renders the overlay layer 52 by the overlay rendering unit 47 in step S155. Then, in step S156, the terminal device 1 synthesizes the rendered overlay 52 with the frame of the current shot video vC in the image synthesizing unit 48 . Further, the terminal device 1 executes filtering by the filtering unit 49 in step S157 according to the setting. Then, the terminal device 1 proceeds to step S106, and the display control unit 39 performs processing for displaying the captured image vC that has undergone the synthesis processing on the screen 1a. Therefore, the user can visually recognize the captured image vC to which the additional virtual image 11 is added on the screen 2a in real time.
- the seventh embodiment it is possible to perform photography in which the background layer 50 and the overlay layer 52 are linked in virtual production photography.
- the user can also specify the position and timing of effect activation by touch panel operation or the like.
- FIGS. 37 and 38 have been described according to the system configuration of the first embodiment of FIG. 12, the processing of FIG. 36 can also be applied to the system configuration examples of FIGS. 13 to 17.
- the background layer rendering unit 34 that adds the additional virtual image 11 to the background layer 50 may be provided on the display device 2 side (see FIG. 21) other than the terminal device 1 side, or may be provided on the cloud server 4 side. (See FIGS. 23, 25, 27 and 29).
- the overlay rendering unit 47 and the image synthesizing unit 48 are provided in the terminal device 1 as well as in the cloud server 4, and the terminal device 1 and the cloud server 4 work together to process the overlay layer 52 for the shot video vC. You may do so.
- the information processing device 70 of the embodiment includes a video processing section 85 having the function of the background layer rendering section 34 .
- the video processing unit 85 captures the object 10 and the background video vB displayed on the display device 2 with the terminal device 1.
- the 3D model is rendered based on the relative position information RP between the display device 2 and the terminal device 1 to generate the background image vB displayed on the display device 2 .
- the user shoots the background image vB displayed on the display device 2 and the object 10 using the display device 2 such as a television receiver at home and the terminal device 1 such as a smartphone.
- the terminal device 1 and the display device 2 are associated as objects of relative position detection and relative position detection is performed, thereby generating a background image vB corresponding to the viewpoint direction from the terminal device 1 with respect to the display device 2, It can be displayed on the display device 2 . Therefore, it is possible to easily perform shooting using the virtual production technique at home or the like, other than in a dedicated studio.
- Such an information processing device 70 can be considered as a processor provided within the terminal device 1 or as the terminal device 1 itself including such a processor.
- the information processing device 70 can be considered as a processor provided within the display device 2 or as the display device 2 itself including such a processor.
- the information processing device 70 is a device separate from the display device 2 and the terminal device 1, for example, a processor provided in the cloud server 4, or a device such as the cloud server 4 having such a processor. You can also think.
- the video processing unit 85 (background layer rendering unit 34, communication control unit 35) of the terminal device 1 is configured to transmit the background video vB obtained by rendering the 3D background model 5 to the display device 2. (See FIGS. 12, 19 and 20).
- the terminal device 1 renders the 3D background model 5 in accordance with the relative position and transmits the rendered data to the display device 2 for display. Then, the terminal device 1 takes a picture.
- a general television device, a monitor device, or the like capable of receiving video images can be used as the display device 2, and the functions of the terminal device 1 can be used to easily perform a virtual display in an environment where the display device 2 is present, such as at home or on the go. production can be realized.
- the video processing unit 85 (background layer rendering unit 34, communication control unit 36) of the display device 2 renders the 3D background model 5 based on the relative position information RP received from the terminal device 1. 13, 21, and 22).
- the display device 2 renders the 3D background model to generate and display the background image vB by receiving the relative position information RP from the terminal device 1. can be done.
- virtual production can be realized at home or the like by introducing the display device 2 having the video processing unit 85 that performs rendering. Since the terminal device 1 side only needs to have the function of detecting the relative position information RP and transmitting it to the display device 2, the processing load is small and high processing power is not required.
- the image processing unit 85 (background layer rendering unit 34) is provided in an external device that is separate from the terminal device 1 and the display device 2. bottom. Then, the video processing unit 85 renders the 3D background model 5 based on the received relative position information RP, generates a background video vB to be displayed on the display device 2, and transmits the background video vB (Figs. 30). For example, an external device capable of communicating with one or both of the terminal device 1 and the display device 2 renders the background image vB. As a result, virtual production using the terminal device 1 and the display device 2 can be realized by using an external device with abundant resources such as arithmetic functions and storage capacity.
- the cloud server 4 is used as an external device.
- external devices include a home server in a home network, a dedicated personal computer, a workstation, a smartphone, a tablet, a PC other than the terminal device 1, or a video recorder. It may be a so-called household electric appliance such as a device. Any device may be used as long as it can function as the information processing device 70 including at least the video processing unit 85 of the present technology.
- the cloud server 4 is used as the external device.
- the terminal device 1 or the display device 2 renders the background image vB in the cloud server 4 to which the terminal device 1 or the display device 2 can communicate.
- virtual production using the terminal device 1 and the display device 2 can be realized by using the processing function of the cloud server 4 .
- An advantage is obtained that the processing load on the terminal device 1 and the display device 2 is small.
- by providing services to users as the cloud server 4 it is possible to widely provide video production opportunities through virtual production.
- the video processing unit 85 (background layer rendering unit 34) in the external device renders and displays the 3D background model 5 based on the relative position information RP received from the terminal device 1. It was supposed to generate a video.
- the cloud server 4 may receive the relative position information RP from the terminal device 1 .
- an external device other than the terminal device 1 and the display device 2 can perform rendering based on the relative position information RP at each point in time, and the background image vB corresponding to the viewpoint position of the terminal device 1 at each point in time can be generated.
- the relative position information RP detected by the display device 2 may be transferred to the terminal device 1 and transmitted from the terminal device 1 to the cloud server 4 .
- the video processing unit 85 (background layer rendering unit 34, communication control unit 40) in the external device generates the 3D background model 5 based on the relative position information RP received from the display device 2. It is assumed that the image to be rendered and displayed is generated.
- the cloud server 4 may receive the relative position information RP from the display device 2, for example.
- the relative position information RP may be transferred to the display device 2 side and transmitted from the display device 2 to the cloud server 4. good.
- an external device other than the terminal device 1 and the display device 2 can perform rendering based on the relative position information RP at each point in time, and can generate a background image vB corresponding to the viewpoint position of the terminal device 1 at each point in time.
- the video processing unit 85 (background layer rendering unit 34, communication control unit 40) in the external device transmits the background video vB generated by rendering the 3D background model 5 to the terminal device 1. shall be performed.
- the background image vB generated by the cloud server 4 is transmitted to the terminal device 1 and transmitted from the terminal device 1 to the display device 2 .
- the background image vB generated by the external device communicating with the terminal device 1 can be displayed on the display device 2 in real time.
- the cloud server 4 transmits the background image vB rendered based on the relative position information received from the display device 2 to the terminal device 1, and from the terminal device 1 to the display device 2.
- a configuration for transmission is also conceivable.
- the video processing unit 85 (background layer rendering unit 34, communication control unit 40) in the external device renders the 3D background model 5 and renders the generated video to the display device 2. It was assumed that processing to send to The background video vB generated by the cloud server 4 is transmitted to the display device 2 . Thereby, the background image vB generated by the external device communicating with the terminal device 1 can be displayed on the display device 2 in real time. Further, by transmitting the background image vB to the display device 2 without going through the terminal device 1, it is possible to reduce the required amount of communication, reduce the amount of communication, improve the communication speed, and improve the communication efficiency.
- the image processing unit 85 (the background layer rendering unit 34 and the overlay layer rendering unit 47) renders the background image vB displayed on the display device 2 and the photographed image vC obtained by photographing the object 10 with the terminal device 1.
- virtual image addition processing is performed to include the additional virtual image 11 together with the background image vB and the image of the object 10 based on the 3D background model 5 .
- the captured image vC is added with an additional virtual image 11 in addition to the background image vB based on the 3D background model 5 and the image of the object 10 .
- the addition processing of the additional virtual image 11 can be performed in real time during shooting, or can be performed after shooting as post-production.
- the additional virtual image 11 refers to any additional virtual image such as specific images, patterns, changes in color and luminance, patterns, and characters.
- additional virtual image such as specific images, patterns, changes in color and luminance, patterns, and characters.
- the image added by image processing or the image effect corresponds to the additional virtual video 11 .
- processing for including such additional virtual video 11 is referred to as virtual video addition processing.
- general image quality adjustments such as brightness adjustment, color adjustment, gradation adjustment, white balance adjustment, gamma processing, sharpness processing, etc. do not fall under the virtual video addition processing referred to in the present disclosure.
- the image processing unit 85 processes each frame of the image captured by the terminal device 1 by adding the virtual image 11 to the captured image vC.
- This is an example of performing virtual video addition processing to include In other words, the additional virtual image 11 is added in real time during shooting. As a result, it is possible to easily provide a video effect that is easy for the user to confirm.
- the video processing unit 85 starts virtual video addition processing in response to a predetermined operation on the terminal device 1.
- FIG. For example, during shooting, the effect of the additional virtual image 11 is activated in response to a user's touch operation or the like. As a result, it is possible to provide a shooting environment in which the user can activate video effects at desired timing.
- the video processing unit 85 (3D model management unit 33) sets the virtual video addition processing based on the image recognition processing of the captured video (step ST16 in FIG. 36, Step S152 in FIG. 38, see FIGS. 33, 34 and 35). For example, parameters such as the type of the additional virtual image 11 and the position within the image are determined according to the object type, position, size within the captured image vC, and the like. An additional virtual image 11 can be added at an appropriate place in the image by face recognition, bone recognition, or the like of a person as the object 10 .
- the virtual video addition processing of the seventh embodiment is processing to add the additional virtual video 11 to the overlay layer 52 that overlays the video of the object 10 in the captured video vC.
- the overlay layer 52 includes the additional virtual video 11.
- FIG. As a result, a virtual image can be added to the front area 61 of the object 10 that actually exists.
- Such virtual video addition processing can be realized by the functions of the overlay rendering section 47 and the image synthesizing section 48 in the video processing section 85 .
- the virtual image adding process of the seventh embodiment is a process of adding an additional virtual image to the background image vB generated by rendering the 3D background model 5 .
- the background image vB includes the additional virtual image 11.
- FIG. As a result, a virtual image can be added to the area behind the object 10 (the other areas 63 and 64 and the rear area 62).
- Such virtual image addition processing can be realized by rendering by the background layer rendering unit 34 .
- the additional virtual image 11 is also reflected in the real object 10 . Therefore, it is possible to easily realize a more realistic image representation such that the virtual additional virtual image 11 is reflected in the real object 10 . This also means that difficult work such as adding reflections in post-production is not required.
- Such virtual video addition processing can be realized by the function of the background layer rendering section 34 in the video processing section 85 .
- the image processing unit 85 determines the area around the object in the captured image vC, and based on the determination, performs the virtualization. It was assumed that video addition processing was performed. By determining the front area 61, the rear area 62, and the other areas 63 and 64 as peripheral areas of the object 10 for each frame, the additional virtual image 11 can be added in consideration of the positional relationship with the object 10.
- FIG. 1 the area detection unit 44, the background layer rendering unit 34, the overlay layer rendering unit 47
- an example has been given in which virtual image addition processing such as effects is started in response to a touch operation on the screen of the terminal device 1 .
- virtual image addition processing such as effects
- the user can activate an effect by touching an arbitrary position on the screen.
- the user can provide a shooting environment in which a video effect can be activated at any position and at any timing on the screen.
- the background image vB displayed on the display device 2 and the captured image vC captured by the terminal device 1 capturing the object 10 are displayed on the screen 2a of the terminal device 1. .
- the background image vB and the captured image vC of the object 10 captured using a terminal device 1 such as a smartphone are displayed on the terminal device 1, so that the user can shoot while viewing the captured image vC. .
- a simple virtual production using the terminal device 1 can be realized.
- the additional virtual video 11 can also be confirmed while being captured by the terminal device 1 .
- a smartphone is mainly assumed as the terminal device 1, but any device having a shooting function can be used as the terminal device 1.
- a camera such as a single-lens camera or a compact digital camera can be implemented as the information processing device 70 of the present disclosure by providing the functions described in the embodiment.
- the functions of the present disclosure may be implemented in devices such as camera-equipped glasses, AR (Augmented Reality) glasses, and the like. In this case, it becomes easier to shoot an image with camerawork that seems to be from the first-person viewpoint. Also, this function may be implemented in a watch device with a camera. This allows you to shoot without having to hold the equipment in your hands.
- Various display devices can be used as the display device 2 as well.
- a projector a large tablet, a smartphone, or the like may be used as the display device.
- the background image vB displayed as the background layer 50 is a partially transparent image vBT.
- the display device 2 that displays the background image vB is a transmissive panel, the rear part of the transparent image vBT can be seen through.
- the object 12 is also arranged behind.
- the captured image vC can include the object 12 behind the background.
- the terminal device 1 for photographing is used to photograph the object so as to circle the surroundings to estimate the light source.
- the brightness of the background image vB is changed according to the ambient light of the shooting site.
- the photographing according to the embodiment there may be a case where the area of the screen 2a of the display device 2 is small and the photographing range is limited. Therefore, it is conceivable to move the display device 2 to the front of the terminal device 1 by using a drone, a cart, or the like so that the shooting range (angle of view) does not protrude from the screen 2a of the display device 2.
- the terminal device 1 may notify the user by vibrating or displaying an alert on the screen 2a. Furthermore, when the photographed image vC protrudes outside the background image vB, the protruding area is drawn with the background on the overlay layer 52 so that it is not visible on the image that it protrudes outside the background image vB. You can also
- ambient light may be reflected on the screen 2a of the display device 2 .
- the screen 2a of the display device 2 is rotated so that the illumination is not reflected, the orientation of the terminal device 1 changes. This can be dealt with by distorting vB.
- the user can hold the light source or perform any operation with the other hand. It is also suitable for shooting by one person.
- presenting information other than the captured image vC on the terminal device 1 such as a smartphone may be disturbing, but in such a case, the information can be presented in cooperation with other devices such as a watch device or a tablet.
- audio guide information can be output using audio output devices such as earphones.
- audio output devices such as earphones.
- information such as the remaining shooting time and take number is output by voice.
- the description is mainly based on the assumption of moving image shooting, but the technique of the present disclosure can also be applied to still image shooting.
- the display of the background image vB and the display of the captured image vC are performed during the display of the background image vB on the display device 2 and the display of the through image on the terminal device 1 while waiting for the shutter operation in the still image shooting mode. can be applied to
- the program of the embodiment is a program that causes a processor such as a CPU or a DSP, or a device including these, to execute the processing of the video processing unit 85 described above. That is, the program of the embodiment can be applied to the case where the display device 2 and the terminal device 1 having a shooting function are associated with each other, and the object and the image displayed on the display device are captured by the terminal device.
- Such a program can be recorded in advance in a HDD as a recording medium built in equipment such as a computer device, or in a ROM or the like in a microcomputer having a CPU.
- a program can be used on flexible discs, CD-ROMs (Compact Disc Read Only Memory), MO (Magneto Optical) discs, DVDs (Digital Versatile Discs), Blu-ray Discs (registered trademark), magnetic It can be temporarily or permanently stored (recorded) in a removable recording medium such as a disk, semiconductor memory, or memory card.
- Such removable recording media can be provided as so-called package software.
- it can also be downloaded from a download site via a network such as a LAN (Local Area Network) or the Internet.
- LAN Local Area Network
- Such a program is suitable for widely providing the information processing device 70 of the embodiment.
- a program for example, by downloading a program to a personal computer, a communication device, a mobile terminal device such as a smartphone or a tablet, a mobile phone, a game device, a video device, a PDA (Personal Digital Assistant), etc., these devices can be used as the information processing device of the present disclosure. 70.
- the information processing apparatus of the present disclosure is configured to have a video processing unit, and in the embodiment, as a specific example, an information processing apparatus 70 having a video processing unit 85 as shown in FIG. 8 is used.
- the processing of the video processing unit referred to in the present disclosure may be processing performed by the video processing unit 85 in the configuration of FIG. 8, or may be performed by the video processing unit 85 and the CPU 71 in cooperation. Alternatively, the CPU 71 may perform the processing.
- the present technology can also adopt the following configuration.
- An information processing apparatus comprising: an image processing unit that renders a 3D model based on the above to generate an image to be displayed on the display device.
- the video processing unit is provided in the terminal device, The information processing apparatus according to (1) above, wherein an image obtained by rendering a 3D model in the image processing unit is transmitted to the display device.
- the video processing unit is provided in the display device, The information processing apparatus according to (1) above, wherein the image processing unit is configured to render a 3D model based on the relative position information received from the terminal device and generate an image to be displayed.
- the video processing unit is provided in an external device that is separate from both the terminal device and the display device, The image processing unit renders a 3D model based on the received relative position information, generates an image to be displayed on the display device, and transmits the generated image.
- the video processing unit is Performing virtual image addition processing so that the image displayed on the display device and the object captured by the terminal device include the additional virtual image together with the image of the 3D model and the image of the object From (1) above (9)
- the information processing device according to any one of (9).
- the video processing unit is The information processing apparatus according to (10) above, wherein in the process for each frame of the video captured by the terminal device, virtual video addition processing is performed so that the additional virtual video is included in the captured video.
- the video processing unit is The information processing apparatus according to (10) or (11) above, wherein the virtual video addition process is started in response to a predetermined operation on the terminal device.
- the video processing unit is The information processing apparatus according to any one of (10) to (12) above, wherein the setting of the virtual video addition processing is performed based on image recognition processing of the captured video.
- the virtual image adding process is a process of adding an additional virtual image to a layer overlaid on the image of the object in the captured image.
- the virtual image adding process is a process of adding an additional virtual image to an image displayed on the display device generated by rendering a 3D model. .
- the video processing unit is The information processing apparatus according to any one of (10) to (15) above, wherein an object peripheral region in the captured image is determined, and the virtual image addition processing is performed based on the determination.
- the image displayed on the display device and the image captured by the terminal device capturing the object are displayed and output on the display unit of the terminal device,
- the display unit has a screen as an input unit,
- the information processing device according to any one of (10) to (16) wherein the terminal device starts the virtual video addition process in response to a touch operation on the input unit.
- the information processing apparatus according to any one of (1) to (16) above, wherein an image displayed on the display device and an image captured by the terminal device capturing the object are displayed on the terminal device.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Graphics (AREA)
- Signal Processing (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Geometry (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Processing Or Creating Images (AREA)
Abstract
Description
また近年はグリーンバック撮影に代わって、大型の表示装置を設置したスタジオにおいて、表示装置に背景映像を表示させ、その前で演者が演技を行うことで、演者と背景を撮影できる撮影システムも開発され、いわゆるバーチャルプロダクション(Virtual Production)、インカメラVFX(In-Camera VFX)、またはLEDウォールバーチャルプロダクション(LED Wall Virtual Production)として知られている
下記特許文献1には、背景映像の前で演技する演者を撮影するシステムの技術が開示されている。
但しこのような撮影システムは、専用のスタジオセットを用いることが必要となり、一般ユーザが手軽にバーチャルプロダクション技術を用いることが困難である。例えば自宅にある機材のみでバーチャルプロダクションを行うということは実現されていなかった。
表示装置と端末装置の「関連付け」とは、少なくとも相対位置検出の対象のペアとされることをいう。情報処理装置は、表示装置と端末装置の相対位置情報に基づいて3Dモデルをレンダリングする処理を少なくとも行う。
このような本開示の情報処理装置は、端末装置内に設けられるプロセッサであるか、もしくはそのようなプロセッサを備えた端末装置自体と考えることができる。或いは本開示の情報処理装置は、表示装置内に設けられるプロセッサであるか、もしくはそのようなプロセッサを備えた表示装置自体と考えることもできる。さらに本開示の情報処理装置は、表示装置や端末装置とは別体の装置(例えばクラウドサーバ等)内に設けられるプロセッサであるか、もしくはそのようなプロセッサを備えた装置自体と考えることもできる。
<1.撮影システム及び映像コンテンツ制作>
<2.情報処理装置の構成>
<3.実施の形態のバーチャルプロダクション>
<4.第1の実施の形態:端末装置と表示装置による例>
<5.第2の実施の形態:端末装置と表示装置による例>
<6.第3の実施の形態:クラウドサーバを用いる例>
<7.第4の実施の形態:クラウドサーバを用いる例>
<8.第5の実施の形態:クラウドサーバを用いる例>
<9.第6の実施の形態:クラウドサーバを用いる例>
<10.第7の実施の形態:仮想映像追加技術の適用>
<11.まとめ及び変型例>
まず、バーチャルプロダクションとして、スタジオセットを用いた撮影システム及び映像コンテンツの制作について説明する。
図1は撮影システム500を模式的に示している。この撮影システム500はバーチャルプロダクションとしての撮影を行うシステムで、図では撮影スタジオに配置される機材の一部を示している。
なお背景映像vBのうちで撮影領域映像vBCを除いた部分は「アウターフラスタム」と呼ばれ、撮影領域映像vBCは「インナーフラスタム」と呼ばれる。
ここで説明している背景映像vBとは、撮影領域映像vBC(インナーフラスタム)を含んで背景として表示される映像全体を指す。
なお、実際には撮影領域映像vBCの範囲は、その時点でカメラ502によって撮影される範囲よりも少し広い範囲とされる。これはカメラ502のパン、チルトやズームなどにより撮影される範囲が若干変化したときに、描画遅延によってアウターフラスタムの映像が映り込んでしまうことを防止するためや、アウターフラスタムの映像からの回折光による影響を避けるためである。
このようにリアルタイムでレンダリングされた撮影領域映像vBCの映像は、アウターフラスタムの映像と合成される。背景映像vBで用いられるアウターフラスタムの映像は、予め3D背景データに基づいてレンダリングしたものであるが、そのアウターフラスタムの映像の一部に、リアルタイムでレンダリングした撮影領域映像vBCとして映像を組み込むことで、全体の背景映像vBを生成している。
なお、フォトグラメトリによる3Dデータ生成において、ライダーで取得した点群情報を用いても良い。
撮影情報としては各フレームタイミングでのカメラ502の位置情報、カメラの向き、画角、焦点距離、F値(絞り値)、シャッタースピード、レンズ情報などを含むことが想定される。
映像の調整として色調整、輝度調整、コントラスト調整などが行われる場合がある。
クリップ編集として、クリップのカット、順番の調整、時間長の調整などが行われる場合がある。
映像エフェクトとして、CG映像や特殊効果映像の合成などが行われる場合がある。
図5は、図1、図2、図3で概要を説明した撮影システム500の構成を示すブロック図である。
カメラトラッカー560による具体的な検出手法としては、天井にランダムに反射板を配置して、それらに対してカメラ502側から照射された赤外光の反射光から位置を検出する方法がある。また検出手法としては、カメラ502の雲台やカメラ502の本体に搭載されたジャイロ情報や、カメラ502の撮影映像の画像認識によりカメラ502の自己位置推定する方法もある。
またレンダリングエンジン520は、1フレーム毎の処理として、カメラトラッカー560やカメラ502から供給された撮影情報を用いて3D背景データに対する視点位置等を特定して撮影領域映像vBC(インナーフラスタム)のレンダリングを行う。
なお、ディスプレイコントローラ590を設けず、これらの処理をレンダリングエンジン520が行うようにしてもよい。つまりレンダリングエンジン520が分割映像信号nDを生成し、キャリブレーションを行い、各LEDパネル506に対して分割映像信号nDの伝送を行うようにしてもよい。
そしてアウターフラスタムとして用いる映像を生成する。
但し、複数のカメラ502を用いる場合、それぞれのカメラ502に対応する撮影領域映像vBCが干渉するという事情がある。例えば図7のように2台のカメラ502a、502bを用いる例では、カメラ502aに対応する撮影領域映像vBCを示しているが、カメラ502bの映像を用いる場合、カメラ502bに対応する撮影領域映像vBCも必要になる。単純に各カメラ502a、502bに対応するそれぞれの撮影領域映像vBCを表示させると、それらが互いに干渉する。このため撮影領域映像vBCの表示に関する工夫が必要とされる。
次に、上述のアセットクリエイションST1、プロダクションST2、ポストプロダクションST3や、後述する実施の形態で用いることができる情報処理装置70の構成例を図8で説明する。
情報処理装置70は、コンピュータ機器など、情報処理、特に映像処理が可能な機器である。この情報処理装置70としては、具体的には、PC(パーソナルコンピュータ)、ワークステーション、スマートフォンやタブレット等の携帯端末装置、ビデオ編集装置等が想定される。また情報処理装置70は、クラウドコンピューティングにおけるサーバ装置や演算装置として構成されるコンピュータ装置であってもよい。
また情報処理装置70は、プロダクションST2で用いる撮影システム500を構成するレンダリングエンジン520として機能できる。さらに情報処理装置70はアセットサーバ530としても機能できる。
また情報処理装置70は、ポストプロダクションST3における各種映像処理を行う映像編集装置としても機能できる。
この映像処理部85は例えば、CPU71とは別体のCPU、GPU(Graphics Processing Unit)、GPGPU(General-purpose computing on graphics processing units)、AI(artificial intelligence)プロセッサ等により実現できる。
なお映像処理部85はCPU71内の機能として設けられてもよい。
入力部76によりユーザの操作が検知され、入力された操作に応じた信号はCPU71によって解釈される。
入力部76としてはマイクロフォンも想定される。ユーザの発する音声を操作情報として入力することもできる。
表示部77は各種表示を行う表示部であり、例えば情報処理装置70の筐体に設けられるディスプレイデバイスや、情報処理装置70に接続される別体のディスプレイデバイス等により構成される。
表示部77は、CPU71の指示に基づいて表示画面上に各種の画像、操作メニュー、アイコン、メッセージ等、即ちGUI(Graphical User Interface)としての表示を行う。
例えば情報処理装置70がアセットサーバ530として機能する場合、記憶部79を利用して3D背景データ群を格納するDBを構築できる。
例えば情報処理装置70がレンダリングエンジン520として機能する場合、通信部80によりアセットサーバ530としてのDBにアクセスしたり、カメラ502やカメラトラッカー560からの撮影情報を受信したりすることができる。
またポストプロダクションST3に用いる情報処理装置70の場合も、通信部80によりアセットサーバ530としてのDBにアクセスすることなども可能である。
ドライブ81により、リムーバブル記録媒体82からは映像データや、各種のコンピュータプログラムなどを読み出すことができる。読み出されたデータは記憶部79に記憶されたり、データに含まれる映像や音声が表示部77や音声出力部78で出力されたりする。またリムーバブル記録媒体82から読み出されたコンピュータプログラム等は必要に応じて記憶部79にインストールされる。
CPU71や映像処理部85は、センサ部86からの情報に基づいて対応する処理を行うことができる。
センサ部86における具体的なセンサとしては、例えばToF(Time of Flight)センサ等の測距センサ、ライダー等の測距/方向センサ、位置情報センサ、照度センサ、赤外線センサ、タッチセンサなどがある。
またセンサ部86としてIMU(inertial measurement unit:慣性計測装置)が搭載され、例えばピッチ-、ヨー、ロールの3軸の角速度(ジャイロ)センサで角速度を検出することができるようにしてもよい。
カメラ部87はイメージセンサ及びイメージセンサで光電変換した信号の処理回路などを備える。カメラ部87により動画や静止画としての映像の撮影が行われる。
撮影された映像は、映像処理部85やCPU71で映像処理されたり、記憶部79に記憶されたり、表示部77で表示されたり、通信部80により他の装置に送信されたりする。
以下、実施の形態のバーチャルプロダクションについて説明する。
上述した大規模なスタジオセットとしての撮影システム500を用いたバーチャルプロダクションは、一般ユーザが手軽に実行できるものとはいえない。そこで実施の形態では、自宅等でも手軽にバーチャルプロダクション技術を用いた映像制作を行うことができるようにする技術を提案する。
例えばユーザは、自分のスマートフォンを端末装置1、自宅にあるテレビジョン受像器を表示装置2として、バーチャルプロダクションとしての撮影を実行する。
この場合、例えば端末装置1により表示装置2を認識させる。例えば端末装置1が撮影した映像上で表示装置2を認識する。これにより端末装置1が表示装置2を相対位置検出の対象の相手と認識する。具体的には例えばARマーカー3を備えたテレビジョン受像器などを表示装置2として認識すればよい。或いは近距離無線通信などで端末装置1と表示装置2がペアリングを行うようにしてもよい。さらには、表示装置2側から端末装置1を、相対位置検出の相手として認識してもよい。いずれにしても、相対位置検出の対象とするペアを、端末装置1、表示装置2の少なくとも一方で認識する。
このとき、端末装置1の表示装置2に対する相対位置に基づいて背景映像vBのレンダリングを行うようにすれば、端末装置1の位置を視点とする方向及び位置関係による視差に応じた背景映像vBを生成できることになる。つまり上述したインナーフラスタムと同等の背景映像vBを、表示装置2に表示できることになる。
従って、上述の図1の撮影システム500と同等の撮影を、スマートフォン等の端末装置1とテレビジョン受像器等の表示装置2で実現できる。
・端末装置1と表示装置2の相対位置の検出
・相対位置に基づく背景映像vBのレンダリング
・端末装置1の撮影機能による背景映像vBとオブジェクト10を含めた撮影
本実施の形態では、これら機能を端末装置1、表示装置2のいずれかで実現する。或いは後述するクラウドサーバ4を利用する場合もある。
これによりユーザが、自宅等でバーチャルプロダクションの撮影を行い、魅力的な動画を手軽に作成することができるようにする。例えば趣味で作成したアイテム、ペット、被写体人物のアクションなどの、魅力的な紹介動画を作成できるようにする。
撮影方向や視差を反映した背景映像vBを表示させるには、端末装置1で撮影を行っているときに、フレームタイミング毎に端末装置1と表示装置2の相対位置が検出される必要がある。なお、簡易的には、間欠的なフレームのタイミング毎でもよいが、より精密に視差を反映する背景映像vBとするには、端末装置1による撮影映像vCの全フレームのタイミング毎に相対位置検出が行われることが望ましい。
また端末装置1が、撮影した映像とIMUの検出データにより、自身で自己位置推定を行うこともできる。
これらの自己位置推定に基づいて、表示装置2との相対位置検出を行うことができる。
或いはユーザが表示装置2であるテレビジョン受像器等の製品名や型番などを入力して、アプリケーションプログラムがDB(データベース)にアクセスし、自動的にサイズ検索を行うことも考えられる。
また、より正確には表示装置2の筐体サイズではなく、図10Bに斜線を付した画面2aのサイズを検出したい。そこで、サイズ検出時には端末装置1から特定色の映像を表示装置2に送信して表示させ、撮影した映像内で当該色の範囲を検出し、実際のサイズを算出することも考えられる。
例えばアニメ調フィルタなどを適用することもできる。
そのようにすることで広範囲な撮影ができるため、表示装置2のARマーカー3や、SLAMのための環境認識にとって有利となる場合がある。
なお各図では表示装置2のディスプレイサイズ情報の検出及び送受信については示していないが、ディスプレイサイズ情報は撮影の開始から終了の間に変化するものではないため、何らかの手法で最初に1回検出され、レンダリングを行う装置が取得した状態とされればよい。
端末装置1は相対位置検出を行い、相対位置情報に基づいて、3D背景モデル5から背景映像vBをレンダリングする。
端末装置1は背景映像vBを表示装置2に送信する。
表示装置2は背景映像vBを表示する。
端末装置1は相対位置検出を行って相対位置情報RPを取得する。
端末装置1は相対位置情報RPを表示装置2に送信する。
表示装置2は相対位置情報に基づいて、3D背景モデル5から背景映像vBをレンダリングし、背景映像vBを表示する。
端末装置1は相対位置検出を行い、相対位置情報RPをクラウドサーバ4に送信する。
クラウドサーバ4は相対位置情報RPに基づいて、3D背景モデル5から背景映像vBをレンダリングする。
クラウドサーバ4は背景映像vBを端末装置1に送信する。
端末装置1はクラウドサーバ4から受信した背景映像vBを表示装置2に送信する。
表示装置2は背景映像vBを表示する。
端末装置1は相対位置検出を行い、相対位置情報RPをクラウドサーバ4に送信する。
クラウドサーバ4は相対位置情報RPに基づいて、3D背景モデル5から背景映像vBをレンダリングする。
クラウドサーバ4は背景映像vBを表示装置2に送信する。
表示装置2は背景映像vBを表示する。
表示装置2は相対位置検出を行い、相対位置情報RPをクラウドサーバ4に送信する。
クラウドサーバ4は相対位置情報RPに基づいて、3D背景モデル5から背景映像vBをレンダリングする。
クラウドサーバ4は背景映像vBを表示装置2に送信する。
表示装置2は背景映像vBを表示する。
端末装置1は相対位置検出を行い、相対位置情報RPを表示装置2に送信する。
表示装置2は端末装置1から受信した相対位置情報RPをクラウドサーバ4に送信する。
クラウドサーバ4は相対位置情報RPに基づいて、3D背景モデル5から背景映像vBをレンダリングする。
クラウドサーバ4は背景映像vBを表示装置2に送信する。
表示装置2は背景映像vBを表示する。
例えば以上のような構成でバーチャルプロダクションとしての撮影を行う場合の処理の流れを図18で説明する。図18の各ステップは、それぞれ、第1から第6の実施の形態の構成によるシステムにおけるいずれかの装置で実行される処理である。ここではシステム全体としての処理手順として記載した。
撮影終了と判定されるまでは、背景映像vB及び撮影映像vCの1フレームタイミング毎に、ステップST53からステップST56の処理が繰り返される。
なお背景映像vBのフレームタイミングと撮影映像vCのフレームタイミングは、同期関係が保たれるようにしている。
図12に示した第1の実施の形態における端末装置1と表示装置2の機能構成を図19に示す。
なお、各実施の形態で説明する機能構成は、端末装置1、表示装置2,或いはクラウドサーバ4において例えば図8の情報処理装置70としてのハードウェア構成、主に映像処理部85によって実現されるものである。
なお、端末装置1による撮影についてはフローチャート内に示していないが、基本的にはユーザ操作により、バーチャルプロダクションとしての撮影モードでの記録待機状態とされることで、動画の撮像(イメージセンサによる画像データの取得)は開始され、スルー画として画面1aへの撮影映像vCの表示が開始される。そして記録開始操作に応じて撮影映像vCが、映像コンテンツとして記録媒体に記録されていくことになる。また記録停止操作に応じて、映像コンテンツとして記録媒体への記録は停止され、記録待機状態となる。そして所定の終了操作によりバーチャルプロダクションとしての撮影は終了され、画面1aでの撮影映像vCの表示も終了される。
各実施の形態のフローチャートは、バーチャルプロダクションとしての撮影が開始されてから終了されるまでのフレームタイミング毎の処理を示している。
ステップS104で端末装置1は、背景レイアレンダリング部34により、ディスプレイサイズの情報と相対位置情報RPに基づき、3Dモデル管理部33から読み出された3D背景モデル5をオフスクリーンバッファにレンダリングする。即ち背景映像vBを生成する。オフスクリーンバッファは非表示画面であり、RAM73等に用意されるレンダリング映像の一時的なバッファエリアである。
表示装置2側の終了判定は、例えば背景映像vBのフレームの受信が所定時間以上、途絶えたことなどにより行うことができる。或いは終了時には端末装置1から終了指示の信号を送信するようにし、表示装置2はそれを受信することで終了判定するようにしてもよい。終了判定により表示装置2はバーチャルプロダクションの背景映像vBの表示の処理を終える。
ステップS203で表示装置2は、表示制御部37により、受信した背景映像vBを画面2aに表示する処理を行う。
端末装置1は、撮像部38により表示装置2とオブジェクト10の撮影を行っているが、ステップS106で、表示制御部39により、撮影により得られる各フレームの撮影映像vCを画面1aに表示する処理を行う。
図13に示した第2の実施の形態における端末装置1と表示装置2の機能構成を図21に示す。なお以降の各実施の形態において、既述の機能構成については、同一符号を付して詳細な重複説明を省略する。先の図11の説明を参照されたい。
表示装置2は、ディスプレイサイズ検出部31、3Dモデル管理部33、背景レイアレンダリング部34、通信制御部36、表示制御部37を有する。
なお、説明済みの処理については同一のステップ番号を付している。
ステップS110で端末装置1は、通信制御部35により相対位置情報RPを表示装置2に送信する処理を行う。
ステップS106で端末装置1は、表示制御部39により、撮像部38の撮影により得られる各フレームの撮影映像vCを画面1aに表示する処理を行う。
ステップS212で表示装置2は、背景レイアレンダリング部34により、ディスプレイサイズの情報と受信した相対位置情報RPに基づき、3Dモデル管理部33から読み出された3D背景モデル5をレンダリングして背景映像vBを生成する。
ステップS203で表示装置2は、表示制御部37により、生成した背景映像vBを画面2aに表示する処理を行う。
図14に示した第3の実施の形態における端末装置1、表示装置2、クラウドサーバ4の機能構成を図23に示す。
表示装置2は、通信制御部36、表示制御部37を有する。
クラウドサーバ4は、3Dモデル管理部33、背景レイアレンダリング部34、通信制御部40を有する。
ステップS304でクラウドサーバ4、背景レイアレンダリング部34により、ディスプレイサイズの情報と受信した相対位置情報RPに基づき、3Dモデル管理部33から読み出された3D背景モデル5をレンダリングして背景映像vBを生成する。
そしてクラウドサーバ4はステップS305で、背景映像vBを通信制御部40により端末装置1に送信する処理を行う。
またステップS106で端末装置1は、表示制御部39により、撮像部38の撮影により得られる各フレームの撮影映像vCを画面1aに表示する処理を行う。
図15に示した第4の実施の形態における端末装置1、表示装置2、クラウドサーバ4の機能構成を図25に示す。
図16に示した第5の実施の形態における端末装置1、表示装置2、クラウドサーバ4の機能構成を図27に示す。
表示装置2は、ディスプレイサイズ検出部31、相対位置検出部32、通信制御部36、表示制御部37を有する。
クラウドサーバ4は、3Dモデル管理部33、背景レイアレンダリング部34、通信制御部40を有する。
ステップS303でクラウドサーバ4は、通信制御部40により表示装置2から相対位置情報RPを受信する。
ステップS304でクラウドサーバ4、背景レイアレンダリング部34により、ディスプレイサイズの情報と受信した相対位置情報RPに基づき、3Dモデル管理部33から読み出された3D背景モデル5をレンダリングして背景映像vBを生成する。
そしてクラウドサーバ4はステップS305で、背景映像vBを通信制御部40により表示装置2に送信する処理を行う。
図17に示した第6の実施の形態における端末装置1、表示装置2、クラウドサーバ4の機能構成を図29に示す。
表示装置2は、ディスプレイサイズ検出部31、通信制御部36、表示制御部37を有する。
クラウドサーバ4は、3Dモデル管理部33、背景レイアレンダリング部34、通信制御部40を有する。
クラウドサーバ4はこれに応じてステップS301でディスプレイサイズ情報を受信し、その後のレンダリングのために記憶する。
続いて第7の実施の形態として、仮想映像追加技術を適用する例を説明する。特に、背景レイアとオーバレイレイアを連携させた撮影の例である。
なお、オブジェクト10の「前方」とは、オブジェクト10から見て端末装置1側、つまり撮影を行うデバイス側をいう。
このような機能を、例えば図1の撮影システム500で実装しようとすると、撮影システム500内の各機器の連携が必要となり、機器間の同期や描画データの受け渡しなどを行うために、実現には大幅な変更が必要である。しかしながら、例えば端末装置1内で撮影と描画を行うようにすれば、装置間にまたがらずに同等の処理を行うことができ、実現が容易となる。
前方領域61は、端末装置1による撮影映像vCにおいてオブジェクト10の前方となる領域である。後方領域62はオブジェクト10の後方となる領域である。他領域63,64は、オブジェクト10の前方又は後方とはならない領域である。
前景51はオブジェクト10そのものの映像である。背景レイア50は、表示装置2に表示された背景映像vBのレイアである。図31のような撮影によれば、撮影映像vCは、背景レイア50と前景51の映像を含むことになる。
上述のように、バーチャルプロダクション撮影では、オブジェクト10の後方の背景映像vBとして仮想的な映像を含むものとするが、オーバレイレイア52を考えることで、オブジェクト10の前方にも仮想的な映像を加えることができる。
この場合に、追加仮想映像11における前方領域61に属する部分は、オーバレイレイア52に描画する。追加仮想映像11における後方領域62に属する部分は背景映像vBに追加されるようにする。追加仮想映像11における他領域63,64に属する部分は、オーバレイレイア52に描画してもよいが、望ましくは背景映像vBに追加されるようにする。
オーバレイレイア52を用いる場合は、その追加仮想映像11をオーバレイレイア52の映像としてレンダリングし、レンダリングしたオーバレイレイア52を、撮影映像vCに合成するということになる。
追加仮想映像11を背景レイア50に追加する場合は、3D背景モデルを用いて背景映像vBのレンダリングを行う際に、追加仮想映像11における映像も付加する処理を行うこととなる。
図34は、背景映像vBとオブジェクト10を撮影した撮影映像vCに、仮想的なハート型や星形の追加仮想映像11a、11bが付加された例である。これは、例えばオブジェクト10の人物の位置から前方領域61の追加仮想映像11aをオーバレイレイア52に描画し、後方領域62や他領域63,64の追加仮想映像11bを背景レイア50に描画、つまり背景映像vBに含めるようにした例である。
例えば人物(オブジェクト10)の身体の、画像内における位置に応じて、追加仮想映像11(11a,11b)の位置を設定する。図33は、追加仮想映像11aがオブジェクト10の身体と重なるようにしている。また図34では、顔(頬)の部分に追加仮想映像11aが位置するようにした例である。
ユーザの指65のタッチのタイミングで、タッチによって指定された位置から、背景レイア50の背景映像vBに、稲妻のような追加仮想映像11bが加えられた例である。
もちろん、タッチ等で指定した位置から、オーバレイレイア52を用いた追加仮想映像11を付加することや、背景レイア50とオーバレイレイア52の両方にまたがる追加仮想映像11を付加することもできる。
撮影終了と判定されるまでは、背景映像vB及び撮影映像vCの1フレームタイミング毎に、ステップST13からステップST23の処理が繰り返される。
追加仮想映像11を適用するエフェクト開始タイミングは、例えばユーザのタッチ操作により指定される。タッチ操作以外の所定のユーザ操作により、追加仮想映像11を適用するエフェクト開始タイミングが指示されるようにしてもよい。
或いは画像認識処理によって、特定の被写体が検出されたら追加仮想映像11の適用するエフェクトが発動されるような自動処理でもよい。例えば笑顔を検出したら、所定の追加仮想映像11が付加されるようにする処理である。
また映像コンテンツのタイムスタンプとして、あらかじめ設定した時間になったら追加仮想映像11としてのエフェクトを発動するような処理を行ってもよい。
この場合は、ステップST17で3D背景モデルを用いて、例えば第1の実施の形態と同様に背景映像vBのレンダリングを行うことになる。
つまり、3D背景モデルを用いて、ディスプレイサイズ情報、相対位置情報RPに基づいてレンダリングが行われ、背景映像vBが生成される。
ステップST16では、システム内のいずれかの装置で、追加仮想映像11の適用設定が行われる。具体的には、背景レイア50についての追加仮想映像11の適用設定、オーバレイレイア52についての追加仮想映像11の適用設定の一方又は両方が行われる。
背景映像vB内における追加仮想映像11の画面内位置の設定も行われる。例えばタッチ位置や、認識や手指や身体のボーン認識などの物体検出結果に応じて、追加仮想映像11の画面内位置の設定が行われる。
つまり具体的には、背景レイア50に付与する追加仮想映像11としての画像内容、ステップST14の領域検出の結果に応じた追加仮想映像11のうちの描画すべき範囲の設定、操作或いは画像認識などに応じた画面内の位置の指定などが行われる。
つまり、3D背景モデルを用いて、ディスプレイサイズ情報、相対位置情報RPに基づいてレンダリングが行われるとともに、ステップST16の設定応じて追加仮想映像11が加えられた背景映像vBが生成される。
今回のフレームについて先にステップST16でのオーバレイレイア52における追加仮想映像11の適用設定が行われていない場合は、端末装置1はステップST19からステップST23に進み、今回の撮影映像vCのフレームをそのまま画面1aに表示させる処理を行う。
これは現在の撮影映像vCのフレームが、追加仮想映像11のエフェクトが発動されていない期間のフレームである場合、或いは発動期間であっても、追加仮想映像11の全てが背景レイア50に追加され、オーバレイレイア52を用いない場合である。
ステップST22で端末装置1は、合成後の撮影映像vCの映像全体へのフィルタを適用することができる。例えば、絵画調フィルタ、アニメ調フィルタなどを適用することで、画像エフェクトの一種としてのフィルタ処理を行うことができる。
そして端末装置1はステップST23で撮影映像vCを画面1aに表示させる。
なお、オーバレイレイア52の合成を行わない場合にステップST22のフィルタ処理を行うようにしてもよい。
このような処理を実行するための具体的な機能構成と、各装置の処理例を説明する。
この領域検出部44は、例えば情報処理装置70におけるカメラ部87、センサ部86、通信部80からの情報を用いて、CPU71や映像処理部85の処理機能として実現することができる。
ユーザのタッチ操作により追加仮想映像11のエフェクト発動とする場合は、入力操作受付部45から操作情報が3Dモデル管理部33に通知される。入力操作受付部45は、情報処理装置70における入力部76による入力を検知するCPU71や映像処理部85により実現することができる。
フィルタ処理部49は、画像合成部48で合成した映像に対してエフェクトとしてのフィルタ処理を行う。
画像合成部48、フィルタ処理部49は、例えば情報処理装置70における映像処理部85により実現される機能である。
そして端末装置1はステップS102でバーチャルプロダクション撮影の終了判定を行い、終了していない期間は、撮影映像vCのフレームタイミング毎に、ステップS103からS106を繰り返すことになる。
ステップS150で端末装置1は、領域検出部44により現在の撮影映像vCのフレームで、前方領域61、後方領域62、他領域63,64の検出を行う。
ステップS203で表示装置2は、表示制御部37により、受信した背景映像vBを画面2aに表示する処理を行う。
従って背景映像vBに追加仮想映像11が含まれていた場合は、表示装置2では、背景レイア50に追加仮想映像11が追加された状態の表示が行われる。
オーバレイレイア52による追加仮想映像11を加えない場合は、端末装置1はステップS106に進み、表示制御部39により、撮影により得られる各フレームの撮影映像vCを画面1aに表示する処理を行う。
そして端末装置1はステップS156で、画像合成部48で、現在の撮影映像vCのフレームにレンダリングしたオーバレイレイア52を合成する。
さらに端末装置1は、設定に応じてステップS157で、フィルタ処理部49によるフィルタ処理を実行する。
そして端末装置1はステップS106に進み、表示制御部39により、合成処理を経た撮影映像vCを画面1aに表示する処理を行う。
従ってユーザは、リアルタイムで追加仮想映像11が加えられた撮影映像vCを画面2a上で視認することができる。
またユーザは、タッチパネル操作などによるエフェクト発動の位置とタイミング指定を行うこともできる。
例えば背景レイア50に追加仮想映像11を加える背景レイアレンダリング部34は、端末装置1側以外に、表示装置2側に設けられる場合もあるし(図21参照)、或いはクラウドサーバ4側に設けられる場合もある(図23,図25,図27,図29参照)。
オーバレイレイアレンダリング部47や画像合成部48は、端末装置1に設けられる他、クラウドサーバ4に設けられて、撮影映像vCについて端末装置1とクラウドサーバ4が連携してオーバレイレイア52の処理を行うようにしてもよい。
以上の実施の形態によれば次のような効果が得られる。
実施の形態の情報処理装置70は、背景レイアレンダリング部34の機能を有する映像処理部85を備える。この映像処理部85は、表示装置2と、撮影機能を有する端末装置1とが関連付けられる状態で、オブジェクト10と、表示装置2に表示される背景映像vBとを端末装置1で撮影する場合に、表示装置2と端末装置1の相対位置情報RPに基づいて3Dモデルをレンダリングして表示装置2に表示される背景映像vBを生成する機能である。
例えばユーザが自宅にあるテレビジョン受像器などの表示装置2と、スマートフォンなどの端末装置1を用いて、表示装置2に表示される背景映像vBと、オブジェクト10とを撮影する。このときに端末装置1と表示装置2が、相対位置検出の対象として関連付けられて相対位置検出が行われることで、表示装置2に対する端末装置1から視点方向に応じた背景映像vBを生成し、表示装置2に表示させることができる。従って、専用スタジオ以外、例えば自宅等で容易にバーチャルプロダクション技術を適用した撮影を行うことができるようになる。
このような情報処理装置70は、端末装置1内に設けられるプロセッサであるか、もしくはそのようなプロセッサを備えた端末装置1自体と考えることができる。或いは情報処理装置70は、表示装置2内に設けられるプロセッサであるか、もしくはそのようなプロセッサを備えた表示装置2自体と考えることもできる。さらに情報処理装置70は、表示装置2や端末装置1とは別体の装置、例えばクラウドサーバ4内に設けられるプロセッサであるか、もしくはそのようなプロセッサを備えたクラウドサーバ4等の装置自体と考えることもできる。
端末装置1において相対位置に応じて3D背景モデル5のレンダリングを行って表示装置2に送信し、表示させる。そして端末装置1で撮影する。これにより、例えば映像受信可能な一般的なテレビジョン装置、モニタ装置等を表示装置2として用いて、端末装置1の機能により、自宅や外出先などにおける表示装置2がある環境で、手軽にバーチャルプロダクションを実現できる。
端末装置1で相対位置検出を行う場合、相対位置情報RPを端末装置1から受信する構成により、表示装置2が3D背景モデルのレンダリングを行って背景映像vBを生成し、表示する構成とすることができる。この場合、レンダリングを行う映像処理部85を備える表示装置2を導入することで、自宅等でバーチャルプロダクションを実現できる。端末装置1側は、相対位置情報RPを検出し表示装置2に送信する機能を備えればよいため、処理負荷が少なく、高い処理能力を必要としない。
例えば端末装置1、又は表示装置2の一方又は両方と通信可能な外部装置において背景映像vBのレンダリングを行うようにする。これにより、演算機能や記憶容量などのリソースの豊富な外部装置を利用して、端末装置1と表示装置2を用いたバーチャルプロダクションを実現できる。端末装置1や表示装置2に処理負担が少ないという利点が得られる。
なお実施の形態では外部装置としてクラウドサーバ4を挙げたが、外部装置としては例えばホームネットワークにおけるホームサーバ、専用のパーソナルコンピュータ、ワークステーション、端末装置1とは別のスマートフォンやタブレットやPC、或いはビデオ機器等のいわゆる家庭用電化製品などであってもよい。少なくとも本技術の映像処理部85を備えた情報処理装置70として機能できる装置であればよい。
例えば端末装置1、又は表示装置2が通信アクセス可能なクラウドサーバ4において背景映像vBのレンダリングを行うようにする。これによりクラウドサーバ4の処理機能を利用して、端末装置1と表示装置2を用いたバーチャルプロダクションを実現できる。端末装置1や表示装置2に処理負担が少ないという利点が得られる。クラウドサーバ4の処理能力を利用して、例えばデータサイズの大きい3D背景モデル5を用いて高精細な背景映像vBのレンダリングを行うことも可能となる。またクラウドサーバ4としてユーザにサービスを提供することで、バーチャルプロダクションによる映像制作機会を広く提供できる。
端末装置1において相対位置検出を行う場合に、例えばクラウドサーバ4が端末装置1から相対位置情報RPを受信できるようにすればよい。これにより端末装置1及び表示装置2以外の外部装置で、時点毎の相対位置情報RPに基づいたレンダリングが可能となり、各時点の端末装置1の視点位置に応じた背景映像vBを生成できる。
なお実施の形態では挙げていないが、例えば表示装置2側で検出した相対位置情報RPを端末装置1に転送し、端末装置1からクラウドサーバ4に送信するようにしてもよい。端末装置1がクラウドサーバ4にアクセスする環境における相対位置情報RPの送信手法の1つとなる。
例えば第5の実施の形態のように表示装置2において相対位置検出を行う場合には、例えばクラウドサーバ4が表示装置2から相対位置情報RPを受信できるようにすればよい。また第6の実施の形態のように、端末装置1で相対位置検出を行う場合でも、相対位置情報RPを表示装置2側に転送し、表示装置2からクラウドサーバ4に送信するようにしてもよい。これらにより端末装置1及び表示装置2以外の外部装置で、時点毎の相対位置情報RPに基づいたレンダリングが可能となり、各時点の端末装置1の視点位置に応じた背景映像vBを生成できる。
クラウドサーバ4が生成した背景映像vBは、端末装置1に送信し、端末装置1から表示装置2に送信する構成とする。これにより端末装置1と通信する外部装置で生成した背景映像vBをリアルタイムで表示装置2に表示させることができる。
なお実施の形態では挙げていないが、例えばクラウドサーバ4が、表示装置2から受信した相対位置情報に基づいてレンダリングした背景映像vBを、端末装置1に送信し、端末装置1から表示装置2に送信する構成も考えられる。
クラウドサーバ4が生成した背景映像vBは、表示装置2に送信する。これにより端末装置1と通信する外部装置で生成した背景映像vBをリアルタイムで表示装置2に表示させることができる。また背景映像vBを、端末装置1を介さないで表示装置2に送信することで、必要な通信量を低減し、通信付加の削減、通信速度向上、通信効率向上を促進できる。
撮影映像vCは、3D背景モデル5に基づく背景映像vBやオブジェクト10の映像に加えて追加仮想映像11が加わるようにする。これにより簡易的なバーチャルプロダクションにおいて追加仮想映像11を用いた、より多様な映像表現を可能とすることができる。
追加仮想映像11の付加処理は撮影時にリアルタイムで行うこともできるし、ポストプロダクションとして撮影後に行うようにしてもよい。
つまり撮影の際に、リアルタイムで追加仮想映像11が追加されるようにする。これにより簡易に、且つユーザが確認しやすい映像エフェクトを提供できる。
例えば撮影の際に、ユーザのタッチ操作などに応じて追加仮想映像11のエフェクトが発動されるようにする。これによりユーザが望みのタイミングで映像エフェクトを発動させるような撮影環境を提供できる。
例えばオブジェクトの種別、位置、撮影映像vC内のサイズなどに応じて、追加仮想映像11の種別、映像内での位置などのパラメータを決定する。オブジェクト10としての人の顔認識やボーン認識などにより、映像内の適切な場所に追加仮想映像11を追加することができる。
例えば図33,図34の追加仮想映像11aのように、オーバレイレイア52に追加仮想映像11が含まれるようにする。これにより実在するオブジェクト10の前方領域61に仮想的な映像を加えることができる。
このような仮想映像追加処理は、映像処理部85におけるオーバレイレイアレンダリング部47及び画像合成部48の機能で実現できる。
例えば図34,図35の追加仮想映像11bのように、背景映像vBに追加仮想映像11が含まれるようにする。これによりオブジェクト10の後方側となる領域(他領域63,64、後方領域62)に仮想的な映像を加えることができる。
このような仮想映像追加処理は、背景レイアレンダリング部34のレンダリングで実現できる。特に背景映像vBに追加仮想映像11が加わるようにすることで、実在するオブジェクト10には追加仮想映像11の映り込みも生じる。従って、仮想的な追加仮想映像11が実在するオブジェクト10に映り込むような、よりリアルな映像表現を容易に実現できる。これはポストプロダクションとして映り込みを加えるような困難な作業を必要としないという意味でもある。
このような仮想映像追加処理は、映像処理部85における背景レイアレンダリング部34の機能で実現できる。
フレーム毎に、オブジェクト10の周辺領域として、前方領域61、後方領域62、他領域63,64を判定することで、オブジェクト10との位置関係を考慮した追加仮想映像11を追加することができる。
例えば撮影の際に、ユーザは画面上で任意の位置にタッチすることで、エフェクトが発動される。これによりユーザは画面上での任意の位置かつ任意のタイミングで映像エフェクトを発動させるような撮影環境を提供できる。
スマートフォンなどの端末装置1を用いて背景映像vBとオブジェクト10を撮影した撮影映像vCは、その端末装置1で表示されることで、ユーザは、撮影映像vCを視認しながら撮影を行うことができる。つまり端末装置1を用いた簡易なバーチャルプロダクションを実現できる。
また撮影映像vCに追加仮想映像11をリアルタイムで追加する場合も、その追加仮想映像11についても端末装置1で撮影しながら確認できる。
例えば一眼カメラ、コンパクトデジタルカメラなどのカメラが、実施の形態で説明した機能を備えることで、本開示の情報処理装置70として実現できる。特に高解像度の処理や、レンズ精度の高いカメラを用いることで、より高品質なバーチャルプロダクション映像を撮影することができる。
またカメラ付きのウォッチデバイスに本機能を実装しても良い。これにより手に機材を構えることなく撮影を行えるようになる。
これによって撮影映像vCには、オブジェクト10、背景映像vBに加え、さらに背景後方のオブジェクト12を含ませるようにすることができる。
まず撮影用の端末装置1で周囲を旋回するように撮影して光源推定を行う。そして撮影現場の環境光に合わせ、背景映像vBの明るさを変化させる。撮影後に全体の明るさを変更することで、大まかに意図した明るさに調整する。このようにすれば、撮影現場の照明状況に対応しつつ、制作する映像コンテンツを意図した明るさの状態とすることができる。
そこで、ドローンや台車などで、表示装置2を端末装置1の正面に移動させて、撮影する範囲(画角)が、表示装置2の画面2aからはみ出ないようにすることが考えられる。
さらに、撮影映像vCが、背景映像vBより外にはみ出た時には、はみ出た領域は、オーバレイレイア52に背景を描画することで、背景映像vB外にはみ出たことが、映像上で分からないようにすることもできる。
また表示装置2の画面2aに、環境光が映り込む場合がある。
この場合、照明が映り込まないように表示装置2の画面2aを回転させると、端末装置1との向きが変わってしまうが、その画面2aの回転に合わせて、表示装置2に表示する背景映像vBを歪ませることで対応することができる。
即ち実施の形態のプログラムは、表示装置2と、撮影機能を有する端末装置1とが関連付けられる状態で、オブジェクトと、前記表示装置に表示される映像とを前記端末装置で撮影する場合に適用できるプログラムであり、表示装置2と端末装置1の相対位置情報RPに基づいて3Dモデルをレンダリングして表示装置2に表示される映像(背景映像vB)を生成する映像処理を情報処理装置70に実行させるプログラムである。
このようなプログラムにより、上述した端末装置1、表示装置2、又はクラウドサーバ4としての情報処理装置70を、各種のコンピュータ装置により実現できる。
また、このようなプログラムは、リムーバブル記録媒体からパーソナルコンピュータ等にインストールする他、ダウンロードサイトから、LAN(Local Area Network)、インターネットなどのネットワークを介してダウンロードすることもできる。
(1)
表示装置と、撮影機能を有する端末装置とが関連付けられる状態で、オブジェクトと、前記表示装置に表示される映像とを前記端末装置で撮影する場合に、前記表示装置と前記端末装置の相対位置情報に基づいて3Dモデルをレンダリングして前記表示装置に表示される映像を生成する映像処理部を備える
情報処理装置。
(2)
前記映像処理部は前記端末装置に設けられ、
前記映像処理部で3Dモデルをレンダリングした映像を前記表示装置に送信する構成とされる
上記(1)に記載の情報処理装置。
(3)
前記映像処理部は前記表示装置に設けられ、
前記映像処理部は、前記端末装置から受信した前記相対位置情報に基づいて3Dモデルをレンダリングして、表示する映像を生成する構成とされる
上記(1)に記載の情報処理装置。
(4)
前記映像処理部は、前記端末装置及び前記表示装置のいずれとも別体である外部装置に設けられ、
前記映像処理部は、受信した前記相対位置情報に基づいて3Dモデルをレンダリングして前記表示装置に表示させる映像を生成し、生成した映像を送信する構成とされる
上記(1)に記載の情報処理装置。
(5)
前記外部装置はクラウドサーバである
上記(4)に記載の情報処理装置。
(6)
前記映像処理部は、前記端末装置から受信した前記相対位置情報に基づいて3Dモデルをレンダリングして表示する映像を生成する
上記(4)又は(5)に記載の情報処理装置。
(7)
前記映像処理部は、前記表示装置から受信した前記相対位置情報に基づいて3Dモデルをレンダリングして表示する映像を生成する
上記(4)又は(5)に記載の情報処理装置。
(8)
前記映像処理部は、3Dモデルをレンダリングして生成した映像を前記端末装置に送信する処理を行う
上記(4)から(7)のいずれかに記載の情報処理装置。
(9)
前記映像処理部は、3Dモデルをレンダリングして生成した映像を前記表示装置に送信する処理を行う
上記(4)から(7)のいずれかに記載の情報処理装置。
(10)
前記映像処理部は、
前記表示装置に表示された映像とオブジェクトを前記端末装置で撮影した撮影映像において、3Dモデルによる映像及びオブジェクトの映像とともに追加仮想映像が含まれるようにする仮想映像追加処理を行う
上記(1)から(9)のいずれかに記載の情報処理装置。
(11)
前記映像処理部は、
前記端末装置による撮影時の映像の各フレームに対する処理において、前記撮影映像に前記追加仮想映像が含まれるようにする仮想映像追加処理を行う
上記(10)に記載の情報処理装置。
(12)
前記映像処理部は、
前記端末装置に対する所定操作に応じて、前記仮想映像追加処理を開始する
上記(10)又は(11)に記載の情報処理装置。
(13)
前記映像処理部は、
前記撮影映像についての画像認識処理に基づいて前記仮想映像追加処理の設定を行う
上記(10)から(12)のいずれかに記載の情報処理装置。
(14)
前記仮想映像追加処理は、撮影映像においてオブジェクトの映像にオーバレイするレイアに追加仮想映像を加える処理である
上記(10)から(13)のいずれかに記載の情報処理装置。
(15)
前記仮想映像追加処理は、3Dモデルをレンダリングして生成する前記表示装置に表示される映像に、追加仮想映像を加える処理である
上記(10)から(14)のいずれかに記載の情報処理装置。
(16)
前記映像処理部は、
前記撮影映像におけるオブジェクト周辺領域の判定を行い、判定に基づいて前記仮想映像追加処理を行う
上記(10)から(15)のいずれかに記載の情報処理装置。
(17)
前記表示装置に表示された映像とオブジェクトを撮影した前記端末装置による撮影映像は、前記端末装置の表示部において表示出力され、
前記表示部は画面が入力部とされ、
前記端末装置は、前記入力部に対するタッチ操作に応じて、前記仮想映像追加処理を開始する
上記(10)から(16)のいずれかに記載の情報処理装置。
(18)
前記表示装置に表示された映像とオブジェクトを撮影した前記端末装置による撮影映像は、前記端末装置において表示出力される
上記(1)から(16)のいずれかに記載の情報処理装置。
(19)
表示装置と、撮影機能を有する端末装置とが関連付けられる状態で、オブジェクトと、前記表示装置に表示される映像とを前記端末装置で撮影する場合に、前記表示装置と前記端末装置の相対位置情報に基づいて3Dモデルをレンダリングして前記表示装置に表示される映像を生成する映像処理を情報処理装置が行う
映像処理方法。
(20)
表示装置と、撮影機能を有する端末装置とが関連付けられる状態で、オブジェクトと、前記表示装置に表示される映像とを前記端末装置で撮影する場合に、前記表示装置と前記端末装置の相対位置情報に基づいて3Dモデルをレンダリングして前記表示装置に表示される映像を生成する映像処理を、
情報処理装置に実行させるプログラム。
2 表示装置
3 ARマーカー
4 クラウドサーバ
5 3D背景モデル
10,12 オブジェクト
11,11a,11b 追加仮想映像
16 無効領域枠
31 ディスプレイサイズ検出部
32 相対位置検出部
33 3Dモデル管理部
34 背景レイアレンダリング部
35,36 通信制御部
37 表示制御部
38 撮像部
39 表示制御部
40 通信制御部
44 領域検出部
45 入力操作受付部
46 画像認識処理部
47 オーバレイレイアレンダリング部
48 画像合成部
49 フィルタ処理部
70 情報処理装置、
71 CPU
85 映像処理部
vB 背景映像
vC 撮影映像
RP 相対位置情報
Claims (20)
- 表示装置と、撮影機能を有する端末装置とが関連付けられる状態で、オブジェクトと、前記表示装置に表示される映像とを前記端末装置で撮影する場合に、前記表示装置と前記端末装置の相対位置情報に基づいて3Dモデルをレンダリングして前記表示装置に表示される映像を生成する映像処理部を備える
情報処理装置。 - 前記映像処理部は前記端末装置に設けられ、
前記映像処理部で3Dモデルをレンダリングした映像を前記表示装置に送信する構成とされる
請求項1に記載の情報処理装置。 - 前記映像処理部は前記表示装置に設けられ、
前記映像処理部は、前記端末装置から受信した前記相対位置情報に基づいて3Dモデルをレンダリングして、表示する映像を生成する構成とされる
請求項1に記載の情報処理装置。 - 前記映像処理部は、前記端末装置及び前記表示装置のいずれとも別体である外部装置に設けられ、
前記映像処理部は、受信した前記相対位置情報に基づいて3Dモデルをレンダリングして前記表示装置に表示させる映像を生成し、生成した映像を送信する構成とされる
請求項1に記載の情報処理装置。 - 前記外部装置はクラウドサーバである
請求項4に記載の情報処理装置。 - 前記映像処理部は、前記端末装置から受信した前記相対位置情報に基づいて3Dモデルをレンダリングして表示する映像を生成する
請求項4に記載の情報処理装置。 - 前記映像処理部は、前記表示装置から受信した前記相対位置情報に基づいて3Dモデルをレンダリングして表示する映像を生成する
請求項4に記載の情報処理装置。 - 前記映像処理部は、3Dモデルをレンダリングして生成した映像を前記端末装置に送信する処理を行う
請求項4に記載の情報処理装置。 - 前記映像処理部は、3Dモデルをレンダリングして生成した映像を前記表示装置に送信する処理を行う
請求項4に記載の情報処理装置。 - 前記映像処理部は、
前記表示装置に表示された映像とオブジェクトを前記端末装置で撮影した撮影映像において、3Dモデルによる映像及びオブジェクトの映像とともに追加仮想映像が含まれるようにする仮想映像追加処理を行う
請求項1に記載の情報処理装置。 - 前記映像処理部は、
前記端末装置による撮影時の映像の各フレームに対する処理において、前記撮影映像に前記追加仮想映像が含まれるようにする仮想映像追加処理を行う
請求項10に記載の情報処理装置。 - 前記映像処理部は、
前記端末装置に対する所定操作に応じて、前記仮想映像追加処理を開始する
請求項10に記載の情報処理装置。 - 前記映像処理部は、
前記撮影映像についての画像認識処理に基づいて前記仮想映像追加処理の設定を行う
請求項10に記載の情報処理装置。 - 前記仮想映像追加処理は、撮影映像においてオブジェクトの映像にオーバレイするレイアに追加仮想映像を加える処理である
請求項10に記載の情報処理装置。 - 前記仮想映像追加処理は、3Dモデルをレンダリングして生成する前記表示装置に表示される映像に、追加仮想映像を加える処理である
請求項10に記載の情報処理装置。 - 前記映像処理部は、
前記撮影映像におけるオブジェクト周辺領域の判定を行い、判定に基づいて前記仮想映像追加処理を行う
請求項10に記載の情報処理装置。 - 前記表示装置に表示された映像とオブジェクトを撮影した前記端末装置による撮影映像は、前記端末装置の表示部において表示出力され、
前記表示部は画面が入力部とされ、
前記端末装置は、前記入力部に対するタッチ操作に応じて、前記仮想映像追加処理を開始する
請求項10に記載の情報処理装置。 - 前記表示装置に表示された映像とオブジェクトを撮影した前記端末装置による撮影映像は、前記端末装置において表示出力される
請求項1に記載の情報処理装置。 - 表示装置と、撮影機能を有する端末装置とが関連付けられる状態で、オブジェクトと、前記表示装置に表示される映像とを前記端末装置で撮影する場合に、前記表示装置と前記端末装置の相対位置情報に基づいて3Dモデルをレンダリングして前記表示装置に表示される映像を生成する映像処理を情報処理装置が行う
映像処理方法。 - 表示装置と、撮影機能を有する端末装置とが関連付けられる状態で、オブジェクトと、前記表示装置に表示される映像とを前記端末装置で撮影する場合に、前記表示装置と前記端末装置の相対位置情報に基づいて3Dモデルをレンダリングして前記表示装置に表示される映像を生成する映像処理を、
情報処理装置に実行させるプログラム。
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP22895324.6A EP4436159A4 (en) | 2021-11-17 | 2022-10-19 | Information processing apparatus, image processing method, and program |
| US18/697,017 US12621404B2 (en) | 2021-11-17 | 2022-10-19 | Information processing device, video processing method, and program |
| CN202280074736.6A CN118216136A (zh) | 2021-11-17 | 2022-10-19 | 信息处理装置、图像处理方法和程序 |
| JP2023561472A JPWO2023090038A1 (ja) | 2021-11-17 | 2022-10-19 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2021-186952 | 2021-11-17 | ||
| JP2021186952 | 2021-11-17 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023090038A1 true WO2023090038A1 (ja) | 2023-05-25 |
Family
ID=86396668
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2022/038981 Ceased WO2023090038A1 (ja) | 2021-11-17 | 2022-10-19 | 情報処理装置、映像処理方法、プログラム |
Country Status (4)
| Country | Link |
|---|---|
| EP (1) | EP4436159A4 (ja) |
| JP (1) | JPWO2023090038A1 (ja) |
| CN (1) | CN118216136A (ja) |
| WO (1) | WO2023090038A1 (ja) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2025177810A1 (ja) * | 2024-02-19 | 2025-08-28 | ソニーグループ株式会社 | 情報処理装置、および情報処理方法、並びにプログラム |
| WO2025205012A1 (ja) * | 2024-03-29 | 2025-10-02 | ソニーグループ株式会社 | 情報処理装置、情報処理方法、およびプログラム |
| EP4686190A1 (en) * | 2024-07-22 | 2026-01-28 | Canon Kabushiki Kaisha | Image processing apparatus, image processing method, and computer program |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2019082680A (ja) * | 2017-10-27 | 2019-05-30 | 三星電子株式会社Samsung Electronics Co.,Ltd. | 3次元ディスプレイ装置のキャリブレーション方法、装置及び動作方法 |
| US20200145644A1 (en) | 2018-11-06 | 2020-05-07 | Lucasfilm Entertainment Company Ltd. LLC | Immersive content production system with multiple targets |
| US20210342971A1 (en) * | 2020-04-29 | 2021-11-04 | Lucasfilm Entertainment Company Ltd. | Photogrammetric alignment for immersive content production |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR101690955B1 (ko) * | 2010-10-04 | 2016-12-29 | 삼성전자주식회사 | 증강 현실을 이용한 영상 데이터 생성 방법 및 재생 방법, 그리고 이를 이용한 촬영 장치 |
| JP5145444B2 (ja) * | 2011-06-27 | 2013-02-20 | 株式会社コナミデジタルエンタテインメント | 画像処理装置、画像処理装置の制御方法、及びプログラム |
| US9710972B2 (en) * | 2014-05-30 | 2017-07-18 | Lucasfilm Entertainment Company Ltd. | Immersion photography with dynamic matte screen |
| BR102019006465A2 (pt) * | 2019-03-29 | 2020-10-13 | Globo Comunicação E Participações S.a. | Sistema e método de captação e projeção de imagem, e, uso do sistema. |
-
2022
- 2022-10-19 WO PCT/JP2022/038981 patent/WO2023090038A1/ja not_active Ceased
- 2022-10-19 JP JP2023561472A patent/JPWO2023090038A1/ja active Pending
- 2022-10-19 CN CN202280074736.6A patent/CN118216136A/zh active Pending
- 2022-10-19 EP EP22895324.6A patent/EP4436159A4/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2019082680A (ja) * | 2017-10-27 | 2019-05-30 | 三星電子株式会社Samsung Electronics Co.,Ltd. | 3次元ディスプレイ装置のキャリブレーション方法、装置及び動作方法 |
| US20200145644A1 (en) | 2018-11-06 | 2020-05-07 | Lucasfilm Entertainment Company Ltd. LLC | Immersive content production system with multiple targets |
| US20210342971A1 (en) * | 2020-04-29 | 2021-11-04 | Lucasfilm Entertainment Company Ltd. | Photogrammetric alignment for immersive content production |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4436159A4 |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2025177810A1 (ja) * | 2024-02-19 | 2025-08-28 | ソニーグループ株式会社 | 情報処理装置、および情報処理方法、並びにプログラム |
| WO2025205012A1 (ja) * | 2024-03-29 | 2025-10-02 | ソニーグループ株式会社 | 情報処理装置、情報処理方法、およびプログラム |
| EP4686190A1 (en) * | 2024-07-22 | 2026-01-28 | Canon Kabushiki Kaisha | Image processing apparatus, image processing method, and computer program |
Also Published As
| Publication number | Publication date |
|---|---|
| JPWO2023090038A1 (ja) | 2023-05-25 |
| EP4436159A1 (en) | 2024-09-25 |
| CN118216136A (zh) | 2024-06-18 |
| EP4436159A4 (en) | 2025-02-26 |
| US20240406338A1 (en) | 2024-12-05 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12500993B2 (en) | Information processing device, video processing method, and program | |
| WO2023090038A1 (ja) | 情報処理装置、映像処理方法、プログラム | |
| US20250301099A1 (en) | Information processing apparatus, information processing method, program, and information processing system | |
| JP2026069631A (ja) | 画像処理方法、プログラム、画像処理装置および画像処理システム | |
| US11287658B2 (en) | Picture processing device, picture distribution system, and picture processing method | |
| WO2019123509A1 (ja) | 端末装置、システム、プログラム及び方法 | |
| WO2023176269A1 (ja) | 情報処理装置、情報処理方法、プログラム | |
| CN112866507A (zh) | 智能化的全景视频合成方法、系统、电子设备及介质 | |
| US12536754B2 (en) | Previsualization devices and systems for the film industry | |
| US20240414427A1 (en) | Information processing device, video processing method, and program | |
| WO2024048295A1 (ja) | 情報処理装置、情報処理方法、プログラム | |
| GB2566006A (en) | Three-dimensional video processing | |
| JP7011728B2 (ja) | 画像データ出力装置、コンテンツ作成装置、コンテンツ再生装置、画像データ出力方法、コンテンツ作成方法、およびコンテンツ再生方法 | |
| US12621404B2 (en) | Information processing device, video processing method, and program | |
| CN116503522A (zh) | 互动画面的渲染方法、装置、设备、存储介质及程序产品 | |
| JP7745726B1 (ja) | 映像生成装置、映像生成システム、および映像生成方法 | |
| US20250328304A1 (en) | Information processing device, information processing method, and imaging system | |
| US20250301100A1 (en) | Switcher device, control method, and imaging system | |
| EP4579583A1 (en) | Information processing device, information processing method, and program | |
| WO2026094726A1 (ja) | 情報処理システム、情報処理方法および情報処理プログラム | |
| WO2024075525A1 (ja) | 情報処理装置およびプログラム | |
| WO2025052902A1 (ja) | 画像処理方法、画像処理装置、画像処理システム |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22895324 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2023561472 Country of ref document: JP Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202280074736.6 Country of ref document: CN |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2022895324 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2022895324 Country of ref document: EP Effective date: 20240617 |