WO2018121524A1 - 一种数据处理方法及装置、采集设备及存储介质 - Google Patents
一种数据处理方法及装置、采集设备及存储介质 Download PDFInfo
- Publication number
- WO2018121524A1 WO2018121524A1 PCT/CN2017/118600 CN2017118600W WO2018121524A1 WO 2018121524 A1 WO2018121524 A1 WO 2018121524A1 CN 2017118600 W CN2017118600 W CN 2017118600W WO 2018121524 A1 WO2018121524 A1 WO 2018121524A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio
- orientation
- audio data
- information
- devices
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
- H04S7/303—Tracking of listener position or orientation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/204—Image signal generators using stereoscopic image cameras
- H04N13/243—Image signal generators using stereoscopic image cameras using three or more two-dimensional [2D] image sensors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/698—Control of cameras or camera modules for achieving an enlarged field of view, e.g. panoramic image capture
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/90—Arrangement of cameras or camera modules, e.g. multiple cameras in TV studios or sports stadiums
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; ELECTRIC HEARING AIDS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers
- H04R3/12—Circuits for transducers for distributing signals to two or more loudspeakers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
- H04S3/008—Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S5/00—Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation
- H04S5/005—Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation of the pseudo five- or more-channel type, e.g. virtual surround
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/301—Automatic calibration of stereophonic sound system, e.g. with test microphone
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/307—Frequency adjustment, e.g. tone control
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/01—Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
Definitions
- the present invention relates to virtual reality (VR) technology, and in particular, to a data processing method and device, an acquisition device, and a storage medium.
- VR virtual reality
- VR technology is an important direction of simulation technology, and it is a challenging cross-cutting discipline and research field.
- VR technology mainly refers to a highly realistic computer simulation environment in terms of sight, hearing, touch, smell and taste.
- VR technology uses computer technology to simulate the creation of a three-dimensional virtual world, allowing users to perceive things in the virtual space instantly and without idleness.
- VR technology is a comprehensive technology of multiple disciplines, including: computer graphics technology, multimedia technology, sensor technology, human-computer interaction technology, network technology, stereo imaging technology and computer simulation technology.
- the embodiment of the invention provides a data processing method and device, an acquisition device and a storage medium.
- the embodiment of the invention provides a data processing method, including:
- the collection space corresponding to the collection device forms a geometry; the spatial orientation of the video capture device deployed in the collection device covers the entire geometry; and the setting orientation of each video capture device is correspondingly set N audio collection devices; N is a positive integer;
- N audio collection devices corresponding to the set orientation of each video capture device, according to the spatial information of the audio capture device, encode the audio data collected by the N audio capture devices to form M-channel audio data; the M-channel audio The data carries spatial information of the audio; M is a positive integer.
- the spatial information of the audio carried in the M-channel audio data is at least one of the following:
- N audio capture devices physical or sound space information
- the center point space information of the captured video is the center point space information of the captured video.
- the spatial information of the audio carried in the M-channel audio data is expressed in at least one of the following:
- the video data and the audio data collected by the collecting device satisfy at least one of the following:
- the video data collected by all video capture devices can be restored to a sphere
- the audio data collected by all audio capture devices can be restored to a sphere.
- the method further includes:
- the embodiment of the invention further provides a data processing method, including:
- Q is a positive integer
- M channel audio data is rendered using the determined Q speaker devices.
- the method further includes:
- the audio data rendered by each speaker device is adjusted according to the orientation information of the Q speaker devices and the spatial information of the audio device.
- the M channel audio data is rendered by using the determined Q speaker devices, including at least one of the following:
- the orientation of the speaker is consistent with the audio collection device vector, corresponding to rendering
- the speakers corresponding to the orientation corresponding to the spatial information of the audio are rendered one by one, and the speakers that do not correspond to the orientation corresponding to the spatial information of the audio are not rendered;
- the at least two speakers that satisfy the preset condition render the same audio data; the preset condition represents that the distance between the orientation of the speaker and the orientation corresponding to the spatial information of the audio is less than the preset distance.
- the embodiment of the present invention further provides a collection device, where the collection space corresponding to the collection device forms a geometry, and the collection device includes: a video collection device and an audio collection device;
- the spatial orientation of the video capture device deployed in the capture device covers the entire geometry; the set orientation of each video capture device is correspondingly set with N audio capture devices; N is a positive integer.
- the collecting device further includes:
- a mobile device configured to receive a control command, move the collection device in response to the control command, to cause the collection device to collect data while moving; or to cause the collection device to be stationary to enable the acquisition in a stationary state
- the device collects data.
- the video capture device and the audio capture device are set at least one of the following positions:
- the video data collected by all video capture devices can be restored to a sphere
- the audio data collected by all audio capture devices can be restored to a sphere.
- At least one video capture device is disposed on each side of the geometry.
- the collecting device further includes:
- the processor is configured to acquire the spatial information of the audio collection device of the collection device for the N audio collection devices corresponding to the set orientation of each video capture device, and collect the N audio capture devices according to the spatial information of the audio capture device.
- the audio data is encoded to form M-channel audio data; the M-channel audio data carries spatial information of the audio; M is a positive integer.
- the embodiment of the invention further provides a data processing device, including:
- the acquiring unit is configured to acquire spatial information of the audio collecting device of the collecting device; the collecting space corresponding to the collecting device forms a geometric body; the spatial orientation of the video capturing device deployed in the collecting device covers the entire geometric body; and each video capturing device Set the orientation corresponding to N audio collection devices; N is a positive integer;
- the coding unit is configured to encode the audio data collected by the N audio collection devices according to the spatial information of the audio collection device to form the M channel audio data, according to the N audio collection devices corresponding to the set orientation of each video capture device;
- the M channel audio data carries spatial information of the audio; M is a positive integer.
- the device further includes:
- a processing unit configured to store or issue M-channel audio data.
- the embodiment of the invention further provides a data processing device, including:
- a receiving unit configured to receive the encoded M channel audio data
- a decoding unit configured to decode the encoded M-channel audio data to obtain spatial information of the corresponding audio;
- M is a positive integer;
- the first determining unit is configured to determine, according to the obtained spatial information of the audio and the orientation information of the speaker device, the Q speaker devices corresponding to the M channel audio data; Q is a positive integer;
- a rendering unit configured to render M-channel audio data using the determined Q speaker devices.
- the device further includes:
- a second determining unit configured to obtain orientation information of the viewpoint and/or the region of interest on the projection map according to the motion posture of the user
- the rendering unit is further configured to adjust audio data rendered by each speaker device according to the orientation information of the Q speaker devices and the spatial information of the audio device.
- Embodiments of the present invention also provide a storage medium having stored thereon a computer program that, when executed by a processor, implements the steps of any of the above methods.
- the collection space corresponding to the collection device forms a geometry; the spatial orientation of the video capture device deployed in the collection device covers the entire geometry; and each video capture device Set the orientation corresponding to N audio collection devices; N is a positive integer, and the spatial orientation of the video acquisition device and the audio collection device is covered by the entire geometry, so that true omnidirectional audio acquisition can be realized.
- the N audio collection devices corresponding to the set orientation of each video capture device acquire the spatial information of the audio collection device of the collection device; according to the spatial information of the audio collection device, the audio data collected by the N audio collection devices are performed.
- Encoding to form M-channel audio data the M-channel audio data carries spatial information of the audio; M is a positive integer; and after receiving the encoded M-channel audio data, decoding the encoded M-channel audio data to obtain a corresponding The spatial information of the audio; determining the Q speaker devices corresponding to the M channel audio data according to the obtained spatial information of the audio and the orientation information of the speaker device; Q is a positive integer; rendering the M by using the determined Q speaker devices Road audio data; each video capture device is set with N audio capture devices, and the audio data has corresponding spatial information, so that the audio data can be immersed in audio presentation, and the audio and time-frequency spatial orientation can be matched.
- the effect of synchronization the M-channel audio data carries spatial information of the audio; M is a positive integer; and after receiving the encoded M-channel audio data, decoding the encoded M-channel audio data to obtain a corresponding The spatial information of the audio; determining the Q speaker devices corresponding to the M channel audio data according to the obtained spatial information of the audio
- FIG. 1 is a schematic flowchart of a method for data processing according to an embodiment of the present invention
- FIG. 2 is a schematic flowchart of another method for data processing according to Embodiment 1 of the present invention.
- FIG. 3 is a schematic structural diagram of a collection device according to Embodiment 2 of the present invention.
- FIGS. 4A-4B are schematic structural diagrams of a collection device according to Embodiment 3 of the present invention.
- FIGS. 5A-5B are schematic structural diagrams of another collection device according to Embodiment 3 of the present invention.
- 6A-6B are schematic diagrams showing the structure of another acquisition device according to Embodiment 3 of the present invention.
- FIG. 7 is a schematic diagram of spatial information of a four-spherical coordinate system according to an embodiment of the present invention.
- FIG. 8 is a schematic diagram of spatial information of an information type below a cube according to an embodiment of the present invention.
- FIG. 9 is a schematic diagram of spatial information after spatial information expansion shown in FIG. 8 according to Embodiment 4 of the present invention.
- FIG. 10 is a schematic diagram of audio rendering when the fifth embodiment of the present invention is larger than the number of speakers;
- 11 is a schematic diagram of audio rendering when the fifth M is equal to the number of speakers according to the embodiment of the present invention.
- 12A-12B are schematic diagrams showing audio rendering when the fifth M is smaller than the number of speakers according to the embodiment of the present invention.
- FIGS. 13A-13B are schematic diagrams of audio rendering based on an interest region or a viewpoint according to Embodiment 6 of the present invention.
- 14A-14B are another schematic diagram of audio rendering based on an area of interest or a viewpoint
- FIG. 15 is a schematic diagram of an audio rendering process based on an interest area or a viewpoint according to Embodiment 6 of the present invention.
- FIG. 16 is a schematic structural diagram of a data processing apparatus according to Embodiment 7 of the present invention.
- FIG. 17 is a schematic structural diagram of another data processing apparatus according to Embodiment 7 of the present invention.
- VR audio capture devices usually only have 1 to 2 microphones, so true omnidirectional audio capture is not achieved.
- the transmission and storage process is currently also focused on video, including video orientation, view-based (VP) and interest area (ROI) video tiled storage and transmission.
- video orientation since audio collection does not carry spatial information related to the corresponding area video; in addition, transmission and storage processes are not considered to carry spatial information (including three-dimensional space coordinates or surface information), so audio and video for VP and ROI are not considered.
- the mating rendering of the orientation has not been considered too much.
- the audio and time-frequency spatial orientation of the virtual reality acquisition can be matched and rendered, but the matching rendering cannot achieve the effect of the orientation synchronization.
- the spatial information of the audio collection device of the collection device is acquired; the collection space corresponding to the collection device forms a geometry; and the spatial orientation of the video capture device deployed in the collection device covers the entire Geometry; each video capture device is set with N audio capture devices; N is a positive integer; according to the spatial information of the audio capture device, the audio data collected by the N audio capture devices is encoded to form M-channel audio data.
- the M channel audio data carries spatial information of the audio; M is a positive integer.
- the embodiment of the invention provides a data processing method, as shown in FIG. 1 , the method includes:
- Step 101 Acquire spatial information of an audio collection device of the collection device.
- the collection space corresponding to the collection device forms a geometry; the spatial orientation of the video capture device deployed in the collection device covers the entire geometry; and the setting orientation of each video capture device is correspondingly set with N audio collection devices; Integer.
- the spatial orientation of the video capture device deployed in the capture device covers the entire geometry.
- the at least two video capture devices of the capture device are disposed in a spatial orientation corresponding to the geometric body, and at least two video capture devices are deployed.
- the orientation covers the entire geometry surface.
- all of the video capture devices and audio capture devices set up on the capture device cover the entire geometry surface in a spatial orientation.
- the collection device may be a VR collection device.
- the audio collection device may be a microphone; the video capture device may be a camera.
- the collected audio data sent by the collection device can carry the spatial information of the audio collection device, and the spatial information of each audio collection device can be obtained.
- the spatial information of the audio collection device can be one of the following:
- At least one of the video data and the audio data collected by the collection device meets at least one of the following:
- the video data collected by all video capture devices can be directly restored to a sphere
- the audio data collected by all audio capture devices can be directly restored to a sphere.
- Step 102 N audio collection devices corresponding to the set orientation of each video capture device are used to encode the audio data collected by the N audio capture devices according to the spatial information of the audio capture device to form M-channel audio data.
- the M channel audio data carries spatial information of the audio; M is a positive integer.
- the format of the encoded audio data may be: mpega, aac, mp3, G.711, and the like.
- the spatial information of the audio carried in the M-channel audio data is at least one of the following:
- N audio capture devices physical or sound space information
- the center point space information of the captured video is the center point space information of the captured video.
- the spatial information of the audio carried in the M-channel audio data may be one of the following:
- N audio capture devices physical or sound space information
- the center point space information of the captured video is the center point space information of the captured video.
- the spatial information of the audio carried in the M-channel audio data may be one of the following:
- the center point space information of the captured video is the center point space information of the captured video.
- the spatial information of the audio carried in the M-channel audio data may be expressed in at least one of the following:
- the encoded audio data can be saved or transmitted to achieve virtual reality interactivity.
- the method may further include:
- an embodiment of the present invention further provides a data processing method, which can be considered as a method for implementing virtual reality interactivity. As shown in Figure 2, the method includes:
- Step 201 Receive encoded M-channel audio data.
- the received encoded M-channel audio data is encoded data obtained by the method shown in FIG. 1.
- Step 202 Decode the encoded M-channel audio data to obtain spatial information of the corresponding audio.
- M is a positive integer.
- Step 203 Determine, according to the obtained spatial information of the audio and the orientation information of the speaker device, the Q speaker devices corresponding to the M channel audio data;
- Q is a positive integer.
- the total number of speaker devices is L
- L is a positive integer
- the spatial information of the audio is centered on the orientation of each speaker device, and at least one audio data within a preset radius corresponds to one speaker device;
- each channel of audio data corresponds to one speaker device
- Q speaker devices are selected from the L speaker devices; or, the orientation of the speaker device is in each audio data.
- the spatial orientation is centered, and at least one speaker device within a predetermined radius is used as a speaker device for each audio data.
- the preset radius can be set as needed.
- Step 204 Render M channel audio data by using the determined Q speaker devices.
- the audio data to be rendered by each speaker device is not all the way, but at least two ways. In this case, at least two pieces of audio data need to be mixed, that is, the mixing process is performed. .
- using the determined Q speaker devices to render the M channel audio data may include at least one of the following:
- the orientation of the speaker is consistent with the audio collection device vector, corresponding to rendering
- the speakers corresponding to the orientation corresponding to the spatial information of the audio are rendered one by one, and the speakers that do not correspond to the orientation corresponding to the spatial information of the audio are not rendered;
- the at least two speakers that satisfy the preset condition render the same audio data; the preset condition represents that the distance between the orientation of the speaker and the orientation corresponding to the spatial information of the audio is less than the preset distance.
- the preset radius and the preset distance may be set as needed.
- audio rendering can also be performed in conjunction with the region of interest or viewpoint of the video.
- the method may further include:
- the audio data rendered by each speaker device is adjusted according to the orientation information of the Q speaker devices and the spatial information of the audio device.
- the orientation information of the speaker device may be the orientation information corresponding to the projection mapping body, or may not be the corresponding orientation information on the projection mapping body.
- the orientation information is calculated to determine how to adjust the audio data rendered by the speaker device.
- the encoded M-channel audio data formed performs the above-mentioned rendering operation.
- the data processing method provided by the embodiment of the present invention acquires the spatial information of the audio collection device of the collection device; the spatial orientation of the video acquisition device deployed in the collection device covers the entire geometry; and the setting orientation of each video capture device is set to N.
- An audio collection device; N is a positive integer; N audio collection devices corresponding to the set orientation of each video capture device, according to the spatial information of the audio capture device, encode the audio data collected by the N audio capture devices to form M-channel audio data; the M-channel audio data carries spatial information of the audio; M is a positive integer; and after receiving the encoded M-channel audio data, decoding the encoded M-channel audio data to obtain corresponding audio Spatial information; determining Q sounding devices corresponding to the M channel audio data according to the obtained spatial information of the audio and the orientation information of the speaker device; Q is a positive integer; rendering the M channel audio data by using the determined Q speaker devices
- the setting orientation of each video capture device is correspondingly set with N audio collection devices, and the audio data has corresponding empty Information,
- the collection device provided by the embodiment of the present invention can collect audio data in an omnidirectional manner.
- the embodiment provides a collection device, and the collection space corresponding to the collection device forms a geometry.
- the collection device includes: a video collection device 31 and an audio collection device 32.
- the spatial orientation deployed by the video capture device 31 in the collection device covers the entire geometry
- each video capture device 31 is correspondingly set with N audio capture devices 32; N is a positive integer.
- the spatial orientation of the video capture device 31 in the collection device covers the entire geometry.
- the at least two video capture devices 31 of the capture device are disposed in a spatial orientation corresponding to the geometric body, and at least two video capture devices 31 are provided.
- the spatial orientation of the deployment covers the entire geometry surface.
- all of the video capture device 31 and audio capture device 32 disposed on the capture device cover the entire geometry surface in a spatial orientation.
- At least one video capturing device 31 may be disposed on each side of the geometric body to cover the entire geometric surface in a spatial orientation.
- each video capture device 31 when the set orientation of each video capture device 31 is set with a plurality of audio capture devices 32, at least two audio capture devices 32 provided by the set orientation of each video capture device 31 are disposed around the video capture device.
- the collection device may be a VR collection device.
- the audio collection device 32 may be a microphone; the video capture device 31 may be a camera.
- the setting positions of the video capturing device 31 and the audio collecting device 31 satisfy at least one of the following:
- the video data collected by all video capture devices 31 can be directly restored to a sphere
- the audio data collected by all of the audio capture devices 32 can be directly restored to a sphere.
- all video capture devices 31 can restore video data at 360 degrees latitude and longitude.
- All audio capture devices 32 can restore audio data at 360 degrees latitude and longitude.
- the acquisition device is capable of performing functions of moving or stationary shooting and pickup.
- the collecting device may further include:
- a mobile device configured to receive a control command, move the collection device in response to the control command, to cause the collection device to collect data while moving; or to cause the collection device to be stationary to enable the acquisition in a stationary state
- the device collects data.
- the specific function of the mobile device may be similar to the function of the aerial vehicle, and the specific composition may include: a blade, a power driving device, and the like.
- the collecting device may further include:
- the processor is configured to acquire the spatial information of the audio collection device of the collection device for the N audio collection devices corresponding to the set orientation of each video capture device, and collect the N audio capture devices according to the spatial information of the audio capture device.
- the audio data is encoded to form M-channel audio data; the M-channel audio data carries spatial information of the audio; M is a positive integer.
- the spatial information of the audio carried in the M channel audio data may be at least one of the following:
- N audio capture devices physical or sound space information
- the center point space information of the captured video is the center point space information of the captured video.
- the spatial information of the audio carried in the M-channel audio data may be expressed in at least one of the following:
- the collection space corresponding to the collection device forms a geometry; the spatial orientation of the video capture device deployed in the collection device covers the entire geometry; and the setting orientation of each video capture device is set with N audios.
- the acquisition device; N is a positive integer, and the spatial orientation of the video acquisition device and the audio collection device is covered by the entire geometry, so that true omnidirectional audio acquisition can be realized.
- this embodiment describes in detail how the components of the collection device are arranged.
- the video collection device is a camera
- the audio collection device is a microphone
- the VR acquisition device is a combination of a microphone and a camera, and has a corresponding geometry.
- the microphone of the VR acquisition device can realize omnidirectional audio collection, so these microphones can be called geometric virtual reality omnidirectional microphones. .
- the geometry includes basic geometry and combined geometry.
- the basic geometry includes a rotating body (sphere, cylinder, etc.), a polyhedron (Platonic solid and non-Platonic solid).
- the combined geometry is a geometrical body composed of two or more basic geometric bodies of the same or any number of the same.
- the geometry virtual reality omnidirectional microphone has the function of moving or stationary pickup, that is, audio acquisition can be realized during the movement of the VR acquisition device or in a stationary state.
- the camera can achieve the shooting function during the movement of the VR acquisition device or in a stationary state.
- the spatial orientation of the microphone and camera deployment covers the entire geometry surface.
- the audio collected by the geometry virtual reality omnidirectional microphone can be restored to a sphere, and the video captured by all the cameras can be restored to a sphere, that is, the collected audio can be restored at 360 degrees.
- the number of microphones on each side of the geometry is N, and N is a positive integer.
- each camera orientation corresponds to one or more microphones.
- the number of microphones corresponding to each camera orientation is one.
- each camera orientation corresponds to a plurality of microphones, and a plurality of microphones are deployed around the camera, and the cameras are concentric circles, and are deployed one or more times.
- each camera orientation corresponds to one or more microphones.
- the number of microphones corresponding to each camera orientation is one.
- each camera orientation corresponds to a plurality of microphones, and a plurality of microphones are deployed around the camera, and the cameras are concentric circles, and are deployed one or more times.
- each camera orientation corresponds to one or more microphones.
- the number of microphones corresponding to each camera orientation is one.
- each camera orientation corresponds to a plurality of microphones, and a plurality of microphones are distributed around the camera.
- the VR collection device described in Embodiments 2 and 3 is used to collect audio data.
- this embodiment describes in detail the encoding process of the collected audio data.
- the audio data collected by the N microphones can carry one of the following information:
- the audio data collected by the N microphones is encoded into M-channel audio, and M is a positive integer.
- the encoded M-channel audio carries spatial information of the audio.
- Encoding according to the format of the encoding (such as mpega, aac, mp3, or G.711, etc.).
- the encoded audio data carrying the spatial information of the audio is stored, and when stored, the spatial information of the audio is in accordance with the package format (such as an ISOBMFF file, a ts file, etc.).
- the package format such as an ISOBMFF file, a ts file, etc.
- the encoded audio carrying the spatial information of the audio is encapsulated based on a transport protocol (eg, DASH, Hypertext Transfer Protocol (HTTP), HLS (HTTP Live Streaming), Real Time Transport Protocol (RTP), etc.).
- a transport protocol eg, DASH, Hypertext Transfer Protocol (HTTP), HLS (HTTP Live Streaming), Real Time Transport Protocol (RTP), etc.
- HTTP Hypertext Transfer Protocol
- HLS HTTP Live Streaming
- RTP Real Time Transport Protocol
- the spatial information form carried may be at least one of the following:
- the orientation information may be the face information of the geometric face of the camera as shown in FIG. 8 or 9.
- the spatial information carried by the M channel audio corresponds to N microphone physical or sound space information, or corresponding to the physical point of the camera or the center point space information of the captured video, for example, the number of corresponding microphones of each camera is 1 Case.
- the spatial information carried by the M channel audio is the central point space information of the physical or sound collection space of the N microphones, or the center point space information corresponding to the physical condition of the camera or the captured video, for example, the camera corresponding to each camera position.
- the number is N.
- the audio spatial information parameter needs to be extended to describe the audio carrying the spatial information.
- the VR collection device described in Embodiments 2 and 3 is used to collect audio data.
- the present embodiment slenderly describes the process of rendering using the collected audio data.
- the audio near the speaker orientation is mixed and the rendering is converged.
- the decoded M-channel audio is immersed in audio rendering according to the spatial information of each audio, corresponding to the orientation of the current speaker, and the audio in the vicinity of the speaker orientation is mixed.
- the multi-channel audio corresponding to the orientation of the upper semicircle is converged (mixed processing), and then rendered corresponding to the four speakers, and the other directions are the same.
- audio can be rendered in conjunction with the region of interest or viewpoint of the video.
- This embodiment describes a rendering process based on an area of interest and/or a viewpoint.
- FIG. 13A and 13B are schematic diagrams showing the deployment and audio rendering of a microphone when the audio carries relative position information.
- the inscribed circle in the figure is a virtual reality rendering device, four microphones are deployed, F is the orientation of the face, L and R are the left and right ear orientations, and B is the central position of the hindbrain.
- the cube is a restored projection map body, corresponding to the acquisition orientation shown in FIG. 8 or FIG. 9, and the audio collected by the microphone on each side carries the face_id relative orientation information. As shown in FIG.
- the audio of the 1 front orientation corresponds to the F speaker rendering
- the audio of the 2 right orientation corresponds to the R speaker rendering
- the audio of the 3 rear orientation corresponds to the B speaker for rendering
- the 4 left orientation The audio is rendered corresponding to the L speaker.
- the viewpoint changes from "1 front” to "4 left” the orientation of the inscribed circular speaker changes with respect to the cube.
- the audio collected by the left 4 is rendered corresponding to the F speaker.
- the audio collected before 1 is corresponding to the R speaker for rendering
- the audio captured by 2 is corresponding to the B speaker for rendering
- the audio collected after 3 is corresponding to the L speaker for rendering.
- the actual sound effect is exemplified by assuming that the left side of the water is heard before the viewpoint is 1 (reproduced by the L speaker), the head is rotated, and the viewpoint becomes 4 left. At this time, the water sound is forward (from the F speaker). Render).
- FIGS. 14A and 14B are schematic diagrams of the deployment and audio rendering of the microphone when the audio carries the spherical coordinate orientation information.
- the inner concentric circles in the figure are virtual reality rendering devices, four microphones are deployed, F is the orientation of the face, L and R are the left and right ear orientations, and B is the central position of the hindbrain. .
- the outer concentric circles are projection mapping bodies. As shown in FIG. 14, assuming that the current viewpoint is (yaw1, pitch1), the audio of the orientation corresponds to the F speaker rendering, and the audio of the (yaw2, pitch2) orientation corresponds to the R speaker rendering, and the audio of the (yaw3, pitch3) orientation corresponds to the B speaker.
- the audio of the (yaw4, pitch4) orientation is rendered for the L-speaker.
- the viewpoint changes from (yaw1, pitch1) to (yaw3, pitch3)
- the orientation of the concentric circular speaker changes with respect to the cube.
- (yaw3, pitch3) audio corresponds to the F speaker.
- Rendering, (yaw4, pitch4) audio is rendered corresponding to the R speaker
- (yaw1, pitch1) audio is rendered for the B speaker
- (yaw2, pitch2) audio is rendered for the L speaker.
- the spatial orientation information of the sound collected by the microphone is first obtained (step 1501); when the audio is encoded, the spatial orientation information of the collected sound is considered. That is, the encoded audio data carries audio spatial orientation information (step 1502); obtaining orientation information of the viewpoint and/or the region of interest on the projection map according to head or eye motion (step 1503); obtaining the speaker in projection Mapping the orientation information on the body (step 1504); and dynamically adjusting the audio rendering orientation (1505) according to the orientation information (ball coordinates/face id, etc.) carried by the multi-channel audio in combination with the basic rendering rules.
- the embodiment further provides a data processing device.
- the device includes:
- the acquiring unit 161 is configured to acquire spatial information of the audio collecting device of the collecting device; the collecting space corresponding to the collecting device forms a geometric body; the spatial orientation of the video capturing device deployed in the collecting device covers the entire geometric body; each video capturing device The set orientation is correspondingly set with N audio collection devices; N is a positive integer;
- the encoding unit 162 is configured to encode the audio data collected by the N audio collection devices according to the spatial information of the audio collection device to form the M channel audio data.
- the M channel audio data carries spatial information of the audio; M is a positive integer.
- the spatial orientation of the video capture device deployed in the collection device covers the entire geometry.
- the at least two video capture devices of the capture device are disposed in a spatial orientation corresponding to the geometric body, and at least two video capture devices are deployed.
- the spatial orientation covers the entire geometry surface.
- all the video capture devices and audio capture devices set on the capture device cover the entire geometric surface in a spatial orientation.
- the collection device may be a VR collection device.
- the audio collection device may be a microphone; the video capture device may be a camera.
- the collected audio data sent by the collection device can carry the spatial information of the audio collection device, and the spatial information of each audio collection device can be obtained.
- the spatial information of the audio collection device can be one of the following:
- the video data and/or audio data collected by the collection device meets at least one of the following:
- the video data collected by all video capture devices can be directly restored to a sphere
- the audio data collected by all audio capture devices can be directly restored to a sphere.
- the format of the encoded audio data may be: mpega, aac, mp3, G.711, and the like.
- the spatial information of the audio carried in the M-channel audio data is at least one of the following:
- N audio capture devices physical or sound space information
- the center point space information of the captured video is the center point space information of the captured video.
- the spatial information of the audio carried in the M-channel audio data may be one of the following:
- N audio capture devices physical or sound space information
- the center point space information of the captured video is the center point space information of the captured video.
- the spatial information of the audio carried in the M-channel audio data may be one of the following:
- the center point space information of the captured video is the center point space information of the captured video.
- the spatial information of the audio carried in the M-channel audio data may be expressed in at least one of the following:
- the encoded audio data can be saved or transmitted to achieve virtual reality interactivity.
- the device may further include:
- a processing unit configured to store or issue M-channel audio data.
- the obtaining unit 161, the encoding unit 162, and the processing unit may be implemented by a processor in the data processing apparatus.
- the embodiment further provides a data processing device.
- the device includes:
- the receiving unit 171 is configured to receive the encoded M channel audio data
- the decoding unit 172 is configured to decode the encoded M-channel audio data to obtain spatial information of the corresponding audio; M is a positive integer;
- the first determining unit 173 is configured to determine, according to the obtained spatial information of the audio and the orientation information of the speaker device, the Q speaker devices corresponding to the M channel audio data; Q is a positive integer;
- the rendering unit 174 is configured to render the M channel audio data by using the determined Q speaker devices.
- the encoded M-channel audio data received by the receiving unit 171 is encoded data obtained by the method shown in FIG. 1.
- L is a positive integer, when determining the speaker device
- the first determining unit 173 sets the spatial information of the audio to be at least one audio data within a predetermined radius corresponding to the orientation of each speaker device, and corresponds to one speaker device;
- the first determining unit 173 corresponds each channel of audio data to one speaker device.
- the first determining unit 173 selects Q speaker devices from the L speaker devices according to the obtained spatial information of the audio and the orientation information of the speaker device; or the first determining unit 173
- the position of the speaker device is at least one speaker device within a predetermined radius centering on the spatial orientation of each channel of audio data, as a speaker device for each channel of audio data.
- the preset radius can be set as needed.
- the audio data to be rendered by each speaker device is not all the way, but at least two ways. In this case, at least two pieces of audio data need to be mixed, that is, the mixing process is performed. .
- using the determined Q speaker devices to render the M channel audio data may include at least one of the following:
- the orientation of the speaker is consistent with the audio collection device vector, corresponding to rendering
- the speakers corresponding to the orientation corresponding to the spatial information of the audio are rendered one by one, and the speakers that do not correspond to the orientation corresponding to the spatial information of the audio are not rendered;
- the at least two speakers that satisfy the preset condition render the same audio data; the preset condition represents that the distance between the orientation of the speaker and the orientation corresponding to the spatial information of the audio is less than the preset distance.
- the preset radius and the preset distance may be set as needed.
- audio can be rendered in conjunction with the region of interest or viewpoint of the video.
- the device may further include:
- a second determining unit configured to obtain orientation information of the viewpoint and/or the region of interest on the projection map according to the motion posture of the user, and obtain orientation information of the Q speaker devices;
- the rendering unit 174 is further configured to adjust the audio data rendered by each speaker device according to the orientation information of the Q speaker devices and the spatial information of the audio.
- the orientation information of the speaker device may be the orientation information corresponding to the projection mapping body, or may not be the corresponding orientation information on the projection mapping body.
- the orientation information is calculated to determine how to adjust the audio data rendered by the speaker device.
- the formed M-channel audio data is formed, and each unit in the data processing device performs the above functions. .
- the receiving unit 171, the decoding unit 172, the first determining unit 173, the rendering unit 174, and the second determining unit may be implemented by a processor in the data processing apparatus.
- embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention can take the form of a hardware embodiment, a software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.
- the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
- the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
- These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
- the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
- an embodiment of the present invention further provides a storage medium, specifically a computer readable storage medium, on which a computer program is stored, and when the computer program is executed by the processor, a data processing method in the foregoing embodiment is implemented.
- a storage medium specifically a computer readable storage medium, on which a computer program is stored, and when the computer program is executed by the processor, a data processing method in the foregoing embodiment is implemented.
- the collection space corresponding to the collection device forms a geometry; the spatial orientation of the video capture device deployed in the collection device covers the entire geometry; and the setting orientation of each video capture device is correspondingly set with N audio collection devices.
- N is a positive integer, covering the entire geometry with the spatial orientation of the video capture device and the audio capture device deployed, thus enabling true omnidirectional audio acquisition.
- the N audio collection devices corresponding to the set orientation of each video capture device acquire the spatial information of the audio collection device of the collection device; according to the spatial information of the audio collection device, the audio data collected by the N audio collection devices are performed.
- the effect of synchronization is
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- General Physics & Mathematics (AREA)
- Otolaryngology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Stereophonic System (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
本发明公开了一种数据处理方法,包括:获取采集设备的音频采集设备的空间信息;所述采集设备对应的采集空间形成几何体;所述采集设备中的视频采集设备部署的空间方位覆盖整个几何体;每个视频采集设备的设置方位对应设置有N个音频采集设备;N为正整数;针对每个视频采集设备的设置方位对应设置的N个音频采集设备,依据音频采集设备的空间信息,将N个音频采集设备采集的音频数据进行编码,形成M路音频数据;所述M路音频数据中携带音频的空间信息。本发明实施例还提供了一种采集设备、数据处理装置及存储介质。
Description
相关申请的交叉引用
本申请基于申请号为201611265760.0、申请日为2016年12月30日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
本发明涉及虚拟现实(VR,Virtual Reality)技术,尤其涉及一种数据处理方法及装置、采集设备及存储介质。
VR技术是仿真技术的一个重要方向,是一门富有挑战性的交叉技术前沿学科和研究领域。VR技术主要是指在视、听、触、嗅、味觉等方面高度逼真的计算机模拟环境。具体来说,VR技术是利用计算机技术模拟产生三维的虚拟世界,让使用者即时、没有闲置地感知虚拟空间内的事物。VR技术是一种多门学科的综合技术,包括:计算机图形技术、多媒体技术、传感器技术、人机交互技术、网络技术、立体成像技术以及计算机仿真技术等。
目前虚拟现实音频采集设备中,只布置较少的麦克风,并未真正做到全向音频采集,也就是说采集的范围没有做到经纬方向上的360度;并且存储和传输过程中并没有音频的空间信息,这样在渲染时,也只重点考虑视频数据,所以需要进行后期人为简单的制作,才能将当前虚拟现实采集的音频和视频的空间方位配合渲染,这种配合渲染并不能达到真正同步的效果。
发明内容
本发明实施例提供一种数据处理方法及装置、采集设备及存储介质。
本发明实施例的技术方案是这样实现的:
本发明实施例提供了一种数据处理方法,包括:
获取采集设备的音频采集设备的空间信息;所述采集设备对应的采集空间形成几何体;所述采集设备中的视频采集设备部署的空间方位覆盖整个几何体;每个视频采集设备的设置方位对应设置有N个音频采集设备;N为正整数;
针对每个视频采集设备的设置方位对应设置的N个音频采集设备,依据音频采集设备的空间信息,将N个音频采集设备采集的音频数据进行编码,形成M路音频数据;所述M路音频数据中携带音频的空间信息;M为正整数。
上述方案中,所述M路音频数据中携带的音频的空间信息为以下至少之一:
N个音频采集设备物理或拾音空间信息;
N个音频采集设备空间位置的中心点空间信息;
所拍摄视频的中心点空间信息。
上述方案中,所述M路音频数据中携带的音频的空间信息的表现形式为以下至少之一:
经纬图上的对应方位信息;
三维坐标系中的三维笛卡尔坐标;
球坐标系中的球坐标;
与相对面方位信息。
上述方案中,所述采集设备采集到的视频数据以及音频数据满足以下至少之一:
所有视频采集设备采集的视频数据能还原为球体;
所有音频采集设备采集的音频数据能还原为球体。
上述方案中,所述方法还包括:
将M路音频数据存储或者发出。
本发明实施例还提供了一种数据处理方法,包括:
接收编码后的M路音频数据;
对编码的M路音频数据进行解码,获得对应的音频的空间信息;M为正整数;
依据获得的音频的空间信息及扬声设备的方位信息,确定M路音频数据对应的Q个扬声设备;Q为正整数;
利用确定的Q个扬声设备,渲染M路音频数据。
上述方案中,所述方法还包括:
根据使用者的运动姿态,获得视点和兴趣区域中至少之一在投影映射体上的方位信息;
依据Q个扬声设备的方位信息及所述音频的空间信息,调整每个扬声设备所渲染的音频数据。
上述方案中,利用确定的Q个扬声设备,渲染M路音频数据,包括以下至少之一:
将以扬声器方位为中心预设半径范围内的至少两路音频数据混合后渲染;
扬声器的方位与音频采集设备矢量一致,对应渲染;
与音频的空间信息对应方位相对应的扬声器一一渲染,与音频的空间信息对应方位不对应的扬声器不做渲染;
位置满足预设条件的至少两个扬声器渲染同一个音频数据;所述预设条件表征扬声器方位与音频的空间信息对应的方位之间的距离小于预设距离。
本发明实施例又提供了一种采集设备,所述采集设备对应的采集空间形成几何体,所述采集设备包括:视频采集设备及音频采集设备;其中,
所述采集设备中的视频采集设备部署的空间方位覆盖整个几何体;每个视频采集设备的设置方位对应设置有N个音频采集设备;N为正整数。
上述方案中,所述采集设备还包括:
移动装置,配置为接收控制指令,响应所述控制指令移动所述采集设备,使所述采集设备在移动中采集数据;或者,使所述采集设备静止,以在静止的状态下使所述采集设备采集数据。
上述方案中,视频采集设备以及音频采集设备的设置位置满足以下至少之一:
所有视频采集设备采集的视频数据能还原为球体;
所有音频采集设备采集的音频数据能还原为球体。
上述方案中,所述几何体的每面均设置有至少一个视频采集设备。
上述方案中,所述采集设备还包括:
处理器,配置为针对每个视频采集设备的设置方位对应设置的N个音频采集设备,获取采集设备的音频采集设备的空间信息;依据音频采集设备的空间信息,将N个音频采集设备采集的音频数据进行编码,形成M路音频数据;所述M路音频数据中携带音频的空间信息;M为正整数。
本发明实施例还提供了一种数据处理装置,包括:
获取单元,配置为获取采集设备的音频采集设备的空间信息;所述采集设备对应的采集空间形成几何体;所述采集设备中的视频采集设备部署的空间方位覆盖整个几何体;每个视频采集设备的设置方位对应设置有N个音频采集设备;N为正整数;
编码单元,配置为针对每个视频采集设备的设置方位对应设置的N个音频采集设备,依据音频采集设备的空间信息,将N个音频采集设备采集的音频数据进行编码,形成M路音频数据;所述M路音频数据中携带音频的空间信息;M为正整数。
上述方案中,所述装置还包括:
处理单元,配置为将M路音频数据存储或者发出。
本发明实施例又提供了一种数据处理装置,包括:
接收单元,配置为接收编码后的M路音频数据;
解码单元,配置为对编码的M路音频数据进行解码,获得对应的音频的空间信息;M为正整数;
第一确定单元,配置为依据获得的音频的空间信息及扬声设备的方位信息,确定M路音频数据对应的Q个扬声设备;Q为正整数;
渲染单元,配置为利用确定的Q个扬声设备,渲染M路音频数据。
上述方案中,所述装置还包括:
第二确定单元,配置为根据使用者的运动姿态,获得视点和/或兴趣区域在投影映射体上的方位信息,
所述渲染单元,还配置为依据Q个扬声设备的方位信息及所述音频的空间信息,调整每个扬声设备所渲染的音频数据。
本发明实施例还提供了一种存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述任一方法的步骤。
本发明实施例提供的数据处理方法及装置、采集设备及存储介质,采集设备对应的采集空间形成几何体;所述采集设备中的视频采集设备部署的空间方位覆盖整个几何体;每个视频采集设备的设置方位对应设置有N个音频采集设备;N为正整数,将视频采集设备及音频采集设备部署的空间方位覆盖整个几何体,如此,能够实现真正地全向音频采集。同时,针对每个视频采集设备的设置方位对应设置的N个音频采集设备,获取采集设备的音频采集设备的空间信息;依据音频采集设备的空间信息,将N个音频采集设备采集的音频数据进行编码,形成M路音频数据;所述M路音频数据中携带音频的空间信息;M为正整数;而接收到编码后的M路音频数据后,对编码的M路音频数据进行解码,获得对应的音频的空间信息;依据获得的音频的空间信息及扬声设备的方位信息,确定M路音频数据对应的Q个扬声设备;Q为正整数;利用确定的Q个扬声设备,渲染M路音频数据;每个视频采集设备的设置方位对应设置有N个音频采集设备,音频数据有对应的空间信息,这样能够将音频数据进行沉浸式的音频呈现, 达到音频和时频的空间方位配合同步的效果。
在附图(其不一定是按比例绘制的)中,相似的附图标记可在不同的视图中描述相似的部件。附图以示例而非限制的方式大体示出了本文中所讨论的各个实施例。
图1为本发明实施例一一种数据处理的方法流程示意图;
图2为本发明实施例一另一种数据处理的方法流程示意图;
图3为本发明实施例二采集设备结构示意图;
图4A-4B为本发明实施例三一种采集设备结构示意图;
图5A-5B为本发明实施例三另一种采集设备结构示意图;
图6A-6B为本发明实施例三再一种采集设备结构示意图;
图7为本发明实施例四球坐标式的空间信息示意图;
图8为本发明实施例四立方体下面信息式的空间信息示意图;
图9为本发明实施例四图8所示的空间信息展开后的空间信息示意图;
图10为本发明实施例五M大于扬声器数量时的音频渲染示意图;
图11为本发明实施例五M等于扬声器数量时的音频渲染示意图;
图12A-12B为本发明实施例五M小于扬声器数量时的音频渲染示意图;
图13A-13B为本发明实施例六一种基于兴趣区域或视点的音频渲染示意图;
图14A-14B本发明实施例六另一种基于兴趣区域或视点的音频渲染示意图;
图15为本发明实施例六基于兴趣区域或视点的音频渲染流程示意图;
图16为本发明实施例七一种数据处理装置结构示意图;
图17为本发明实施例七另一种数据处理装置结构示意图。
下面结合附图及实施例对本发明再作进一步详细的描述。
在描述本发明实施例之前,先了解一下VR的相关技术。
目前虚拟现实音频采集设备(也可以称为VR音频捕获设备)中,通常只设置1到2个麦克风,因此并未做到真正地全向音频捕获。
并且传输和存储过程,目前也是重点考虑视频,包括视频的方位、基于视角(VP)以及兴趣区域(ROI)的视频tiled存储以及传输。然而,目前,由于音频采集时没有携带与对应区域视频相关的空间信息;加之传输和存储过程,也未考虑携带空间信息(包括三维空间坐标或面信息),因此对于VP以及ROI的音频和视频方位的配合渲染并未做过多考虑。虽然目前可以通过后期人为的制作,可以实现虚拟现实采集的音频和时频的空间方位配合渲染,但是这种配合渲染并不能达到方位同步的效果。
基于此,在本发明的各种实施例中:获取采集设备的音频采集设备的空间信息;所述采集设备对应的采集空间形成几何体;所述采集设备中的视频采集设备部署的空间方位覆盖整个几何体;每个视频采集设备的设置方位对应设置有N个音频采集设备;N为正整数;依据音频采集设备的空间信息,将N个音频采集设备采集的音频数据进行编码,形成M路音频数据;所述M路音频数据中携带音频的空间信息;M为正整数。
实施例一
本发明实施例提供一种数据处理方法,如图1所示,该方法包括:
步骤101:获取采集设备的音频采集设备的空间信息;
这里,所述采集设备对应的采集空间形成几何体;所述采集设备中的视频采集设备部署的空间方位覆盖整个几何体;每个视频采集设备的设置方位对应设置有N个音频采集设备;N为正整数。
所述采集设备中的视频采集设备部署的空间方位覆盖整个几何体是指:所述采集设备的至少两个视频采集设备设置在所述几何体对应的空间 方位,且至少两个视频采集设备部署的空间方位覆盖整个几何体表面。
换句话说,采集设备上设置的所有视频采集设备及音频采集设备在空间方位上覆盖整个几何体表面。
所述采集设备可以为VR采集设备。
实际应用时,音频采集设备可以是麦克风;视频采集设备可以是摄像头。
实际应用时,采集设备发送的采集的音频数据里可以携带音频采集设备的空间信息,据此可以获知每个音频采集设备的空间信息。
音频采集设备的空间信息可以是以下之一:
N个麦克风的物理或拾音空间信息;
对应摄像头物理或所拍摄视频的中心点空间信息。
所述采集设备采集到的视频数据和音频数据中至少之一数据满足以下至少之一:
所有视频采集设备采集的视频数据能够直接还原为球体;
所有音频采集设备采集的音频数据能够直接还原为球体。
步骤102:针对每个视频采集设备的设置方位对应设置的N个音频采集设备,依据音频采集设备的空间信息,将N个音频采集设备采集的音频数据进行编码,形成M路音频数据。
其中,所述M路音频数据中携带音频的空间信息;M为正整数。
这里,实际应用时,根据编码应用场景的不同,编码后的音频数据的格式可以是:mpega、aac、mp3、G.711等。
所述M路音频数据中携带的音频的空间信息为以下至少之一:
N个音频采集设备物理或拾音空间信息;
N个音频采集设备空间位置的中心点空间信息;
所拍摄视频的中心点空间信息。
其中,当M等于N时,所述M路音频数据中携带的音频的空间信息可以为以下之一:
N个音频采集设备物理或拾音空间信息;
N个音频采集设备空间位置的中心点空间信息;
所拍摄视频的中心点空间信息。
当M小于N时,所述M路音频数据中携带的音频的空间信息可以为以下之一:
N个音频采集设备空间位置的中心点空间信息;
所拍摄视频的中心点空间信息。
实际应用时,所述M路音频数据中携带的音频的空间信息的表现形式可以为以下至少之一:
经纬图上的对应方位信息;
三维坐标系中的三维笛卡尔坐标;
球坐标系中的球坐标;
与相对面方位信息。
编码后的音频数据可以保存也可以发送,以实现虚拟现实的交互性。
基于此,在一实施例中,该方法还可以包括:
将M路音频数据存储或者发出。
相应地,本发明实施例还提供一种数据处理方法,可以认为是一种实现虚拟现实交互性的方法。如图2所示,该方法包括:
步骤201:接收编码后的M路音频数据;
这里,需要说明的是:接收的编码后的M路音频数据是通过图1所示的方法得到的编码数据。
步骤202:对编码的M路音频数据进行解码,获得对应的音频的空间信息;
这里,M为正整数。
步骤203:依据获得的音频的空间信息及扬声设备的方位信息,确定M 路音频数据对应的Q个扬声设备;
这里,Q为正整数。
具体地,假设扬声设备的总个数为L,L为正整数,确定扬声设备时,
当M大于L时,将音频的空间信息在以每个扬声设备的方位为中心,预设半径范围内的至少一路音频数据对应一个扬声设备;
当M等于L时,且每个扬声设备的方位与每路音频数据的空间信息一致时,将每路音频数据对应一个扬声设备;
当M小于L时,依据获得的音频的空间信息及扬声设备的方位信息,从L个扬声设备中选择Q个扬声设备;或者,将扬声设备的方位在以每路音频数据的空间方位为中心,预设半径范围内的至少一个扬声设备,作为每路音频数据的扬声设备。
这里,实际应用时,预设半径可以根据需要来设置。
步骤204:利用确定的Q个扬声设备,渲染M路音频数据。
这里,实际应用时,可能会出现每个扬声设备所要渲染的音频数据不是一路,而是至少两路,在这种情况下,需要对至少两路音频数据进行混合处理,就是进行混音处理。
从上面的描述中可以看出,利用确定的Q个扬声设备,渲染M路音频数据,可以包括以下至少之一:
将以扬声器方位为中心预设半径范围内的至少两路音频数据混合后渲染;
扬声器的方位与音频采集设备矢量一致,对应渲染;
与音频的空间信息对应方位相对应的扬声器一一渲染,与音频的空间信息对应方位不对应的扬声器不做渲染;
位置满足预设条件的至少两个扬声器渲染同一个音频数据;所述预设条件表征扬声器方位与音频的空间信息对应的方位之间的距离小于预设距离。
其中,所述预设半径以及预设距离可以根据需要设置。
实际应用时,除了步骤204所描述的基本的音频渲染操作之外,还可以配合视频的兴趣区域或视点,进行音频的渲染。
基于此,在一实施例中,该方法还可以包括:
根据使用者的运动姿态,获得视点和/或兴趣区域在投影映射体上的方位信息,并可以获得Q个扬声设备对应的方位信息;
依据Q个扬声设备的方位信息及所述音频的空间信息,调整每个扬声设备所渲染的音频数据。
这里,实际应用时,扬声设备的方位信息可以是在所述投影映射体上对应的方位信息,也可以不是在所述投影映射体上对应的方位信息,此时,可以根据扬声设备的方位信息通过计算来确定如何调整扬声设备所渲染的音频数据。
需要说明的是:实际应用时,针对每个视频采集设备的设置方位对应设置的N个音频采集设备,所形成的编码后的M路音频数据,均执行上述的渲染操作。
本发明实施例提供的数据处理方法,获取采集设备的音频采集设备的空间信息;所述采集设备中的视频采集设备部署的空间方位覆盖整个几何体;每个视频采集设备的设置方位对应设置有N个音频采集设备;N为正整数;针对每个视频采集设备的设置方位对应设置的N个音频采集设备,依据音频采集设备的空间信息,将N个音频采集设备采集的音频数据进行编码,形成M路音频数据;所述M路音频数据中携带音频的空间信息;M为正整数;而接收到编码后的M路音频数据后,对编码的M路音频数据进行解码,获得对应的音频的空间信息;依据获得的音频的空间信息及扬声设备的方位信息,确定M路音频数据对应的Q个扬声设备;Q为正整数;利用确定的Q个扬声设备,渲染M路音频数据;每个视频采集设备的设置方位对应设置有N个音频采集设备,音频数据有对应的空间信息,这样能够将音频数据进行沉浸式的音频呈现,达到音频和时频的空间方位配合同步的效果音频数据有对应的空间信息,这样能够将音频数据进行沉浸式的音频呈现,达到音频和时频的空间方位配合同步的效果。
实施例二
从实施例一可以看出,本发明实施例提供的采集设备可以全向采集音频数据。基于此,本实施例提供一种采集设备,采集设备对应的采集空间形成几何体,如图3所示,该采集设备包括:视频采集设备31及音频采集设备32;其中,
所述采集设备中的视频采集设备31部署的空间方位覆盖整个几何体;
每个视频采集设备31的设置方位对应设置有N个音频采集设备32;N为正整数。
所述采集设备中的视频采集设备31部署的空间方位覆盖整个几何体是指:所述采集设备的至少两个视频采集设备31设置在所述几何体对应的空间方位,且至少两个视频采集设备31部署的空间方位覆盖整个几何体表面。
换句话说,采集设备上设置的所有视频采集设备31及音频采集设备32在空间方位上覆盖整个几何体表面。
其中,实际应用时,所述几何体的每面均可以设置有至少一个视频采集设备31,以实现在空间方位上覆盖整个几何体表面。
另外,当每个视频采集设备31的设置方位设置有多个音频采集设备32时,每个视频采集设备31的设置方位所设置的至少两个音频采集设备32,环绕视频采集设备设置。
所述采集设备可以为VR采集设备。
实际应用时,音频采集设备32可以是麦克风;视频采集设备31可以是摄像头。
在一实施例中,视频采集设备31以及音频采集设备31的设置位置满足以下至少之一:
所有视频采集设备31采集的视频数据能够直接还原为球体;
所有音频采集设备32采集的音频数据能够直接还原为球体。
也就是说,所有视频采集设备31在经纬360度均可以还原视频数据。所有音频采集设备32在经纬360度均可以还原音频数据。
在一实施例中,所述采集设备能够实现移动或静止拍摄和拾音的功能。
基于此,在一实施例中,该采集设备还可以包括:
移动装置,配置为接收控制指令,响应所述控制指令移动所述采集设备,使所述采集设备在移动中采集数据;或者,使所述采集设备静止,以在静止的状态下使所述采集设备采集数据。
这里,实际应用时,所述移动装置的具体功能可以类似与航拍飞行器的功能,具体组成可以包括:桨叶、动力驱动装置等。
在一实施例中,该采集设备还可以包括:
处理器,配置为针对每个视频采集设备的设置方位对应设置的N个音频采集设备,获取采集设备的音频采集设备的空间信息;依据音频采集设备的空间信息,将N个音频采集设备采集的音频数据进行编码,形成M路音频数据;所述M路音频数据中携带音频的空间信息;M为正整数。
其中,所述M路音频数据中携带的音频的空间信息可以为以下至少之一:
N个音频采集设备物理或拾音空间信息;
N个音频采集设备空间位置的中心点空间信息;
所拍摄视频的中心点空间信息。
所述M路音频数据中携带的音频的空间信息的表现形式可以为以下至少之一:
经纬图上的对应方位信息;
三维坐标系中的三维笛卡尔坐标;
球坐标系中的球坐标;
与相对面方位信息。
本发明实施例提供的采集设设备,采集设备对应的采集空间形成几何体;所述采集设备中的视频采集设备部署的空间方位覆盖整个几何体;每个视频采集设备的设置方位对应设置有N个音频采集设备;N为正整数,将视频采集设备及音频采集设备部署的空间方位覆盖整个几何体,如此, 能够实现真正地全向音频采集。
实施例三
在实施例二的基础上,本实施例详细描述采集设备的各部件的设置方式。
在本实施例中,视频采集设备为摄像头,音频采集设备为麦克风。
结合实施例二可以看出,VR采集设备为麦克风和摄像头的组合,是有一个对应的几何体,有VR采集设备的麦克风可以实现全向音频采集,所以这些麦克风可以称为几何体虚拟现实全向麦克风。
其中,几何体包括基本几何体和组合几何体。基本几何体包括旋转体(球体,圆柱体等)、多面体(柏拉图立体和非柏拉图立体)。组合几何体为同一种或任意几种的两个以上的基本几何体组成在一起的几何体。
几何体虚拟现实全向麦克风具备移动或者静止拾音的功能,也就是说,在VR采集设备移动过程中或者在静止的状态下能够实现音频采集。当然,在VR采集设备移动过程中或在静止的状态下,摄像头能够实现拍摄功能。
麦克风和摄像头部署的空间方位覆盖整个几何体表面。几何体虚拟现实全向麦克风采集的音频可以还原为球体,所有摄像头采集的视频可以还原为球体,即在经纬360度均可还原采集的音频。
下面详细描述麦克风的部署位置。
几何体每个面的麦克风数量为N,N为正整数。
如图4A及4B所示,当几何体为球体时,即全向麦克风为球体的虚拟现实全向麦克风,每个摄像头方位对应1个或多个麦克风。具体地,在图4A中,每个摄像机方位对应麦克风数量为1。在图4B中,每个摄像机方位对应麦克风数量为多个,且多个麦克风环绕摄像头部署,以摄像头为同心圆,一圈或多圈部署。
如图5A及5B所示,当几何体为立方时,即全向麦克风为立方体的虚拟现实全向麦克风,每个摄像头方位对应1个或多个麦克风。具体地,在图5A中,每个摄像头方位对应麦克风数量为1。在图5B中,每个摄像头 方位对应麦克风数量为多个,且多个麦克风环绕摄像头部署,以摄像头为同心圆,一圈或多圈部署。
如图6A及6B所示,当几何体为正二十面体时,即全向麦克风为正二十面体的虚拟现实全向麦克风,每个摄像头方位对应1个或多个麦克风。具体地,在图6A中,每个摄像头方位对应麦克风数量为1。在图6B中,每个摄像头方位对应麦克风数量为多个,且多个麦克风环绕摄像头分散部署。
实施例四
在本实施例中,采用实施例二、三所描述的VR采集设备采集音频数据。
在实施例一的基础上,本实施例详细描述对采集的音频数据的编码过程。
N个麦克风采集的音频数据可携带以下信息之一:
N个麦克风的物理或拾音空间信息;
对应摄像头物理或所拍摄视频的中心点空间信息。
将N个麦克风采集的音频数据编码成M路音频,M为正整数。
其中,编码后的M路音频携带音频的空间信息。
依据编码的格式(如mpega、aac、mp3、或者G.711等)进行编码。
编码后,将携带有音频的空间信息的编码音频数据进行存储,存储时,按照封装格式(比如ISOBMFF文件、ts文件等)将音频的空间信息。
当然,如果需要传输音频文件,则基于传输协议(例如DASH、超文本传输协议(HTTP)、HLS(HTTP Live Streaming)、实时传输协议(RTP)等)规定封装携带有音频的空间信息的编码音频数据。
这里,所携带的空间信息形式可以是以下至少一种:
经纬图上的对应方位信息;
三维坐标系中的三维笛卡尔坐标;
球坐标系中的球坐标(如图7所示);
与相对面方位信息。
这里,方位信息可以是如图8或9所示的摄像头所在几何体面的面信息。
其中,当M=N时,M路音频携带的空间信息对应N个麦克风物理或拾音空间信息,或者对应摄像头物理或所拍摄视频的中心点空间信息,如每个摄像头对应麦克风的数量为1的情况。
当M=1时,M路音频携带的空间信息为N个麦克风的物理或拾音空间的中心点空间信息,或者对应摄像头物理或所拍摄视频的中心点空间信息,例如每个摄像头方位对应麦克风数量为N个的情况。
当然,实际应用过程中,在编码的音频数据中携带音频的空间信息时,需要扩展Audio空间信息参数,以用来描述携带空间信息的音频。具体地,
实施例五
在本实施例中,采用实施例二、三所描述的VR采集设备采集音频数据。
在实施例一的基础上,本实施例纤细描述利用采集的音频数据渲染的 过程。
在基本渲染时,对M路编码后的音频数据进行基本渲染时,需要考虑扬声器的数量及方位。具体来说,
当M>扬声器数量时,将靠近扬声器方位的音频混合后,收敛渲染。具体地,解码后的M路音频,按照每路音频的空间信息,对应当前扬声器的方位,将扬声器方位附近的音频混合后,进行沉浸式地音频渲染。如图10所示,将上半圆对应方位的多路音频进行收敛(混音处理)后,对应四个扬声器进行渲染,其他方位同理。
当M=扬声器数量时,且扬声器的方位与麦克风矢量一致,则一一对应渲染,如图11所示。
当M<扬声器数量时,可以做部分对应渲染(方位相对应的扬声器则一一渲染,剩余的扬声器不做渲染,如图12A所示),或者扩散渲染,即位置相近的扬声器渲染同一个音频,如图12B所示。
实施例六
除了实施例五描述的基本的音频渲染外,还可以配合视频的兴趣区域或视点,进行音频的渲染。本实施例描述基于兴趣区域和/或视点的渲染过程。
图13A和13B为音频携带相对方位信息时的麦克风的部署及音频渲染示意图。从图13A和13B可以看出,而图内的内切圆为虚拟现实渲染设备,部署四个麦克风,F为正对人脸的方位,L和R分别为左右耳方位,B为后脑中央方位。立方体为还原的投影映射体,与图8或图9所示的采集方位对应,每个面上的麦克风采集的音频携带face_id相对方位信息。如图13A所示,假设当前视点为“1前”,1前方位的音频对应F扬声器渲染,2右方位的音频对应R扬声器渲染,3后方位的音频对应B扬声器进行渲染,4左方位的音频对应L扬声器进行渲染。当视点从“1前”变为“4左”时,内切圆扬声器的方位相对立方体有所变化,在这种情况下,如图13B所示,4左采集的音频对应F扬声器进行渲染,1前采集的音频对应R扬声器进行 渲染,2右采集的音频对应B扬声器进行渲染,3后采集的音频对应L扬声器进行渲染。实际的音效举例为,假设视点为1前时听到4左传来水声(由L扬声器渲染),头部转动,视点变为4左,此时水声为正前方传来(由F扬声器渲染)。
图14A和14B为音频携带球坐标方位信息时的麦克风的部署及音频渲染示意图。从图14A和14B可以看出,而图内的内同心圆为虚拟现实渲染设备,部署四个麦克风,F为正对人脸的方位,L和R分别为左右耳方位,B为后脑中央方位。外同心圆为投影映射体。如图14所示,假设当前视点为(yaw1,pitch1),该方位的音频对应F扬声器渲染,而(yaw2,pitch2)方位的音频对应R扬声器渲染,(yaw3,pitch3)方位的音频对应B扬声器进行渲染,(yaw4,pitch4)方位的音频对应L扬声器进行渲染。当视点从(yaw1,pitch1)变为(yaw3,pitch3)时,同心圆扬声器的方位相对立方体有所变化,在这种情况下,如图14B所示,(yaw3,pitch3)音频对应F扬声器进行渲染,(yaw4,pitch4)的音频对应R扬声器进行渲染,(yaw1,pitch1)音频对应B扬声器进行渲染,(yaw2,pitch2)的音频对应L扬声器进行渲染。
从上面的描述可以看出,基于视点和/或兴趣区域的渲染,如图15所示,首先获得麦克风采集的声音的空间方位信息(步骤1501);编码音频时考虑采集声音的空间方位信息,也就是说,编码后的音频数据携带音频空间方位信息(步骤1502);根据头部或眼部运动获得视点和/或兴趣区域在投影映射体上的方位信息(步骤1503);获得扬声器在投影映射体上的方位信息(步骤1504);再根据多路音频携带的方位信息(球坐标/面id等),结合基本渲染规则,动态调整音频渲染方位(1505)。
实施例七
为实现本发明实施例一的方法,本实施例还提供了一种数据处理装置,如图16所示,该装置包括:
获取单元161,配置为获取采集设备的音频采集设备的空间信息;所述采集设备对应的采集空间形成几何体;所述采集设备中的视频采集设备部署的空间方位覆盖整个几何体;每个视频采集设备的设置方位对应设置有N 个音频采集设备;N为正整数;
编码单元162,配置为针对每个视频采集设备的设置方位对应设置的N个音频采集设备,依据音频采集设备的空间信息,将N个音频采集设备采集的音频数据进行编码,形成M路音频数据;所述M路音频数据中携带音频的空间信息;M为正整数。
其中,所述采集设备中的视频采集设备部署的空间方位覆盖整个几何体是指:所述采集设备的至少两个视频采集设备设置在所述几何体对应的空间方位,且至少两个视频采集设备部署的空间方位覆盖整个几何体表面。
对于所述采集设备上部署的视频采集设备及音频采集设备,换句话说,采集设备上设置的所有视频采集设备及音频采集设备在空间方位上覆盖整个几何体表面。
所述采集设备可以为VR采集设备。
实际应用时,音频采集设备可以是麦克风;视频采集设备可以是摄像头。
实际应用时,采集设备发送的采集的音频数据里可以携带音频采集设备的空间信息,据此可以获知每个音频采集设备的空间信息。
音频采集设备的空间信息可以是以下之一:
N个麦克风的物理或拾音空间信息;
对应摄像头物理或所拍摄视频的中心点空间信息。
所述采集设备采集到的视频数据和/或音频数据满足以下至少之一:
所有视频采集设备采集的视频数据能够直接还原为球体;
所有音频采集设备采集的音频数据能够直接还原为球体。
这里,实际应用时,根据编码应用场景的不同,编码后的音频数据的格式可以是:mpega、aac、mp3、G.711等。
所述M路音频数据中携带的音频的空间信息为以下至少之一:
N个音频采集设备物理或拾音空间信息;
N个音频采集设备空间位置的中心点空间信息;
所拍摄视频的中心点空间信息。
其中,当M等于N时,所述M路音频数据中携带的音频的空间信息可以为以下之一:
N个音频采集设备物理或拾音空间信息;
N个音频采集设备空间位置的中心点空间信息;
所拍摄视频的中心点空间信息。
当M小于N时,所述M路音频数据中携带的音频的空间信息可以为以下之一:
N个音频采集设备空间位置的中心点空间信息;
所拍摄视频的中心点空间信息。
实际应用时,所述M路音频数据中携带的音频的空间信息的表现形式可以为以下至少之一:
经纬图上的对应方位信息;
三维坐标系中的三维笛卡尔坐标;
球坐标系中的球坐标;
与相对面方位信息。
编码后的音频数据可以保存也可以发送,以实现虚拟现实的交互性。
基于此,在一实施例中,该装置还可以包括:
处理单元,配置为将M路音频数据存储或者发出。
实际应用时,所述获取单元161、编码单元162及处理单元可由数据处理装置中的处理器实现。
相应地,为了实现本发明实施例的方法,本实施例还提供了一种数据处理装置,如图17所示,该装置包括:
接收单元171,配置为接收编码后的M路音频数据;
解码单元172,配置为对编码的M路音频数据进行解码,获得对应的 音频的空间信息;M为正整数;
第一确定单元173,配置为依据获得的音频的空间信息及扬声设备的方位信息,确定M路音频数据对应的Q个扬声设备;Q为正整数;
渲染单元174,配置为利用确定的Q个扬声设备,渲染M路音频数据。
这里,需要说明的是:所述接收单元171接收的编码后的M路音频数据是通过图1所示的方法得到的编码数据。
假设扬声设备的总个数为L,L为正整数,确定扬声设备时,
当M大于L时,所述第一确定单元173将音频的空间信息在以每个扬声设备的方位为中心,预设半径范围内的至少一路音频数据对应一个扬声设备;
当M等于L时,且每个扬声设备的方位与每路音频数据的空间信息一致时,所述第一确定单元173将每路音频数据对应一个扬声设备。
当M小于L时,所述第一确定单元173依据获得的音频的空间信息及扬声设备的方位信息,从L个扬声设备中选择Q个扬声设备;或者,所述第一确定单元173将扬声设备的方位在以每路音频数据的空间方位为中心,预设半径范围内的至少一个扬声设备,作为每路音频数据的扬声设备。
这里,实际应用时,预设半径可以根据需要来设置。
这里,实际应用时,可能会出现每个扬声设备所要渲染的音频数据不是一路,而是至少两路,在这种情况下,需要对至少两路音频数据进行混合处理,就是进行混音处理。
从上面的描述中可以看出,利用确定的Q个扬声设备,渲染M路音频数据,可以包括以下至少之一:
将以扬声器方位为中心预设半径范围内的至少两路音频数据混合后渲染;
扬声器的方位与音频采集设备矢量一致,对应渲染;
与音频的空间信息对应方位相对应的扬声器一一渲染,与音频的空间信息对应方位不对应的扬声器不做渲染;
位置满足预设条件的至少两个扬声器渲染同一个音频数据;所述预设条件表征扬声器方位与音频的空间信息对应的方位之间的距离小于预设距离。
其中,所述预设半径以及预设距离可以根据需要设置。
实际应用时,除了上面描述的基本的音频渲染操作之外,还可以配合视频的兴趣区域或视点,进行音频的渲染。
基于此,在一实施例中,该装置还可以包括:
第二确定单元,配置为根据使用者的运动姿态,获得视点和/或兴趣区域在投影映射体上的方位信息,并可以获得Q个扬声设备的方位信息;
所述渲染单元174,还配置为依据Q个扬声设备的方位信息及所述音频的空间信息,调整每个扬声设备所渲染的音频数据。
这里,实际应用时,扬声设备的方位信息可以是在所述投影映射体上对应的方位信息,也可以不是在所述投影映射体上对应的方位信息,此时,可以根据扬声设备的方位信息通过计算来确定如何调整扬声设备所渲染的音频数据。
需要说明的是:实际应用时,针对每个视频采集设备的设置方位对应设置的N个音频采集设备,所形成的编码后的M路音频数据,数据处理装置中的个单元均执行上述的功能。
实际应用时,所述接收单元171、解码单元172、第一确定单元173、渲染单元174、第二确定单元可由数据处理装置中的处理器实现。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和 /或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
基于此,本发明实施例还提供了一种存储介质,具体为计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述实施例中一种数据处理方法的步骤,或者实现上述实施例中另一种数据处理方法的步骤。
以上所述,仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围。
本发明实施例提供的方案,采集设备对应的采集空间形成几何体;所述采集设备中的视频采集设备部署的空间方位覆盖整个几何体;每个视频采集设备的设置方位对应设置有N个音频采集设备;N为正整数,将视频采集设备及音频采集设备部署的空间方位覆盖整个几何体,如此,能够实现真正地全向音频采集。同时,针对每个视频采集设备的设置方位对应设置的N个音频采集设备,获取采集设备的音频采集设备的空间信息;依据音频采集设备的空间信息,将N个音频采集设备采集的音频数据进行编码, 形成M路音频数据;所述M路音频数据中携带音频的空间信息;M为正整数;而接收到编码后的M路音频数据后,对编码的M路音频数据进行解码,获得对应的音频的空间信息;依据获得的音频的空间信息及扬声设备的方位信息,确定M路音频数据对应的Q个扬声设备;Q为正整数;利用确定的Q个扬声设备,渲染M路音频数据;每个视频采集设备的设置方位对应设置有N个音频采集设备,音频数据有对应的空间信息,这样能够将音频数据进行沉浸式的音频呈现,达到音频和时频的空间方位配合同步的效果。
Claims (18)
- 一种数据处理方法,包括:获取采集设备的音频采集设备的空间信息;所述采集设备对应的采集空间形成几何体;所述采集设备中的视频采集设备部署的空间方位覆盖整个几何体;每个视频采集设备的设置方位对应设置有N个音频采集设备;N为正整数;针对每个视频采集设备的设置方位对应设置的N个音频采集设备,依据音频采集设备的空间信息,将N个音频采集设备采集的音频数据进行编码,形成M路音频数据;所述M路音频数据中携带音频的空间信息;M为正整数。
- 根据权利要求1所述的方法,其中,所述M路音频数据中携带的音频的空间信息为以下至少之一:N个音频采集设备物理或拾音空间信息;N个音频采集设备空间位置的中心点空间信息;所拍摄视频的中心点空间信息。
- 根据权利要求1所述的方法,其中,所述M路音频数据中携带的音频的空间信息的表现形式为以下至少之一:经纬图上的对应方位信息;三维坐标系中的三维笛卡尔坐标;球坐标系中的球坐标;与相对面方位信息。
- 根据权利要求1所述的方法,其中,所述采集设备采集到的视频数据以及音频数据满足以下至少之一:所有视频采集设备采集的视频数据能还原为球体;所有音频采集设备采集的音频数据能还原为球体。
- 根据权利要求1所述的方法,其中,所述方法还包括:将M路音频数据存储或者发出。
- 一种数据处理方法,包括:接收编码后的M路音频数据;对编码的M路音频数据进行解码,获得对应的音频的空间信息;M为正整数;依据获得的音频的空间信息及扬声设备的方位信息,确定M路音频数据对应的Q个扬声设备;Q为正整数;利用确定的Q个扬声设备,渲染M路音频数据。
- 根据权利要求6所述的方法,其中,所述方法还包括:根据使用者的运动姿态,获得视点和/或兴趣区域在投影映射体上的方位信息;依据Q个扬声设备的方位信息及所述音频的空间信息,调整每个扬声设备所渲染的音频数据。
- 根据权利要求6所述的方法,其中,利用确定的Q个扬声设备,渲染M路音频数据,包括以下至少之一:将以扬声器方位为中心预设半径范围内的至少两路音频数据混合后渲染;扬声器的方位与音频采集设备矢量一致,对应渲染;与音频的空间信息对应方位相对应的扬声器一一渲染,与音频的空间信息对应方位不对应的扬声器不做渲染;位置满足预设条件的至少两个扬声器渲染同一个音频数据;所述预设条件表征扬声器方位与音频的空间信息对应的方位之间的距离小于预设距离。
- 一种采集设备,所述采集设备对应的采集空间形成几何体,所述采集设备包括:视频采集设备及音频采集设备;其中,所述采集设备中的视频采集设备部署的空间方位覆盖整个几何体;每个视频采集设备的设置方位对应设置有N个音频采集设备;N为正整数。
- 根据权利要求9所述的采集设备,其中,所述采集设备还包括:移动装置,配置为接收控制指令,响应所述控制指令移动所述采集设备,使所述采集设备在移动中采集数据;或者,使所述采集设备静止,以在静止的状态下使所述采集设备采集数据。
- 根据权利要求9所述的采集设备,其中,视频采集设备以及音频采集设备的设置位置满足以下至少之一:所有视频采集设备采集的视频数据能还原为球体;所有音频采集设备采集的音频数据能还原为球体。
- 根据权利要求9所述的采集设备,其中,所述几何体的每面均设置有至少一个视频采集设备。
- 根据权利要求9所述的采集设备,其中,所述采集设备还包括:处理器,配置为针对每个视频采集设备的设置方位对应设置的N个音频采集设备,获取采集设备的音频采集设备的空间信息;依据音频采集设备的空间信息,将N个音频采集设备采集的音频数据进行编码,形成M路音频数据;所述M路音频数据中携带音频的空间信息;M为正整数。
- 一种数据处理装置,包括:获取单元,配置为获取采集设备的音频采集设备的空间信息;所述采集设备对应的采集空间形成几何体;所述采集设备中的视频采集设备部署的空间方位覆盖整个几何体;每个视频采集设备的设置方位对应设置有N个音频采集设备;N为正整数;编码单元,配置为针对每个视频采集设备的设置方位对应设置的N个音频采集设备,依据音频采集设备的空间信息,将N个音频采集设备采集的音频数据进行编码,形成M路音频数据;所述M路音频数据中携带音频的空间信息;M为正整数。
- 根据权利要求14所述的装置,其中,所述装置还包括:处理单元,配置为将M路音频数据存储或者发出。
- 一种数据处理装置,包括:接收单元,配置为接收编码后的M路音频数据;解码单元,配置为对编码的M路音频数据进行解码,获得对应的音频的空间信息;M为正整数;第一确定单元,配置为依据获得的音频的空间信息及扬声设备的方位信息,确定M路音频数据对应的Q个扬声设备;Q为正整数;渲染单元,配置为利用确定的Q个扬声设备,渲染M路音频数据。
- 根据权利要求16所述的装置,其中,所述装置还包括:第二确定单元,配置为根据使用者的运动姿态,获得视点和/或兴趣区域在投影映射体上的方位信息,所述渲染单元,还配置为依据Q个扬声设备的方位信息及所述音频的空间信息,调整每个扬声设备所渲染的音频数据。
- 一种存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1至5任一项所述方法的步骤,或者实现权利要求6至8任一项所述方法的步骤。
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP17888003.5A EP3564785A4 (en) | 2016-12-30 | 2017-12-26 | PROCESS AND DEVICE FOR DATA PROCESSING, ACQUISITION DEVICE AND STORAGE MEDIA |
| US16/474,133 US10911884B2 (en) | 2016-12-30 | 2017-12-26 | Data processing method and apparatus, acquisition device, and storage medium |
| US17/131,031 US11223923B2 (en) | 2016-12-30 | 2020-12-22 | Data processing method and apparatus, acquisition device, and storage medium |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201611265760.0 | 2016-12-30 | ||
| CN201611265760.0A CN106774930A (zh) | 2016-12-30 | 2016-12-30 | 一种数据处理方法、装置及采集设备 |
Related Child Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US16/474,133 A-371-Of-International US10911884B2 (en) | 2016-12-30 | 2017-12-26 | Data processing method and apparatus, acquisition device, and storage medium |
| US17/131,031 Division US11223923B2 (en) | 2016-12-30 | 2020-12-22 | Data processing method and apparatus, acquisition device, and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2018121524A1 true WO2018121524A1 (zh) | 2018-07-05 |
Family
ID=58951619
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2017/118600 Ceased WO2018121524A1 (zh) | 2016-12-30 | 2017-12-26 | 一种数据处理方法及装置、采集设备及存储介质 |
Country Status (4)
| Country | Link |
|---|---|
| US (2) | US10911884B2 (zh) |
| EP (1) | EP3564785A4 (zh) |
| CN (1) | CN106774930A (zh) |
| WO (1) | WO2018121524A1 (zh) |
Families Citing this family (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106774930A (zh) * | 2016-12-30 | 2017-05-31 | 中兴通讯股份有限公司 | 一种数据处理方法、装置及采集设备 |
| FR3079706B1 (fr) * | 2018-03-29 | 2021-06-04 | Inst Mines Telecom | Procede et systeme de diffusion d'un flux audio multicanal a des terminaux de spectateurs assistant a un evenement sportif |
| CN109900354B (zh) * | 2019-02-22 | 2021-08-03 | 世邦通信股份有限公司 | 一种鸣笛声音检测设备、鸣笛声音识别定位方法和系统 |
| CN113365013B (zh) * | 2020-03-06 | 2025-06-17 | 华为技术有限公司 | 一种音频处理方法及设备 |
| CN114067810B (zh) * | 2020-07-31 | 2025-12-12 | 华为技术有限公司 | 音频信号渲染方法和装置 |
| WO2022259768A1 (ja) * | 2021-06-11 | 2022-12-15 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ | 音響アクティブセンサ装置 |
| CN113674751A (zh) * | 2021-07-09 | 2021-11-19 | 北京字跳网络技术有限公司 | 音频处理方法、装置、电子设备和存储介质 |
| CN113660063B (zh) * | 2021-08-18 | 2023-12-08 | 杭州网易智企科技有限公司 | 空间音频数据处理方法、装置、存储介质及电子设备 |
| CN114598984B (zh) * | 2022-01-11 | 2023-06-02 | 华为技术有限公司 | 立体声合成方法和系统 |
| CN116309566B (zh) * | 2023-05-17 | 2023-09-12 | 深圳大学 | 基于点云的粘连人造杆状物单体化提取方法及相关设备 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101350931A (zh) * | 2008-08-27 | 2009-01-21 | 深圳华为通信技术有限公司 | 音频信号的生成、播放方法及装置、处理系统 |
| US20150149943A1 (en) * | 2010-11-09 | 2015-05-28 | Sony Corporation | Virtual room form maker |
| CN205039929U (zh) * | 2015-09-24 | 2016-02-17 | 北京工业大学 | 全景摄像机的音视频数据采集并行多路无线传输及处理装置 |
| CN105761721A (zh) * | 2016-03-16 | 2016-07-13 | 广东佳禾声学科技有限公司 | 一种携带位置信息的语音编码方法 |
| CN106774930A (zh) * | 2016-12-30 | 2017-05-31 | 中兴通讯股份有限公司 | 一种数据处理方法、装置及采集设备 |
Family Cites Families (21)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5495576A (en) | 1993-01-11 | 1996-02-27 | Ritchey; Kurtis J. | Panoramic image based virtual reality/telepresence audio-visual system and method |
| US5714997A (en) * | 1995-01-06 | 1998-02-03 | Anderson; David P. | Virtual reality television system |
| US7355623B2 (en) * | 2004-04-30 | 2008-04-08 | Microsoft Corporation | System and process for adding high frame-rate current speaker data to a low frame-rate video using audio watermarking techniques |
| KR100682904B1 (ko) * | 2004-12-01 | 2007-02-15 | 삼성전자주식회사 | 공간 정보를 이용한 다채널 오디오 신호 처리 장치 및 방법 |
| US20090094375A1 (en) * | 2007-10-05 | 2009-04-09 | Lection David B | Method And System For Presenting An Event Using An Electronic Device |
| US20100098258A1 (en) | 2008-10-22 | 2010-04-22 | Karl Ola Thorn | System and method for generating multichannel audio with a portable electronic device |
| EP2194527A3 (en) * | 2008-12-02 | 2013-09-25 | Electronics and Telecommunications Research Institute | Apparatus for generating and playing object based audio contents |
| CN102480671B (zh) * | 2010-11-26 | 2014-10-08 | 华为终端有限公司 | 视频通信中的音频处理方法和装置 |
| US9930225B2 (en) * | 2011-02-10 | 2018-03-27 | Villmer Llc | Omni-directional camera and related viewing software |
| TWI615834B (zh) * | 2013-05-31 | 2018-02-21 | Sony Corp | 編碼裝置及方法、解碼裝置及方法、以及程式 |
| US9451162B2 (en) * | 2013-08-21 | 2016-09-20 | Jaunt Inc. | Camera array including camera modules |
| JP6612753B2 (ja) * | 2013-11-27 | 2019-11-27 | ディーティーエス・インコーポレイテッド | 高チャンネル数マルチチャンネルオーディオのためのマルチプレットベースのマトリックスミキシング |
| US20170086005A1 (en) * | 2014-03-25 | 2017-03-23 | Intellectual Discovery Co., Ltd. | System and method for processing audio signal |
| JP6149818B2 (ja) * | 2014-07-18 | 2017-06-21 | 沖電気工業株式会社 | 収音再生システム、収音再生装置、収音再生方法、収音再生プログラム、収音システム及び再生システム |
| WO2016084592A1 (ja) * | 2014-11-28 | 2016-06-02 | ソニー株式会社 | 送信装置、送信方法、受信装置および受信方法 |
| US9794721B2 (en) * | 2015-01-30 | 2017-10-17 | Dts, Inc. | System and method for capturing, encoding, distributing, and decoding immersive audio |
| CN106020441A (zh) * | 2016-05-06 | 2016-10-12 | 四川大学 | 一种360°全息实时交互装置 |
| CN109313904B (zh) * | 2016-05-30 | 2023-12-08 | 索尼公司 | 视频音频处理设备和方法以及存储介质 |
| CN106162206A (zh) * | 2016-08-03 | 2016-11-23 | 北京疯景科技有限公司 | 全景录制、播放方法及装置 |
| US10531220B2 (en) * | 2016-12-05 | 2020-01-07 | Magic Leap, Inc. | Distributed audio capturing techniques for virtual reality (VR), augmented reality (AR), and mixed reality (MR) systems |
| US10499181B1 (en) * | 2018-07-27 | 2019-12-03 | Sony Corporation | Object audio reproduction using minimalistic moving speakers |
-
2016
- 2016-12-30 CN CN201611265760.0A patent/CN106774930A/zh active Pending
-
2017
- 2017-12-26 WO PCT/CN2017/118600 patent/WO2018121524A1/zh not_active Ceased
- 2017-12-26 US US16/474,133 patent/US10911884B2/en active Active
- 2017-12-26 EP EP17888003.5A patent/EP3564785A4/en not_active Withdrawn
-
2020
- 2020-12-22 US US17/131,031 patent/US11223923B2/en not_active Expired - Fee Related
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101350931A (zh) * | 2008-08-27 | 2009-01-21 | 深圳华为通信技术有限公司 | 音频信号的生成、播放方法及装置、处理系统 |
| US20150149943A1 (en) * | 2010-11-09 | 2015-05-28 | Sony Corporation | Virtual room form maker |
| CN205039929U (zh) * | 2015-09-24 | 2016-02-17 | 北京工业大学 | 全景摄像机的音视频数据采集并行多路无线传输及处理装置 |
| CN105761721A (zh) * | 2016-03-16 | 2016-07-13 | 广东佳禾声学科技有限公司 | 一种携带位置信息的语音编码方法 |
| CN106774930A (zh) * | 2016-12-30 | 2017-05-31 | 中兴通讯股份有限公司 | 一种数据处理方法、装置及采集设备 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP3564785A4 * |
Also Published As
| Publication number | Publication date |
|---|---|
| EP3564785A1 (en) | 2019-11-06 |
| CN106774930A (zh) | 2017-05-31 |
| US11223923B2 (en) | 2022-01-11 |
| EP3564785A4 (en) | 2020-08-12 |
| US20210112363A1 (en) | 2021-04-15 |
| US20190387347A1 (en) | 2019-12-19 |
| US10911884B2 (en) | 2021-02-02 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2018121524A1 (zh) | 一种数据处理方法及装置、采集设备及存储介质 | |
| US10863159B2 (en) | Field-of-view prediction method based on contextual information for 360-degree VR video | |
| CN110121695B (zh) | 虚拟现实领域中的装置及相关联的方法 | |
| CN109906616B (zh) | 用于确定一或多个音频源的一或多个音频表示的方法、系统和设备 | |
| CN109564504B (zh) | 用于基于移动处理空间化音频的多媒体装置 | |
| WO2018196469A1 (zh) | 声场的音频数据的处理方法及装置 | |
| JP2018523326A (ja) | 全球状取込方法 | |
| CN111615835A (zh) | 用于处理虚拟现实环境中听音位置之间的局部转换的方法和系统 | |
| WO2015122108A1 (ja) | 情報処理装置、情報処理方法及びプログラム | |
| JP2022116221A (ja) | 空間オーディオに関する方法、装置およびコンピュータプログラム | |
| JP2018527655A (ja) | ユーザ・ハプティック空間(HapSpace)に基づくハプティック・フィードバックおよびインタラクティブ性を提供する方法および装置 | |
| US20210112361A1 (en) | Methods and Systems for Simulating Acoustics of an Extended Reality World | |
| US20170193704A1 (en) | Causing provision of virtual reality content | |
| WO2019030427A1 (en) | THREE-DIMENSIONAL VIDEO PROCESSING | |
| WO2019034804A2 (en) | THREE-DIMENSIONAL VIDEO PROCESSING | |
| US20240089688A1 (en) | Processing of audio data | |
| US11696085B2 (en) | Apparatus, method and computer program for providing notifications | |
| JP2024533078A (ja) | ユーザ電気信号に基づくオーディオ調整 | |
| EP4554229A1 (en) | Production platform | |
| US20240406669A1 (en) | Metadata for Spatial Audio Rendering | |
| US20240406658A1 (en) | Methods and Systems for Automatically Updating Look Directions of Radiation Patterns | |
| GB2632902A (en) | Metadata for spatial audio rendering | |
| WO2018003081A1 (ja) | 全天球カメラ撮像画像表示システム、方法及びプログラム | |
| WO2023199818A1 (ja) | 音響信号処理装置、音響信号処理方法、及び、プログラム |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17888003 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2017888003 Country of ref document: EP Effective date: 20190730 |
