WO2021218558A1 - 音频信号的比特分配方法和装置 - Google Patents
音频信号的比特分配方法和装置 Download PDFInfo
- Publication number
- WO2021218558A1 WO2021218558A1 PCT/CN2021/084578 CN2021084578W WO2021218558A1 WO 2021218558 A1 WO2021218558 A1 WO 2021218558A1 CN 2021084578 W CN2021084578 W CN 2021084578W WO 2021218558 A1 WO2021218558 A1 WO 2021218558A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio signal
- parameter
- classification
- sound field
- parameters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/002—Dynamic bit allocation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0212—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/167—Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
Definitions
- This application relates to audio processing technology, and in particular to a method and device for bit allocation of audio signals.
- immersive audio technology is to provide users with a better three-dimensional sound experience by expanding the audio to high-dimensional space representation.
- Three-dimensional audio technology is no longer simply using multi-channel representation at the playback end, but reconstructs the audio signal in three-dimensional space, and realizes the representation of audio in three-dimensional space through rendering technology.
- the number of bits allocated to each audio signal for coding and decoding cannot reflect the difference in the spatial characteristics of the audio signal at the playback end, nor can it adapt to the characteristics of the audio signal. Reduce the audio signal coding and decoding efficiency.
- the present application provides a method and device for bit allocation of audio signals to adapt to the characteristics of audio signals, and at the same time match different number of coding bits for different audio signals, thereby improving the coding and decoding efficiency of audio signals.
- the present application provides a method for bit allocation of audio signals, including: obtaining T audio signals in a current frame, where T is a positive integer; determining a first audio signal set according to the T audio signals, and the first audio signal set An audio signal set includes M audio signals, M is a positive integer, and the T audio signals include the M audio signals, T ⁇ M; determine the value of the M audio signals in the first audio signal set Priority; bit allocation is performed on the M audio signals according to the priority of the M audio signals.
- the present application determines the priority of the multiple audio signals according to the characteristics of the multiple audio signals included in the current frame and the related information of the audio signals in the metadata, and determines the number of bits to be allocated to each audio signal according to the priority, It can not only adapt to the characteristics of audio signals, but also match different number of coding bits for different audio signals, which improves the coding and decoding efficiency of audio signals.
- the determining the priority of the M audio signals in the first audio signal set includes: obtaining a sound field classification parameter of each audio signal in the M audio signals; The priority of the M audio signals is determined according to the sound field classification parameter of each audio signal in the M audio signals.
- the obtaining the sound field classification parameters of each of the M audio signals includes: obtaining the motion classification parameters, volume classification parameters, propagation classification parameters, and diffusion classification of the first audio signal One or more of parameters, state grading parameters, sorting grading parameters, and signal grading parameters, where the first audio signal is any one of the M audio signals; according to the acquired motion grading parameters and volume grading parameters , One or more of the propagation grading parameter, the diffusion grading parameter, the state grading parameter, the sort grading parameter, and the signal grading parameter to obtain the sound field grading parameter of the first audio signal; wherein, the motion grading parameter is used to describe the How fast the first audio signal moves per unit time in the spatial sound field, the volume classification parameter is used to describe the volume of the first audio signal in the spatial sound field, and the propagation classification parameter is used to describe the first audio signal The size of the propagation range in the space sound field, the diffusion classification parameter is used to describe the size of the diffusion range of the first
- the priority of the audio signal involving multiple dimensions of information can be obtained.
- the T audio signals in the current frame while acquiring the T audio signals in the current frame, it also includes: acquiring S groups of metadata in the current frame, where S is a positive integer, T ⁇ S, and S
- S is a positive integer
- T ⁇ S the number of subcarriers in the current frame
- S is a positive integer
- S ⁇ S the number of subcarriers in the current frame
- S is a positive integer
- T ⁇ S the number of metadata
- the metadata is used to describe the state of the corresponding audio signal in the spatial sound field.
- Metadata is used as the description information of the state of the corresponding audio signal in the spatial sound field, and can provide a reliable and effective basis for subsequent acquisition of the sound field grading parameters of the audio signal.
- the obtaining the sound field classification parameters of each audio signal in the M audio signals includes: according to metadata corresponding to the first audio signal, or according to the first audio signal and The metadata corresponding to the first audio signal obtains one or one of the motion classification parameter, volume classification parameter, propagation classification parameter, diffusion classification parameter, state classification parameter, ranking classification parameter, and signal classification parameter of the first audio signal.
- the first audio signal is any one of the M audio signals; according to the acquired motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, sort classification parameters, and One or more of the signal classification parameters obtain the sound field classification parameters of the first audio signal; wherein, the motion classification parameter is used to describe how fast the first audio signal moves in a unit time in the spatial sound field, and the volume
- the classification parameter is used to describe the volume of the first audio signal in the spatial sound field
- the propagation classification parameter is used to describe the propagation range of the first audio signal in the spatial sound field
- the diffusion classification parameter is used to describe
- the state classification parameter is used to describe the size of the sound source segmentation of the first audio signal in the spatial sound field
- the order classification parameter is used to describe the first audio signal.
- the size of an audio signal prioritized in the spatial sound field, and the signal grading parameter is used to describe the size of energy in the encoding process of the first audio signal.
- the acquisition is based on one or more of the acquired motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, ranking classification parameters, and signal classification parameters.
- the sound field grading parameter of the first audio signal includes: weighting multiple of the acquired motion grading parameter, volume grading parameter, propagation grading parameter, diffusion grading parameter, state grading parameter, ranking grading parameter, and signal grading parameter.
- the sound field grading parameters are averaged; or, the obtained motion grading parameters, volume grading parameters, propagation grading parameters, diffusion grading parameters, state grading parameters, ranking grading parameters, and signal grading parameters are averaged and obtained
- the sound field classification parameter; or, the acquired one of the motion classification parameter, volume classification parameter, propagation classification parameter, diffusion classification parameter, state classification parameter, ranking classification parameter, and signal classification parameter is used as the sound field classification parameter.
- the determining the priority of the M audio signals according to the sound field grading parameters of each audio signal in the M audio signals includes: comparing the priority of the M audio signals according to a set first correspondence relationship with The priority corresponding to the sound field grading parameter of the first audio signal is determined as the priority of the first audio signal, and the first correspondence includes a correspondence between multiple sound field grading parameters and multiple priorities, of which one Or a plurality of the sound field classification parameters correspond to one of the priorities, and the first audio signal is any one of the M audio signals; or, the sound field classification parameter of the first audio signal is used as the first The priority of an audio signal; or, determine the range of the sound field classification parameter of the first audio signal according to multiple set range thresholds, and give priority to the corresponding range of the sound field classification parameter of the first audio signal The level is determined as the priority of the first audio signal.
- the performing bit allocation on the M audio signals according to the priority of the M audio signals includes: performing bit allocation according to the currently available number of bits and the priority of the M audio signals Bit allocation, the higher the priority audio signal, the more bits are allocated.
- the performing bit allocation according to the number of currently available bits and the priority of the M audio signals includes: determining the number of bits of the first audio signal according to the priority of the first audio signal
- the first audio signal is any one of the M audio signals; the first audio signal is obtained according to the product of the currently available number of bits and the number of bits of the first audio signal Number of bits.
- the performing bit allocation according to the number of currently available bits and the priority of the M audio signals includes: determining from a set second correspondence according to the priority of the first audio signal
- the number of bits of the first audio signal, the second correspondence relationship includes a correspondence relationship between multiple priorities and multiple numbers of bits, wherein one or more of the priorities correspond to one number of bits, so
- the first audio signal is any one of the M audio signals.
- the determining a first audio signal set according to the T audio signals includes: adding a pre-designated audio signal among the T audio signals to the first audio signal set.
- the determining the first audio signal set according to the T audio signals includes: adding audio signals corresponding to the S groups of metadata in the T audio signals to the first audio signal.
- the obtaining the sound field classification parameters of each of the M audio signals includes: obtaining the motion classification parameters, volume classification parameters, propagation classification parameters, and diffusion classification of the first audio signal
- the first audio signal is any one of the M audio signals
- the obtaining motion classification parameters, volume classification parameters, propagation classification parameters, and diffusion classification parameters Acquire a plurality of first sound field classification parameters of the first audio signal; acquire one or more of the state classification parameters, sorting classification parameters, and signal classification parameters of the first audio signal; and classify according to the acquired state One or more of the parameters, the sorting grading parameters, and the signal grading parameters to obtain the second sound field grading parameter of the first audio signal; to obtain the first sound field grading parameter according to the first sound field grading parameter and the second sound field grading parameter A sound field grading parameter of an audio signal; wherein the motion grading parameter is used to describe how fast the first audio signal moves per unit time in the spatial sound field, and the volume
- the propagation grading parameter is used to describe the size of the propagation range of the first audio signal during playback in the spatial sound field
- the diffusion grading parameter is used to describe the spatial sound field of the first audio signal.
- the size of the diffusion range in the sound field, the state grading parameter is used to describe the size of the sound source segmentation of the first audio signal in the spatial sound field, and the ranking grading parameter is used to describe the priority of the first audio signal in the spatial sound field
- the size of the sorting, and the signal grading parameter is used to describe the size of the energy in the encoding process of the first audio signal.
- the obtaining the sound field classification parameters of each audio signal in the M audio signals includes: according to metadata corresponding to the first audio signal, or according to the first audio signal and The metadata corresponding to the first audio signal acquires one or more of the motion classification parameter, the volume classification parameter, the propagation classification parameter, and the diffusion classification parameter of the first audio signal, and the first audio signal is the Any one of the M audio signals; acquiring the first sound field classification parameter of the first audio signal according to one or more of the acquired motion classification parameters, volume classification parameters, propagation classification parameters, and diffusion classification parameters; Acquire the state grading parameter, sorting grading parameter, and signal of the first audio signal according to the metadata corresponding to the first audio signal, or according to the first audio signal and the metadata corresponding to the first audio signal One or more of the grading parameters; acquiring the second sound field grading parameter of the first audio signal according to one or more of the acquired state grading parameters, sorting grading parameters, and signal grading parameters; according to the first The sound field classification
- This application uses multiple methods to obtain multiple sound field classification parameters related to the audio signal according to the different characteristics of the audio signal, and then determines the priority of the audio signal according to the multiple sound field classification parameters, so that the obtained priority can refer to the number of audio signals.
- This feature can also be compatible with implementation schemes corresponding to different features.
- the determining the priority of the M audio signals according to the sound field classification parameters of each of the M audio signals includes: obtaining according to the first sound field classification parameters The first priority of the first audio signal; the second priority of the first audio signal is obtained according to the second sound field classification parameter; the second priority of the first audio signal is obtained according to the first priority and the second priority The priority of the first audio signal.
- This application uses multiple methods to obtain multiple priorities related to audio signals according to different characteristics of audio signals, and then the multiple priorities are compatible and combined to obtain the final priority of the audio signal, so that the obtained priority can refer to the audio signal.
- the multiple features of can also be compatible with the implementation schemes corresponding to different features.
- the present application provides an audio signal encoding method. After executing the audio signal bit allocation method according to any one of the above first aspects, the method further includes: according to the bits allocated by the M audio signals Encode the M audio signals to obtain an encoded bitstream.
- the encoded bitstream includes the number of bits of the M audio signals.
- the present application provides an audio signal decoding method. After executing the audio signal bit allocation method according to any one of the above-mentioned first aspects, the method further includes: receiving an encoded bitstream; and executing the method as in the above-mentioned first aspect The audio signal bit allocation method described in any one of the methods for acquiring the respective bit numbers of the M audio signals; and reconstructing the M audio signals according to the respective bit numbers of the M audio signals and the code stream.
- the present application provides a bit allocation device for audio signals, including: a processing module for obtaining T audio signals in the current frame, where T is a positive integer; and determining the first audio signal according to the T audio signals Set, the first audio signal set includes M audio signals, M is a positive integer, the T audio signals include the M audio signals, T ⁇ M; determine the first audio signal set in the Priorities of the M audio signals; bit allocation is performed on the M audio signals according to the priorities of the M audio signals.
- the processing module is specifically configured to obtain the sound field classification parameter of each audio signal in the M audio signals; according to the sound field classification parameter of each audio signal in the M audio signals The priority of the M audio signals is determined.
- the processing module is specifically configured to obtain the motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, ranking classification parameters, and signal classification parameters of the first audio signal.
- the first audio signal is any one of the M audio signals; according to the acquired motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, One or more of the sorting grading parameter and the signal grading parameter obtains the sound field grading parameter of the first audio signal; wherein the motion grading parameter is used to describe how fast the first audio signal moves in the spatial sound field per unit time
- the volume classification parameter is used to describe the volume of the first audio signal in the spatial sound field
- the propagation classification parameter is used to describe the propagation range of the first audio signal in the spatial sound field
- the diffusion classification is used to describe the size of the diffusion range of the first audio signal in the spatial sound field
- the state classification parameter is used to describe the size of the sound source segmentation of the first audio signal in the
- the processing module is specifically configured to obtain S groups of metadata in the current frame, where S is a positive integer, T ⁇ S, the S groups of metadata and the T audio Signal correspondence, and the metadata is used to describe the state of the corresponding audio signal in the spatial sound field.
- the processing module is specifically configured to obtain the data according to the metadata corresponding to the first audio signal, or according to the first audio signal and the metadata corresponding to the first audio signal.
- One or more of the motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, ranking classification parameters, and signal classification parameters of the first audio signal, and the first audio signal is the M Any one of the following audio signals; acquiring the first audio signal according to one or more of the acquired motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, ranking classification parameters, and signal classification parameters
- a sound field grading parameter of an audio signal wherein the motion grading parameter is used to describe how fast the first audio signal moves per unit time in the spatial sound field, and the volume grading parameter is used to describe the spatial sound field of the first audio signal.
- the size of the volume in the sound field is used to describe the size of the propagation range of the first audio signal in the spatial sound field
- the diffusion grading parameter is used to describe the size of the propagation range of the first audio signal in the spatial sound field
- the state classification parameter is used to describe the size of the sound source segmentation of the first audio signal in the spatial sound field
- the ranking classification parameter is used to describe the size of the priority ranking of the first audio signal in the spatial sound field.
- the signal classification parameter is used to describe the magnitude of energy in the encoding process of the first audio signal.
- the processing module is specifically configured to analyze the acquired motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, ranking classification parameters, and signal classification parameters.
- the sound field classification parameter is obtained by multiple weighted averages; or, the obtained motion classification parameter, volume classification parameter, propagation classification parameter, diffusion classification parameter, state classification parameter, ranking classification parameter, and signal classification parameter are obtained Averaging to obtain the sound field classification parameters; or, use one of the acquired motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, ranking classification parameters, and signal classification parameters as the sound field Classification parameters.
- the processing module is specifically configured to determine the priority corresponding to the sound field classification parameter of the first audio signal as the priority of the first audio signal according to the set first correspondence relationship.
- the first correspondence includes a correspondence between multiple sound field classification parameters and multiple priorities, wherein one or more of the sound field classification parameters corresponds to one priority, and the first audio signal Is any one of the M audio signals; or, the sound field classification parameter of the first audio signal is used as the priority of the first audio signal; or, the first audio signal is determined according to a plurality of set range thresholds; The range of the sound field classification parameter of an audio signal is determined, and the priority corresponding to the range of the sound field classification parameter of the first audio signal is determined as the priority of the first audio signal.
- the processing module is specifically configured to perform bit allocation according to the number of currently available bits and the priority of the M audio signals, and the audio signal with a higher priority is allocated more bits.
- the processing module is specifically configured to determine the proportion of the number of bits of the first audio signal according to the priority of the first audio signal, and the first audio signal is the M audio signals. Any one of the signals; obtaining the number of bits of the first audio signal according to the product of the number of currently available bits and the proportion of the number of bits of the first audio signal.
- the processing module is specifically configured to determine the number of bits of the first audio signal from a set second correspondence relationship according to the priority of the first audio signal, and the second correspondence The relationship includes a corresponding relationship between multiple priority levels and multiple bit numbers, wherein one or more of the priority levels corresponds to one bit number, and the first audio signal is any of the M audio signals one.
- the processing module is specifically configured to add pre-designated audio signals among the T audio signals to the first audio signal set.
- the processing module is specifically configured to add the audio signals corresponding to the S groups of metadata in the T audio signals to the first audio signal set; or, will be greater than or The audio signal corresponding to the importance parameter equal to the set participation threshold is added to the first audio signal set, the metadata includes the importance parameter, and the T audio signals include the audio signal corresponding to the importance parameter .
- the processing module is specifically configured to obtain one or more of the motion classification parameter, the volume classification parameter, the propagation classification parameter, and the diffusion classification parameter of the first audio signal, and the first audio signal
- the signal is any one of the M audio signals; the first sound of the first audio signal is obtained according to one or more of the acquired motion classification parameters, volume classification parameters, propagation classification parameters, and diffusion classification parameters.
- Field grading parameters acquiring one or more of the state grading parameters, sorting grading parameters, and signal grading parameters of the first audio signal; according to one or more of the acquired state grading parameters, sorting grading parameters, and signal grading parameters Acquire a plurality of second sound field classification parameters of the first audio signal; acquire the sound field classification parameters of the first audio signal according to the first sound field classification parameters and the second sound field classification parameters; wherein, the motion The classification parameter is used to describe how fast the first audio signal moves per unit time in the spatial sound field, the volume classification parameter is used to describe the volume of the first audio signal when it is played back in the spatial sound field, and the propagation classification parameter Used to describe the size of the propagation range of the first audio signal during playback in the spatial sound field, the diffusion classification parameter is used to describe the size of the diffusion range of the first audio signal in the spatial sound field, and the state classification parameter is used To describe the size of the sound source segmentation of the first audio signal in the spatial sound field, the ranking parameter is used to describe
- the processing module is specifically configured to obtain the data according to the metadata corresponding to the first audio signal, or according to the first audio signal and the metadata corresponding to the first audio signal.
- One or more of the motion classification parameter, the volume classification parameter, the propagation classification parameter, and the diffusion classification parameter of the first audio signal, the first audio signal is any one of the M audio signals;
- One or more of the motion classification parameter, the volume classification parameter, the propagation classification parameter, and the diffusion classification parameter obtain the first sound field classification parameter of the first audio signal; according to the metadata corresponding to the first audio signal, Or obtain one or more of the state classification parameter, the sort classification parameter, and the signal classification parameter of the first audio signal according to the first audio signal and the metadata corresponding to the first audio signal;
- One or more of the state classification parameter, the sort classification parameter, and the signal classification parameter to obtain the second sound field classification parameter of the first audio signal; obtain the second sound field classification parameter according to the first sound field classification parameter and the second sound field classification parameter The sound field grading parameter of the first audio signal;
- the processing module is specifically configured to obtain the first priority of the first audio signal according to the first sound field classification parameter; obtain the first priority of the first audio signal according to the second sound field classification parameter The second priority of the first audio signal; the priority of the first audio signal is acquired according to the first priority and the second priority.
- the processing module is further configured to encode the M audio signals according to the number of bits allocated by the M audio signals to obtain an encoded bitstream.
- the encoded bitstream includes the number of bits of the M audio signals.
- it further includes: a transceiver module, configured to receive an encoded code stream; the processing module, further configured to obtain the respective number of bits of the M audio signals; and according to each of the M audio signals The number of bits and the encoded bitstream are used to reconstruct the M audio signals.
- the present application provides a device including: one or more processors; a memory for storing one or more programs; when the one or more programs are executed by the one or more processors, The one or more processors are caused to implement the method according to any one of the first to third aspects.
- the present application provides a computer-readable storage medium, which is characterized by comprising a computer program that, when executed on a computer, causes the computer to execute any one of the first to third aspects mentioned above. The method described.
- the present application provides a computer-readable storage medium, including an encoded bitstream obtained according to the method described in the above second aspect.
- the present application provides an encoding device, including a processor and a communication interface, the processor reads and stores a computer program through the communication interface, the computer program includes program instructions, and the processor is used to call the Program instructions to execute the method described in any one of the first to third aspects above.
- the present application provides an encoding device, which is characterized by comprising a processor and a memory, where the processor is configured to execute the method described in the second aspect, and the memory is configured to store the encoded code stream.
- FIG. 1A exemplarily shows a schematic block diagram of an audio encoding and decoding system 10 applied in this application;
- FIG. 1B is an explanatory diagram of an example of an audio decoding system 40 according to an exemplary embodiment
- FIG. 2 is a schematic diagram of the structure of an audio decoding device 200 provided by the present application.
- FIG. 3 is a simplified block diagram of an apparatus 300 according to an exemplary embodiment
- FIG. 4 is a schematic flowchart of a method for allocating audio signals according to the present application.
- Fig. 5 is an exemplary schematic diagram of the position of the audio signal in the spatial sound field
- Fig. 6 is an exemplary schematic diagram of the priority of the audio signal in the spatial sound field
- FIG. 7 is a schematic structural diagram of an embodiment of a device of this application.
- FIG. 8 is a schematic structural diagram of an embodiment of a device of this application.
- At least one (item) refers to one or more, and “multiple” refers to two or more.
- “And/or” is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, “A and/or B” can mean: only A, only B, and both A and B , Where A and B can be singular or plural.
- the character “/” generally indicates that the associated objects before and after are in an “or” relationship.
- the following at least one item (a) or similar expressions refers to any combination of these items, including any combination of a single item (a) or a plurality of items (a).
- At least one of a, b, or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c" ", where a, b, and c can be single or multiple.
- Audio frame Audio data is streaming.
- the amount of audio data within a period of time is usually taken as a frame of audio. This period is called “sampling time", which can be coded and decoded.
- the value is determined by the requirements of the device and the specific application, for example, the duration is 2.5ms-60ms, and ms is milliseconds.
- Audio signal An audio signal is an information carrier of regular sound waves with voice, music and sound effects that change in frequency and amplitude. Audio is a continuously changing analog signal, which can be represented by a continuous curve, called a sound wave. Audio is the audio signal through analog-to-digital conversion or a digital signal generated by a computer. The sound wave has three important parameters: frequency, amplitude and phase, which determine the characteristics of the audio signal.
- Metadata also known as intermediary data, relay data, is data describing data (data about data), mainly used to describe data properties, and support such as indicating storage location, historical data, and resource search , File recording and other functions. Metadata is information about the organization of data, data domains and their relationships. In short, metadata is data about data. The metadata in this application is used to describe the state of the corresponding audio signal in the spatial sound field.
- Three-dimensional audio Three-dimensional audio:
- FIG. 1A exemplarily shows a schematic block diagram of an audio encoding and decoding system 10 applied in this application.
- the audio encoding and decoding system 10 may include a source device 12 and a destination device 14.
- the source device 12 generates encoded audio data. Therefore, the source device 12 may be referred to as an audio encoding device.
- the destination device 14 can decode the encoded audio data generated by the source device 12, and therefore, the destination device 14 can be referred to as an audio decoding device.
- Various implementations of source device 12, destination device 14, or both may include one or more processors and memory coupled to the one or more processors.
- the memory may include, but is not limited to, random access memory (RAM), read-only memory (ROM), flash memory, or can be used in the form of instructions or data structures that can be accessed by a computer Any other medium that stores the desired program code.
- the source device 12 and the destination device 14 may include various devices, including desktop computers, mobile computing devices, notebook (for example, laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called "smart" phones. , Televisions, cameras, display devices, digital media players, audio game consoles, on-board computers, wireless communication equipment, or the like.
- FIG. 1A shows the source device 12 and the destination device 14 as separate devices
- the device embodiment may also include the source device 12 and the destination device 14 or the functionality of both, that is, the source device 12 or the corresponding function. And the destination device 14 or the corresponding functionality.
- the same hardware and/or software may be used, or separate hardware and/or software, or any combination thereof may be used to implement the source device 12 or the corresponding functionality and the destination device 14 or the corresponding functionality.
- the source device 12 and the destination device 14 can communicate with each other via a link 13, and the destination device 14 can receive encoded audio data from the source device 12 via the link 13.
- the link 13 may include one or more media or devices capable of moving the encoded audio data from the source device 12 to the destination device 14.
- the link 13 may include one or more communication media that enable the source device 12 to transmit the encoded audio data directly to the destination device 14 in real time.
- the source device 12 may modulate the encoded audio data according to a communication standard (for example, a wireless communication protocol), and may transmit the modulated audio data to the destination device 14.
- the one or more communication media may include wireless and/or wired communication media, such as a radio frequency (RF) spectrum or one or more physical transmission lines.
- RF radio frequency
- the one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (e.g., the Internet).
- the one or more communication media may include routers, switches, base stations, or other devices that facilitate communication from source device 12 to destination device 14.
- the source device 12 includes an encoder 20, and optionally, the source device 12 may also include an audio source 16, an audio preprocessor 18, and a communication interface 22.
- the encoder 20, the audio source 16, the audio preprocessor 18, and the communication interface 22 may be hardware components in the source device 12, or may be software programs in the source device 12. They are described as follows:
- the audio source 16 may include or may be any type of audio capture device, for example, for capturing real-world sounds, and/or any type of audio generating device, for example, a computer audio processor, or for acquiring and/or providing reality World audio, computer animation audio (for example, screen content, audio in virtual reality (VR)), and/or any combination thereof (for example, audio in augmented reality (AR)) .
- the audio source 16 may be a microphone for capturing audio or a memory for storing audio.
- the audio source 16 may also include any type (internal or external) interface for storing previously captured or generated audio and/or acquiring or receiving audio.
- the audio source 16 can be, for example, an audio capture device that is local or integrated in the source device; when the audio source 16 is a memory, the audio source 16 can be local or, for example, integrated in the source device. Integrated memory.
- the interface may be, for example, an external interface that receives audio from an external audio source.
- the external audio source is, for example, an external audio capture device, such as a microphone, a microphone, an external memory, or an external audio generating device.
- the device is, for example, an external computer audio processor, computer, or server.
- the interface can be any type of interface according to any proprietary or standardized interface protocol, such as a wired or wireless interface, and an optical interface.
- audio can be regarded as a one-dimensional vector of picture elements.
- the pixels in the vector can also be called sampling points.
- the number of sampling points on the vector or audio defines the size of the audio.
- the audio transmitted from the audio source 16 to the audio processor may also be referred to as original audio data 17.
- the audio pre-processor 18 is configured to receive the original audio data 17 and perform pre-processing on the original audio data 17 to obtain pre-processed audio 19 or pre-processed audio data 19.
- the pre-processing performed by the audio pre-processor 18 may include trimming, toning, or denoising.
- the encoder 20 (or audio encoder 20) is used to receive the pre-processed audio data 19, and process the pre-processed audio data 19, so as to provide the encoded audio data 21.
- the encoder 20 may be used to implement the various embodiments described below to implement the application of the audio signal bit allocation method described in this application on the encoding side.
- the communication interface 22 can be used to receive the encoded audio data 21, and can transmit the encoded audio data 21 to the destination device 14 or any other device (such as a memory) through the link 13 for storage or direct reconstruction,
- the other device may be any device used for decoding or storage.
- the communication interface 22 may be used, for example, to encapsulate the encoded audio data 21 into a suitable format, such as a data packet, for transmission on the link 13.
- the destination device 14 includes a decoder 30, and optionally, the destination device 14 may also include a communication interface 28, an audio post-processor 32, and a playback device 34. They are described as follows:
- the communication interface 28 may be used to receive the encoded audio data 21 from the source device 12 or any other source, for example, a storage device, and the storage device is, for example, an encoded audio data storage device.
- the communication interface 28 can be used to transmit or receive the encoded audio data 21 through the link 13 between the source device 12 and the destination device 14 or through any type of network.
- the link 13 is, for example, a direct wired or wireless connection.
- the type of network is, for example, a wired or wireless network or any combination thereof, or any type of private network and public network, or any combination thereof.
- the communication interface 28 may be used, for example, to decapsulate the data packet transmitted by the communication interface 22 to obtain the encoded audio data 21.
- Both the communication interface 28 and the communication interface 22 can be configured as a one-way communication interface or a two-way communication interface, and can be used, for example, to send and receive messages to establish connections, confirm and exchange any other communication links and/or, for example, encoded audio Data transfer information about data transfer.
- the decoder 30 (or referred to as the decoder 30) is used to receive the encoded audio data 21 and provide the decoded audio data 31 or the decoded audio 31.
- the decoder 30 may be used to implement the various embodiments described below to implement the application of the audio signal bit allocation method described in this application on the decoding side.
- the audio post-processor 32 is configured to perform post-processing on the decoded audio data 31 (also referred to as reconstructed audio data) to obtain the post-processed audio data 33.
- the post-processing performed by the audio post-processor 32 may include: trimming or resampling, or any other processing, and may also be used to transmit the post-processed audio data 33 to the playback device 34.
- the playback device 34 is used to receive the post-processed audio data 33 to play audio to, for example, users or listeners.
- the playback device 34 may be or may include any type of player for presenting reconstructed audio, for example, an integrated or external speaker or speaker.
- FIG. 1A shows the source device 12 and the destination device 14 as separate devices
- the device embodiment may also include the source device 12 and the destination device 14 or the functionality of both, that is, the source device 12 or the corresponding Functionality and destination device 14 or corresponding functionality.
- the same hardware and/or software may be used, or separate hardware and/or software, or any combination thereof may be used to implement the source device 12 or the corresponding functionality and the destination device 14 or the corresponding functionality.
- the source device 12 and the destination device 14 may include any of a variety of devices, including any types of handheld or stationary devices, such as notebook or laptop computers, mobile phones, smart phones, tablets or tablet computers, cameras, desktop computers , Set-top boxes, televisions, cameras, in-vehicle devices, playback devices, digital media players, game consoles, media streaming devices (such as content service servers or content distribution servers), broadcast receiver devices, broadcast transmitter devices, etc., And can not use or use any type of operating system.
- handheld or stationary devices such as notebook or laptop computers, mobile phones, smart phones, tablets or tablet computers, cameras, desktop computers , Set-top boxes, televisions, cameras, in-vehicle devices, playback devices, digital media players, game consoles, media streaming devices (such as content service servers or content distribution servers), broadcast receiver devices, broadcast transmitter devices, etc., And can not use or use any type of operating system.
- Both the encoder 20 and the decoder 30 can be implemented as any of various suitable circuits, for example, one or more microprocessors, digital signal processors (digital signal processors, DSP), and application-specific integrated circuits (application-specific integrated circuits). circuit, ASIC), field-programmable gate array (FPGA), discrete logic, hardware, or any combination thereof.
- the device can store the instructions of the software in a suitable non-transitory computer-readable storage medium, and can use one or more processors to execute the instructions in hardware to execute the technology of the present disclosure. . Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) can be regarded as one or more processors.
- the audio encoding and decoding system 10 shown in FIG. 1A is only an example, and the technology of the present application can be applied to audio encoding settings that do not necessarily include any data communication between encoding and decoding devices (for example, audio encoding or audio decoding).
- the data can be retrieved from local storage, streamed on the network, etc.
- the audio encoding device can encode data and store the data to the memory, and/or the audio decoding device can retrieve the data from the memory and decode the data.
- encoding and decoding are performed by devices that do not communicate with each other but only encode data to and/or retrieve data from the memory and decode the data.
- FIG. 1B is an explanatory diagram of an example of an audio decoding system 40 according to an exemplary embodiment.
- the audio decoding system 40 can implement a combination of various technologies of the present application.
- the audio decoding system 40 may include a microphone 41, an encoder 20, a decoder 30 (and/or an audio encoder/decoder implemented by the logic circuit 47 of the processing unit 46), an antenna 42, One or more processors 43, one or more memories 44, and/or playback devices 45.
- the microphone 41, the antenna 42, the processing unit 46, the logic circuit 47, the encoder 20, the decoder 30, the processor 43, the memory 44, and/or the playback device 45 can communicate with each other.
- the encoder 20 and the decoder 30 are used to illustrate the audio coding system 40, in different examples, the audio coding system 40 may include only the encoder 20 or only the decoder 30.
- the antenna 42 may be used to transmit or receive an encoded stream of audio data.
- the playback device 45 may be used to play audio data.
- the logic circuit 47 may be implemented by the processing unit 46.
- the processing unit 46 may include application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, and the like.
- the audio decoding system 40 may also include an optional processor 43, and the optional processor 43 may similarly include application-specific integrated circuit (ASIC) logic, general-purpose processors, and the like.
- the logic circuit 47 may be implemented by hardware, such as dedicated audio coding hardware, and the processor 43 may be implemented by general software, an operating system, and the like.
- the memory 44 may be any type of memory, such as volatile memory (for example, static random access memory (Static Random Access Memory, SRAM), dynamic random access memory (Dynamic Random Access Memory, DRAM), etc.) or non-volatile memory. Memory (for example, flash memory, etc.), etc.
- volatile memory for example, static random access memory (Static Random Access Memory, SRAM), dynamic random access memory (Dynamic Random Access Memory, DRAM), etc.
- Memory for example, flash memory, etc.
- the memory 44 may be implemented by cache memory.
- the logic circuit 47 can access the memory 44.
- the logic circuit 47 and/or the processing unit 46 may include a memory (for example, a cache, etc.) for implementing buffers and the like.
- the encoder 20 implemented by logic circuits may include a buffer (e.g., implemented by the processing unit 46 or memory 44) and an audio processing unit (e.g., implemented by the processing unit 46).
- the audio processing unit may be communicatively coupled to the buffer.
- the audio processing unit may include an encoder 20 implemented by a logic circuit 47 to implement various modules discussed in any other encoder system or subsystem described herein. Logic circuits can be used to perform the various operations discussed herein.
- decoder 30 may be implemented by logic circuit 47 in a similar manner to implement the various modules discussed in any other decoder system or subsystem described herein.
- the decoder 30 implemented by the logic circuit may include a buffer (implemented by the processing unit 2820 or the memory 44) and an audio processing unit (implemented by the processing unit 46, for example).
- the audio processing unit may be communicatively coupled to the buffer.
- the audio processing unit may include a decoder 30 implemented by a logic circuit 47 to implement various modules discussed in any other decoder system or subsystem described herein.
- the antenna 42 may be used to receive an encoded bitstream of audio data.
- the encoded bitstream may include audio signal data, metadata, etc., related to audio frames discussed herein.
- the audio coding system 40 may also include a decoder 30 coupled to the antenna 42 and used to decode the encoded bitstream.
- the playback device 45 is used to play audio frames.
- the decoder 30 may be used to perform the reverse process.
- the decoder 30 can be used to receive and parse such metadata, and decode related audio data accordingly.
- the encoder 20 may entropy encode the metadata into an encoded audio code stream. In such instances, decoder 30 may parse such metadata and decode related audio data accordingly.
- Fig. 2 is a schematic structural diagram of an audio decoding device 200 (for example, an audio encoding device or an audio decoding device) provided by the present application.
- the audio decoding device 200 is suitable for implementing the embodiments described in this application.
- the audio decoding device 200 may be an audio decoder (for example, the decoder 30 of FIG. 1A) or an audio encoder (for example, the encoder 20 of FIG. 1A).
- the audio decoding device 200 may be one or more components of the decoder 30 in FIG. 1A or the encoder 20 in FIG. 1A described above.
- the audio decoding device 200 includes: an entry port 210 for receiving data and a receiving unit (Rx) 220, a processor, logic unit or central processing unit (CPU) 230 for processing data, and a transmitter unit for transmitting data (Tx) 240 and outlet port 250, and a memory 260 for storing data.
- the audio decoding device 200 may further include photoelectric conversion components and electro-optical (EO) components coupled with the inlet port 210, the receiver unit 220, the transmitter unit 240, and the outlet port 250 for the outlet or inlet of optical or electrical signals.
- EO electro-optical
- the processor 230 is implemented by hardware and software.
- the processor 230 may be implemented as one or more CPU chips, cores (for example, multi-core processors), FPGAs, ASICs, and DSPs.
- the processor 230 communicates with the ingress port 210, the receiver unit 220, the transmitter unit 240, the egress port 250, and the memory 260.
- the processor 230 includes a decoding module 270 (for example, an encoding module 270 or a decoding module 270).
- the encoding/decoding module 270 implements the embodiments disclosed in this document to implement the audio signal bit allocation method provided in this application. For example, the encoding/decoding module 270 implements, processes, or provides various encoding operations.
- the encoding/decoding module 270 provides a substantial improvement to the function of the audio decoding device 200, and affects the conversion of the audio decoding device 200 to different states.
- the encoding/decoding module 270 is implemented by instructions stored in the memory 260 and executed by the processor 230.
- the memory 260 includes one or more magnetic disks, tape drives, and solid-state hard drives, which can be used as an overflow data storage device for storing programs when these programs are selectively executed, and storing instructions and data read during program execution.
- the memory 260 may be volatile and/or non-volatile, and may be read-only memory (ROM), random access memory (RAM), random access memory (ternary content-addressable memory, TCAM) and/or static Random Access Memory (SRAM).
- FIG. 3 is a simplified block diagram of an apparatus 300 according to an exemplary embodiment.
- the device 300 can implement the technology of the present application.
- FIG. 3 is a schematic block diagram of an implementation manner of an encoding device or a decoding device (referred to as a decoding device 300 for short) of this application.
- the apparatus 300 may include a processor 310, a memory 330, and a bus system 350.
- the processor and the memory are connected by a bus system, the memory is used to store instructions, and the processor is used to execute instructions stored in the memory.
- the memory of the decoding device stores program code, and the processor can call the program code stored in the memory to execute the method described in this application. To avoid repetition, it will not be described in detail here.
- the processor 310 may be a central processing unit (Central Processing Unit, referred to as "CPU"), and the processor 310 may also be other general-purpose processors, digital signal processors (DSP), or application specific integrated circuits ( ASIC), off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
- the memory 330 may include a read only memory (ROM) device or a random access memory (RAM) device. Any other suitable type of storage device can also be used as the memory 330.
- the memory 330 may include code and data 331 accessed by the processor 310 using the bus 350.
- the memory 330 may further include an operating system 333 and application programs 335.
- the bus system 350 may also include a power bus, a control bus, a status signal bus, and the like. However, for clear description, various buses are marked as the bus system 350 in the figure.
- the decoding device 300 may further include one or more output devices, such as a speaker 370.
- the speaker 370 may be a headset or an external speaker.
- the speaker 370 may be connected to the processor 310 via the bus 350.
- FIG. 4 is a schematic flowchart of a method for allocating audio signals according to the present application.
- the process 400 may be executed by the source device 12 or the destination device 14.
- the process 400 is described as a series of steps or operations. It should be understood that the process 400 may be executed in various orders and/or occur simultaneously, and is not limited to the execution order shown in FIG. 4.
- the method includes:
- Step 401 Acquire T audio signals in the current frame.
- the current frame is the audio frame acquired at the current moment during the execution of the method of the present application.
- 3D audio technology no longer simply uses multi-channel representation, but uses different audio signals to represent different sounds.
- the environment includes human voices and music. Sound, car sound, etc., use three audio signals to represent human voice, music sound, and car sound, and then reconstruct each sound in three-dimensional space based on these three audio signals to realize a variety of sounds in three-dimensional space. Representation of space. That is, the audio frame may contain multiple audio signals, and one audio signal represents a kind of voice, music or sound effect in reality. It should be noted that any technology for extracting audio signals from audio frames can be used in this application, and there is no specific limitation on this.
- S groups of metadata in the current frame are acquired, and the S groups of metadata correspond to the above T audio signals.
- the encoding end is based on pre-processing of original speech, music or sound effects. Audio data and metadata have been generated separately in this process.
- the encoding end can correspond to the start time of the current frame according to the principle of the audio frame. (Sampling point) and end time (sampling point), take the metadata within the corresponding time range as the metadata of the current frame.
- the metadata of the current frame can be obtained by parsing the received code stream.
- the metadata includes parameters such as object index (object_index), azimuth angle (position_azimuth), elevation angle (position_elevation), position radius (position_radius), and gain factor (gain_factor). , Uniform spread (spread_uniform), spread width (spread_width), spread height (spread_height), spread depth (spread_depth), diffusion (diffuseness), importance (priority), division (divergence) and speed (speed), Yuan The value range and the number of bits of the above parameters are recorded in the data. It should be noted that the metadata may also include other parameters and parameter recording forms, which are not specifically limited in this application.
- Metadata Value range (precision) Number of bits object_index 1; 128(1) 7 position_azimuth -180; 180(2) 8 position_elevation -90; 90(5) 6 position_radius 0.5; 16(non-linear) 4 gain_factor 0.004; 5.957(non-linear) 7 spread_uniform 0; 180 7 spread_width 0; 180 7 spread_height 0; 90 5 spread_depth 0; 15.5 4 diffuseness 0; 1 7 priority 0; 7 3 divergence 0; 1 8 speed 0,1 4
- Step 402 Determine a first audio signal set according to the T audio signals.
- the first audio signal set includes M audio signals, where M is a positive integer, and T audio signals include M audio signals, T ⁇ M.
- audio signals with corresponding metadata among the T audio signals may be added to the first audio signal set. That is, if the above T audio signals all correspond to metadata, all T audio signals can be added to the first audio signal set. If only part of the above T audio signals corresponds to metadata, you only need to add this part of the audio signal. The signal is added to the first audio signal set.
- This application may also add pre-designated audio signals among the T audio signals to the first audio signal set. Through high-level signaling or a user-specified manner, part or all of the audio signals in the above T audio signals can be added to the first audio signal set.
- the higher layer signaling directly configures the index of the audio signal to be added to the first audio signal set.
- the user specifies voice, music or sound effects, and adds the audio signal of the specified object to the first audio signal set.
- This application can also refer to the importance parameter of the audio signal recorded in the metadata.
- the importance parameter is used to indicate the importance of the corresponding audio signal in three-dimensional audio.
- the audio signal corresponding to the importance parameter is added to the first audio signal set.
- Step 403 Determine the priority of the M audio signals in the first audio signal set.
- This application may first obtain the sound field classification parameters of each audio signal in the M audio signals, and then determine the priority of the M audio signals according to the sound field classification parameters of each audio signal in the M audio signals.
- the sound field classification parameter can be an index of importance of the audio signal obtained according to the related parameters of the audio signal.
- the related parameters can include motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, sort classification parameters, and signals One or more of the grading parameters. These parameters can be obtained according to the signal characteristics of the audio signal itself, or can be obtained according to the metadata of the audio signal.
- the motion grading parameter is used to describe how fast the first audio signal moves in a unit time in the spatial sound field
- the volume grading parameter is used to describe the volume of the first audio signal when it is played back in the spatial sound field
- the propagation grading parameter is used to describe the first audio signal.
- the size of the propagation range of the audio signal when it is played back in the spatial sound field is used to describe the size of the diffusion range of the first audio signal in the space sound field, and the state classification parameter is used to describe the sound source division of the first audio signal in the space sound field.
- the ranking parameter is used to describe the priority of the first audio signal in the spatial sound field, and the signal ranking parameter is used to describe the amount of energy in the encoding process of the first audio signal.
- the following takes the i-th audio signal as an example to describe the method for obtaining the above-mentioned parameters.
- the i-th audio signal is any one of the above-mentioned M audio signals. It should be noted that the following parameters are exemplary descriptions, and other parameters or characteristics of the audio signal may also be used to calculate the sound field grading parameters, which are not specifically limited in this application.
- the sports classification parameters can be calculated by the following formula:
- speedRatio i represents a motion classification parameter i th audio signal
- f (d i) a map showing the relationship between the i-th audio signals between a moving state spatial sound field with metadata
- D i represents the i-th audio signal
- ⁇ i represents the azimuth angle of the i-th audio signal compared to the rendering center point after moving
- r i represents the distance of the i-th audio signal compared to the rendering center point after moving
- ⁇ 0 represents the comparison of the i-th audio signal before moving
- the azimuth of the center point of the rendering It represents the pitch angle of the i-th audio signal compared to the rendering center point before moving
- r 0 represents the distance of the i-th audio signal compared to the rendering center point before moving.
- the center of the sphere is the rendering center point
- the radius of the sphere is the distance between the position of the i-th audio signal in the space field and the center of the sphere.
- the angle between the position of the audio signal in the space field and the horizontal plane is the pitch angle of the i-th audio signal
- the position of the i-th audio signal in the space field is directly in front of the projection and rendering center point on the horizontal plane.
- the included angle is the azimuth angle of the i-th audio signal; Represents the sum of the mapping relationships between the motion states of the above M audio signals in the spatial sound field and the metadata.
- the sports classification parameters can also be calculated by the following formula:
- sports classification parameters can also be calculated by other methods, which are not specifically limited in this application.
- volume grading parameters can be calculated by the following formula:
- loudRatio i represents the volume grading parameter of the i-th audio signal
- f(A i , gain i , r i ) represents the mapping relationship between the playback volume of the i-th audio signal in the spatial sound field and signal characteristics and metadata
- a i represents the sum or average value of the amplitude of each sampling point of the i-th audio signal in the current frame.
- the amplitude of the sampling point can be obtained through the metadata of the i-th audio signal; gain i means that the audio signal is in the current frame The gain value can be obtained through the metadata of the i-th audio signal; r i represents the distance of the i-th audio signal from the rendering center point in the current frame, and can be obtained through the metadata of the i-th audio signal; Represents the sum of the mapping relationship between the playback volume of the above M audio signals in the spatial sound field and the signal characteristics and metadata.
- volume grading parameters can also be calculated by the following formula:
- mean(A i ) represents the sum or average value of the amplitude of each sampling point of the i-th audio signal in the current frame, and the amplitude of the sampling point can be obtained through the metadata of the i-th audio signal; Represents the sum of the amplitudes or the sum of the average values of the respective sampling points of the M audio signals in the current frame.
- volume grading parameters can also be calculated by the following formula:
- r i represents the distance between the i-th audio signal and the rendering center point, which can be obtained through the metadata of the i-th audio signal; Represents the sum of the reciprocals of the distances between the above M audio signals and the rendering center point.
- volume grading parameters can also be calculated by the following formula:
- gain i represents the gain of the i-th audio signal in rendering, and the gain can be obtained by the user through self-definition of the i-th audio signal, or it can be generated by the decoder through a set rule; Represents the sum of the gains of the above M audio signals in rendering.
- volume grading parameters can also be calculated by other methods, which are not specifically limited in this application.
- the propagation grading parameter describes the propagation degree of the i-th audio signal in the current frame, and can be obtained through the spread-related metadata of the i-th audio signal. It should be noted that the propagation classification parameters can also be calculated by other methods, which are not specifically limited in this application.
- the diffusion grading parameter describes the diffusion degree of the i-th audio signal in the current frame, and can be obtained through the diffuseness-related metadata of the i-th audio signal. It should be noted that the diffusion classification parameters can also be calculated by other methods, which are not specifically limited in this application.
- the state grading parameter describes the division degree of the i-th audio signal in the current frame, and can be obtained through the divergence-related metadata of the i-th audio signal. It should be noted that the state grading parameters can also be calculated by other methods, which are not specifically limited in this application.
- the ranking parameter describes the priority of the i-th audio signal in the current frame, and can be obtained through the priority-related metadata of the i-th audio signal. It should be noted that the sorting and grading parameters can also be calculated by other methods, which are not specifically limited in this application.
- the signal grading parameter describes the energy of the first audio signal in the encoding process of the current frame, and can be obtained from the original energy of the i-th audio signal, or from the signal energy of the i-th audio signal after preprocessing. It should be noted that the signal classification parameters can also be calculated by other methods, which are not specifically limited in this application.
- the sound field grading parameter sceneRatio i i-th audio signal may be Is a function about the one or more parameters, which can be expressed as:
- sceneRatio i f(speedRatio i ,loudRatio i , whil)
- the function can be linear or non-linear, which is not specifically limited in this application.
- one or more of the aforementioned parameters of the i-th audio signal for example, motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, sort classification parameters, and For a plurality of signal classification parameters, a weighted average is performed to obtain the sound field classification parameter of the i-th audio signal.
- sceneRatio i f(speedRatio i ,loudRatio i , whil)
- ⁇ 1- ⁇ 4 are the weighting factors of the corresponding parameters, and the value of the weighting factor can be any value from 0-1, and the sum is 1.
- the larger the value of the weighting factor the higher the importance and specific gravity of the corresponding parameter in the calculation of the sound field grading parameters. If it is 0, the corresponding parameter does not participate in the calculation of the sound field grading parameter, that is, the parameter corresponds to The characteristics of the audio signal are not considered to calculate the sound field classification parameters; if it is 1, it means that only the corresponding parameters are considered to participate in the calculation of the sound field classification parameters, that is, the characteristics of the audio signal corresponding to this parameter are the only way to calculate the sound field classification parameters in accordance with.
- the value of the weighting factor may be obtained through preset settings, or may be obtained through adaptive calculation during the execution of the method of this application, which is not specifically limited in this application.
- this parameter is used as the sound field classification parameter of the i-th audio signal.
- one or more of the aforementioned parameters of the i-th audio signal for example, motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, sort classification parameters, and Multiple of the signal classification parameters are averaged to obtain the sound field classification parameter of the i-th audio signal.
- the present application may adopt the following method to obtain the priority of the i-th audio signal.
- the spatial sound field takes the rendering center as the center of the sphere, and the distance from the center The higher the priority of the near audio signal, the lower the priority of the audio signal that is farther from the center of the sphere.
- the priority corresponding to the sound field classification parameter of the i-th audio signal may be determined as the priority of the first audio signal according to the set first corresponding relationship, and the first corresponding relationship includes multiple Correspondence between the sound field grading parameters and multiple priorities, where one or more sound field grading parameters correspond to one priority.
- the priority level of the audio signal and the corresponding relationship between the sound field grading parameters and each priority level can be preset.
- Table 2 shows an exemplary first correspondence between the sound field classification parameters and the priority.
- Table 2 when the sound field classification parameter of the i-th audio signal is 0.4, the corresponding priority is 6, then the priority of the i-th audio signal is 6. When the sound field classification parameter of the i-th audio signal is 0.1, the corresponding priority is 9, then the priority of the i-th audio signal is 9 at this time. It should be noted that Table 2 is an example of the corresponding relationship between the sound field grading parameters and the priority, which does not constitute a limitation on the corresponding relationship involved in this application.
- the sound field classification parameter of the i-th audio signal may be used as the priority of the i-th audio signal.
- the priority may not be classified, and the sound field classification parameter of the i-th audio signal may be directly regarded as the priority.
- the range of the sound field classification parameter of the i-th audio signal may be determined according to the set range threshold, and the priority corresponding to the range of the sound field classification parameter of the i-th audio signal may be determined as The priority of the i-th audio signal.
- the priority level of the audio signal and the corresponding relationship between the interval of the sound field grading parameter and each priority level can be preset.
- Table 3 shows another exemplary first correspondence between the sound field classification parameters and the priority.
- Table 3 when the sound field classification parameter of the i-th audio signal is 0.6, the interval to which it belongs is [0.6, 0.7), and the corresponding priority is 4, then the priority of the i-th audio signal is 4 at this time.
- the sound field classification parameter of the i-th audio signal is 0.15, the interval to which it belongs is [0.1, 0.2), and the corresponding priority is 9, then the priority of the i-th audio signal is 9 at this time.
- Table 3 is an example of the corresponding relationship between the sound field grading parameters and the priority, which does not constitute a limitation on the corresponding relationship involved in this application.
- Step 404 Perform bit allocation on the M audio signals according to the priority of the M audio signals.
- This application can perform bit allocation according to the number of currently available bits and the priority of M audio signals. The higher the priority, the more the number of bits allocated for the audio signal.
- the current number of available bits refers to the total number of bits that the codec in the current frame can use for bit allocation to M audio signals in the first audio signal set before bit allocation.
- the proportion of the number of bits of the first audio signal can be determined according to the priority of the first audio signal.
- the first audio signal is any one of the M audio signals.
- the number of bits of an audio signal is calculated and multiplied to obtain the number of bits of the first audio signal.
- One priority can correspond to one proportion of the number of bits, or multiple priorities can correspond to one proportion of the number of bits. Based on the proportion of the number of bits and the number of bits currently available, the number of bits that can be allocated for the corresponding audio signal can be calculated.
- the priority of the first audio signal is 1, the priority of the second audio signal is 2, and the priority of the third audio signal is 3, assuming that the proportion corresponding to priority 1 is 50%, priority 2 corresponds to 30%, priority 3 corresponds to 20%, and the current number of available bits is 100, then the number of bits allocated for the first audio signal is 50, and the second audio signal The number of bits allocated is 30, and the number of bits allocated for the third audio signal is 20. It should be noted that in different audio frames, the number of bits corresponding to the priority can be adjusted adaptively, which is not specifically limited.
- the number of bits corresponding to the priority of the first audio signal may be determined as the number of bits of the first audio signal according to the set second correspondence, and the second correspondence includes multiple priorities. Correspondence between multiple numbers of bits, where one or more priority levels correspond to one number of bits. There is a pre-established correspondence between the priority of the audio signal and the number of bits. One priority can correspond to one bit, or multiple priorities can correspond to one bit. Based on this correspondence, as long as the priority of the audio signal is acquired, the number of bits corresponding to it can be acquired. For example, if M is 3, the priority of the first audio signal is 1, the priority of the second audio signal is 2, and the priority of the third audio signal is 3. Assuming that the number of bits corresponding to priority 1 is set 50, the number of bits corresponding to priority 2 is 30, and the number of bits corresponding to priority 3 is 20.
- the bit allocation between the audio signals can be Determined according to the absolute energy ratio between the audio signals in the encoding and decoding process; when the sound field classification parameters of the audio signal do not contain the signal classification parameters, and when the sound field classification parameters of the audio signal are large, the sound field classification difference between the audio signals is considered to be large, At this time, the bit allocation between the audio signals can be determined according to the sound field grading parameters of the audio signal; in other cases, the bit allocation of the audio signal can be determined according to the bit allocation factor of the audio signal.
- sceneRatio i represents the sound field classification parameter of the i-th audio signal
- bits_available represents the number of currently available bits
- bits_object i represents the number of bits allocated for the i-th audio signal.
- bits_object i nrgRatio i ⁇ bits_available, where ⁇ represents the upper limit of the sound field classification parameter, and nrgRatio i represents the absolute energy ratio between the i-th audio signal and other audio signals.
- bits_object i sceneRatio i ⁇ bits_available
- ⁇ represents the lower limit of the sound field classification parameter.
- bits_object i objRatio i ⁇ bits_available, where objRatio i represents the bit allocation factor of the i-th audio signal.
- the present application determines the priority of the multiple audio signals according to the characteristics of the multiple audio signals included in the current frame and the related information of the audio signals in the metadata, and determines the number of bits to be allocated to each audio signal according to the priority, It can not only adapt to the characteristics of audio signals, but also match different number of coding bits for different audio signals, which improves the coding and decoding efficiency of audio signals.
- This application determines in step 402 from the T audio signals of the current frame that M audio signals are added to the first audio signal set, and the methods of steps 403 and 404 are used for the M audio signals, and the audio signals of each audio signal are determined first. Priority, and then determine the number of bits allocated to each audio signal according to the priority of the audio signal.
- N audio signals a simpler method can be used to determine the number of bits allocated. For example, the total number of bits available for the second audio signal set is averaged over N to obtain the number of bits for each audio signal. The total number of bits available in the signal set is equally distributed to the N audio signals in the set.
- the second audio signal set may also adopt other methods to obtain the number of bits of each audio signal in the set, which is not specifically limited in this application.
- this application also provides a priority fusion method based on multiple priority determination methods, that is, for the same audio signal, multiple methods can be used to obtain Its priority, then how to determine the final priority of the audio signal.
- a priority fusion method based on multiple priority determination methods, that is, for the same audio signal, multiple methods can be used to obtain Its priority, then how to determine the final priority of the audio signal.
- the following description takes the first audio signal as an example, and the first audio signal is any one of the foregoing M audio signals.
- the first parameter set and the second parameter set of the first audio signal are acquired according to the first audio signal and/or metadata corresponding to the first audio signal, and the first parameter set includes the first audio signal.
- the first parameter set includes the first audio signal.
- the second parameter set also includes the first audio Among the above-mentioned related parameters of the signal, one or more of motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, ranking classification parameters, and signal classification parameters.
- the first parameter set and the second parameter set may include the same parameter or may include different parameters.
- the method of determining the sound field classification parameters of the M audio signals in the first audio signal set in step 403 may be referred to, and other methods may also be used.
- the method used here is different from the method of calculating the first sound field grading parameters.
- the sound field grading parameters obtained by two methods for the same audio signal in this application can be determined by the weighted average method, the direct average method, and the maximum value or minimum value method to determine the audio signal.
- the final sound field classification parameters are not specifically limited. In this way, the diversified acquisition of the sound field grading parameters of the audio signal can be realized, and the calculation schemes under various strategies can be compatible.
- the first priority of the first audio signal may be obtained according to the first sound field classification parameter.
- the priority can be obtained by using the method of step 403 above, or by other methods.
- the method used here is different from the method of calculating the first priority.
- the priority of the first audio signal is acquired according to the first priority and the second priority.
- the two methods for calculating the priority of the same audio signal can be obtained by using a weighted average method, an average method, and a maximum value or a minimum value method to determine the final audio signal.
- Priority which is not specifically limited. In this way, the diversified acquisition of the priority of the audio signal can be realized, and the calculation scheme under various strategies can be compatible.
- the present application can generate a code stream according to the number of bits of the T audio signal.
- the code stream includes T first identifiers and T first identifiers. Two identifiers and T third identifiers.
- T audio signals correspond to T first identifiers, T second identifiers, and T third identifiers respectively.
- the first identifier is used to indicate the audio signal set to which the corresponding audio signal belongs.
- the second identifier is used to indicate the priority of the corresponding audio signal
- the third identifier is used to indicate the number of bits of the corresponding audio signal; the code stream is sent to the decoding device.
- the decoding device After receiving the code stream, the decoding device executes the above-mentioned audio signal bit allocation method according to the T first identifiers, T second identifiers, and T third identifiers carried in the code stream, and determines the number of bits of the T audio signals.
- the decoding device can also directly determine the audio signal set, priority, and allocated number of bits to which the T audio signals belong based on the T first identifiers, T second identifiers, and T third identifiers carried in the code stream, and then code The stream is decoded to obtain T audio signals.
- the above-mentioned first identification, second identification and third identification are identification information added on the basis of the method embodiment shown in FIG. 4, so that the audio signal encoding and decoding end can encode or decode the audio signal based on the same method. .
- FIG. 7 is a schematic structural diagram of an embodiment of an apparatus of this application. As shown in FIG. 7, the apparatus can be applied to the encoding device or the decoding device in the foregoing embodiment.
- the apparatus of this embodiment may include: a processing module 701 and a transceiver module 702.
- the processing module 701 is configured to obtain T audio signals in the current frame, where T is a positive integer; determine a first audio signal set according to the T audio signals, and the first audio signal set includes M audio signals, M is a positive integer, the T audio signals include the M audio signals, T ⁇ M; determine the priority of the M audio signals in the first audio signal set; according to the M audio signals The priority of the M audio signals is allocated to bits.
- the processing module 701 is specifically configured to obtain the sound field classification parameters of each audio signal in the M audio signals; according to the sound field classification of each audio signal in the M audio signals The parameter determines the priority of the M audio signals.
- the processing module 701 is specifically configured to obtain the motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, ranking classification parameters, and signal classification of the first audio signal.
- One or more of the parameters, the first audio signal is any one of the M audio signals; according to the acquired motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, and status classification parameters
- One or more of the sorting grading parameter and the signal grading parameter obtains the sound field grading parameter of the first audio signal; wherein the motion grading parameter is used to describe the movement of the first audio signal in the spatial sound field per unit time Fast or slow, the volume grading parameter is used to describe the volume of the first audio signal in the spatial sound field, the propagation grading parameter is used to describe the size of the propagation range of the first audio signal in the spatial sound field, and the diffusion The classification parameter is used to describe the size of the diffusion range of the first audio signal in the spatial sound field, the state classification parameter is used to describe the size of the sound source segmentation
- the processing module 701 is specifically configured to obtain S groups of metadata in the current frame, where S is a positive integer, T ⁇ S, the S groups of metadata and the T The audio signal corresponds, and the metadata is used to describe the state of the corresponding audio signal in the spatial sound field.
- the processing module 701 is specifically configured to obtain the metadata according to the metadata corresponding to the first audio signal, or according to the first audio signal and the metadata corresponding to the first audio signal.
- One or more of motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, ranking classification parameters, and signal classification parameters of the first audio signal, and the first audio signal is the Any one of M audio signals; acquiring the said motion classification parameter, volume classification parameter, propagation classification parameter, diffusion classification parameter, state classification parameter, ranking classification parameter, and signal classification parameter according to one or more of the acquired The sound field classification parameter of the first audio signal; wherein the motion classification parameter is used to describe how fast the first audio signal moves per unit time in the spatial sound field, and the volume classification parameter is used to describe how the first audio signal is The size of the volume in the spatial sound field, the propagation grading parameter is used to describe the size of the propagation range of the first audio signal in the spatial sound field, and the diffusion grading parameter is used to describe the propagation range of the first audio signal
- the processing module 701 is specifically configured to analyze the acquired motion grading parameters, volume grading parameters, propagation grading parameters, diffusion grading parameters, state grading parameters, ranking grading parameters, and signal grading parameters.
- the sound field classification parameters are obtained by a weighted average in multiple weighted averages; or, for the acquired motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, ranking classification parameters, and signal classification parameters.
- the sound field classification parameters are obtained by averaging; or, one of the acquired motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, ranking classification parameters, and signal classification parameters is used as the Sound field classification parameters.
- the processing module 701 is specifically configured to determine the priority corresponding to the sound field classification parameter of the first audio signal as the first audio signal according to the set first correspondence relationship
- the first corresponding relationship includes the corresponding relationship between multiple sound field classification parameters and multiple priority levels, wherein one or more of the sound field classification parameters corresponds to one priority level, and the first audio
- the signal is any one of the M audio signals; or, the sound field classification parameter of the first audio signal is used as the priority of the first audio signal; or, the first audio signal is determined according to a set range threshold.
- the range of the sound field classification parameter of the audio signal is determined, and the priority corresponding to the range of the sound field classification parameter of the first audio signal is determined as the priority of the first audio signal.
- the processing module 701 is specifically configured to perform bit allocation according to the number of currently available bits and the priority of the M audio signals. The higher the priority, the more the number of bits allocated for the audio signal. .
- the processing module 701 is specifically configured to determine the proportion of the number of bits of the first audio signal according to the priority of the first audio signal, and the first audio signal is the M Any one of the audio signals; obtaining the number of bits of the first audio signal according to the product of the number of currently available bits and the proportion of the number of bits of the first audio signal.
- the processing module 701 is specifically configured to determine the number of bits of the first audio signal from a set second correspondence relationship according to the priority of the first audio signal, and the first audio signal
- the second correspondence includes the correspondence between multiple priorities and multiple bit numbers, wherein one or more of the priorities corresponds to one bit number, and the first audio signal is one of the M audio signals Any one of.
- the processing module 701 is specifically configured to add pre-designated audio signals among the T audio signals to the first audio signal set.
- the processing module 701 is specifically configured to add the audio signals corresponding to the S groups of metadata in the T audio signals to the first audio signal set; or, it will be greater than
- the audio signal corresponding to the importance parameter equal to or equal to the set participation threshold is added to the first audio signal set, the metadata includes the importance parameter, and the T audio signals include the audio corresponding to the importance parameter Signal.
- the processing module 701 is specifically configured to obtain one or more of the motion classification parameter, the volume classification parameter, the propagation classification parameter, and the diffusion classification parameter of the first audio signal, and the first audio signal
- the audio signal is any one of the M audio signals; the first audio signal of the first audio signal is obtained according to one or more of the acquired motion classification parameters, volume classification parameters, propagation classification parameters, and diffusion classification parameters.
- Sound field grading parameters acquiring one or more of the state grading parameters, sorting grading parameters, and signal grading parameters of the first audio signal; according to one of the acquired state grading parameters, sorting grading parameters, and signal grading parameters Or multiple acquiring second sound field classification parameters of the first audio signal; acquiring the sound field classification parameters of the first audio signal according to the first sound field classification parameters and the second sound field classification parameters; wherein, the The motion classification parameter is used to describe how fast the first audio signal moves per unit time in the spatial sound field, the volume classification parameter is used to describe the volume of the first audio signal when it is played back in the spatial sound field, and the propagation classification The parameter is used to describe the size of the propagation range of the first audio signal during playback in the spatial sound field, the diffusion classification parameter is used to describe the size of the diffusion range of the first audio signal in the spatial sound field, and the state classification parameter It is used to describe the size of the sound source division of the first audio signal in the spatial sound field, the ranking parameter is used to describe the size
- the processing module 701 is specifically configured to obtain the metadata according to the metadata corresponding to the first audio signal, or according to the first audio signal and the metadata corresponding to the first audio signal.
- One or more of motion classification parameters, volume classification parameters, propagation classification parameters, and diffusion classification parameters of the first audio signal, and the first audio signal is any one of the M audio signals;
- One or more of the motion classification parameter, the volume classification parameter, the propagation classification parameter, and the diffusion classification parameter obtain the first sound field classification parameter of the first audio signal; according to the metadata corresponding to the first audio signal , Or obtain one or more of the state classification parameter, the sort classification parameter, and the signal classification parameter of the first audio signal according to the first audio signal and the metadata corresponding to the first audio signal;
- One or more of the state classification parameter, the sort classification parameter, and the signal classification parameter obtains the second sound field classification parameter of the first audio signal; according to the first sound field classification parameter and the second sound field classification parameter Acquire the sound field grading parameter of the first audio signal; wherein the motion grading
- the processing module 701 is specifically configured to obtain the first priority of the first audio signal according to the first sound field grading parameter; obtain the first priority of the first audio signal according to the second sound field grading parameter; The second priority of the first audio signal; the priority of the first audio signal is obtained according to the first priority and the second priority.
- the processing module 701 is further configured to encode the M audio signals according to the number of bits allocated by the M audio signals to obtain an encoded bitstream.
- the encoded bitstream includes the number of bits of the M audio signals.
- a transceiving module 702 configured to receive a coded code stream; the processing module 701, further configured to obtain the number of bits of each of the M audio signals; The number of bits of each signal and the coded stream are used to reconstruct the M audio signals.
- the device in this embodiment can be used to implement the technical solution of the method embodiment shown in FIG. 4, and its implementation principles and technical effects are similar, and will not be repeated here.
- Fig. 8 is a schematic structural diagram of an embodiment of a device of this application.
- the device may be an encoding device or a decoding device in the foregoing embodiment.
- the device of this embodiment may include: a processor 801 and a memory 802, where the memory 802 is used to store one or more programs; when the one or more programs are executed by the processor 801, the processor 801 can implement The technical solution of the method embodiment is shown in FIG. 4.
- the steps of the foregoing method embodiments may be completed by hardware integrated logic circuits in the processor or instructions in the form of software.
- the processor can be a general-purpose processor, digital signal processor (digital signal processor, DSP), application-specific integrated circuit (ASIC), field programmable gate array (field programmable gate array, FPGA) or other Programming logic devices, discrete gates or transistor logic devices, discrete hardware components.
- the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
- the steps of the method disclosed in the present application can be directly embodied as being executed and completed by a hardware encoding processor, or executed and completed by a combination of hardware and software modules in the encoding processor.
- the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
- the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware.
- the memory mentioned in the above embodiments may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
- the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically available Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
- the volatile memory may be random access memory (RAM), which is used as an external cache.
- RAM random access memory
- static random access memory static random access memory
- dynamic RAM dynamic RAM
- DRAM dynamic random access memory
- synchronous dynamic random access memory synchronous DRAM, SDRAM
- double data rate synchronous dynamic random access memory double data rate SDRAM, DDR SDRAM
- enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
- synchronous connection dynamic random access memory serial DRAM, SLDRAM
- direct rambus RAM direct rambus RAM
- the disclosed system, device, and method can be implemented in other ways.
- the device embodiments described above are merely illustrative.
- the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
- the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (personal computer, server, or network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disks or optical disks and other media that can store program codes. .
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Mathematical Physics (AREA)
- Stereophonic System (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Stereo-Broadcasting Methods (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
Abstract
Description
| 元数据 | 取值范围(精度) | 比特数 |
| object_index | 1;128(1) | 7 |
| position_azimuth | -180;180(2) | 8 |
| position_elevation | -90;90(5) | 6 |
| position_radius | 0.5;16(non-linear) | 4 |
| gain_factor | 0.004;5.957(non-linear) | 7 |
| spread_uniform | 0;180 | 7 |
| spread_width | 0;180 | 7 |
| spread_height | 0;90 | 5 |
| spread_depth | 0;15.5 | 4 |
| diffuseness | 0;1 | 7 |
| priority | 0;7 | 3 |
| divergence | 0;1 | 8 |
| speed | 0,1 | 4 |
| 声场分级参数 | 优先级 |
| 0.9 | 1 |
| 0.8 | 2 |
| 0.7 | 3 |
| 0.6 | 4 |
| 0.5 | 5 |
| 0.4 | 6 |
| 0.3 | 7 |
| 0.2 | 8 |
| 0.1 | 9 |
| 0 | 10 |
| 声场分级参数区间 | 优先级 |
| [0.9,1) | 1 |
| [0.8,0.9) | 2 |
| [0.7,0.8) | 3 |
| [0.6,0.7) | 4 |
| [0.5,0.6) | 5 |
| [0.4,0.5) | 6 |
| [0.3,0.4) | 7 |
| [0.2,0.3) | 8 |
| [0.1,0.2) | 9 |
| [0,0.1) | 10 |
Claims (41)
- 一种音频信号的比特分配方法,其特征在于,包括:获取当前帧中的T个音频信号,T为正整数;根据所述T个音频信号确定第一音频信号集合,所述第一音频信号集合包括M个音频信号,M为正整数,所述T个音频信号包括所述M个音频信号,T≥M;确定所述第一音频信号集合中的所述M个音频信号的优先级;根据所述M个音频信号的优先级对所述M个音频信号进行比特分配。
- 根据权利要求1所述的方法,其特征在于,所述确定所述第一音频信号集合中的所述M个音频信号的优先级,包括:获取所述M个音频信号中每个音频信号的声场分级参数;根据所述M个音频信号中每个音频信号的声场分级参数确定所述M个音频信号的优先级。
- 根据权利要求2所述的方法,其特征在于,所述获取所述M个音频信号中每个音频信号的声场分级参数,包括:获取第一音频信号的运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;根据获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的声场分级参数;其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中音量的大小,所述传播分级参数用于描述所述第一音频信号在空间声场中传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过程中能量的大小。
- 根据权利要求2所述的方法,其特征在于,所述方法还包括:获取所述当前帧中的S组元数据,S为正整数,T≥S,所述S组元数据和所述T个音频信号对应,所述元数据用于描述对应的音频信号在空间声场中的状态。
- 根据权利要求4所述的方法,其特征在于,所述获取所述M个音频信号中每个音频信号的声场分级参数,包括:根据与第一音频信号对应的元数据,或者根据所述第一音频信号以及与所述第一音频信号对应的元数据获取所述第一音频信号的运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;根据获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的声场分 级参数;其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中音量的大小,所述传播分级参数用于描述所述第一音频信号在空间声场中传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过程中能量的大小。
- 根据权利要求3或5所述的方法,其特征在于,所述根据获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的声场分级参数,包括:对获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的多个加权平均获取所述声场分级参数;或者,对获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的多个求平均获取所述声场分级参数;或者,将获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个作为所述声场分级参数。
- 根据权利要求2-6中任一项所述的方法,其特征在于,所述根据所述M个音频信号中每个音频信号的声场分级参数确定所述M个音频信号的优先级,包括:根据设定的第一对应关系将与第一音频信号的声场分级参数对应的优先级确定为所述第一音频信号的优先级,所述第一对应关系包括多个声场分级参数和多个优先级之间的对应关系,其中,一个或多个所述声场分级参数对应一个所述优先级,所述第一音频信号为所述M个音频信号中的任意一个;或者,将所述第一音频信号的声场分级参数作为所述第一音频信号的优先级;或者,根据设定的多个范围阈值确定所述第一音频信号的声场分级参数的所属范围,将与所述第一音频信号的声场分级参数的所属范围对应的优先级确定为所述第一音频信号的优先级。
- 根据权利要求1-7中任一项所述的方法,其特征在于,所述根据所述M个音频信号的优先级对所述M个音频信号进行比特分配,包括:根据当前可用比特数和所述M个音频信号的优先级进行比特分配,优先级越高的音频信号分配的比特数越多。
- 根据权利要求8所述的方法,其特征在于,所述根据当前可用比特数和所述M个音频信号的优先级进行比特分配,包括:根据第一音频信号的优先级确定所述第一音频信号的比特数占比,所述第一音频信号为所述M个音频信号中的任意一个;根据所述当前可用比特数和所述第一音频信号的比特数占比的乘积获取所述第一音频信号的比特数。
- 根据权利要求8所述的方法,其特征在于,所述根据当前可用比特数和所述M个音频信号的优先级进行比特分配,包括:根据第一音频信号的优先级从设定的第二对应关系中确定所述第一音频信号的比特数,所述第二对应关系包括多个优先级和多个比特数之间的对应关系,其中,一个或多个所述优先级对应一个所述比特数,所述第一音频信号为所述M个音频信号中的任意一个。
- 根据权利要求1-10中任一项所述的方法,其特征在于,所述根据所述T个音频信号确定第一音频信号集合,包括:将所述T个音频信号中预先指定的音频信号加入所述第一音频信号集合。
- 根据权利要求4所述的方法,其特征在于,所述根据所述T个音频信号确定第一音频信号集合,包括:将所述S组元数据在所述T个音频信号中对应的音频信号加入所述第一音频信号集合;或者,将大于或等于设定的参与阈值的重要度参数对应的音频信号加入所述第一音频信号集合,所述元数据包括所述重要度参数,所述T个音频信号包括所述重要度参数对应的音频信号。
- 根据权利要求2所述的方法,其特征在于,所述获取所述M个音频信号中每个音频信号的声场分级参数,包括:获取第一音频信号的运动分级参数、音量分级参数、传播分级参数和扩散分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;根据获取的所述运动分级参数、音量分级参数、传播分级参数和扩散分级参数中的一个或多个获取所述第一音频信号的第一声场分级参数;获取所述第一音频信号的状态分级参数、排序分级参数和信号分级参数中的一个或多个;根据获取的所述状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的第二声场分级参数;根据所述第一声场分级参数和所述第二声场分级参数获取所述第一音频信号的声场分级参数;其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中回放时的音量大小,所述传播分级参数用于描述所述第一音频信号在空间声场中回放时的传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过程中能量的大小。
- 根据权利要求4所述的方法,其特征在于,所述获取所述M个音频信号中每个音频信号的声场分级参数,包括:根据与第一音频信号对应的元数据,或者根据所述第一音频信号以及与所述第一音频信号对应的元数据获取所述第一音频信号的运动分级参数、音量分级参数、传播分级参数和扩散分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;根据获取的所述运动分级参数、音量分级参数、传播分级参数和扩散分级参数中的一个或多个获取所述第一音频信号的第一声场分级参数;根据与所述第一音频信号对应的元数据,或者根据所述第一音频信号以及与所述第一音频信号对应的元数据获取所述第一音频信号的状态分级参数、排序分级参数和信号分级参数中的一个或多个;根据获取的所述状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的第二声场分级参数;根据所述第一声场分级参数和所述第二声场分级参数获取所述第一音频信号的声场分级参数;其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中回放时的音量大小,所述传播分级参数用于描述所述第一音频信号在空间声场中回放时的传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过程中能量的大小。
- 根据权利要求13或14所述的方法,其特征在于,所述根据所述M个音频信号中每个音频信号的声场分级参数确定所述M个音频信号的优先级,包括:根据所述第一声场分级参数获取所述第一音频信号的第一优先级;根据所述第二声场分级参数获取所述第一音频信号的第二优先级;根据所述第一优先级和所述第二优先级获取所述第一音频信号的优先级。
- 一种音频信号的编码方法,其特征在于,执行完权利要求1-15中任一项所述的音频信号的比特分配方法之后,还包括:根据所述M个音频信号所分配的比特数对所述M个音频信号进行编码以获取编码码流。
- 根据权利要求16所述的音频信号的编码方法,其特征在于,所述编码码流包括所述M个音频信号的比特数。
- 一种音频信号的解码方法,其特征在于,执行完权利要求1-15中任一项所述的音频信号的比特分配方法之后,还包括:接收编码码流;执行如权利要求1-15中任一项所述的音频信号的比特分配方法获取所述M个音频信号各自的比特数;根据所述M个音频信号各自的比特数以及所述编码码流重建所述M个音频信号。
- 一种音频信号的比特分配装置,其特征在于,包括:处理模块,用于获取当前帧中的T个音频信号,T为正整数;根据所述T个音频信号确定第一音频信号集合,所述第一音频信号集合包括M个音频信号,M为正整数,所述T个音频信号包括所述M个音频信号,T≥M;确定所述第一音频信号集合中的所述M个音频信号的优先级;根据所述M个音频信号的优先级对所述M个音频信号进行比特分配。
- 根据权利要求19所述的装置,其特征在于,所述处理模块,具体用于获取所述M个音频信号中每个音频信号的声场分级参数;根据所述M个音频信号中每个音频信号的声场分级参数确定所述M个音频信号的优先级。
- 根据权利要求20所述的装置,其特征在于,所述处理模块,具体用于获取第一音频信号的运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;根据获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的声场分级参数;其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中音量的大小,所述传播分级参数用于描述所述第一音频信号在空间声场中传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过程中能量的大小。
- 根据权利要求20所述的装置,其特征在于,所述处理模块,具体用于获取所述当前帧中的S组元数据,S为正整数,T≥S,所述S组元数据和所述T个音频信号对应,所述元数据用于描述对应的音频信号在空间声场中的状态。
- 根据权利要求22所述的装置,其特征在于,所述处理模块,具体用于根据与第一音频信号对应的元数据,或者根据所述第一音频信号以及与所述第一音频信号对应的元数据获取所述第一音频信号的运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;根据获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的声场分级参数;其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中音量的大小,所述传播分级参数用于描述所述第一音频信号在空间声场中传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过程中能量的大小。
- 根据权利要求21或23所述的装置,其特征在于,所述处理模块,具体用于对获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的多个加权平均获取所述声场分级参数;或者,对获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的多个求平均获取所述声场分级参数;或者,将获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个作为所述声场分级参数。
- 根据权利要求20-24中任一项所述的装置,其特征在于,所述处理模块,具体用于根据设定的第一对应关系将与第一音频信号的声场分级参数对应的优先级确定为所述第一音频信号的优先级,所述第一对应关系包括多个声场分级参数和多个优先级之间的对应关系,其中,一个或多个所述声场分级参数对应一个所述优先级,所述第一音频信号为 所述M个音频信号中的任意一个;或者,将所述第一音频信号的声场分级参数作为所述第一音频信号的优先级;或者,根据设定的多个范围阈值确定所述第一音频信号的声场分级参数的所属范围,将与所述第一音频信号的声场分级参数的所属范围对应的优先级确定为所述第一音频信号的优先级。
- 根据权利要求19-25中任一项所述的装置,其特征在于,所述处理模块,具体用于根据当前可用比特数和所述M个音频信号的优先级进行比特分配,优先级越高的音频信号分配的比特数越多。
- 根据权利要求26所述的装置,其特征在于,所述处理模块,具体用于根据第一音频信号的优先级确定所述第一音频信号的比特数占比,所述第一音频信号为所述M个音频信号中的任意一个;根据所述当前可用比特数和所述第一音频信号的比特数占比的乘积获取所述第一音频信号的比特数。
- 根据权利要求26所述的装置,其特征在于,所述处理模块,具体用于根据第一音频信号的优先级从设定的第二对应关系中确定所述第一音频信号的比特数,所述第二对应关系包括多个优先级和多个比特数之间的对应关系,其中,一个或多个所述优先级对应一个所述比特数,所述第一音频信号为所述M个音频信号中的任意一个。
- 根据权利要求19-28中任一项所述的装置,其特征在于,所述处理模块,具体用于将所述T个音频信号中预先指定的音频信号加入所述第一音频信号集合。
- 根据权利要求22所述的装置,其特征在于,所述处理模块,具体用于将所述S组元数据在所述T个音频信号中对应的音频信号加入所述第一音频信号集合;或者,将大于或等于设定的参与阈值的重要度参数对应的音频信号加入所述第一音频信号集合,所述元数据包括所述重要度参数,所述T个音频信号包括所述重要度参数对应的音频信号。
- 根据权利要求20所述的装置,其特征在于,所述处理模块,具体用于获取第一音频信的运动分级参数、音量分级参数、传播分级参数和扩散分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;根据获取的所述运动分级参数、音量分级参数、传播分级参数和扩散分级参数中的一个或多个获取所述第一音频信号的第一声场分级参数;获取所述第一音频信号的状态分级参数、排序分级参数和信号分级参数中的一个或多个;根据获取的所述状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的第二声场分级参数;根据所述第一声场分级参数和所述第二声场分级参数获取所述第一音频信号的声场分级参数;其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中回放时的音量大小,所述传播分级参数用于描述所述第一音频信号在空间声场中回放时的传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过程中能量的大小。
- 根据权利要求22所述的装置,其特征在于,所述处理模块,具体用于根据与第一音频信号对应的元数据,或者根据所述第一音频信号以及与所述第一音频信号对应的元数据获取所述第一音频信号的运动分级参数、音量分级参数、传播分级参数和扩散分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;根据与所述 第一音频信号对应的元数据,或者根据所述第一音频信号以及与所述第一音频信号对应的元数据获取所述第一音频信号的状态分级参数、排序分级参数和信号分级参数中的一个或多个;根据获取的所述运动分级参数、音量分级参数、传播分级参数和扩散分级参数中的一个或多个获取所述第一音频信号的第一声场分级参数;根据获取的所述状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的第二声场分级参数;根据所述第一声场分级参数和所述第二声场分级参数获取所述第一音频信号的声场分级参数;其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中回放时的音量大小,所述传播分级参数用于描述所述第一音频信号在空间声场中回放时的传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过程中能量的大小。
- 根据权利要求31或32所述的装置,其特征在于,所述处理模块,具体用于根据所述第一声场分级参数获取所述第一音频信号的第一优先级;根据所述第二声场分级参数获取所述第一音频信号的第二优先级;根据所述第一优先级和所述第二优先级获取所述第一音频信号的优先级。
- 根据权利要求19-33中任一项所述的装置,其特征在于,所述处理模块,还用于根据所述M个音频信号所分配的比特数对所述M个音频信号进行编码以获取编码码流。
- 根据权利要求34所述的装置,其特征在于,所述编码码流包括所述M个音频信号的比特数。
- 根据权利要求34或35所述的装置,其特征在于,还包括:收发模块,用于接收编码码流;所述处理模块,还用于获取所述M个音频信号各自的比特数;根据所述M个音频信号各自的比特数以及所述编码码流重建所述M个音频信号。
- 一种设备,其特征在于,包括:一个或多个处理器;存储器,用于存储一个或多个程序;当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-18中任一项所述的方法。
- 一种计算机可读存储介质,其特征在于,包括计算机程序,所述计算机程序在计算机上被执行时,使得所述计算机执行权利要求1-18中任一项所述的方法。
- 一种计算机可读存储介质,其特征在于,包括根据如权利要求16所述的方法获取的编码码流。
- 一种编码装置,其特征在于,包括处理器和通信接口,所述处理器通过所述通信接口读取存储计算机程序,所述计算机程序包括程序指令,所述处理器用于调用所述程序指令,执行如权利要求1至18中任一项所述的方法。
- 一种编码装置,其特征在于,包括处理器和存储器,所述处理器用于执行权利要求16所述的方法,所述存储器用于存放所述编码码流。
Priority Applications (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| BR112022021882A BR112022021882A2 (pt) | 2020-04-30 | 2021-03-31 | Método e aparelho de alocação de bits para sinal de áudio, dispositivo, meio de armazenamento legível por computador, aparelho de codificação e aparelho de decodificação |
| KR1020227040823A KR102868387B1 (ko) | 2020-04-30 | 2021-03-31 | 오디오 신호에 대한 비트 할당 방법 및 장치 |
| JP2022565956A JP7550881B2 (ja) | 2020-04-30 | 2021-03-31 | 音声信号に対するビット割り当て方法及び装置 |
| EP21797604.2A EP4131259B1 (en) | 2020-04-30 | 2021-03-31 | Bit allocation method and apparatus for audio signal |
| US17/976,474 US11900950B2 (en) | 2020-04-30 | 2022-10-28 | Bit allocation method and apparatus for audio signal |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010368424.9A CN113593585A (zh) | 2020-04-30 | 2020-04-30 | 音频信号的比特分配方法和装置 |
| CN202010368424.9 | 2020-04-30 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/976,474 Continuation US11900950B2 (en) | 2020-04-30 | 2022-10-28 | Bit allocation method and apparatus for audio signal |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021218558A1 true WO2021218558A1 (zh) | 2021-11-04 |
Family
ID=78237842
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2021/084578 Ceased WO2021218558A1 (zh) | 2020-04-30 | 2021-03-31 | 音频信号的比特分配方法和装置 |
Country Status (8)
| Country | Link |
|---|---|
| US (1) | US11900950B2 (zh) |
| EP (1) | EP4131259B1 (zh) |
| JP (1) | JP7550881B2 (zh) |
| KR (1) | KR102868387B1 (zh) |
| CN (1) | CN113593585A (zh) |
| BR (1) | BR112022021882A2 (zh) |
| TW (1) | TWI773286B (zh) |
| WO (1) | WO2021218558A1 (zh) |
Families Citing this family (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112767953B (zh) * | 2020-06-24 | 2024-01-23 | 腾讯科技(深圳)有限公司 | 语音编码方法、装置、计算机设备和存储介质 |
| CN115497485B (zh) * | 2021-06-18 | 2024-10-18 | 华为技术有限公司 | 三维音频信号编码方法、装置、编码器和系统 |
| CN115002613A (zh) * | 2022-04-18 | 2022-09-02 | 北京安声科技有限公司 | 耳机 |
| GB2624890A (en) * | 2022-11-29 | 2024-06-05 | Nokia Technologies Oy | Parametric spatial audio encoding |
| CN120112994B (zh) * | 2023-07-14 | 2026-03-17 | 北京小米移动软件有限公司 | 信号处理方法及其装置 |
| WO2025081393A1 (zh) * | 2023-10-18 | 2025-04-24 | 北京小米移动软件有限公司 | 音频信号的处理方法、装置、音频设备及存储介质 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101217037A (zh) * | 2007-01-05 | 2008-07-09 | 华为技术有限公司 | 对音频信号的编码速率进行源控的方法和系统 |
| CN101950562A (zh) * | 2010-11-03 | 2011-01-19 | 武汉大学 | 基于音频关注度的分级编码方法及系统 |
| US20120314875A1 (en) * | 2011-06-09 | 2012-12-13 | Samsung Electronics Co., Ltd. | Method and apparatus for encoding and decoding 3-dimensional audio signal |
| CN103928030A (zh) * | 2014-04-30 | 2014-07-16 | 武汉大学 | 基于子带空间关注测度的可分级音频编码系统及方法 |
Family Cites Families (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5632005A (en) * | 1991-01-08 | 1997-05-20 | Ray Milton Dolby | Encoder/decoder for multidimensional sound fields |
| EP0520068B1 (en) * | 1991-01-08 | 1996-05-15 | Dolby Laboratories Licensing Corporation | Encoder/decoder for multidimensional sound fields |
| WO2009039897A1 (en) * | 2007-09-26 | 2009-04-02 | Fraunhofer - Gesellschaft Zur Förderung Der Angewandten Forschung E.V. | Apparatus and method for extracting an ambient signal in an apparatus and method for obtaining weighting coefficients for extracting an ambient signal and computer program |
| EP2469741A1 (en) * | 2010-12-21 | 2012-06-27 | Thomson Licensing | Method and apparatus for encoding and decoding successive frames of an ambisonics representation of a 2- or 3-dimensional sound field |
| US9412385B2 (en) * | 2013-05-28 | 2016-08-09 | Qualcomm Incorporated | Performing spatial masking with respect to spherical harmonic coefficients |
| US9495968B2 (en) * | 2013-05-29 | 2016-11-15 | Qualcomm Incorporated | Identifying sources from which higher order ambisonic audio data is generated |
| WO2015056383A1 (ja) | 2013-10-17 | 2015-04-23 | パナソニック株式会社 | オーディオエンコード装置及びオーディオデコード装置 |
| US9564136B2 (en) * | 2014-03-06 | 2017-02-07 | Dts, Inc. | Post-encoding bitrate reduction of multiple object audio |
| US10395664B2 (en) | 2016-01-26 | 2019-08-27 | Dolby Laboratories Licensing Corporation | Adaptive Quantization |
| US20180338212A1 (en) * | 2017-05-18 | 2018-11-22 | Qualcomm Incorporated | Layered intermediate compression for higher order ambisonic audio data |
| US10854209B2 (en) * | 2017-10-03 | 2020-12-01 | Qualcomm Incorporated | Multi-stream audio coding |
| JP2019121037A (ja) | 2017-12-28 | 2019-07-22 | ソニー株式会社 | 情報処理装置、情報処理方法およびプログラム |
-
2020
- 2020-04-30 CN CN202010368424.9A patent/CN113593585A/zh active Pending
-
2021
- 2021-03-31 KR KR1020227040823A patent/KR102868387B1/ko active Active
- 2021-03-31 JP JP2022565956A patent/JP7550881B2/ja active Active
- 2021-03-31 EP EP21797604.2A patent/EP4131259B1/en active Active
- 2021-03-31 BR BR112022021882A patent/BR112022021882A2/pt unknown
- 2021-03-31 WO PCT/CN2021/084578 patent/WO2021218558A1/zh not_active Ceased
- 2021-04-29 TW TW110115467A patent/TWI773286B/zh active
-
2022
- 2022-10-28 US US17/976,474 patent/US11900950B2/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101217037A (zh) * | 2007-01-05 | 2008-07-09 | 华为技术有限公司 | 对音频信号的编码速率进行源控的方法和系统 |
| CN101950562A (zh) * | 2010-11-03 | 2011-01-19 | 武汉大学 | 基于音频关注度的分级编码方法及系统 |
| US20120314875A1 (en) * | 2011-06-09 | 2012-12-13 | Samsung Electronics Co., Ltd. | Method and apparatus for encoding and decoding 3-dimensional audio signal |
| CN103928030A (zh) * | 2014-04-30 | 2014-07-16 | 武汉大学 | 基于子带空间关注测度的可分级音频编码系统及方法 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4131259A4 |
Also Published As
| Publication number | Publication date |
|---|---|
| KR20230002968A (ko) | 2023-01-05 |
| US20230133252A1 (en) | 2023-05-04 |
| KR102868387B1 (ko) | 2025-10-13 |
| BR112022021882A2 (pt) | 2023-01-24 |
| EP4131259A4 (en) | 2023-09-20 |
| JP2023523081A (ja) | 2023-06-01 |
| CN113593585A (zh) | 2021-11-02 |
| JP7550881B2 (ja) | 2024-09-13 |
| US11900950B2 (en) | 2024-02-13 |
| EP4131259A1 (en) | 2023-02-08 |
| EP4131259B1 (en) | 2025-06-25 |
| TW202143216A (zh) | 2021-11-16 |
| TWI773286B (zh) | 2022-08-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2021218558A1 (zh) | 音频信号的比特分配方法和装置 | |
| TWI819344B (zh) | 音訊訊號渲染方法、裝置、設備及電腦可讀存儲介質 | |
| JP7745100B2 (ja) | 信号の符号化および復号化方法、装置、ユーザイクイップメント、ネットワーク側デバイス並びに記憶媒体 | |
| KR102901181B1 (ko) | 오디오 코딩 방법 및 장치 | |
| US20230368801A1 (en) | Bit allocation method and apparatus for audio object | |
| KR102808817B1 (ko) | 가상 스피커 세트 결정 방법 및 디바이스 | |
| WO2020155976A1 (zh) | 一种音频信号处理方法及装置 | |
| WO2021213128A1 (zh) | 音频信号编码方法和装置 | |
| WO2022012554A1 (zh) | 多声道音频信号编码方法和装置 | |
| EP4356376A1 (en) | Apparatus, methods and computer programs for obtaining spatial metadata | |
| CN116883708A (zh) | 图像分类方法、装置、电子设备及存储介质 | |
| CN115550690B (zh) | 帧率调整方法、装置、设备及存储介质 | |
| CN114283837B (zh) | 一种音频处理方法、装置、设备及存储介质 | |
| CN116156184A (zh) | 视频编解码方法、装置、设备、存储介质及计算机程序 | |
| US12412587B2 (en) | Encoding method and apparatus, decoding method and apparatus, device, storage medium, and computer program | |
| US20260038521A1 (en) | Scene audio signal encoding method and apparatus | |
| EP4618595A1 (en) | Audio signal rendering method, apparatus, device, and storage medium | |
| CN115038027B (zh) | Hoa系数的获取方法和装置 | |
| CN116980075A (zh) | 数据编码方法、装置、电子设备及存储介质 | |
| GB2594942A (en) | Capturing and enabling rendering of spatial audio signals | |
| WO2024212894A1 (zh) | 场景音频信号的解码方法和装置 | |
| CN117581566A (zh) | 音频处理方法、装置及存储介质 | |
| CN105872018A (zh) | 一种医群通语音系统 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21797604 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2022565956 Country of ref document: JP Kind code of ref document: A |
|
| REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112022021882 Country of ref document: BR |
|
| ENP | Entry into the national phase |
Ref document number: 2021797604 Country of ref document: EP Effective date: 20221031 |
|
| ENP | Entry into the national phase |
Ref document number: 20227040823 Country of ref document: KR Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 112022021882 Country of ref document: BR Kind code of ref document: A2 Effective date: 20221027 |
|
| WWG | Wipo information: grant in national office |
Ref document number: 202247062884 Country of ref document: IN |
|
| WWG | Wipo information: grant in national office |
Ref document number: 2021797604 Country of ref document: EP |






