WO2021218558A1 - 音频信号的比特分配方法和装置 - Google Patents

音频信号的比特分配方法和装置 Download PDF

Info

Publication number
WO2021218558A1
WO2021218558A1 PCT/CN2021/084578 CN2021084578W WO2021218558A1 WO 2021218558 A1 WO2021218558 A1 WO 2021218558A1 CN 2021084578 W CN2021084578 W CN 2021084578W WO 2021218558 A1 WO2021218558 A1 WO 2021218558A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio signal
parameter
classification
sound field
parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2021/084578
Other languages
English (en)
French (fr)
Inventor
高原
丁建策
王宾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to BR112022021882A priority Critical patent/BR112022021882A2/pt
Priority to KR1020227040823A priority patent/KR102868387B1/ko
Priority to JP2022565956A priority patent/JP7550881B2/ja
Priority to EP21797604.2A priority patent/EP4131259B1/en
Publication of WO2021218558A1 publication Critical patent/WO2021218558A1/zh
Priority to US17/976,474 priority patent/US11900950B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/002Dynamic bit allocation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0212Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/167Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes

Definitions

  • This application relates to audio processing technology, and in particular to a method and device for bit allocation of audio signals.
  • immersive audio technology is to provide users with a better three-dimensional sound experience by expanding the audio to high-dimensional space representation.
  • Three-dimensional audio technology is no longer simply using multi-channel representation at the playback end, but reconstructs the audio signal in three-dimensional space, and realizes the representation of audio in three-dimensional space through rendering technology.
  • the number of bits allocated to each audio signal for coding and decoding cannot reflect the difference in the spatial characteristics of the audio signal at the playback end, nor can it adapt to the characteristics of the audio signal. Reduce the audio signal coding and decoding efficiency.
  • the present application provides a method and device for bit allocation of audio signals to adapt to the characteristics of audio signals, and at the same time match different number of coding bits for different audio signals, thereby improving the coding and decoding efficiency of audio signals.
  • the present application provides a method for bit allocation of audio signals, including: obtaining T audio signals in a current frame, where T is a positive integer; determining a first audio signal set according to the T audio signals, and the first audio signal set An audio signal set includes M audio signals, M is a positive integer, and the T audio signals include the M audio signals, T ⁇ M; determine the value of the M audio signals in the first audio signal set Priority; bit allocation is performed on the M audio signals according to the priority of the M audio signals.
  • the present application determines the priority of the multiple audio signals according to the characteristics of the multiple audio signals included in the current frame and the related information of the audio signals in the metadata, and determines the number of bits to be allocated to each audio signal according to the priority, It can not only adapt to the characteristics of audio signals, but also match different number of coding bits for different audio signals, which improves the coding and decoding efficiency of audio signals.
  • the determining the priority of the M audio signals in the first audio signal set includes: obtaining a sound field classification parameter of each audio signal in the M audio signals; The priority of the M audio signals is determined according to the sound field classification parameter of each audio signal in the M audio signals.
  • the obtaining the sound field classification parameters of each of the M audio signals includes: obtaining the motion classification parameters, volume classification parameters, propagation classification parameters, and diffusion classification of the first audio signal One or more of parameters, state grading parameters, sorting grading parameters, and signal grading parameters, where the first audio signal is any one of the M audio signals; according to the acquired motion grading parameters and volume grading parameters , One or more of the propagation grading parameter, the diffusion grading parameter, the state grading parameter, the sort grading parameter, and the signal grading parameter to obtain the sound field grading parameter of the first audio signal; wherein, the motion grading parameter is used to describe the How fast the first audio signal moves per unit time in the spatial sound field, the volume classification parameter is used to describe the volume of the first audio signal in the spatial sound field, and the propagation classification parameter is used to describe the first audio signal The size of the propagation range in the space sound field, the diffusion classification parameter is used to describe the size of the diffusion range of the first
  • the priority of the audio signal involving multiple dimensions of information can be obtained.
  • the T audio signals in the current frame while acquiring the T audio signals in the current frame, it also includes: acquiring S groups of metadata in the current frame, where S is a positive integer, T ⁇ S, and S
  • S is a positive integer
  • T ⁇ S the number of subcarriers in the current frame
  • S is a positive integer
  • S ⁇ S the number of subcarriers in the current frame
  • S is a positive integer
  • T ⁇ S the number of metadata
  • the metadata is used to describe the state of the corresponding audio signal in the spatial sound field.
  • Metadata is used as the description information of the state of the corresponding audio signal in the spatial sound field, and can provide a reliable and effective basis for subsequent acquisition of the sound field grading parameters of the audio signal.
  • the obtaining the sound field classification parameters of each audio signal in the M audio signals includes: according to metadata corresponding to the first audio signal, or according to the first audio signal and The metadata corresponding to the first audio signal obtains one or one of the motion classification parameter, volume classification parameter, propagation classification parameter, diffusion classification parameter, state classification parameter, ranking classification parameter, and signal classification parameter of the first audio signal.
  • the first audio signal is any one of the M audio signals; according to the acquired motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, sort classification parameters, and One or more of the signal classification parameters obtain the sound field classification parameters of the first audio signal; wherein, the motion classification parameter is used to describe how fast the first audio signal moves in a unit time in the spatial sound field, and the volume
  • the classification parameter is used to describe the volume of the first audio signal in the spatial sound field
  • the propagation classification parameter is used to describe the propagation range of the first audio signal in the spatial sound field
  • the diffusion classification parameter is used to describe
  • the state classification parameter is used to describe the size of the sound source segmentation of the first audio signal in the spatial sound field
  • the order classification parameter is used to describe the first audio signal.
  • the size of an audio signal prioritized in the spatial sound field, and the signal grading parameter is used to describe the size of energy in the encoding process of the first audio signal.
  • the acquisition is based on one or more of the acquired motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, ranking classification parameters, and signal classification parameters.
  • the sound field grading parameter of the first audio signal includes: weighting multiple of the acquired motion grading parameter, volume grading parameter, propagation grading parameter, diffusion grading parameter, state grading parameter, ranking grading parameter, and signal grading parameter.
  • the sound field grading parameters are averaged; or, the obtained motion grading parameters, volume grading parameters, propagation grading parameters, diffusion grading parameters, state grading parameters, ranking grading parameters, and signal grading parameters are averaged and obtained
  • the sound field classification parameter; or, the acquired one of the motion classification parameter, volume classification parameter, propagation classification parameter, diffusion classification parameter, state classification parameter, ranking classification parameter, and signal classification parameter is used as the sound field classification parameter.
  • the determining the priority of the M audio signals according to the sound field grading parameters of each audio signal in the M audio signals includes: comparing the priority of the M audio signals according to a set first correspondence relationship with The priority corresponding to the sound field grading parameter of the first audio signal is determined as the priority of the first audio signal, and the first correspondence includes a correspondence between multiple sound field grading parameters and multiple priorities, of which one Or a plurality of the sound field classification parameters correspond to one of the priorities, and the first audio signal is any one of the M audio signals; or, the sound field classification parameter of the first audio signal is used as the first The priority of an audio signal; or, determine the range of the sound field classification parameter of the first audio signal according to multiple set range thresholds, and give priority to the corresponding range of the sound field classification parameter of the first audio signal The level is determined as the priority of the first audio signal.
  • the performing bit allocation on the M audio signals according to the priority of the M audio signals includes: performing bit allocation according to the currently available number of bits and the priority of the M audio signals Bit allocation, the higher the priority audio signal, the more bits are allocated.
  • the performing bit allocation according to the number of currently available bits and the priority of the M audio signals includes: determining the number of bits of the first audio signal according to the priority of the first audio signal
  • the first audio signal is any one of the M audio signals; the first audio signal is obtained according to the product of the currently available number of bits and the number of bits of the first audio signal Number of bits.
  • the performing bit allocation according to the number of currently available bits and the priority of the M audio signals includes: determining from a set second correspondence according to the priority of the first audio signal
  • the number of bits of the first audio signal, the second correspondence relationship includes a correspondence relationship between multiple priorities and multiple numbers of bits, wherein one or more of the priorities correspond to one number of bits, so
  • the first audio signal is any one of the M audio signals.
  • the determining a first audio signal set according to the T audio signals includes: adding a pre-designated audio signal among the T audio signals to the first audio signal set.
  • the determining the first audio signal set according to the T audio signals includes: adding audio signals corresponding to the S groups of metadata in the T audio signals to the first audio signal.
  • the obtaining the sound field classification parameters of each of the M audio signals includes: obtaining the motion classification parameters, volume classification parameters, propagation classification parameters, and diffusion classification of the first audio signal
  • the first audio signal is any one of the M audio signals
  • the obtaining motion classification parameters, volume classification parameters, propagation classification parameters, and diffusion classification parameters Acquire a plurality of first sound field classification parameters of the first audio signal; acquire one or more of the state classification parameters, sorting classification parameters, and signal classification parameters of the first audio signal; and classify according to the acquired state One or more of the parameters, the sorting grading parameters, and the signal grading parameters to obtain the second sound field grading parameter of the first audio signal; to obtain the first sound field grading parameter according to the first sound field grading parameter and the second sound field grading parameter A sound field grading parameter of an audio signal; wherein the motion grading parameter is used to describe how fast the first audio signal moves per unit time in the spatial sound field, and the volume
  • the propagation grading parameter is used to describe the size of the propagation range of the first audio signal during playback in the spatial sound field
  • the diffusion grading parameter is used to describe the spatial sound field of the first audio signal.
  • the size of the diffusion range in the sound field, the state grading parameter is used to describe the size of the sound source segmentation of the first audio signal in the spatial sound field, and the ranking grading parameter is used to describe the priority of the first audio signal in the spatial sound field
  • the size of the sorting, and the signal grading parameter is used to describe the size of the energy in the encoding process of the first audio signal.
  • the obtaining the sound field classification parameters of each audio signal in the M audio signals includes: according to metadata corresponding to the first audio signal, or according to the first audio signal and The metadata corresponding to the first audio signal acquires one or more of the motion classification parameter, the volume classification parameter, the propagation classification parameter, and the diffusion classification parameter of the first audio signal, and the first audio signal is the Any one of the M audio signals; acquiring the first sound field classification parameter of the first audio signal according to one or more of the acquired motion classification parameters, volume classification parameters, propagation classification parameters, and diffusion classification parameters; Acquire the state grading parameter, sorting grading parameter, and signal of the first audio signal according to the metadata corresponding to the first audio signal, or according to the first audio signal and the metadata corresponding to the first audio signal One or more of the grading parameters; acquiring the second sound field grading parameter of the first audio signal according to one or more of the acquired state grading parameters, sorting grading parameters, and signal grading parameters; according to the first The sound field classification
  • This application uses multiple methods to obtain multiple sound field classification parameters related to the audio signal according to the different characteristics of the audio signal, and then determines the priority of the audio signal according to the multiple sound field classification parameters, so that the obtained priority can refer to the number of audio signals.
  • This feature can also be compatible with implementation schemes corresponding to different features.
  • the determining the priority of the M audio signals according to the sound field classification parameters of each of the M audio signals includes: obtaining according to the first sound field classification parameters The first priority of the first audio signal; the second priority of the first audio signal is obtained according to the second sound field classification parameter; the second priority of the first audio signal is obtained according to the first priority and the second priority The priority of the first audio signal.
  • This application uses multiple methods to obtain multiple priorities related to audio signals according to different characteristics of audio signals, and then the multiple priorities are compatible and combined to obtain the final priority of the audio signal, so that the obtained priority can refer to the audio signal.
  • the multiple features of can also be compatible with the implementation schemes corresponding to different features.
  • the present application provides an audio signal encoding method. After executing the audio signal bit allocation method according to any one of the above first aspects, the method further includes: according to the bits allocated by the M audio signals Encode the M audio signals to obtain an encoded bitstream.
  • the encoded bitstream includes the number of bits of the M audio signals.
  • the present application provides an audio signal decoding method. After executing the audio signal bit allocation method according to any one of the above-mentioned first aspects, the method further includes: receiving an encoded bitstream; and executing the method as in the above-mentioned first aspect The audio signal bit allocation method described in any one of the methods for acquiring the respective bit numbers of the M audio signals; and reconstructing the M audio signals according to the respective bit numbers of the M audio signals and the code stream.
  • the present application provides a bit allocation device for audio signals, including: a processing module for obtaining T audio signals in the current frame, where T is a positive integer; and determining the first audio signal according to the T audio signals Set, the first audio signal set includes M audio signals, M is a positive integer, the T audio signals include the M audio signals, T ⁇ M; determine the first audio signal set in the Priorities of the M audio signals; bit allocation is performed on the M audio signals according to the priorities of the M audio signals.
  • the processing module is specifically configured to obtain the sound field classification parameter of each audio signal in the M audio signals; according to the sound field classification parameter of each audio signal in the M audio signals The priority of the M audio signals is determined.
  • the processing module is specifically configured to obtain the motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, ranking classification parameters, and signal classification parameters of the first audio signal.
  • the first audio signal is any one of the M audio signals; according to the acquired motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, One or more of the sorting grading parameter and the signal grading parameter obtains the sound field grading parameter of the first audio signal; wherein the motion grading parameter is used to describe how fast the first audio signal moves in the spatial sound field per unit time
  • the volume classification parameter is used to describe the volume of the first audio signal in the spatial sound field
  • the propagation classification parameter is used to describe the propagation range of the first audio signal in the spatial sound field
  • the diffusion classification is used to describe the size of the diffusion range of the first audio signal in the spatial sound field
  • the state classification parameter is used to describe the size of the sound source segmentation of the first audio signal in the
  • the processing module is specifically configured to obtain S groups of metadata in the current frame, where S is a positive integer, T ⁇ S, the S groups of metadata and the T audio Signal correspondence, and the metadata is used to describe the state of the corresponding audio signal in the spatial sound field.
  • the processing module is specifically configured to obtain the data according to the metadata corresponding to the first audio signal, or according to the first audio signal and the metadata corresponding to the first audio signal.
  • One or more of the motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, ranking classification parameters, and signal classification parameters of the first audio signal, and the first audio signal is the M Any one of the following audio signals; acquiring the first audio signal according to one or more of the acquired motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, ranking classification parameters, and signal classification parameters
  • a sound field grading parameter of an audio signal wherein the motion grading parameter is used to describe how fast the first audio signal moves per unit time in the spatial sound field, and the volume grading parameter is used to describe the spatial sound field of the first audio signal.
  • the size of the volume in the sound field is used to describe the size of the propagation range of the first audio signal in the spatial sound field
  • the diffusion grading parameter is used to describe the size of the propagation range of the first audio signal in the spatial sound field
  • the state classification parameter is used to describe the size of the sound source segmentation of the first audio signal in the spatial sound field
  • the ranking classification parameter is used to describe the size of the priority ranking of the first audio signal in the spatial sound field.
  • the signal classification parameter is used to describe the magnitude of energy in the encoding process of the first audio signal.
  • the processing module is specifically configured to analyze the acquired motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, ranking classification parameters, and signal classification parameters.
  • the sound field classification parameter is obtained by multiple weighted averages; or, the obtained motion classification parameter, volume classification parameter, propagation classification parameter, diffusion classification parameter, state classification parameter, ranking classification parameter, and signal classification parameter are obtained Averaging to obtain the sound field classification parameters; or, use one of the acquired motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, ranking classification parameters, and signal classification parameters as the sound field Classification parameters.
  • the processing module is specifically configured to determine the priority corresponding to the sound field classification parameter of the first audio signal as the priority of the first audio signal according to the set first correspondence relationship.
  • the first correspondence includes a correspondence between multiple sound field classification parameters and multiple priorities, wherein one or more of the sound field classification parameters corresponds to one priority, and the first audio signal Is any one of the M audio signals; or, the sound field classification parameter of the first audio signal is used as the priority of the first audio signal; or, the first audio signal is determined according to a plurality of set range thresholds; The range of the sound field classification parameter of an audio signal is determined, and the priority corresponding to the range of the sound field classification parameter of the first audio signal is determined as the priority of the first audio signal.
  • the processing module is specifically configured to perform bit allocation according to the number of currently available bits and the priority of the M audio signals, and the audio signal with a higher priority is allocated more bits.
  • the processing module is specifically configured to determine the proportion of the number of bits of the first audio signal according to the priority of the first audio signal, and the first audio signal is the M audio signals. Any one of the signals; obtaining the number of bits of the first audio signal according to the product of the number of currently available bits and the proportion of the number of bits of the first audio signal.
  • the processing module is specifically configured to determine the number of bits of the first audio signal from a set second correspondence relationship according to the priority of the first audio signal, and the second correspondence The relationship includes a corresponding relationship between multiple priority levels and multiple bit numbers, wherein one or more of the priority levels corresponds to one bit number, and the first audio signal is any of the M audio signals one.
  • the processing module is specifically configured to add pre-designated audio signals among the T audio signals to the first audio signal set.
  • the processing module is specifically configured to add the audio signals corresponding to the S groups of metadata in the T audio signals to the first audio signal set; or, will be greater than or The audio signal corresponding to the importance parameter equal to the set participation threshold is added to the first audio signal set, the metadata includes the importance parameter, and the T audio signals include the audio signal corresponding to the importance parameter .
  • the processing module is specifically configured to obtain one or more of the motion classification parameter, the volume classification parameter, the propagation classification parameter, and the diffusion classification parameter of the first audio signal, and the first audio signal
  • the signal is any one of the M audio signals; the first sound of the first audio signal is obtained according to one or more of the acquired motion classification parameters, volume classification parameters, propagation classification parameters, and diffusion classification parameters.
  • Field grading parameters acquiring one or more of the state grading parameters, sorting grading parameters, and signal grading parameters of the first audio signal; according to one or more of the acquired state grading parameters, sorting grading parameters, and signal grading parameters Acquire a plurality of second sound field classification parameters of the first audio signal; acquire the sound field classification parameters of the first audio signal according to the first sound field classification parameters and the second sound field classification parameters; wherein, the motion The classification parameter is used to describe how fast the first audio signal moves per unit time in the spatial sound field, the volume classification parameter is used to describe the volume of the first audio signal when it is played back in the spatial sound field, and the propagation classification parameter Used to describe the size of the propagation range of the first audio signal during playback in the spatial sound field, the diffusion classification parameter is used to describe the size of the diffusion range of the first audio signal in the spatial sound field, and the state classification parameter is used To describe the size of the sound source segmentation of the first audio signal in the spatial sound field, the ranking parameter is used to describe
  • the processing module is specifically configured to obtain the data according to the metadata corresponding to the first audio signal, or according to the first audio signal and the metadata corresponding to the first audio signal.
  • One or more of the motion classification parameter, the volume classification parameter, the propagation classification parameter, and the diffusion classification parameter of the first audio signal, the first audio signal is any one of the M audio signals;
  • One or more of the motion classification parameter, the volume classification parameter, the propagation classification parameter, and the diffusion classification parameter obtain the first sound field classification parameter of the first audio signal; according to the metadata corresponding to the first audio signal, Or obtain one or more of the state classification parameter, the sort classification parameter, and the signal classification parameter of the first audio signal according to the first audio signal and the metadata corresponding to the first audio signal;
  • One or more of the state classification parameter, the sort classification parameter, and the signal classification parameter to obtain the second sound field classification parameter of the first audio signal; obtain the second sound field classification parameter according to the first sound field classification parameter and the second sound field classification parameter The sound field grading parameter of the first audio signal;
  • the processing module is specifically configured to obtain the first priority of the first audio signal according to the first sound field classification parameter; obtain the first priority of the first audio signal according to the second sound field classification parameter The second priority of the first audio signal; the priority of the first audio signal is acquired according to the first priority and the second priority.
  • the processing module is further configured to encode the M audio signals according to the number of bits allocated by the M audio signals to obtain an encoded bitstream.
  • the encoded bitstream includes the number of bits of the M audio signals.
  • it further includes: a transceiver module, configured to receive an encoded code stream; the processing module, further configured to obtain the respective number of bits of the M audio signals; and according to each of the M audio signals The number of bits and the encoded bitstream are used to reconstruct the M audio signals.
  • the present application provides a device including: one or more processors; a memory for storing one or more programs; when the one or more programs are executed by the one or more processors, The one or more processors are caused to implement the method according to any one of the first to third aspects.
  • the present application provides a computer-readable storage medium, which is characterized by comprising a computer program that, when executed on a computer, causes the computer to execute any one of the first to third aspects mentioned above. The method described.
  • the present application provides a computer-readable storage medium, including an encoded bitstream obtained according to the method described in the above second aspect.
  • the present application provides an encoding device, including a processor and a communication interface, the processor reads and stores a computer program through the communication interface, the computer program includes program instructions, and the processor is used to call the Program instructions to execute the method described in any one of the first to third aspects above.
  • the present application provides an encoding device, which is characterized by comprising a processor and a memory, where the processor is configured to execute the method described in the second aspect, and the memory is configured to store the encoded code stream.
  • FIG. 1A exemplarily shows a schematic block diagram of an audio encoding and decoding system 10 applied in this application;
  • FIG. 1B is an explanatory diagram of an example of an audio decoding system 40 according to an exemplary embodiment
  • FIG. 2 is a schematic diagram of the structure of an audio decoding device 200 provided by the present application.
  • FIG. 3 is a simplified block diagram of an apparatus 300 according to an exemplary embodiment
  • FIG. 4 is a schematic flowchart of a method for allocating audio signals according to the present application.
  • Fig. 5 is an exemplary schematic diagram of the position of the audio signal in the spatial sound field
  • Fig. 6 is an exemplary schematic diagram of the priority of the audio signal in the spatial sound field
  • FIG. 7 is a schematic structural diagram of an embodiment of a device of this application.
  • FIG. 8 is a schematic structural diagram of an embodiment of a device of this application.
  • At least one (item) refers to one or more, and “multiple” refers to two or more.
  • “And/or” is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, “A and/or B” can mean: only A, only B, and both A and B , Where A and B can be singular or plural.
  • the character “/” generally indicates that the associated objects before and after are in an “or” relationship.
  • the following at least one item (a) or similar expressions refers to any combination of these items, including any combination of a single item (a) or a plurality of items (a).
  • At least one of a, b, or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c" ", where a, b, and c can be single or multiple.
  • Audio frame Audio data is streaming.
  • the amount of audio data within a period of time is usually taken as a frame of audio. This period is called “sampling time", which can be coded and decoded.
  • the value is determined by the requirements of the device and the specific application, for example, the duration is 2.5ms-60ms, and ms is milliseconds.
  • Audio signal An audio signal is an information carrier of regular sound waves with voice, music and sound effects that change in frequency and amplitude. Audio is a continuously changing analog signal, which can be represented by a continuous curve, called a sound wave. Audio is the audio signal through analog-to-digital conversion or a digital signal generated by a computer. The sound wave has three important parameters: frequency, amplitude and phase, which determine the characteristics of the audio signal.
  • Metadata also known as intermediary data, relay data, is data describing data (data about data), mainly used to describe data properties, and support such as indicating storage location, historical data, and resource search , File recording and other functions. Metadata is information about the organization of data, data domains and their relationships. In short, metadata is data about data. The metadata in this application is used to describe the state of the corresponding audio signal in the spatial sound field.
  • Three-dimensional audio Three-dimensional audio:
  • FIG. 1A exemplarily shows a schematic block diagram of an audio encoding and decoding system 10 applied in this application.
  • the audio encoding and decoding system 10 may include a source device 12 and a destination device 14.
  • the source device 12 generates encoded audio data. Therefore, the source device 12 may be referred to as an audio encoding device.
  • the destination device 14 can decode the encoded audio data generated by the source device 12, and therefore, the destination device 14 can be referred to as an audio decoding device.
  • Various implementations of source device 12, destination device 14, or both may include one or more processors and memory coupled to the one or more processors.
  • the memory may include, but is not limited to, random access memory (RAM), read-only memory (ROM), flash memory, or can be used in the form of instructions or data structures that can be accessed by a computer Any other medium that stores the desired program code.
  • the source device 12 and the destination device 14 may include various devices, including desktop computers, mobile computing devices, notebook (for example, laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called "smart" phones. , Televisions, cameras, display devices, digital media players, audio game consoles, on-board computers, wireless communication equipment, or the like.
  • FIG. 1A shows the source device 12 and the destination device 14 as separate devices
  • the device embodiment may also include the source device 12 and the destination device 14 or the functionality of both, that is, the source device 12 or the corresponding function. And the destination device 14 or the corresponding functionality.
  • the same hardware and/or software may be used, or separate hardware and/or software, or any combination thereof may be used to implement the source device 12 or the corresponding functionality and the destination device 14 or the corresponding functionality.
  • the source device 12 and the destination device 14 can communicate with each other via a link 13, and the destination device 14 can receive encoded audio data from the source device 12 via the link 13.
  • the link 13 may include one or more media or devices capable of moving the encoded audio data from the source device 12 to the destination device 14.
  • the link 13 may include one or more communication media that enable the source device 12 to transmit the encoded audio data directly to the destination device 14 in real time.
  • the source device 12 may modulate the encoded audio data according to a communication standard (for example, a wireless communication protocol), and may transmit the modulated audio data to the destination device 14.
  • the one or more communication media may include wireless and/or wired communication media, such as a radio frequency (RF) spectrum or one or more physical transmission lines.
  • RF radio frequency
  • the one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (e.g., the Internet).
  • the one or more communication media may include routers, switches, base stations, or other devices that facilitate communication from source device 12 to destination device 14.
  • the source device 12 includes an encoder 20, and optionally, the source device 12 may also include an audio source 16, an audio preprocessor 18, and a communication interface 22.
  • the encoder 20, the audio source 16, the audio preprocessor 18, and the communication interface 22 may be hardware components in the source device 12, or may be software programs in the source device 12. They are described as follows:
  • the audio source 16 may include or may be any type of audio capture device, for example, for capturing real-world sounds, and/or any type of audio generating device, for example, a computer audio processor, or for acquiring and/or providing reality World audio, computer animation audio (for example, screen content, audio in virtual reality (VR)), and/or any combination thereof (for example, audio in augmented reality (AR)) .
  • the audio source 16 may be a microphone for capturing audio or a memory for storing audio.
  • the audio source 16 may also include any type (internal or external) interface for storing previously captured or generated audio and/or acquiring or receiving audio.
  • the audio source 16 can be, for example, an audio capture device that is local or integrated in the source device; when the audio source 16 is a memory, the audio source 16 can be local or, for example, integrated in the source device. Integrated memory.
  • the interface may be, for example, an external interface that receives audio from an external audio source.
  • the external audio source is, for example, an external audio capture device, such as a microphone, a microphone, an external memory, or an external audio generating device.
  • the device is, for example, an external computer audio processor, computer, or server.
  • the interface can be any type of interface according to any proprietary or standardized interface protocol, such as a wired or wireless interface, and an optical interface.
  • audio can be regarded as a one-dimensional vector of picture elements.
  • the pixels in the vector can also be called sampling points.
  • the number of sampling points on the vector or audio defines the size of the audio.
  • the audio transmitted from the audio source 16 to the audio processor may also be referred to as original audio data 17.
  • the audio pre-processor 18 is configured to receive the original audio data 17 and perform pre-processing on the original audio data 17 to obtain pre-processed audio 19 or pre-processed audio data 19.
  • the pre-processing performed by the audio pre-processor 18 may include trimming, toning, or denoising.
  • the encoder 20 (or audio encoder 20) is used to receive the pre-processed audio data 19, and process the pre-processed audio data 19, so as to provide the encoded audio data 21.
  • the encoder 20 may be used to implement the various embodiments described below to implement the application of the audio signal bit allocation method described in this application on the encoding side.
  • the communication interface 22 can be used to receive the encoded audio data 21, and can transmit the encoded audio data 21 to the destination device 14 or any other device (such as a memory) through the link 13 for storage or direct reconstruction,
  • the other device may be any device used for decoding or storage.
  • the communication interface 22 may be used, for example, to encapsulate the encoded audio data 21 into a suitable format, such as a data packet, for transmission on the link 13.
  • the destination device 14 includes a decoder 30, and optionally, the destination device 14 may also include a communication interface 28, an audio post-processor 32, and a playback device 34. They are described as follows:
  • the communication interface 28 may be used to receive the encoded audio data 21 from the source device 12 or any other source, for example, a storage device, and the storage device is, for example, an encoded audio data storage device.
  • the communication interface 28 can be used to transmit or receive the encoded audio data 21 through the link 13 between the source device 12 and the destination device 14 or through any type of network.
  • the link 13 is, for example, a direct wired or wireless connection.
  • the type of network is, for example, a wired or wireless network or any combination thereof, or any type of private network and public network, or any combination thereof.
  • the communication interface 28 may be used, for example, to decapsulate the data packet transmitted by the communication interface 22 to obtain the encoded audio data 21.
  • Both the communication interface 28 and the communication interface 22 can be configured as a one-way communication interface or a two-way communication interface, and can be used, for example, to send and receive messages to establish connections, confirm and exchange any other communication links and/or, for example, encoded audio Data transfer information about data transfer.
  • the decoder 30 (or referred to as the decoder 30) is used to receive the encoded audio data 21 and provide the decoded audio data 31 or the decoded audio 31.
  • the decoder 30 may be used to implement the various embodiments described below to implement the application of the audio signal bit allocation method described in this application on the decoding side.
  • the audio post-processor 32 is configured to perform post-processing on the decoded audio data 31 (also referred to as reconstructed audio data) to obtain the post-processed audio data 33.
  • the post-processing performed by the audio post-processor 32 may include: trimming or resampling, or any other processing, and may also be used to transmit the post-processed audio data 33 to the playback device 34.
  • the playback device 34 is used to receive the post-processed audio data 33 to play audio to, for example, users or listeners.
  • the playback device 34 may be or may include any type of player for presenting reconstructed audio, for example, an integrated or external speaker or speaker.
  • FIG. 1A shows the source device 12 and the destination device 14 as separate devices
  • the device embodiment may also include the source device 12 and the destination device 14 or the functionality of both, that is, the source device 12 or the corresponding Functionality and destination device 14 or corresponding functionality.
  • the same hardware and/or software may be used, or separate hardware and/or software, or any combination thereof may be used to implement the source device 12 or the corresponding functionality and the destination device 14 or the corresponding functionality.
  • the source device 12 and the destination device 14 may include any of a variety of devices, including any types of handheld or stationary devices, such as notebook or laptop computers, mobile phones, smart phones, tablets or tablet computers, cameras, desktop computers , Set-top boxes, televisions, cameras, in-vehicle devices, playback devices, digital media players, game consoles, media streaming devices (such as content service servers or content distribution servers), broadcast receiver devices, broadcast transmitter devices, etc., And can not use or use any type of operating system.
  • handheld or stationary devices such as notebook or laptop computers, mobile phones, smart phones, tablets or tablet computers, cameras, desktop computers , Set-top boxes, televisions, cameras, in-vehicle devices, playback devices, digital media players, game consoles, media streaming devices (such as content service servers or content distribution servers), broadcast receiver devices, broadcast transmitter devices, etc., And can not use or use any type of operating system.
  • Both the encoder 20 and the decoder 30 can be implemented as any of various suitable circuits, for example, one or more microprocessors, digital signal processors (digital signal processors, DSP), and application-specific integrated circuits (application-specific integrated circuits). circuit, ASIC), field-programmable gate array (FPGA), discrete logic, hardware, or any combination thereof.
  • the device can store the instructions of the software in a suitable non-transitory computer-readable storage medium, and can use one or more processors to execute the instructions in hardware to execute the technology of the present disclosure. . Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) can be regarded as one or more processors.
  • the audio encoding and decoding system 10 shown in FIG. 1A is only an example, and the technology of the present application can be applied to audio encoding settings that do not necessarily include any data communication between encoding and decoding devices (for example, audio encoding or audio decoding).
  • the data can be retrieved from local storage, streamed on the network, etc.
  • the audio encoding device can encode data and store the data to the memory, and/or the audio decoding device can retrieve the data from the memory and decode the data.
  • encoding and decoding are performed by devices that do not communicate with each other but only encode data to and/or retrieve data from the memory and decode the data.
  • FIG. 1B is an explanatory diagram of an example of an audio decoding system 40 according to an exemplary embodiment.
  • the audio decoding system 40 can implement a combination of various technologies of the present application.
  • the audio decoding system 40 may include a microphone 41, an encoder 20, a decoder 30 (and/or an audio encoder/decoder implemented by the logic circuit 47 of the processing unit 46), an antenna 42, One or more processors 43, one or more memories 44, and/or playback devices 45.
  • the microphone 41, the antenna 42, the processing unit 46, the logic circuit 47, the encoder 20, the decoder 30, the processor 43, the memory 44, and/or the playback device 45 can communicate with each other.
  • the encoder 20 and the decoder 30 are used to illustrate the audio coding system 40, in different examples, the audio coding system 40 may include only the encoder 20 or only the decoder 30.
  • the antenna 42 may be used to transmit or receive an encoded stream of audio data.
  • the playback device 45 may be used to play audio data.
  • the logic circuit 47 may be implemented by the processing unit 46.
  • the processing unit 46 may include application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, and the like.
  • the audio decoding system 40 may also include an optional processor 43, and the optional processor 43 may similarly include application-specific integrated circuit (ASIC) logic, general-purpose processors, and the like.
  • the logic circuit 47 may be implemented by hardware, such as dedicated audio coding hardware, and the processor 43 may be implemented by general software, an operating system, and the like.
  • the memory 44 may be any type of memory, such as volatile memory (for example, static random access memory (Static Random Access Memory, SRAM), dynamic random access memory (Dynamic Random Access Memory, DRAM), etc.) or non-volatile memory. Memory (for example, flash memory, etc.), etc.
  • volatile memory for example, static random access memory (Static Random Access Memory, SRAM), dynamic random access memory (Dynamic Random Access Memory, DRAM), etc.
  • Memory for example, flash memory, etc.
  • the memory 44 may be implemented by cache memory.
  • the logic circuit 47 can access the memory 44.
  • the logic circuit 47 and/or the processing unit 46 may include a memory (for example, a cache, etc.) for implementing buffers and the like.
  • the encoder 20 implemented by logic circuits may include a buffer (e.g., implemented by the processing unit 46 or memory 44) and an audio processing unit (e.g., implemented by the processing unit 46).
  • the audio processing unit may be communicatively coupled to the buffer.
  • the audio processing unit may include an encoder 20 implemented by a logic circuit 47 to implement various modules discussed in any other encoder system or subsystem described herein. Logic circuits can be used to perform the various operations discussed herein.
  • decoder 30 may be implemented by logic circuit 47 in a similar manner to implement the various modules discussed in any other decoder system or subsystem described herein.
  • the decoder 30 implemented by the logic circuit may include a buffer (implemented by the processing unit 2820 or the memory 44) and an audio processing unit (implemented by the processing unit 46, for example).
  • the audio processing unit may be communicatively coupled to the buffer.
  • the audio processing unit may include a decoder 30 implemented by a logic circuit 47 to implement various modules discussed in any other decoder system or subsystem described herein.
  • the antenna 42 may be used to receive an encoded bitstream of audio data.
  • the encoded bitstream may include audio signal data, metadata, etc., related to audio frames discussed herein.
  • the audio coding system 40 may also include a decoder 30 coupled to the antenna 42 and used to decode the encoded bitstream.
  • the playback device 45 is used to play audio frames.
  • the decoder 30 may be used to perform the reverse process.
  • the decoder 30 can be used to receive and parse such metadata, and decode related audio data accordingly.
  • the encoder 20 may entropy encode the metadata into an encoded audio code stream. In such instances, decoder 30 may parse such metadata and decode related audio data accordingly.
  • Fig. 2 is a schematic structural diagram of an audio decoding device 200 (for example, an audio encoding device or an audio decoding device) provided by the present application.
  • the audio decoding device 200 is suitable for implementing the embodiments described in this application.
  • the audio decoding device 200 may be an audio decoder (for example, the decoder 30 of FIG. 1A) or an audio encoder (for example, the encoder 20 of FIG. 1A).
  • the audio decoding device 200 may be one or more components of the decoder 30 in FIG. 1A or the encoder 20 in FIG. 1A described above.
  • the audio decoding device 200 includes: an entry port 210 for receiving data and a receiving unit (Rx) 220, a processor, logic unit or central processing unit (CPU) 230 for processing data, and a transmitter unit for transmitting data (Tx) 240 and outlet port 250, and a memory 260 for storing data.
  • the audio decoding device 200 may further include photoelectric conversion components and electro-optical (EO) components coupled with the inlet port 210, the receiver unit 220, the transmitter unit 240, and the outlet port 250 for the outlet or inlet of optical or electrical signals.
  • EO electro-optical
  • the processor 230 is implemented by hardware and software.
  • the processor 230 may be implemented as one or more CPU chips, cores (for example, multi-core processors), FPGAs, ASICs, and DSPs.
  • the processor 230 communicates with the ingress port 210, the receiver unit 220, the transmitter unit 240, the egress port 250, and the memory 260.
  • the processor 230 includes a decoding module 270 (for example, an encoding module 270 or a decoding module 270).
  • the encoding/decoding module 270 implements the embodiments disclosed in this document to implement the audio signal bit allocation method provided in this application. For example, the encoding/decoding module 270 implements, processes, or provides various encoding operations.
  • the encoding/decoding module 270 provides a substantial improvement to the function of the audio decoding device 200, and affects the conversion of the audio decoding device 200 to different states.
  • the encoding/decoding module 270 is implemented by instructions stored in the memory 260 and executed by the processor 230.
  • the memory 260 includes one or more magnetic disks, tape drives, and solid-state hard drives, which can be used as an overflow data storage device for storing programs when these programs are selectively executed, and storing instructions and data read during program execution.
  • the memory 260 may be volatile and/or non-volatile, and may be read-only memory (ROM), random access memory (RAM), random access memory (ternary content-addressable memory, TCAM) and/or static Random Access Memory (SRAM).
  • FIG. 3 is a simplified block diagram of an apparatus 300 according to an exemplary embodiment.
  • the device 300 can implement the technology of the present application.
  • FIG. 3 is a schematic block diagram of an implementation manner of an encoding device or a decoding device (referred to as a decoding device 300 for short) of this application.
  • the apparatus 300 may include a processor 310, a memory 330, and a bus system 350.
  • the processor and the memory are connected by a bus system, the memory is used to store instructions, and the processor is used to execute instructions stored in the memory.
  • the memory of the decoding device stores program code, and the processor can call the program code stored in the memory to execute the method described in this application. To avoid repetition, it will not be described in detail here.
  • the processor 310 may be a central processing unit (Central Processing Unit, referred to as "CPU"), and the processor 310 may also be other general-purpose processors, digital signal processors (DSP), or application specific integrated circuits ( ASIC), off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory 330 may include a read only memory (ROM) device or a random access memory (RAM) device. Any other suitable type of storage device can also be used as the memory 330.
  • the memory 330 may include code and data 331 accessed by the processor 310 using the bus 350.
  • the memory 330 may further include an operating system 333 and application programs 335.
  • the bus system 350 may also include a power bus, a control bus, a status signal bus, and the like. However, for clear description, various buses are marked as the bus system 350 in the figure.
  • the decoding device 300 may further include one or more output devices, such as a speaker 370.
  • the speaker 370 may be a headset or an external speaker.
  • the speaker 370 may be connected to the processor 310 via the bus 350.
  • FIG. 4 is a schematic flowchart of a method for allocating audio signals according to the present application.
  • the process 400 may be executed by the source device 12 or the destination device 14.
  • the process 400 is described as a series of steps or operations. It should be understood that the process 400 may be executed in various orders and/or occur simultaneously, and is not limited to the execution order shown in FIG. 4.
  • the method includes:
  • Step 401 Acquire T audio signals in the current frame.
  • the current frame is the audio frame acquired at the current moment during the execution of the method of the present application.
  • 3D audio technology no longer simply uses multi-channel representation, but uses different audio signals to represent different sounds.
  • the environment includes human voices and music. Sound, car sound, etc., use three audio signals to represent human voice, music sound, and car sound, and then reconstruct each sound in three-dimensional space based on these three audio signals to realize a variety of sounds in three-dimensional space. Representation of space. That is, the audio frame may contain multiple audio signals, and one audio signal represents a kind of voice, music or sound effect in reality. It should be noted that any technology for extracting audio signals from audio frames can be used in this application, and there is no specific limitation on this.
  • S groups of metadata in the current frame are acquired, and the S groups of metadata correspond to the above T audio signals.
  • the encoding end is based on pre-processing of original speech, music or sound effects. Audio data and metadata have been generated separately in this process.
  • the encoding end can correspond to the start time of the current frame according to the principle of the audio frame. (Sampling point) and end time (sampling point), take the metadata within the corresponding time range as the metadata of the current frame.
  • the metadata of the current frame can be obtained by parsing the received code stream.
  • the metadata includes parameters such as object index (object_index), azimuth angle (position_azimuth), elevation angle (position_elevation), position radius (position_radius), and gain factor (gain_factor). , Uniform spread (spread_uniform), spread width (spread_width), spread height (spread_height), spread depth (spread_depth), diffusion (diffuseness), importance (priority), division (divergence) and speed (speed), Yuan The value range and the number of bits of the above parameters are recorded in the data. It should be noted that the metadata may also include other parameters and parameter recording forms, which are not specifically limited in this application.
  • Metadata Value range (precision) Number of bits object_index 1; 128(1) 7 position_azimuth -180; 180(2) 8 position_elevation -90; 90(5) 6 position_radius 0.5; 16(non-linear) 4 gain_factor 0.004; 5.957(non-linear) 7 spread_uniform 0; 180 7 spread_width 0; 180 7 spread_height 0; 90 5 spread_depth 0; 15.5 4 diffuseness 0; 1 7 priority 0; 7 3 divergence 0; 1 8 speed 0,1 4
  • Step 402 Determine a first audio signal set according to the T audio signals.
  • the first audio signal set includes M audio signals, where M is a positive integer, and T audio signals include M audio signals, T ⁇ M.
  • audio signals with corresponding metadata among the T audio signals may be added to the first audio signal set. That is, if the above T audio signals all correspond to metadata, all T audio signals can be added to the first audio signal set. If only part of the above T audio signals corresponds to metadata, you only need to add this part of the audio signal. The signal is added to the first audio signal set.
  • This application may also add pre-designated audio signals among the T audio signals to the first audio signal set. Through high-level signaling or a user-specified manner, part or all of the audio signals in the above T audio signals can be added to the first audio signal set.
  • the higher layer signaling directly configures the index of the audio signal to be added to the first audio signal set.
  • the user specifies voice, music or sound effects, and adds the audio signal of the specified object to the first audio signal set.
  • This application can also refer to the importance parameter of the audio signal recorded in the metadata.
  • the importance parameter is used to indicate the importance of the corresponding audio signal in three-dimensional audio.
  • the audio signal corresponding to the importance parameter is added to the first audio signal set.
  • Step 403 Determine the priority of the M audio signals in the first audio signal set.
  • This application may first obtain the sound field classification parameters of each audio signal in the M audio signals, and then determine the priority of the M audio signals according to the sound field classification parameters of each audio signal in the M audio signals.
  • the sound field classification parameter can be an index of importance of the audio signal obtained according to the related parameters of the audio signal.
  • the related parameters can include motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, sort classification parameters, and signals One or more of the grading parameters. These parameters can be obtained according to the signal characteristics of the audio signal itself, or can be obtained according to the metadata of the audio signal.
  • the motion grading parameter is used to describe how fast the first audio signal moves in a unit time in the spatial sound field
  • the volume grading parameter is used to describe the volume of the first audio signal when it is played back in the spatial sound field
  • the propagation grading parameter is used to describe the first audio signal.
  • the size of the propagation range of the audio signal when it is played back in the spatial sound field is used to describe the size of the diffusion range of the first audio signal in the space sound field, and the state classification parameter is used to describe the sound source division of the first audio signal in the space sound field.
  • the ranking parameter is used to describe the priority of the first audio signal in the spatial sound field, and the signal ranking parameter is used to describe the amount of energy in the encoding process of the first audio signal.
  • the following takes the i-th audio signal as an example to describe the method for obtaining the above-mentioned parameters.
  • the i-th audio signal is any one of the above-mentioned M audio signals. It should be noted that the following parameters are exemplary descriptions, and other parameters or characteristics of the audio signal may also be used to calculate the sound field grading parameters, which are not specifically limited in this application.
  • the sports classification parameters can be calculated by the following formula:
  • speedRatio i represents a motion classification parameter i th audio signal
  • f (d i) a map showing the relationship between the i-th audio signals between a moving state spatial sound field with metadata
  • D i represents the i-th audio signal
  • ⁇ i represents the azimuth angle of the i-th audio signal compared to the rendering center point after moving
  • r i represents the distance of the i-th audio signal compared to the rendering center point after moving
  • ⁇ 0 represents the comparison of the i-th audio signal before moving
  • the azimuth of the center point of the rendering It represents the pitch angle of the i-th audio signal compared to the rendering center point before moving
  • r 0 represents the distance of the i-th audio signal compared to the rendering center point before moving.
  • the center of the sphere is the rendering center point
  • the radius of the sphere is the distance between the position of the i-th audio signal in the space field and the center of the sphere.
  • the angle between the position of the audio signal in the space field and the horizontal plane is the pitch angle of the i-th audio signal
  • the position of the i-th audio signal in the space field is directly in front of the projection and rendering center point on the horizontal plane.
  • the included angle is the azimuth angle of the i-th audio signal; Represents the sum of the mapping relationships between the motion states of the above M audio signals in the spatial sound field and the metadata.
  • the sports classification parameters can also be calculated by the following formula:
  • sports classification parameters can also be calculated by other methods, which are not specifically limited in this application.
  • volume grading parameters can be calculated by the following formula:
  • loudRatio i represents the volume grading parameter of the i-th audio signal
  • f(A i , gain i , r i ) represents the mapping relationship between the playback volume of the i-th audio signal in the spatial sound field and signal characteristics and metadata
  • a i represents the sum or average value of the amplitude of each sampling point of the i-th audio signal in the current frame.
  • the amplitude of the sampling point can be obtained through the metadata of the i-th audio signal; gain i means that the audio signal is in the current frame The gain value can be obtained through the metadata of the i-th audio signal; r i represents the distance of the i-th audio signal from the rendering center point in the current frame, and can be obtained through the metadata of the i-th audio signal; Represents the sum of the mapping relationship between the playback volume of the above M audio signals in the spatial sound field and the signal characteristics and metadata.
  • volume grading parameters can also be calculated by the following formula:
  • mean(A i ) represents the sum or average value of the amplitude of each sampling point of the i-th audio signal in the current frame, and the amplitude of the sampling point can be obtained through the metadata of the i-th audio signal; Represents the sum of the amplitudes or the sum of the average values of the respective sampling points of the M audio signals in the current frame.
  • volume grading parameters can also be calculated by the following formula:
  • r i represents the distance between the i-th audio signal and the rendering center point, which can be obtained through the metadata of the i-th audio signal; Represents the sum of the reciprocals of the distances between the above M audio signals and the rendering center point.
  • volume grading parameters can also be calculated by the following formula:
  • gain i represents the gain of the i-th audio signal in rendering, and the gain can be obtained by the user through self-definition of the i-th audio signal, or it can be generated by the decoder through a set rule; Represents the sum of the gains of the above M audio signals in rendering.
  • volume grading parameters can also be calculated by other methods, which are not specifically limited in this application.
  • the propagation grading parameter describes the propagation degree of the i-th audio signal in the current frame, and can be obtained through the spread-related metadata of the i-th audio signal. It should be noted that the propagation classification parameters can also be calculated by other methods, which are not specifically limited in this application.
  • the diffusion grading parameter describes the diffusion degree of the i-th audio signal in the current frame, and can be obtained through the diffuseness-related metadata of the i-th audio signal. It should be noted that the diffusion classification parameters can also be calculated by other methods, which are not specifically limited in this application.
  • the state grading parameter describes the division degree of the i-th audio signal in the current frame, and can be obtained through the divergence-related metadata of the i-th audio signal. It should be noted that the state grading parameters can also be calculated by other methods, which are not specifically limited in this application.
  • the ranking parameter describes the priority of the i-th audio signal in the current frame, and can be obtained through the priority-related metadata of the i-th audio signal. It should be noted that the sorting and grading parameters can also be calculated by other methods, which are not specifically limited in this application.
  • the signal grading parameter describes the energy of the first audio signal in the encoding process of the current frame, and can be obtained from the original energy of the i-th audio signal, or from the signal energy of the i-th audio signal after preprocessing. It should be noted that the signal classification parameters can also be calculated by other methods, which are not specifically limited in this application.
  • the sound field grading parameter sceneRatio i i-th audio signal may be Is a function about the one or more parameters, which can be expressed as:
  • sceneRatio i f(speedRatio i ,loudRatio i , whil)
  • the function can be linear or non-linear, which is not specifically limited in this application.
  • one or more of the aforementioned parameters of the i-th audio signal for example, motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, sort classification parameters, and For a plurality of signal classification parameters, a weighted average is performed to obtain the sound field classification parameter of the i-th audio signal.
  • sceneRatio i f(speedRatio i ,loudRatio i , whil)
  • ⁇ 1- ⁇ 4 are the weighting factors of the corresponding parameters, and the value of the weighting factor can be any value from 0-1, and the sum is 1.
  • the larger the value of the weighting factor the higher the importance and specific gravity of the corresponding parameter in the calculation of the sound field grading parameters. If it is 0, the corresponding parameter does not participate in the calculation of the sound field grading parameter, that is, the parameter corresponds to The characteristics of the audio signal are not considered to calculate the sound field classification parameters; if it is 1, it means that only the corresponding parameters are considered to participate in the calculation of the sound field classification parameters, that is, the characteristics of the audio signal corresponding to this parameter are the only way to calculate the sound field classification parameters in accordance with.
  • the value of the weighting factor may be obtained through preset settings, or may be obtained through adaptive calculation during the execution of the method of this application, which is not specifically limited in this application.
  • this parameter is used as the sound field classification parameter of the i-th audio signal.
  • one or more of the aforementioned parameters of the i-th audio signal for example, motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, sort classification parameters, and Multiple of the signal classification parameters are averaged to obtain the sound field classification parameter of the i-th audio signal.
  • the present application may adopt the following method to obtain the priority of the i-th audio signal.
  • the spatial sound field takes the rendering center as the center of the sphere, and the distance from the center The higher the priority of the near audio signal, the lower the priority of the audio signal that is farther from the center of the sphere.
  • the priority corresponding to the sound field classification parameter of the i-th audio signal may be determined as the priority of the first audio signal according to the set first corresponding relationship, and the first corresponding relationship includes multiple Correspondence between the sound field grading parameters and multiple priorities, where one or more sound field grading parameters correspond to one priority.
  • the priority level of the audio signal and the corresponding relationship between the sound field grading parameters and each priority level can be preset.
  • Table 2 shows an exemplary first correspondence between the sound field classification parameters and the priority.
  • Table 2 when the sound field classification parameter of the i-th audio signal is 0.4, the corresponding priority is 6, then the priority of the i-th audio signal is 6. When the sound field classification parameter of the i-th audio signal is 0.1, the corresponding priority is 9, then the priority of the i-th audio signal is 9 at this time. It should be noted that Table 2 is an example of the corresponding relationship between the sound field grading parameters and the priority, which does not constitute a limitation on the corresponding relationship involved in this application.
  • the sound field classification parameter of the i-th audio signal may be used as the priority of the i-th audio signal.
  • the priority may not be classified, and the sound field classification parameter of the i-th audio signal may be directly regarded as the priority.
  • the range of the sound field classification parameter of the i-th audio signal may be determined according to the set range threshold, and the priority corresponding to the range of the sound field classification parameter of the i-th audio signal may be determined as The priority of the i-th audio signal.
  • the priority level of the audio signal and the corresponding relationship between the interval of the sound field grading parameter and each priority level can be preset.
  • Table 3 shows another exemplary first correspondence between the sound field classification parameters and the priority.
  • Table 3 when the sound field classification parameter of the i-th audio signal is 0.6, the interval to which it belongs is [0.6, 0.7), and the corresponding priority is 4, then the priority of the i-th audio signal is 4 at this time.
  • the sound field classification parameter of the i-th audio signal is 0.15, the interval to which it belongs is [0.1, 0.2), and the corresponding priority is 9, then the priority of the i-th audio signal is 9 at this time.
  • Table 3 is an example of the corresponding relationship between the sound field grading parameters and the priority, which does not constitute a limitation on the corresponding relationship involved in this application.
  • Step 404 Perform bit allocation on the M audio signals according to the priority of the M audio signals.
  • This application can perform bit allocation according to the number of currently available bits and the priority of M audio signals. The higher the priority, the more the number of bits allocated for the audio signal.
  • the current number of available bits refers to the total number of bits that the codec in the current frame can use for bit allocation to M audio signals in the first audio signal set before bit allocation.
  • the proportion of the number of bits of the first audio signal can be determined according to the priority of the first audio signal.
  • the first audio signal is any one of the M audio signals.
  • the number of bits of an audio signal is calculated and multiplied to obtain the number of bits of the first audio signal.
  • One priority can correspond to one proportion of the number of bits, or multiple priorities can correspond to one proportion of the number of bits. Based on the proportion of the number of bits and the number of bits currently available, the number of bits that can be allocated for the corresponding audio signal can be calculated.
  • the priority of the first audio signal is 1, the priority of the second audio signal is 2, and the priority of the third audio signal is 3, assuming that the proportion corresponding to priority 1 is 50%, priority 2 corresponds to 30%, priority 3 corresponds to 20%, and the current number of available bits is 100, then the number of bits allocated for the first audio signal is 50, and the second audio signal The number of bits allocated is 30, and the number of bits allocated for the third audio signal is 20. It should be noted that in different audio frames, the number of bits corresponding to the priority can be adjusted adaptively, which is not specifically limited.
  • the number of bits corresponding to the priority of the first audio signal may be determined as the number of bits of the first audio signal according to the set second correspondence, and the second correspondence includes multiple priorities. Correspondence between multiple numbers of bits, where one or more priority levels correspond to one number of bits. There is a pre-established correspondence between the priority of the audio signal and the number of bits. One priority can correspond to one bit, or multiple priorities can correspond to one bit. Based on this correspondence, as long as the priority of the audio signal is acquired, the number of bits corresponding to it can be acquired. For example, if M is 3, the priority of the first audio signal is 1, the priority of the second audio signal is 2, and the priority of the third audio signal is 3. Assuming that the number of bits corresponding to priority 1 is set 50, the number of bits corresponding to priority 2 is 30, and the number of bits corresponding to priority 3 is 20.
  • the bit allocation between the audio signals can be Determined according to the absolute energy ratio between the audio signals in the encoding and decoding process; when the sound field classification parameters of the audio signal do not contain the signal classification parameters, and when the sound field classification parameters of the audio signal are large, the sound field classification difference between the audio signals is considered to be large, At this time, the bit allocation between the audio signals can be determined according to the sound field grading parameters of the audio signal; in other cases, the bit allocation of the audio signal can be determined according to the bit allocation factor of the audio signal.
  • sceneRatio i represents the sound field classification parameter of the i-th audio signal
  • bits_available represents the number of currently available bits
  • bits_object i represents the number of bits allocated for the i-th audio signal.
  • bits_object i nrgRatio i ⁇ bits_available, where ⁇ represents the upper limit of the sound field classification parameter, and nrgRatio i represents the absolute energy ratio between the i-th audio signal and other audio signals.
  • bits_object i sceneRatio i ⁇ bits_available
  • represents the lower limit of the sound field classification parameter.
  • bits_object i objRatio i ⁇ bits_available, where objRatio i represents the bit allocation factor of the i-th audio signal.
  • the present application determines the priority of the multiple audio signals according to the characteristics of the multiple audio signals included in the current frame and the related information of the audio signals in the metadata, and determines the number of bits to be allocated to each audio signal according to the priority, It can not only adapt to the characteristics of audio signals, but also match different number of coding bits for different audio signals, which improves the coding and decoding efficiency of audio signals.
  • This application determines in step 402 from the T audio signals of the current frame that M audio signals are added to the first audio signal set, and the methods of steps 403 and 404 are used for the M audio signals, and the audio signals of each audio signal are determined first. Priority, and then determine the number of bits allocated to each audio signal according to the priority of the audio signal.
  • N audio signals a simpler method can be used to determine the number of bits allocated. For example, the total number of bits available for the second audio signal set is averaged over N to obtain the number of bits for each audio signal. The total number of bits available in the signal set is equally distributed to the N audio signals in the set.
  • the second audio signal set may also adopt other methods to obtain the number of bits of each audio signal in the set, which is not specifically limited in this application.
  • this application also provides a priority fusion method based on multiple priority determination methods, that is, for the same audio signal, multiple methods can be used to obtain Its priority, then how to determine the final priority of the audio signal.
  • a priority fusion method based on multiple priority determination methods, that is, for the same audio signal, multiple methods can be used to obtain Its priority, then how to determine the final priority of the audio signal.
  • the following description takes the first audio signal as an example, and the first audio signal is any one of the foregoing M audio signals.
  • the first parameter set and the second parameter set of the first audio signal are acquired according to the first audio signal and/or metadata corresponding to the first audio signal, and the first parameter set includes the first audio signal.
  • the first parameter set includes the first audio signal.
  • the second parameter set also includes the first audio Among the above-mentioned related parameters of the signal, one or more of motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, ranking classification parameters, and signal classification parameters.
  • the first parameter set and the second parameter set may include the same parameter or may include different parameters.
  • the method of determining the sound field classification parameters of the M audio signals in the first audio signal set in step 403 may be referred to, and other methods may also be used.
  • the method used here is different from the method of calculating the first sound field grading parameters.
  • the sound field grading parameters obtained by two methods for the same audio signal in this application can be determined by the weighted average method, the direct average method, and the maximum value or minimum value method to determine the audio signal.
  • the final sound field classification parameters are not specifically limited. In this way, the diversified acquisition of the sound field grading parameters of the audio signal can be realized, and the calculation schemes under various strategies can be compatible.
  • the first priority of the first audio signal may be obtained according to the first sound field classification parameter.
  • the priority can be obtained by using the method of step 403 above, or by other methods.
  • the method used here is different from the method of calculating the first priority.
  • the priority of the first audio signal is acquired according to the first priority and the second priority.
  • the two methods for calculating the priority of the same audio signal can be obtained by using a weighted average method, an average method, and a maximum value or a minimum value method to determine the final audio signal.
  • Priority which is not specifically limited. In this way, the diversified acquisition of the priority of the audio signal can be realized, and the calculation scheme under various strategies can be compatible.
  • the present application can generate a code stream according to the number of bits of the T audio signal.
  • the code stream includes T first identifiers and T first identifiers. Two identifiers and T third identifiers.
  • T audio signals correspond to T first identifiers, T second identifiers, and T third identifiers respectively.
  • the first identifier is used to indicate the audio signal set to which the corresponding audio signal belongs.
  • the second identifier is used to indicate the priority of the corresponding audio signal
  • the third identifier is used to indicate the number of bits of the corresponding audio signal; the code stream is sent to the decoding device.
  • the decoding device After receiving the code stream, the decoding device executes the above-mentioned audio signal bit allocation method according to the T first identifiers, T second identifiers, and T third identifiers carried in the code stream, and determines the number of bits of the T audio signals.
  • the decoding device can also directly determine the audio signal set, priority, and allocated number of bits to which the T audio signals belong based on the T first identifiers, T second identifiers, and T third identifiers carried in the code stream, and then code The stream is decoded to obtain T audio signals.
  • the above-mentioned first identification, second identification and third identification are identification information added on the basis of the method embodiment shown in FIG. 4, so that the audio signal encoding and decoding end can encode or decode the audio signal based on the same method. .
  • FIG. 7 is a schematic structural diagram of an embodiment of an apparatus of this application. As shown in FIG. 7, the apparatus can be applied to the encoding device or the decoding device in the foregoing embodiment.
  • the apparatus of this embodiment may include: a processing module 701 and a transceiver module 702.
  • the processing module 701 is configured to obtain T audio signals in the current frame, where T is a positive integer; determine a first audio signal set according to the T audio signals, and the first audio signal set includes M audio signals, M is a positive integer, the T audio signals include the M audio signals, T ⁇ M; determine the priority of the M audio signals in the first audio signal set; according to the M audio signals The priority of the M audio signals is allocated to bits.
  • the processing module 701 is specifically configured to obtain the sound field classification parameters of each audio signal in the M audio signals; according to the sound field classification of each audio signal in the M audio signals The parameter determines the priority of the M audio signals.
  • the processing module 701 is specifically configured to obtain the motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, ranking classification parameters, and signal classification of the first audio signal.
  • One or more of the parameters, the first audio signal is any one of the M audio signals; according to the acquired motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, and status classification parameters
  • One or more of the sorting grading parameter and the signal grading parameter obtains the sound field grading parameter of the first audio signal; wherein the motion grading parameter is used to describe the movement of the first audio signal in the spatial sound field per unit time Fast or slow, the volume grading parameter is used to describe the volume of the first audio signal in the spatial sound field, the propagation grading parameter is used to describe the size of the propagation range of the first audio signal in the spatial sound field, and the diffusion The classification parameter is used to describe the size of the diffusion range of the first audio signal in the spatial sound field, the state classification parameter is used to describe the size of the sound source segmentation
  • the processing module 701 is specifically configured to obtain S groups of metadata in the current frame, where S is a positive integer, T ⁇ S, the S groups of metadata and the T The audio signal corresponds, and the metadata is used to describe the state of the corresponding audio signal in the spatial sound field.
  • the processing module 701 is specifically configured to obtain the metadata according to the metadata corresponding to the first audio signal, or according to the first audio signal and the metadata corresponding to the first audio signal.
  • One or more of motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, ranking classification parameters, and signal classification parameters of the first audio signal, and the first audio signal is the Any one of M audio signals; acquiring the said motion classification parameter, volume classification parameter, propagation classification parameter, diffusion classification parameter, state classification parameter, ranking classification parameter, and signal classification parameter according to one or more of the acquired The sound field classification parameter of the first audio signal; wherein the motion classification parameter is used to describe how fast the first audio signal moves per unit time in the spatial sound field, and the volume classification parameter is used to describe how the first audio signal is The size of the volume in the spatial sound field, the propagation grading parameter is used to describe the size of the propagation range of the first audio signal in the spatial sound field, and the diffusion grading parameter is used to describe the propagation range of the first audio signal
  • the processing module 701 is specifically configured to analyze the acquired motion grading parameters, volume grading parameters, propagation grading parameters, diffusion grading parameters, state grading parameters, ranking grading parameters, and signal grading parameters.
  • the sound field classification parameters are obtained by a weighted average in multiple weighted averages; or, for the acquired motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, ranking classification parameters, and signal classification parameters.
  • the sound field classification parameters are obtained by averaging; or, one of the acquired motion classification parameters, volume classification parameters, propagation classification parameters, diffusion classification parameters, state classification parameters, ranking classification parameters, and signal classification parameters is used as the Sound field classification parameters.
  • the processing module 701 is specifically configured to determine the priority corresponding to the sound field classification parameter of the first audio signal as the first audio signal according to the set first correspondence relationship
  • the first corresponding relationship includes the corresponding relationship between multiple sound field classification parameters and multiple priority levels, wherein one or more of the sound field classification parameters corresponds to one priority level, and the first audio
  • the signal is any one of the M audio signals; or, the sound field classification parameter of the first audio signal is used as the priority of the first audio signal; or, the first audio signal is determined according to a set range threshold.
  • the range of the sound field classification parameter of the audio signal is determined, and the priority corresponding to the range of the sound field classification parameter of the first audio signal is determined as the priority of the first audio signal.
  • the processing module 701 is specifically configured to perform bit allocation according to the number of currently available bits and the priority of the M audio signals. The higher the priority, the more the number of bits allocated for the audio signal. .
  • the processing module 701 is specifically configured to determine the proportion of the number of bits of the first audio signal according to the priority of the first audio signal, and the first audio signal is the M Any one of the audio signals; obtaining the number of bits of the first audio signal according to the product of the number of currently available bits and the proportion of the number of bits of the first audio signal.
  • the processing module 701 is specifically configured to determine the number of bits of the first audio signal from a set second correspondence relationship according to the priority of the first audio signal, and the first audio signal
  • the second correspondence includes the correspondence between multiple priorities and multiple bit numbers, wherein one or more of the priorities corresponds to one bit number, and the first audio signal is one of the M audio signals Any one of.
  • the processing module 701 is specifically configured to add pre-designated audio signals among the T audio signals to the first audio signal set.
  • the processing module 701 is specifically configured to add the audio signals corresponding to the S groups of metadata in the T audio signals to the first audio signal set; or, it will be greater than
  • the audio signal corresponding to the importance parameter equal to or equal to the set participation threshold is added to the first audio signal set, the metadata includes the importance parameter, and the T audio signals include the audio corresponding to the importance parameter Signal.
  • the processing module 701 is specifically configured to obtain one or more of the motion classification parameter, the volume classification parameter, the propagation classification parameter, and the diffusion classification parameter of the first audio signal, and the first audio signal
  • the audio signal is any one of the M audio signals; the first audio signal of the first audio signal is obtained according to one or more of the acquired motion classification parameters, volume classification parameters, propagation classification parameters, and diffusion classification parameters.
  • Sound field grading parameters acquiring one or more of the state grading parameters, sorting grading parameters, and signal grading parameters of the first audio signal; according to one of the acquired state grading parameters, sorting grading parameters, and signal grading parameters Or multiple acquiring second sound field classification parameters of the first audio signal; acquiring the sound field classification parameters of the first audio signal according to the first sound field classification parameters and the second sound field classification parameters; wherein, the The motion classification parameter is used to describe how fast the first audio signal moves per unit time in the spatial sound field, the volume classification parameter is used to describe the volume of the first audio signal when it is played back in the spatial sound field, and the propagation classification The parameter is used to describe the size of the propagation range of the first audio signal during playback in the spatial sound field, the diffusion classification parameter is used to describe the size of the diffusion range of the first audio signal in the spatial sound field, and the state classification parameter It is used to describe the size of the sound source division of the first audio signal in the spatial sound field, the ranking parameter is used to describe the size
  • the processing module 701 is specifically configured to obtain the metadata according to the metadata corresponding to the first audio signal, or according to the first audio signal and the metadata corresponding to the first audio signal.
  • One or more of motion classification parameters, volume classification parameters, propagation classification parameters, and diffusion classification parameters of the first audio signal, and the first audio signal is any one of the M audio signals;
  • One or more of the motion classification parameter, the volume classification parameter, the propagation classification parameter, and the diffusion classification parameter obtain the first sound field classification parameter of the first audio signal; according to the metadata corresponding to the first audio signal , Or obtain one or more of the state classification parameter, the sort classification parameter, and the signal classification parameter of the first audio signal according to the first audio signal and the metadata corresponding to the first audio signal;
  • One or more of the state classification parameter, the sort classification parameter, and the signal classification parameter obtains the second sound field classification parameter of the first audio signal; according to the first sound field classification parameter and the second sound field classification parameter Acquire the sound field grading parameter of the first audio signal; wherein the motion grading
  • the processing module 701 is specifically configured to obtain the first priority of the first audio signal according to the first sound field grading parameter; obtain the first priority of the first audio signal according to the second sound field grading parameter; The second priority of the first audio signal; the priority of the first audio signal is obtained according to the first priority and the second priority.
  • the processing module 701 is further configured to encode the M audio signals according to the number of bits allocated by the M audio signals to obtain an encoded bitstream.
  • the encoded bitstream includes the number of bits of the M audio signals.
  • a transceiving module 702 configured to receive a coded code stream; the processing module 701, further configured to obtain the number of bits of each of the M audio signals; The number of bits of each signal and the coded stream are used to reconstruct the M audio signals.
  • the device in this embodiment can be used to implement the technical solution of the method embodiment shown in FIG. 4, and its implementation principles and technical effects are similar, and will not be repeated here.
  • Fig. 8 is a schematic structural diagram of an embodiment of a device of this application.
  • the device may be an encoding device or a decoding device in the foregoing embodiment.
  • the device of this embodiment may include: a processor 801 and a memory 802, where the memory 802 is used to store one or more programs; when the one or more programs are executed by the processor 801, the processor 801 can implement The technical solution of the method embodiment is shown in FIG. 4.
  • the steps of the foregoing method embodiments may be completed by hardware integrated logic circuits in the processor or instructions in the form of software.
  • the processor can be a general-purpose processor, digital signal processor (digital signal processor, DSP), application-specific integrated circuit (ASIC), field programmable gate array (field programmable gate array, FPGA) or other Programming logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the present application can be directly embodied as being executed and completed by a hardware encoding processor, or executed and completed by a combination of hardware and software modules in the encoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware.
  • the memory mentioned in the above embodiments may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically available Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • the volatile memory may be random access memory (RAM), which is used as an external cache.
  • RAM random access memory
  • static random access memory static random access memory
  • dynamic RAM dynamic RAM
  • DRAM dynamic random access memory
  • synchronous dynamic random access memory synchronous DRAM, SDRAM
  • double data rate synchronous dynamic random access memory double data rate SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • synchronous connection dynamic random access memory serial DRAM, SLDRAM
  • direct rambus RAM direct rambus RAM
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (personal computer, server, or network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disks or optical disks and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Stereophonic System (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Stereo-Broadcasting Methods (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

一种音频信号的比特分配方法和装置,其中,音频信号的比特分配方法(400),包括:获取当前帧中的T个音频信号,T为正整数(401);根据T个音频信号确定第一音频信号集合,第一音频信号集合包括M个音频信号,M为正整数, T个音频信号包括M个音频信号,T≥M(402);确定第一音频信号集合中的M个音频信号的优先级(403);根据M个音频信号的优先级对M个音频信号进行比特分配(404)。该方法可以自适应音频信号的特征,同时针对不同音频信号匹配不同的编码比特数,提高了音频信号的编解码效率。

Description

音频信号的比特分配方法和装置
本申请要求于2020年4月30日提交中国专利局、申请号为202010368424.9、申请名称为“音频信号的比特分配方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及音频处理技术,尤其涉及一种音频信号的比特分配方法和装置。
背景技术
声音是人类获取信息的主要途径之一,随着高性能计算机和信号处理技术的飞速发展,沉浸式音频技术受到越来越多的关注。具有沉浸感的三维音频(3D audio)技术是通过将音频拓展到高维空间表示,为用户提供更佳的三维声音体验。三维音频技术在回放端不再是简单的采用多声道进行表示,而是将音频信号在三维空间中进行重构,通过渲染技术实现音频在三维空间的表示。
在国内和国际的三维音频编解码标准中,分配给各个音频信号的用于编解码的比特数,不能针对回放端音频信号的空间特性体现出其差异性,也不能自适应音频信号的特征,降低了音频信号的编解码效率。
发明内容
本申请提供一种音频信号的比特分配方法和装置,以自适应音频信号的特征,同时针对不同音频信号匹配不同的编码比特数,提高了音频信号的编解码效率。
第一方面,本申请提供一种音频信号的比特分配方法,包括:获取当前帧中的T个音频信号,T为正整数;根据所述T个音频信号确定第一音频信号集合,所述第一音频信号集合包括M个音频信号,M为正整数,所述T个音频信号包括所述M个音频信号,T≥M;确定所述第一音频信号集合中的所述M个音频信号的优先级;根据所述M个音频信号的优先级对所述M个音频信号进行比特分配。
本申请根据当前帧中包括的多个音频信号的特征及元数据中的音频信号的相关信息,确定该多个音频信号的优先级,根据该优先级确定要分配给各个音频信号的比特数,既可以自适应音频信号的特征,也可以针对不同音频信号匹配不同的编码比特数,提高了音频信号的编解码效率。
在一种可能的实现方式中,所述确定所述第一音频信号集合中的所述M个音频信号的优先级,包括:获取所述M个音频信号中每个音频信号的声场分级参数;根据所述M个音频信号中每个音频信号的声场分级参数确定所述M个音频信号的优先级。
在一种可能的实现方式中,所述获取所述M个音频信号中每个音频信号的声场分级参数,包括:获取第一音频信号的运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;根据获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或 多个获取所述第一音频信号的声场分级参数;其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中音量的大小,所述传播分级参数用于描述所述第一音频信号在空间声场中传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过程中能量的大小。
参考音频信号的多种参数,可以获取涉及多个维度信息的音频信号的优先级。
在一种可能的实现方式中,所述获取当前帧中的T个音频信号的同时,还包括:获取所述当前帧中的S组元数据,S为正整数,T≥S,所述S组元数据和所述T个音频信号对应,所述元数据用于描述对应的音频信号在空间声场中的状态。
元数据作为对应的音频信号在空间声场中的状态的描述信息,可以为后续获取以音频信号的声场分级参数提供可靠且有效的依据。
在一种可能的实现方式中,所述获取所述M个音频信号中每个音频信号的声场分级参数,包括:根据与第一音频信号对应的元数据,或者根据所述第一音频信号以及与所述第一音频信号对应的元数据获取所述第一音频信号的运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;根据获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的声场分级参数;其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中音量的大小,所述传播分级参数用于描述所述第一音频信号在空间声场中传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过程中能量的大小。
参考音频信号的多种参数以及音频信号的元数据,可以获取涉及多个维度信息的可靠的音频信号的优先级。
在一种可能的实现方式中,所述根据获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的声场分级参数,包括:对获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的多个加权平均获取所述声场分级参数;或者,对获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的多个求平均获取所述声场分级参数;或者,将获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个作为所述声场分级参数。
在一种可能的实现方式中,所述根据所述M个音频信号中每个音频信号的声场分级参数确定所述M个音频信号的优先级,包括:根据设定的第一对应关系将与第一音频信 号的声场分级参数对应的优先级确定为所述第一音频信号的优先级,所述第一对应关系包括多个声场分级参数和多个优先级之间的对应关系,其中,一个或多个所述声场分级参数对应一个所述优先级,所述第一音频信号为所述M个音频信号中的任意一个;或者,将所述第一音频信号的声场分级参数作为所述第一音频信号的优先级;或者,根据设定的多个范围阈值确定所述第一音频信号的声场分级参数的所属范围,将与所述第一音频信号的声场分级参数的所属范围对应的优先级确定为所述第一音频信号的优先级。
在一种可能的实现方式中,所述根据所述M个音频信号的优先级对所述M个音频信号进行比特分配,包括:根据当前可用比特数和所述M个音频信号的优先级进行比特分配,优先级越高的音频信号分配的比特数越多。
在一种可能的实现方式中,所述根据当前可用比特数和所述M个音频信号的优先级进行比特分配,包括:根据第一音频信号的优先级确定所述第一音频信号的比特数占比,所述第一音频信号为所述M个音频信号中的任意一个;根据所述当前可用比特数和所述第一音频信号的比特数占比的乘积获取所述第一音频信号的比特数。
在一种可能的实现方式中,所述根据当前可用比特数和所述M个音频信号的优先级进行比特分配,包括:根据第一音频信号的优先级从设定的第二对应关系中确定所述第一音频信号的比特数,所述第二对应关系包括多个优先级和多个比特数之间的对应关系,其中,一个或多个所述优先级对应一个所述比特数,所述第一音频信号为所述M个音频信号中的任意一个。
在一种可能的实现方式中,所述根据所述T个音频信号确定第一音频信号集合,包括:将所述T个音频信号中预先指定的音频信号加入所述第一音频信号集合。
在一种可能的实现方式中,所述根据所述T个音频信号确定第一音频信号集合,包括:将所述S组元数据在所述T个音频信号中对应的音频信号加入所述第一音频信号集合;或者,将大于或等于设定的参与阈值的重要度参数对应的音频信号加入所述第一音频信号集合,所述元数据包括所述重要度参数,所述T个音频信号包括所述重要度参数对应的音频信号。
在一种可能的实现方式中,所述获取所述M个音频信号中每个音频信号的声场分级参数,包括:获取第一音频信号的运动分级参数、音量分级参数、传播分级参数和扩散分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;根据获取的所述运动分级参数、音量分级参数、传播分级参数和扩散分级参数中的一个或多个获取所述第一音频信号的第一声场分级参数;获取所述第一音频信号的状态分级参数、排序分级参数和信号分级参数中的一个或多个;根据获取的所述状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的第二声场分级参数;根据所述第一声场分级参数和所述第二声场分级参数获取所述第一音频信号的声场分级参数;其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中回放时的音量大小,所述传播分级参数用于描述所述第一音频信号在空间声场中回放时的传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过 程中能量的大小。
在一种可能的实现方式中,所述获取所述M个音频信号中每个音频信号的声场分级参数,包括:根据与第一音频信号对应的元数据,或者根据所述第一音频信号以及与所述第一音频信号对应的元数据获取所述第一音频信号的运动分级参数、音量分级参数、传播分级参数和扩散分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;根据获取的所述运动分级参数、音量分级参数、传播分级参数和扩散分级参数中的一个或多个获取所述第一音频信号的第一声场分级参数;根据与所述第一音频信号对应的元数据,或者根据所述第一音频信号以及与所述第一音频信号对应的元数据获取所述第一音频信号的状态分级参数、排序分级参数和信号分级参数中的一个或多个;根据获取的所述状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的第二声场分级参数;根据所述第一声场分级参数和所述第二声场分级参数获取所述第一音频信号的声场分级参数;其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中回放时的音量大小,所述传播分级参数用于描述所述第一音频信号在空间声场中回放时的传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过程中能量的大小。
本申请针对音频信号的不同特性采用多种方法获取音频信号相关的多个声场分级参数,再根据该多个声场分级参数确定音频信号的优先级,这样获取的优先级既可以参考音频信号的多个特性,还可以兼容不同特性对应的实现方案。
在一种可能的实现方式中,所述根据所述M个音频信号中每个音频信号的声场分级参数确定所述M个音频信号的优先级,包括:根据所述第一声场分级参数获取所述第一音频信号的第一优先级;根据所述第二声场分级参数获取所述第一音频信号的第二优先级;根据所述第一优先级和所述第二优先级获取所述第一音频信号的优先级。
本申请针对音频信号的不同特性采用多种方法获取音频信号相关的多个优先级,再对该多个优先级进行兼容合并获取音频信号最终的优先级,这样获取的优先级既可以参考音频信号的多个特性,还可以兼容不同特性对应的实现方案。
第二方面,本申请提供一种音频信号的编码方法,执行完上述第一方面中任一项所述的音频信号的比特分配方法之后,还包括:根据所述M个音频信号所分配的比特数对所述M个音频信号进行编码以获取编码码流。
在一种可能的实现方式中,所述编码码流包括所述M个音频信号的比特数。
第三方面,本申请提供一种音频信号的解码方法,执行完上述第一方面中任一项所述的音频信号的比特分配方法之后,还包括:接收编码码流;执行如上述第一方面中任一项所述的音频信号的比特分配方法获取所述M个音频信号各自的比特数;根据所述M个音频信号各自的比特数以及所述编码码流重建所述M个音频信号。
第四方面,本申请提供一种音频信号的比特分配装置,包括:处理模块,用于获取当前帧中的T个音频信号,T为正整数;根据所述T个音频信号确定第一音频信号集合,所述第一音频信号集合包括M个音频信号,M为正整数,所述T个音频信号包括所述M个 音频信号,T≥M;确定所述第一音频信号集合中的所述M个音频信号的优先级;根据所述M个音频信号的优先级对所述M个音频信号进行比特分配。
在一种可能的实现方式中,所述处理模块,具体用于获取所述M个音频信号中每个音频信号的声场分级参数;根据所述M个音频信号中每个音频信号的声场分级参数确定所述M个音频信号的优先级。
在一种可能的实现方式中,所述处理模块,具体用于获取第一音频信号的运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;根据获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的声场分级参数;其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中音量的大小,所述传播分级参数用于描述所述第一音频信号在空间声场中传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过程中能量的大小。
在一种可能的实现方式中,所述处理模块,具体用于获取所述当前帧中的S组元数据,S为正整数,T≥S,所述S组元数据和所述T个音频信号对应,所述元数据用于描述对应的音频信号在空间声场中的状态。
在一种可能的实现方式中,所述处理模块,具体用于根据与第一音频信号对应的元数据,或者根据所述第一音频信号以及与所述第一音频信号对应的元数据获取所述第一音频信号的运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;根据获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的声场分级参数;其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中音量的大小,所述传播分级参数用于描述所述第一音频信号在空间声场中传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过程中能量的大小。
在一种可能的实现方式中,所述处理模块,具体用于对获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的多个加权平均获取所述声场分级参数;或者,对获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的多个求平均获取所述声场分级参数;或者,将获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个作 为所述声场分级参数。
在一种可能的实现方式中,所述处理模块,具体用于根据设定的第一对应关系将与所述第一音频信号的声场分级参数对应的优先级确定为所述第一音频信号的优先级,所述第一对应关系包括多个声场分级参数和多个优先级之间的对应关系,其中,一个或多个所述声场分级参数对应一个所述优先级,所述第一音频信号为所述M个音频信号中的任意一个;或者,将所述第一音频信号的声场分级参数作为所述第一音频信号的优先级;或者,根据设定的多个范围阈值确定所述第一音频信号的声场分级参数的所属范围,将与所述第一音频信号的声场分级参数的所属范围对应的优先级确定为所述第一音频信号的优先级。
在一种可能的实现方式中,所述处理模块,具体用于根据当前可用比特数和所述M个音频信号的优先级进行比特分配,优先级越高的音频信号分配的比特数越多。
在一种可能的实现方式中,所述处理模块,具体用于根据第一音频信号的优先级确定所述第一音频信号的比特数占比,所述第一音频信号为所述M个音频信号中的任意一个;根据所述当前可用比特数和所述第一音频信号的比特数占比的乘积获取所述第一音频信号的比特数。
在一种可能的实现方式中,所述处理模块,具体用于根据第一音频信号的优先级从设定的第二对应关系中确定所述第一音频信号的比特数,所述第二对应关系包括多个优先级和多个比特数之间的对应关系,其中,一个或多个所述优先级对应一个所述比特数,所述第一音频信号为所述M个音频信号中的任意一个。
在一种可能的实现方式中,所述处理模块,具体用于将所述T个音频信号中预先指定的音频信号加入所述第一音频信号集合。
在一种可能的实现方式中,所述处理模块,具体用于将所述S组元数据在所述T个音频信号中对应的音频信号加入所述第一音频信号集合;或者,将大于或等于设定的参与阈值的重要度参数对应的音频信号加入所述第一音频信号集合,所述元数据包括所述重要度参数,所述T个音频信号包括所述重要度参数对应的音频信号。
在一种可能的实现方式中,所述处理模块,具体用于获取第一音频信的运动分级参数、音量分级参数、传播分级参数和扩散分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;根据获取的所述运动分级参数、音量分级参数、传播分级参数和扩散分级参数中的一个或多个获取所述第一音频信号的第一声场分级参数;获取所述第一音频信号的状态分级参数、排序分级参数和信号分级参数中的一个或多个;根据获取的所述状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的第二声场分级参数;根据所述第一声场分级参数和所述第二声场分级参数获取所述第一音频信号的声场分级参数;其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中回放时的音量大小,所述传播分级参数用于描述所述第一音频信号在空间声场中回放时的传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过程中能量的大小。
在一种可能的实现方式中,所述处理模块,具体用于根据与第一音频信号对应的元数 据,或者根据所述第一音频信号以及与所述第一音频信号对应的元数据获取所述第一音频信号的运动分级参数、音量分级参数、传播分级参数和扩散分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;根据获取的所述运动分级参数、音量分级参数、传播分级参数和扩散分级参数中的一个或多个获取所述第一音频信号的第一声场分级参数;根据与所述第一音频信号对应的元数据,或者根据所述第一音频信号以及与所述第一音频信号对应的元数据获取所述第一音频信号的状态分级参数、排序分级参数和信号分级参数中的一个或多个;根据获取的所述状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的第二声场分级参数;根据所述第一声场分级参数和所述第二声场分级参数获取所述第一音频信号的声场分级参数;其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中回放时的音量大小,所述传播分级参数用于描述所述第一音频信号在空间声场中回放时的传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过程中能量的大小。
在一种可能的实现方式中,所述处理模块,具体用于根据所述第一声场分级参数获取所述第一音频信号的第一优先级;根据所述第二声场分级参数获取所述第一音频信号的第二优先级;根据所述第一优先级和所述第二优先级获取所述第一音频信号的优先级。
在一种可能的实现方式中,所述处理模块,还用于根据所述M个音频信号所分配的比特数对所述M个音频信号进行编码以获取编码码流。
在一种可能的实现方式中,所述编码码流包括所述M个音频信号的比特数。
在一种可能的实现方式中,还包括:收发模块,用于接收编码码流;所述处理模块,还用于获取所述M个音频信号各自的比特数;根据所述M个音频信号各自的比特数以及所述编码码流重建所述M个音频信号。
第五方面,本申请提供一种设备,包括:一个或多个处理器;存储器,用于存储一个或多个程序;当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如上述第一至三方面中任一项所述的方法。
第六方面,本申请提供一种计算机可读存储介质,其特征在于,包括计算机程序,所述计算机程序在计算机上被执行时,使得所述计算机执行上述第一至三方面中任一项所述的方法。
第七方面,本申请提供一种计算机可读存储介质,包括根据如上述第二方面所述的方法获取的编码码流。
第八方面,本申请提供一种编码装置,包括处理器和通信接口,所述处理器通过所述通信接口读取存储计算机程序,所述计算机程序包括程序指令,所述处理器用于调用所述程序指令,执行如上述第一至三方面中任一项所述的方法。
第九方面,本申请提供一种编码装置,其特征在于,包括处理器和存储器,所述处理器用于执行上述第二方面所述的方法,所述存储器用于存放所述编码码流。
附图说明
图1A示例性地给出了本申请所应用的音频编码及解码系统10的示意性框图;
图1B是根据一示例性实施例的音频译码系统40的实例的说明图;
图2是本申请提供的音频译码设备200的结构示意图;
图3是根据一示例性实施例的装置300的简化框图;
图4是用于实现本申请的一种音频信号的比特分配方法的流程示意图;
图5是音频信号的位置在空间声场中的一个示例性的示意图;
图6是音频信号的优先级在空间声场中的一个示例性的示意图;
图7为本申请装置实施例的结构示意图;
图8为本申请设备实施例的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请中的附图,对本申请中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获取的所有其他实施例,都属于本申请保护的范围。
本申请的说明书实施例和权利要求书及附图中的术语“第一”、“第二”等仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元。方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。
本申请涉及到的相关名词解释:
音频帧:音频数据是流式的,在实际应用中,为了便于音频处理和传输,通常取一时长内的音频数据量作为一帧音频,该时长被称为“采样时间”,可以根据编解码器和具体应用的需求确定其值,例如该时长为2.5ms~60ms,ms为毫秒。
音频信号:音频信号是带有语音、音乐和音效的有规律的声波的频率、幅度变化信息载体,。音频是一种连续变化的模拟信号,可用一条连续的曲线来表示,称为声波。音频通过模数转换或计算机生成的数字信号即为音频信号。声波有三个重要参数:频率、幅度和相位,这也就决定了音频信号的特征。
元数据:元数据(Metadata),又称中介数据、中继数据,是描述数据的数据(data about data),主要用于描述数据属性(property),支持例如指示存储位置、历史数据、资源查找、文件记录等功能。元数据是关于数据的组织、数据域及其关系的信息,简言之,元数 据就是关于数据的数据。本申请中元数据用于描述对应的音频信号在空间声场中的状态。三维音频:
以下是本申请所应用的系统架构。
图1A示例性地给出了本申请所应用的音频编码及解码系统10的示意性框图。如图1A所示,音频编码及解码系统10可包括源设备12和目的设备14,源设备12产生经编码的音频数据,因此,源设备12可被称为音频编码装置。目的设备14可对由源设备12所产生的经编码的音频数据进行解码,因此,目的设备14可被称为音频解码装置。源设备12、目的设备14或两个的各种实施方案可包含一或多个处理器以及耦合到一或多个处理器的存储器。所述存储器可包含但不限于随机存取存储器(random access memory,RAM)、只读存储器(read-only memory,ROM)、快闪存储器或可用于以可由计算机存取的指令或数据结构的形式存储所要的程序代码的任何其它媒体。源设备12和目的设备14可以包括各种装置,包含桌上型计算机、移动计算装置、笔记型(例如,膝上型)计算机、平板计算机、机顶盒、例如所谓的“智能”电话等电话手持机、电视机、相机、显示装置、数字媒体播放器、音频游戏控制台、车载计算机、无线通信设备或其类似者。
虽然图1A将源设备12和目的设备14绘示为单独的设备,但设备实施例也可以同时包括源设备12和目的设备14或同时包括两者的功能性,即源设备12或对应的功能性以及目的设备14或对应的功能性。在此类实施例中,可以使用相同硬件和/或软件,或使用单独的硬件和/或软件,或其任何组合来实施源设备12或对应的功能性以及目的设备14或对应的功能性。
源设备12和目的设备14之间可通过链路13进行通信连接,目的设备14可经由链路13从源设备12接收经编码的音频数据。链路13可包括能够将经编码的音频数据从源设备12移动到目的设备14的一或多个媒体或装置。在一个实例中,链路13可包括使得源设备12能够实时将经编码的音频数据直接发射到目的设备14的一或多个通信媒体。在此实例中,源设备12可根据通信标准(例如无线通信协议)来调制经编码的音频数据,且可将经调制的音频数据发射到目的设备14。所述一或多个通信媒体可包含无线和/或有线通信媒体,例如射频(RF)频谱或一或多个物理传输线。所述一或多个通信媒体可形成基于分组的网络的一部分,基于分组的网络例如为局域网、广域网或全球网络(例如,因特网)。所述一或多个通信媒体可包含路由器、交换器、基站或促进从源设备12到目的设备14的通信的其它设备。
源设备12包括编码器20,另外可选地,源设备12还可以包括音频源16、音频预处理器18、以及通信接口22。具体实现形态中,所述编码器20、音频源16、音频预处理器18、以及通信接口22可能是源设备12中的硬件部件,也可能是源设备12中的软件程序。分别描述如下:
音频源16,可以包括或可以为任何类别的音频捕获设备,用于例如捕获现实世界声音,和/或任何类别的音频生成设备,例如,计算机音频处理器,或用于获取和/或提供现实世界音频、计算机动画音频(例如,屏幕内容、虚拟现实(virtual reality,VR)中的音频)的任何类别设备,和/或其任何组合(例如,增强现实(augmented reality,AR)中的音频)。音频源16可以为用于捕获音频的麦克风或者用于存储音频的存储器,音频源16还可以包括存储先前捕获或产生的音频和/或获取或接收音频的任何类别的(内部或外部) 接口。当音频源16为麦克风时,音频源16可例如为本地的或集成在源设备中的音频采集装置;当音频源16为存储器时,音频源16可为本地的或例如集成在源设备中的集成存储器。当所述音频源16包括接口时,接口可例如为从外部音频源接收音频的外部接口,外部音频源例如为外部音频捕获设备,比如话筒、麦克风、外部存储器或外部音频生成设备,外部音频生成设备例如为外部计算机音频处理器、计算机或服务器。接口可以为根据任何专有或标准化接口协议的任何类别的接口,例如有线或无线接口、光接口。
其中,音频可以视为像素点(picture element)的一维向量。向量中的像素点也可以称为采样点。向量或音频上的采样点数目定义音频的大小。本申请中,由音频源16传输至音频处理器的音频也可称为原始音频数据17。
音频预处理器18,用于接收原始音频数据17并对原始音频数据17执行预处理,以获取经预处理的音频19或经预处理的音频数据19。例如,音频预处理器18执行的预处理可以包括整修、调色或去噪。
编码器20(或称音频编码器20),用于接收经预处理的音频数据19,对经预处理的音频数据19进行处理,从而提供经编码的音频数据21。在一些实施例中,编码器20可以用于执行下文所描述的各个实施例,以实现本申请所描述的音频信号的比特分配方法在编码侧的应用。
通信接口22,可用于接收经编码的音频数据21,并可通过链路13将经编码的音频数据21传输至目的设备14或任何其它设备(如存储器),以用于存储或直接重构,所述其它设备可为任何用于解码或存储的设备。通信接口22可例如用于将经编码的音频数据21封装成合适的格式,例如数据包,以在链路13上传输。
目的设备14包括解码器30,另外可选地,目的设备14还可以包括通信接口28、音频后处理器32和播放设备34。分别描述如下:
通信接口28,可用于从源设备12或任何其它源接收经编码的音频数据21,所述任何其它源例如为存储设备,存储设备例如为经编码的音频数据存储设备。通信接口28可以用于藉由源设备12和目的设备14之间的链路13或藉由任何类别的网络传输或接收经编码的音频数据21,链路13例如为直接有线或无线连接,任何类别的网络例如为有线或无线网络或其任何组合,或任何类别的私网和公网,或其任何组合。通信接口28可以例如用于解封装通信接口22所传输的数据包以获取经编码的音频数据21。
通信接口28和通信接口22都可以配置为单向通信接口或者双向通信接口,以及可以用于例如发送和接收消息来建立连接、确认和交换任何其它与通信链路和/或例如经编码的音频数据传输的数据传输有关的信息。
解码器30(或称为解码器30),用于接收经编码的音频数据21并提供经解码的音频数据31或经解码的音频31。在一些实施例中,解码器30可以用于执行下文所描述的各个实施例,以实现本申请所描述的音频信号的比特分配方法在解码侧的应用。
音频后处理器32,用于对经解码的音频数据31(也称为经重构音频数据)执行后处理,以获取经后处理的音频数据33。音频后处理器32执行的后处理可以包括:整修或重采样,或任何其它处理,还可用于将将经后处理的音频数据33传输至播放设备34。
播放设备34,用于接收经后处理的音频数据33以向例如用户或收听者播放音频。播放设备34可以为或可以包括任何类别的用于呈现经重构音频的播放器器,例如,集成的 或外部的喇叭器或扬声器。
虽然,图1A将源设备12和目的设备14绘示为单独的设备,但设备实施例也可以同时包括源设备12和目的设备14或同时包括两者的功能性,即源设备12或对应的功能性以及目的设备14或对应的功能性。在此类实施例中,可以使用相同硬件和/或软件,或使用单独的硬件和/或软件,或其任何组合来实施源设备12或对应的功能性以及目的设备14或对应的功能性。
本领域技术人员基于描述明显可知,不同单元的功能性或图1A所示的源设备12和/或目的设备14的功能性的存在和(准确)划分可能根据实际设备和应用有所不同。源设备12和目的设备14可以包括各种设备中的任一个,包含任何类别的手持或静止设备,例如,笔记本或膝上型计算机、移动电话、智能手机、平板或平板计算机、摄像机、台式计算机、机顶盒、电视机、相机、车载设备、播放设备、数字媒体播放器、游戏控制台、媒体流式传输设备(例如内容服务服务器或内容分发服务器)、广播接收器设备、广播发射器设备等,并可以不使用或使用任何类别的操作系统。
编码器20和解码器30都可以实施为各种合适电路中的任一个,例如,一个或多个微处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)、离散逻辑、硬件或其任何组合。如果部分地以软件实施所述技术,则设备可将软件的指令存储于合适的非暂时性计算机可读存储介质中,且可使用一或多个处理器以硬件执行指令从而执行本公开的技术。前述内容(包含硬件、软件、硬件与软件的组合等)中的任一者可视为一或多个处理器。
在一些情况下,图1A中所示音频编码及解码系统10仅为示例,本申请的技术可以适用于不必包含编码和解码设备之间的任何数据通信的音频编码设置(例如,音频编码或音频解码)。在其它实例中,数据可从本地存储器检索、在网络上流式传输等。音频编码设备可以对数据进行编码并且将数据存储到存储器,和/或音频解码设备可以从存储器检索数据并且对数据进行解码。在一些实例中,由并不彼此通信而是仅编码数据到存储器和/或从存储器检索数据且解码数据的设备执行编码和解码。
图1B是根据一示例性实施例的音频译码系统40的实例的说明图。音频译码系统40可以实现本申请的各种技术的组合。在所说明的实施方式中,音频译码系统40可以包含麦克风41、编码器20、解码器30(和/或藉由处理单元46的逻辑电路47实施的音频编/解码器)、天线42、一个或多个处理器43、一个或多个存储器44和/或播放设备45。
如图1B所示,麦克风41、天线42、处理单元46、逻辑电路47、编码器20、解码器30、处理器43、存储器44和/或播放设备45能够互相通信。如所论述,虽然用编码器20和解码器30绘示音频译码系统40,但在不同实例中,音频译码系统40可以只包含编码器20或只包含解码器30。
在一些实例中,天线42可以用于传输或接收音频数据的经编码码流。另外,在一些实例中,播放设备45可以用于播放音频数据。在一些实例中,逻辑电路47可以通过处理单元46实施。处理单元46可以包含专用集成电路(application-specific integrated circuit,ASIC)逻辑、图形处理器、通用处理器等。音频译码系统40也可以包含可选的处理器43,该可选处理器43类似地可以包含专用集成电路(application-specific integrated circuit, ASIC)逻辑、通用处理器等。在一些实例中,逻辑电路47可以通过硬件实施,如音频编码专用硬件等,处理器43可以通过通用软件、操作系统等实施。另外,存储器44可以是任何类型的存储器,例如易失性存储器(例如,静态随机存取存储器(Static Random Access Memory,SRAM)、动态随机存储器(Dynamic Random Access Memory,DRAM)等)或非易失性存储器(例如,闪存等)等。在非限制性实例中,存储器44可以由超速缓存内存实施。在一些实例中,逻辑电路47可以访问存储器44。在其它实例中,逻辑电路47和/或处理单元46可以包含存储器(例如,缓存等)用于实施缓冲器等。
在一些实例中,通过逻辑电路实施的编码器20可以包含(例如,通过处理单元46或存储器44实施的)缓冲器和(例如,通过处理单元46实施的)音频处理单元。音频处理单元可以通信耦合至缓冲器。音频处理单元可以包含通过逻辑电路47实施的编码器20,以实施本文中所描述的任何其它编码器系统或子系统所论述的各种模块。逻辑电路可以用于执行本文所论述的各种操作。
在一些实例中,解码器30可以以类似方式通过逻辑电路47实施,以实施本文中所描述的任何其它解码器系统或子系统所论述的各种模块。在一些实例中,逻辑电路实施的解码器30可以包含(通过处理单元2820或存储器44实施的)缓冲器和(例如,通过处理单元46实施的)音频处理单元。音频处理单元可以通信耦合至缓冲器。音频处理单元可以包含通过逻辑电路47实施的解码器30,以实施本文中所描述的任何其它解码器系统或子系统所论述的各种模块。
在一些实例中,天线42可以用于接收音频数据的经编码码流。如所论述,经编码码流可以包含本文所论述的与音频帧相关的音频信号数据、元数据等。音频译码系统40还可包含耦合至天线42并用于解码经编码码流的解码器30。播放设备45用于播放音频帧。
应理解,本申请中对于参考编码器20所描述的实例,解码器30可以用于执行相反过程。关于元数据,解码器30可以用于接收并解析这种元数据,相应地解码相关音频数据。在一些例子中,编码器20可以将元数据熵编码成经编码音频码流。在此类实例中,解码器30可以解析这种元数据,并相应地解码相关音频数据。
图2是本申请提供的音频译码设备200(例如音频编码设备或音频解码设备)的结构示意图。音频译码设备200适于实施本申请所描述的实施例。在一个实施例中,音频译码设备200可以是音频解码器(例如图1A的解码器30)或音频编码器(例如图1A的编码器20)。在另一个实施例中,音频译码设备200可以是上述图1A的解码器30或图1A的编码器20中的一个或多个组件。
音频译码设备200包括:用于接收数据的入口端口210和接收单元(Rx)220,用于处理数据的处理器、逻辑单元或中央处理器(CPU)230,用于传输数据的发射器单元(Tx)240和出口端口250,以及,用于存储数据的存储器260。音频译码设备200还可以包括与入口端口210、接收器单元220、发射器单元240和出口端口250耦合的光电转换组件和电光(EO)组件,用于光信号或电信号的出口或入口。
处理器230通过硬件和软件实现。处理器230可以实现为一个或多个CPU芯片、核(例如,多核处理器)、FPGA、ASIC和DSP。处理器230与入口端口210、接收器单元220、发射器单元240、出口端口250和存储器260通信。处理器230包括译码模块270(例如编码模块270或解码模块270)。编码/解码模块270实现本文中所公开的实施例, 以实现本申请所提供的音频信号的比特分配方法。例如,编码/解码模块270实现、处理或提供各种编码操作。因此,通过编码/解码模块270为音频译码设备200的功能提供了实质性的改进,并影响了音频译码设备200到不同状态的转换。或者,以存储在存储器260中并由处理器230执行的指令来实现编码/解码模块270。
存储器260包括一个或多个磁盘、磁带机和固态硬盘,可以用作溢出数据存储设备,用于在选择性地执行这些程序时存储程序,并存储在程序执行过程中读取的指令和数据。存储器260可以是易失性和/或非易失性的,可以是只读存储器(ROM)、随机存取存储器(RAM)、随机存取存储器(ternary content-addressable memory,TCAM)和/或静态随机存取存储器(SRAM)。
图3是根据一示例性实施例的装置300的简化框图。装置300可以实现本申请的技术。换言之,图3为本申请的编码设备或解码设备(简称为译码设备300)的一种实现方式的示意性框图。其中,装置300可以包括处理器310、存储器330和总线系统350。其中,处理器和存储器通过总线系统相连,该存储器用于存储指令,该处理器用于执行该存储器存储的指令。译码设备的存储器存储程序代码,且处理器可以调用存储器中存储的程序代码执行本申请描述的方法。为避免重复,这里不再详细描述。
在本申请中,该处理器310可以是中央处理单元(Central Processing Unit,简称为“CPU”),该处理器310还可以是其他通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
该存储器330可以包括只读存储器(ROM)设备或者随机存取存储器(RAM)设备。任何其他适宜类型的存储设备也可以用作存储器330。存储器330可以包括由处理器310使用总线350访问的代码和数据331。存储器330可以进一步包括操作系统333和应用程序335。
该总线系统350除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线系统350。
可选的,译码设备300还可以包括一个或多个输出设备,诸如扬声器370。在一个示例中,扬声器370可以是耳机或外放。扬声器370可以经由总线350连接到处理器310。
基于上述实施例的描述,本申请提供了一种音频信号的比特分配方法。图4是用于实现本申请的一种音频信号的比特分配方法的流程示意图。该过程400可由源设备12或者目的设备14执行。过程400描述为一系列的步骤或操作,应当理解的是,过程400可以以各种顺序执行和/或同时发生,不限于图4所示的执行顺序。如图4所示,该方法包括:
步骤401、获取当前帧中的T个音频信号。
T为正整数。当前帧是本申请的方法执行过程时,在当前时刻获取到的音频帧。为了营造具有沉浸感的立体声音效果,三维音频技术不再是简单的采用多声道进行表示,而是将不同的声音采用不同的音频信号的表示,例如,环境中包括人的声音、音乐的声音、汽车的声音等,分别用三个音频信号表示人的声音、音乐的声音和汽车的声音,然后在三维空间中根据这三个音频信号对各个声音进行重构,实现多种声音在三维空间的表示。即音频帧中可能包含了多个音频信号,一个音频信号代表现实中的一种语音、音乐或音效。需要说明的是,任何从音频帧中提取音频信号的技术均可以用于本申请,对此不作具体限定。
在一种可能的实现方式中,获取当前帧中的S组元数据,该S组元数据和上述T个音频信号对应。例如,T个音频信号中的每个音频信号对应一组元数据,此时S=T。又例如,T个音频信号中只有部分音频信号存在对应元数据,此时T>S。对此不作具体限定。
本申请中,在编码端基于对原始语音、音乐或音效等的预先处理,音频数据和元数据在该过程中已分别生成,编码端可以根据音频帧的原理,对应于当前帧的起始时间(采样点)和终止时间(采样点),取对应时间范围内的元数据作为当前帧的元数据。在解码端可以从接收到的码流中解析获取当前帧的元数据。
本申请采用元数据描述音频信号在空间声场中的状态。示例性的,表1示出了一个元数据示例,该元数据包括的参数有对象索引(object_index)、方位角(position_azimuth)、俯仰角(position_elevation)、位置半径(position_radius)、增益因子(gain_factor)、统一传播度(spread_uniform)、传播宽度(spread_width)、传播高度(spread_height)、传播深度(spread_depth)、扩散度(diffuseness)、重要度(priority)、分割度(divergence)和速度(speed),元数据中记录了上述参数的取值范围和比特数。需要说明的是,元数据还可以包括其他参数及参数的记录形式,本申请对此不作具体限定。
表1
元数据 取值范围(精度) 比特数
object_index 1;128(1) 7
position_azimuth -180;180(2) 8
position_elevation -90;90(5) 6
position_radius 0.5;16(non-linear) 4
gain_factor 0.004;5.957(non-linear) 7
spread_uniform 0;180 7
spread_width 0;180 7
spread_height 0;90 5
spread_depth 0;15.5 4
diffuseness 0;1 7
priority 0;7 3
divergence 0;1 8
speed 0,1 4
步骤402、根据T个音频信号确定第一音频信号集合。
该第一音频信号集合包括M个音频信号,M为正整数,T个音频信号包括M个音频信号,T≥M。本申请中可以将T个音频信号中有对应的元数据的音频信号加入第一音频信号集合。即如果上述T个音频信号均对应元数据,则可以将T个音频信号全部加入第一音频信号集合中,如果上述T个音频信号中只有部分音频信号对应元数据,则只需将这部分音频信号加入第一音频信号集合。本申请还可以将T个音频信号中预先指定的音频信号加入第一音频信号集合。通过高层信令或用户指定的方式,可以将上述T个音频信号中的部分或全部音频信号加入第一音频信号集合。可选的,高层信令直接配置要加入第一音频信 号集合的音频信号的索引。或者,用户指定语音、音乐或音效,将指定对象的音频信号加入第一音频信号集合。本申请还可以参考元数据中记录的音频信号的重要度参数,该重要度参数用于表示对应音频信号在三维音频中的重要性,当重要度参数大于或等于设定的参与阈值时,在上述T个音频信号中将重要度参数对应的音频信号加入第一音频信号集合。
需要说明的是,上述提供了几种对当前帧中的T个音频信号进行归类处理(即将T个音频信号中的全部或部分音频信号加入第一音频信号集合)的方法,应当理解,其并不能成为本申请的全部限定,还可以采用其他方法,包括参考高层信令的其他指定方式、元数据中的其他参数等,均可用于本申请。
步骤403、确定第一音频信号集合中的M个音频信号的优先级。
本申请可以先获取M个音频信号中每个音频信号的声场分级参数,然后根据M个音频信号中每个音频信号的声场分级参数确定M个音频信号的优先级。
声场分级参数可以是根据音频信号的相关参数获取的音频信号的重要性指标,该相关参数可以包括运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个,这些参数中可以根据音频信号自身的信号特征获取,也可以根据音频信号的元数据获取。其中,运动分级参数用于描述第一音频信号在空间声场中单位时间内移动快慢,音量分级参数用于描述第一音频信号在空间声场中回放时的音量大小,传播分级参数用于描述第一音频信号在空间声场中回放时的传播范围的大小,扩散分级参数用于描述第一音频信号在空间声场中扩散范围的大小,状态分级参数用于描述第一音频信号在空间声场中声源分割的大小,排序分级参数用于描述第一音频信号在空间声场中优先排序的大小,信号分级参数用于描述第一音频信号编码过程中能量的大小。
以下以第i个音频信号为例,对上述参数的获取方法进行说明,第i个音频信号是上述M个音频信号中的任意一个。需要说明的是,以下几种参数是示例性的说明,还可以采用音频信号的其他参数或特性计算声场分级参数,本申请对此不作具体限定。
(1)运动分级参数
可以通过以下公式计算运动分级参数:
Figure PCTCN2021084578-appb-000001
其中,speedRatio i表示第i个音频信号的运动分级参数;f(d i)表示第i个音频信号在空间声场的运动状态与元数据之间的映射关系;d i表示第i个音频信号在单位时间内移动的距离,
Figure PCTCN2021084578-appb-000002
θ i表示第i个音频信号移动后相较于渲染中心点的方位角,
Figure PCTCN2021084578-appb-000003
表示第i个音频信号移动后相较于渲染中心点的俯仰角,r i表示第i个音频信号移动后相较于渲染中心点的的距离,θ 0表示第i个音频信号移动前相较于渲染中心点的方位角,
Figure PCTCN2021084578-appb-000004
表示第i个音频信号移动前相较于渲染中心点的俯仰角,r 0表示第i个音频信号移动前相较于渲染中心点的的距离。如图5所示,假设以球坐标表示三维音频在空间场中的位置,球心作为渲染中心点,球体的半径是第i个 音频信号在空间场中的位置与球心的距离,第i个音频信号在空间场中的位置与水平面之间的夹角为第i个音频信号的俯仰角,第i个音频信号在空间场中的位置在水平面上的投影与渲染中心点的正前方的夹角为第i个音频信号的方位角;
Figure PCTCN2021084578-appb-000005
表示上述M个音频信号分别在空间声场的运动状态与元数据之间的映射关系之和。
或者,还可以通过以下公式计算运动分级参数:
Figure PCTCN2021084578-appb-000006
其中,
Figure PCTCN2021084578-appb-000007
表示上述M个音频信号分别在单位时间内移动的距离之和。
需要说明的是,运动分级参数还可以采用其他方法计算,本申请对此不作具体限定。
(2)音量分级参数
可以通过以下公式计算音量分级参数:
Figure PCTCN2021084578-appb-000008
其中,loudRatio i表示第i个音频信号的音量分级参数;f(A i,gain i,r i)表示第i个音频信号在空间声场的回放音量与信号特征和元数据之间的映射关系;A i表示第i个音频信号的在当前帧中的各个采样点的幅度之和或平均值,采样点的幅度可以通过第i个音频信号的元数据获取;gain i表示音频信号在当前帧中增益值,可以通过第i个音频信号的元数据获取;r i表示第i个音频信号在当前帧中距离渲染中心点的距离,可以通过第i个音频信号的元数据获取;
Figure PCTCN2021084578-appb-000009
表示上述M个音频信号在空间声场的回放音量与信号特征和元数据之间的映射关系之和。
或者,还可以通过以下公式计算音量分级参数:
Figure PCTCN2021084578-appb-000010
其中,mean(A i)表示第i个音频信号的在当前帧中的各个采样点的幅度之和或平均值,采样点的幅度可以通过第i个音频信号的元数据获取;
Figure PCTCN2021084578-appb-000011
表示上述M个音频信号分别在当前帧中的各个采样点的幅度之和或平均值之和。
或者,还可以通过以下公式计算音量分级参数:
Figure PCTCN2021084578-appb-000012
其中,r i表示第i个音频信号与渲染中心点之间的距离,可以通过第i个音频信号的元数据获取;
Figure PCTCN2021084578-appb-000013
表示上述M个音频信号分别与渲染中心点之间的距离的倒数之和。
或者,还可以通过以下公式计算音量分级参数:
Figure PCTCN2021084578-appb-000014
其中,gain i表示第i个音频信号在渲染中的增益,该增益可以由用户通过对第i个音频信号的自定义获取,也可以由译码器通过设定的规则生成;
Figure PCTCN2021084578-appb-000015
表示上述M个音频信号分别在渲染中的增益之和。
需要说明的是,音量分级参数还可以采用其他方法计算,本申请对此不作具体限定。
(3)传播分级参数
传播分级参数描述了第i个音频信号在当前帧中的传播度,可以通过第i个音频信号的spread相关元数据获取。需要说明的是,传播分级参数还可以采用其他方法计算,本申请对此不作具体限定。
(4)扩散分级参数
扩散分级参数描述了第i个音频信号在当前帧中的扩散度,可以通过第i个音频信号的diffuseness相关元数据获取。需要说明的是,扩散分级参数还可以采用其他方法计算,本申请对此不作具体限定。
(5)状态分级参数
状态分级参数描述了第i个音频信号在当前帧中的分割度,可以通过第i个音频信号的divergence相关元数据获取。需要说明的是,状态分级参数还可以采用其他方法计算,本申请对此不作具体限定。
(6)排序分级参数
排序分级参数描述了第i个音频信号在当前帧中的优先排序度,可以通过第i个音频信号的priority相关元数据获取。需要说明的是,排序分级参数还可以采用其他方法计算,本申请对此不作具体限定。
(7)信号分级参数
信号分级参数描述了第一音频信号在当前帧编码过程中的能量,可以通过第i个音频信号的原始能量获取,也可以通过第i个音频信号经过预处理后的信号能量获取。需要说明的是,信号分级参数还可以采用其他方法计算,本申请对此不作具体限定。
获取到第i个音频信号的上述一个或多个参数后,可以基于该一个或多个参数计算第i个音频信号的声场分级参数sceneRatio i,即第i个音频信号的声场分级参数sceneRatio i可以是关于该一个或多个参数的函数,可以表示为:
sceneRatio i=f(speedRatio i,loudRatio i,......)
该函数可以是线性的,也可以是非线性的,本申请对此不作具体限定。
在一种可能的实现方式中,可以对第i个音频信号的上述一个或多个参数,例如,运 动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的多个,进行加权平均获取第i个音频信号的声场分级参数。即
sceneRatio i=f(speedRatio i,loudRatio i,......)
=α1×speedRatio i+α2×loudRatio i+......
其中,α1-α4分别是对应参数的权重因子,该权重因子的值可以为从0-1的任意值,其总和为1。权重因子的值越大,表示其所对应的参数在计算声场分级参数时的重要性、比重越高,如果为0表示其所对应的参数不参与声场分级参数的计算,亦即该参数所对应的音频信号的特性不被考虑来计算声场分级参数;如果为1表示只考虑其所对应的参数参与声场分级参数的计算,亦即该参数所对应的音频信号的特性是计算声场分级参数的唯一依据。权重因子的值可以通过预先设置获取,也可以在本申请的方法执行过程中自适应计算获取,本申请对此不作具体限定。可选的,如果只获取第i个音频信号的上述一个或多个参数得其中一个参数,那么就把该参数作为第i个音频信号的声场分级参数。
在一种可能的实现方式中,可以对第i个音频信号的上述一个或多个参数,例如,运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的多个,求平均获取第i个音频信号的声场分级参数。即
Figure PCTCN2021084578-appb-000016
需要说明的是,上述函数中,计算第i个音频信号的声场分级参数上述提供了两种计算第i个音频信号的声场分级参数的函数实现方法,本申请还可以采用其他的计算方法,对此不作具体限定。
基于第i个音频信号的声场分级参数,本申请可以采用以下方法获取第i个音频信号的优先级。第i个音频信号的声场分级参数和优先级之间是线性关系,即声场分级参数越大,优先级越高,如图6所示,空间声场以渲染中心为球心,距离该球心越近的音频信号的优先级越高,距离该球心越远的音频信号的优先级越低。
在一种可能的实现方式中,可以根据设定的第一对应关系将与第i个音频信号的声场分级参数对应的优先级确定为第一音频信号的优先级,第一对应关系包括多个声场分级参数和多个优先级之间的对应关系,其中,一个或多个声场分级参数对应一个优先级。
根据音频信号编码的历史数据和/或经验积累,可以预先设定音频信号的优先级等级,以及声场分级参数和各个优先级之间的对应关系。示例性的,表2示出了声场分级参数和优先级的一个示例性的第一对应关系。
表2
声场分级参数 优先级
0.9 1
0.8 2
0.7 3
0.6 4
0.5 5
0.4 6
0.3 7
0.2 8
0.1 9
0 10
根据表2,当第i个音频信号的声场分级参数为0.4时,其对应的优先级为6,那么此时第i个音频信号的优先级为6。当第i个音频信号的声场分级参数为0.1时,其对应的优先级为9,那么此时第i个音频信号的优先级为9。需要说明的是,表2是声场分级参数和优先级的对应关系的一个示例,其并不构成对本申请涉及到此类对应关系的限定。
在一种可能的实现方式中,可以将第i个音频信号的声场分级参数作为第i个音频信号的优先级。
本申请可以不对优先级分出等级,直接将第i个音频信号的声场分级参数当作其优先级。
在一种可能的实现方式中,可以根据设定的范围阈值确定第i个音频信号的声场分级参数的所属范围,将与第i个音频信号的声场分级参数的所属范围对应的优先级确定为第i个音频信号的优先级。
根据音频信号编码的历史数据和/或经验积累,可以预先设定音频信号的优先级等级,以及声场分级参数的区间和各个优先级之间的对应关系。示例性的,表3示出了声场分级参数和优先级的另一个示例性的第一对应关系。
表3
声场分级参数区间 优先级
[0.9,1) 1
[0.8,0.9) 2
[0.7,0.8) 3
[0.6,0.7) 4
[0.5,0.6) 5
[0.4,0.5) 6
[0.3,0.4) 7
[0.2,0.3) 8
[0.1,0.2) 9
[0,0.1) 10
根据表3,当第i个音频信号的声场分级参数为0.6时,其所属的区间为[0.6,0.7),对应的优先级为4,那么此时第i个音频信号的优先级为4。当第i个音频信号的声场分级参数为0.15时,其所属的区间为[0.1,0.2),对应的优先级为9,那么此时第i个音频信号的优先级为9。需要说明的是,表3是声场分级参数和优先级的对应关系的一个示例,其并不构成对本申请涉及到此类对应关系的限定。
步骤404、根据M个音频信号的优先级对M个音频信号进行比特分配。
本申请可以根据当前可用比特数和M个音频信号的优先级进行比特分配,优先级越高的音频信号分配的比特数越多。当前可用比特数是指当前帧中编解码器在进行比特分配前可以用于对第一音频信号集合中的M个音频信号进行比特分配的总的比特数。
在一种可能的实现方式中,可以根据第一音频信号的优先级确定第一音频信号的比特数占比,第一音频信号为M个音频信号中的任意一个,对当前可用比特数和第一音频信号的比特数占比计算乘积获取第一音频信号的比特数。音频信号的优先级和比特数占比之间预先建立了对应关系,可以一个优先级对应一个比特数占比,也可以多个优先级对应一个比特数占比。基于该比特数占比,以及当前可用比特数,就可以计算获取对应的音频信号可以被分配的比特数。例如,M为3,第一个音频信号的优先级为1,第二个音频信号的优先级为2,第三个音频信号的优先级为3,假设设定优先级1对应的占比为50%,优先级2对应的占比为30%,优先级3对应的占比为20%,当前可用比特数为100,那么第一个音频信号分配的比特数为50,第二个音频信号分配的比特数为30,第三个音频信号分配的比特数为20。需要说明的是,在不同的音频帧中,优先级对应的比特数是可以自适应调整的,对此不作具体限定。
在一种可能的实现方式中,可以根据设定的第二对应关系将与第一音频信号的优先级对应的比特数确定为第一音频信号的比特数,第二对应关系包括多个优先级和多个比特数之间的对应关系,其中,一个或多个优先级对应一个比特数。音频信号的优先级和比特数之间预先建立了对应关系,可以一个优先级对应一个比特数,也可以多个优先级对应一个比特数。基于该对应关系,只要获取了音频信号的优先级,就可以获取与其对应的比特数。例如,M为3,第一个音频信号的优先级为1,第二个音频信号的优先级为2,第三个音 频信号的优先级为3,假设设定优先级1对应的比特数为50,优先级2对应的比特数为30,优先级3对应的比特数为20。
在一种可能的实现方式中,当音频信号的声场分级参数不含有信号分级参数时,且当声场分级参数较小时,认为音频信号间声场分级差异很小,此时音频信号间的比特分配可以根据编解码过程中音频信号间的绝对能量比确定;当音频信号的声场分级参数不含有信号分级参数时,且当当音频信号的声场分级参数较大时,认为音频信号间声场分级差异很大,此时音频信号间的比特分配可以根据音频信号的声场分级参数确定;其他情况下,音频信号的比特分配可以根据音频信号的比特分配因子确定。因此可以有以下公式:sceneRatio i表示第i个音频信号的声场分级参数,bits_available表示当前可用比特数,bits_object i表示第i个音频信号分配的比特数。
当sceneRatio i≤δ时,bits_object i=nrgRatio i×bits_available,其中,δ表示声场分级参数的上限,nrgRatio i表示第i个音频信号和其他音频信号之间的绝对能量比。
当sceneRatio i≥τ时,bits_object i=sceneRatio i×bits_available,τ表示声场分级参数的下限。
除上述两种情况外,bits_object i=objRatio i×bits_available,其中,objRatio i表示第i个音频信号的比特分配因子。
需要说明的是,除上述描述的音频信号分配的比特数的确定方法外,还可以采用其他方法实现,本申请对此不作具体限定。
本申请根据当前帧中包括的多个音频信号的特征及元数据中的音频信号的相关信息,确定该多个音频信号的优先级,根据该优先级确定要分配给各个音频信号的比特数,既可以自适应音频信号的特征,也可以针对不同音频信号匹配不同的编码比特数,提高了音频信号的编解码效率。
本申请在步骤402中从当前帧的T个音频信号中确定出了M个音频信号加入第一音频信号集合,对该M个音频信号采用步骤403和步骤404的方法,先确定各音频信号的优先级,再根据音频信号的优先级确定分配给各音频信号的比特数。当T>M时,第一音频信号集合中的音频信号并不是当前帧中的所有音频信号,可以将剩余的音频信号加入第二音频信号集合,该第二音频信号集合包括N个音频信号,N=T-M。针对该N个音频信号,可以采用较为简单的方法确定其分配的比特数,例如,对第二音频信号集合可用的总比特数对N求平均获取每个音频信号的比特数,即将第二音频信号集合可用的总比特数平均分配给该集合中的N个音频信号。需要说明的是,第二音频信号集合还可以采用其他的方法获取集合中的各音频信号的比特数,本申请对此不作具体限定。
另外,除上述步骤403中描述的音频信号的优先级确定方法外,本申请还提供了一种基于多种优先级确定方法的优先级融合方法,即针对同一音频信号,可以采用多种方法获取其优先级,那么如何确定该音频信号最终的优先级的方法。以下以第一音频信号为例进行描述,第一音频信号为上述M个音频信号中的任意一个。
在一种可能的实现方式中,根据第一音频信号和/或与第一音频信号对应的元数据获取第一音频信号的第一参数集和第二参数集,第一参数集包括第一音频信号的上述相关参 数中的运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个,第二参数集也包括第一音频信号的上述相关参数中的运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个。第一参数集和第二参数集可以包含相同的参数,也可以包含不同的参数。根据第一参数集获取第一音频信号的第一声场分级参数。此处可以参照上述步骤403中确定第一音频信号集合中的M个音频信号的声场分级参数的方法,也可以采用其他方法。根据第二参数集获取第一音频信号的第二声场分级参数。此处所采用的方法与计算第一声场分级参数的方法不相同。根据第一声场分级参数和第二声场分级参数获取第一音频信号的声场分级参数。本申请中对于同一音频信号的两种方法计算获取的声场分级参数,可以采用加权平均的方法,也可以采用直接求平均的方法,还可以采用取最大值或取最小值的方法确定该音频信号最终的声场分级参数,对此不作具体限定。这样可以实现音频信号的声场分级参数的多样性获取,兼容各种策略下的计算方案。
在一种可能的实现方式中,获取到第一音频信号的第一声场分级参数和第二声场分级参数后,可以根据第一声场分级参数获取第一音频信号的第一优先级。此时可以采用上述步骤403的方法获取该优先级,也可以采用其他方法获取。根据第二声场分级参数获取第一音频信号的第二优先级。此处所采用的的方法与计算第一优先级的方法不相同。根据第一优先级和第二优先级获取第一音频信号的优先级。本申请中对于同一音频信号的两种方法计算获取的优先级,可以采用加权平均的方法,也可以采用求平均的方法,还可以采用取最大值或取最小值的方法确定该音频信号最终的优先级,对此不作具体限定。这样可以实现音频信号的优先级的多样性获取,兼容各种策略下的计算方案。
当采用上述实施例的方法确定了当前帧的T个音频信号分配的比特数后,本申请可以根据T个音频信号的比特数生成码流,该码流包括T个第一标识、T个第二标识和T个第三标识,T个音频信号分别和T个第一标识、T个第二标识和T个第三标识对应,第一标识用于表示对应音频信号所属的音频信号集合,第二标识用于表示对应音频信号的优先级,第三标识用于表示对应音频信号的比特数;将码流发送给解码设备。解码设备收到码流后,根据码流中携带的T个第一标识、T个第二标识和T个第三标识执行上述音频信号的比特分配方法,确定T个音频信号的比特数。解码设备也可以直接根据码流中携带的T个第一标识、T个第二标识和T个第三标识确定T个音频信号所属的音频信号集合、优先级及分配的比特数,进而对码流进行解码获取T个音频信号。上述第一标识、第二标识和第三标识是在图4所示的方法实施例的基础上添加的标识信息,以便于音频信号的编解码端可以基于相同的方法对音频信号进行编码或解码。
图7为本申请装置实施例的结构示意图,如图7所示,该装置可以应用于上述实施例中的编码设备或解码设备。本实施例的装置可以包括:处理模块701和收发模块702。其中,处理模块701,用于获取当前帧中的T个音频信号,T为正整数;根据所述T个音频信号确定第一音频信号集合,所述第一音频信号集合包括M个音频信号,M为正整数,所述T个音频信号包括所述M个音频信号,T≥M;确定所述第一音频信号集合中的所述M个音频信号的优先级;根据所述M个音频信号的优先级对所述M个音频信号进行比特分配。
在一种可能的实现方式中,所述处理模块701,具体用于获取所述M个音频信号中每 个音频信号的声场分级参数;根据所述M个音频信号中每个音频信号的声场分级参数确定所述M个音频信号的优先级。
在一种可能的实现方式中,所述处理模块701,具体用于获取第一音频信号的运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;根据获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的声场分级参数;其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中音量的大小,所述传播分级参数用于描述所述第一音频信号在空间声场中传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过程中能量的大小。
在一种可能的实现方式中,所述处理模块701,具体用于获取所述当前帧中的S组元数据,S为正整数,T≥S,所述S组元数据和所述T个音频信号对应,所述元数据用于描述对应的音频信号在空间声场中的状态。
在一种可能的实现方式中,所述处理模块701,具体用于根据与第一音频信号对应的元数据,或者根据所述第一音频信号以及与所述第一音频信号对应的元数据获取所述第一音频信号的运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;根据获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的声场分级参数;其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中音量的大小,所述传播分级参数用于描述所述第一音频信号在空间声场中传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过程中能量的大小。
在一种可能的实现方式中,所述处理模块701,具体用于对获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的多个加权平均获取所述声场分级参数;或者,对获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的多个求平均获取所述声场分级参数;或者,将获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个作为所述声场分级参数。
在一种可能的实现方式中,所述处理模块701,具体用于根据设定的第一对应关系将与所述第一音频信号的声场分级参数对应的优先级确定为所述第一音频信号的优先级,所 述第一对应关系包括多个声场分级参数和多个优先级之间的对应关系,其中,一个或多个所述声场分级参数对应一个所述优先级,所述第一音频信号为所述M个音频信号中的任意一个;或者,将所述第一音频信号的声场分级参数作为所述第一音频信号的优先级;或者,根据设定的范围阈值确定所述第一音频信号的声场分级参数的所属范围,将与所述第一音频信号的声场分级参数的所属范围对应的优先级确定为所述第一音频信号的优先级。
在一种可能的实现方式中,所述处理模块701,具体用于根据当前可用比特数和所述M个音频信号的优先级进行比特分配,优先级越高的音频信号分配的比特数越多。
在一种可能的实现方式中,所述处理模块701,具体用于根据第一音频信号的优先级确定所述第一音频信号的比特数占比,所述第一音频信号为所述M个音频信号中的任意一个;根据所述当前可用比特数和所述第一音频信号的比特数占比的乘积获取所述第一音频信号的比特数。
在一种可能的实现方式中,所述处理模块701,具体用于根据第一音频信号的优先级从设定的第二对应关系中确定为所述第一音频信号的比特数,所述第二对应关系包括多个优先级和多个比特数之间的对应关系,其中,一个或多个所述优先级对应一个所述比特数,所述第一音频信号为所述M个音频信号中的任意一个。
在一种可能的实现方式中,所述处理模块701,具体用于将所述T个音频信号中预先指定的音频信号加入所述第一音频信号集合。
在一种可能的实现方式中,所述处理模块701,具体用于将所述S组元数据在所述T个音频信号中对应的音频信号加入所述第一音频信号集合;或者,将大于或等于设定的参与阈值的重要度参数对应的音频信号加入所述第一音频信号集合,所述元数据包括所述重要度参数,所述T个音频信号包括所述重要度参数对应的音频信号。
在一种可能的实现方式中,所述处理模块701,具体用于获取第一音频信号的运动分级参数、音量分级参数、传播分级参数和扩散分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;根据获取的所述运动分级参数、音量分级参数、传播分级参数和扩散分级参数中的一个或多个获取所述第一音频信号的第一声场分级参数;获取所述第一音频信号的状态分级参数、排序分级参数和信号分级参数中的一个或多个;根据获取的所述状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的第二声场分级参数;根据所述第一声场分级参数和所述第二声场分级参数获取所述第一音频信号的声场分级参数;其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中回放时的音量大小,所述传播分级参数用于描述所述第一音频信号在空间声场中回放时的传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过程中能量的大小。
在一种可能的实现方式中,所述处理模块701,具体用于根据与第一音频信号对应的元数据,或者根据所述第一音频信号以及与所述第一音频信号对应的元数据获取所述第一音频信号的运动分级参数、音量分级参数、传播分级参数和扩散分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;根据获取的所述运动分级参数、 音量分级参数、传播分级参数和扩散分级参数中的一个或多个获取所述第一音频信号的第一声场分级参数;根据与所述第一音频信号对应的元数据,或者根据所述第一音频信号以及与所述第一音频信号对应的元数据获取所述第一音频信号的状态分级参数、排序分级参数和信号分级参数中的一个或多个;根据获取的所述状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的第二声场分级参数;根据所述第一声场分级参数和所述第二声场分级参数获取所述第一音频信号的声场分级参数;其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中回放时的音量大小,所述传播分级参数用于描述所述第一音频信号在空间声场中回放时的传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过程中能量的大小。
在一种可能的实现方式中,所述处理模块701,具体用于根据所述第一声场分级参数获取所述第一音频信号的第一优先级;根据所述第二声场分级参数获取所述第一音频信号的第二优先级;根据所述第一优先级和所述第二优先级获取所述第一音频信号的优先级。
在一种可能的实现方式中,所述处理模块701,还用于根据所述M个音频信号所分配的比特数对所述M个音频信号进行编码以获取编码码流。
在一种可能的实现方式中,所述编码码流包括所述M个音频信号的比特数。
在一种可能的实现方式中,还包括:收发模块702,用于接收编码码流;所述处理模块701,还用于获取所述M个音频信号各自的比特数;根据所述M个音频信号各自的比特数以及所述编码码流重建所述M个音频信号。
本实施例的装置,可以用于执行图4所示方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。
图8为本申请设备实施例的结构示意图,如图8所示,该设备可以是上述实施例中的编码设备或解码设备。本实施例的设备可以包括:处理器801和存储器802,存储器802,用于存储一个或多个程序;当所述一个或多个程序被所述处理器801执行,使得所述处理器801实现如图4所示方法实施例的技术方案。
在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。处理器可以是通用处理器、数字信号处理器(digital signal processor,DSP)、特定应用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。本申请公开的方法的步骤可以直接体现为硬件编码处理器执行完成,或者用编码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。
上述各实施例中提及的存储器可以是易失性存储器或非易失性存储器,或可包括易失 性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (41)

  1. 一种音频信号的比特分配方法,其特征在于,包括:
    获取当前帧中的T个音频信号,T为正整数;
    根据所述T个音频信号确定第一音频信号集合,所述第一音频信号集合包括M个音频信号,M为正整数,所述T个音频信号包括所述M个音频信号,T≥M;
    确定所述第一音频信号集合中的所述M个音频信号的优先级;
    根据所述M个音频信号的优先级对所述M个音频信号进行比特分配。
  2. 根据权利要求1所述的方法,其特征在于,所述确定所述第一音频信号集合中的所述M个音频信号的优先级,包括:
    获取所述M个音频信号中每个音频信号的声场分级参数;
    根据所述M个音频信号中每个音频信号的声场分级参数确定所述M个音频信号的优先级。
  3. 根据权利要求2所述的方法,其特征在于,所述获取所述M个音频信号中每个音频信号的声场分级参数,包括:
    获取第一音频信号的运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;
    根据获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的声场分级参数;
    其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中音量的大小,所述传播分级参数用于描述所述第一音频信号在空间声场中传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过程中能量的大小。
  4. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    获取所述当前帧中的S组元数据,S为正整数,T≥S,所述S组元数据和所述T个音频信号对应,所述元数据用于描述对应的音频信号在空间声场中的状态。
  5. 根据权利要求4所述的方法,其特征在于,所述获取所述M个音频信号中每个音频信号的声场分级参数,包括:
    根据与第一音频信号对应的元数据,或者根据所述第一音频信号以及与所述第一音频信号对应的元数据获取所述第一音频信号的运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;
    根据获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的声场分 级参数;
    其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中音量的大小,所述传播分级参数用于描述所述第一音频信号在空间声场中传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过程中能量的大小。
  6. 根据权利要求3或5所述的方法,其特征在于,所述根据获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的声场分级参数,包括:
    对获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的多个加权平均获取所述声场分级参数;或者,
    对获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的多个求平均获取所述声场分级参数;或者,
    将获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个作为所述声场分级参数。
  7. 根据权利要求2-6中任一项所述的方法,其特征在于,所述根据所述M个音频信号中每个音频信号的声场分级参数确定所述M个音频信号的优先级,包括:
    根据设定的第一对应关系将与第一音频信号的声场分级参数对应的优先级确定为所述第一音频信号的优先级,所述第一对应关系包括多个声场分级参数和多个优先级之间的对应关系,其中,一个或多个所述声场分级参数对应一个所述优先级,所述第一音频信号为所述M个音频信号中的任意一个;或者,
    将所述第一音频信号的声场分级参数作为所述第一音频信号的优先级;或者,
    根据设定的多个范围阈值确定所述第一音频信号的声场分级参数的所属范围,将与所述第一音频信号的声场分级参数的所属范围对应的优先级确定为所述第一音频信号的优先级。
  8. 根据权利要求1-7中任一项所述的方法,其特征在于,所述根据所述M个音频信号的优先级对所述M个音频信号进行比特分配,包括:
    根据当前可用比特数和所述M个音频信号的优先级进行比特分配,优先级越高的音频信号分配的比特数越多。
  9. 根据权利要求8所述的方法,其特征在于,所述根据当前可用比特数和所述M个音频信号的优先级进行比特分配,包括:
    根据第一音频信号的优先级确定所述第一音频信号的比特数占比,所述第一音频信号为所述M个音频信号中的任意一个;
    根据所述当前可用比特数和所述第一音频信号的比特数占比的乘积获取所述第一音频信号的比特数。
  10. 根据权利要求8所述的方法,其特征在于,所述根据当前可用比特数和所述M个音频信号的优先级进行比特分配,包括:
    根据第一音频信号的优先级从设定的第二对应关系中确定所述第一音频信号的比特数,所述第二对应关系包括多个优先级和多个比特数之间的对应关系,其中,一个或多个所述优先级对应一个所述比特数,所述第一音频信号为所述M个音频信号中的任意一个。
  11. 根据权利要求1-10中任一项所述的方法,其特征在于,所述根据所述T个音频信号确定第一音频信号集合,包括:
    将所述T个音频信号中预先指定的音频信号加入所述第一音频信号集合。
  12. 根据权利要求4所述的方法,其特征在于,所述根据所述T个音频信号确定第一音频信号集合,包括:
    将所述S组元数据在所述T个音频信号中对应的音频信号加入所述第一音频信号集合;或者,
    将大于或等于设定的参与阈值的重要度参数对应的音频信号加入所述第一音频信号集合,所述元数据包括所述重要度参数,所述T个音频信号包括所述重要度参数对应的音频信号。
  13. 根据权利要求2所述的方法,其特征在于,所述获取所述M个音频信号中每个音频信号的声场分级参数,包括:
    获取第一音频信号的运动分级参数、音量分级参数、传播分级参数和扩散分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;
    根据获取的所述运动分级参数、音量分级参数、传播分级参数和扩散分级参数中的一个或多个获取所述第一音频信号的第一声场分级参数;
    获取所述第一音频信号的状态分级参数、排序分级参数和信号分级参数中的一个或多个;
    根据获取的所述状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的第二声场分级参数;
    根据所述第一声场分级参数和所述第二声场分级参数获取所述第一音频信号的声场分级参数;
    其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中回放时的音量大小,所述传播分级参数用于描述所述第一音频信号在空间声场中回放时的传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过程中能量的大小。
  14. 根据权利要求4所述的方法,其特征在于,所述获取所述M个音频信号中每个音频信号的声场分级参数,包括:
    根据与第一音频信号对应的元数据,或者根据所述第一音频信号以及与所述第一音频信号对应的元数据获取所述第一音频信号的运动分级参数、音量分级参数、传播分级参数和扩散分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;
    根据获取的所述运动分级参数、音量分级参数、传播分级参数和扩散分级参数中的一个或多个获取所述第一音频信号的第一声场分级参数;
    根据与所述第一音频信号对应的元数据,或者根据所述第一音频信号以及与所述第一音频信号对应的元数据获取所述第一音频信号的状态分级参数、排序分级参数和信号分级参数中的一个或多个;
    根据获取的所述状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的第二声场分级参数;
    根据所述第一声场分级参数和所述第二声场分级参数获取所述第一音频信号的声场分级参数;
    其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中回放时的音量大小,所述传播分级参数用于描述所述第一音频信号在空间声场中回放时的传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过程中能量的大小。
  15. 根据权利要求13或14所述的方法,其特征在于,所述根据所述M个音频信号中每个音频信号的声场分级参数确定所述M个音频信号的优先级,包括:
    根据所述第一声场分级参数获取所述第一音频信号的第一优先级;
    根据所述第二声场分级参数获取所述第一音频信号的第二优先级;
    根据所述第一优先级和所述第二优先级获取所述第一音频信号的优先级。
  16. 一种音频信号的编码方法,其特征在于,执行完权利要求1-15中任一项所述的音频信号的比特分配方法之后,还包括:
    根据所述M个音频信号所分配的比特数对所述M个音频信号进行编码以获取编码码流。
  17. 根据权利要求16所述的音频信号的编码方法,其特征在于,所述编码码流包括所述M个音频信号的比特数。
  18. 一种音频信号的解码方法,其特征在于,执行完权利要求1-15中任一项所述的音频信号的比特分配方法之后,还包括:
    接收编码码流;
    执行如权利要求1-15中任一项所述的音频信号的比特分配方法获取所述M个音频信号各自的比特数;
    根据所述M个音频信号各自的比特数以及所述编码码流重建所述M个音频信号。
  19. 一种音频信号的比特分配装置,其特征在于,包括:
    处理模块,用于获取当前帧中的T个音频信号,T为正整数;根据所述T个音频信号确定第一音频信号集合,所述第一音频信号集合包括M个音频信号,M为正整数,所述T个音频信号包括所述M个音频信号,T≥M;确定所述第一音频信号集合中的所述M个音频信号的优先级;根据所述M个音频信号的优先级对所述M个音频信号进行比特分配。
  20. 根据权利要求19所述的装置,其特征在于,所述处理模块,具体用于获取所述M个音频信号中每个音频信号的声场分级参数;根据所述M个音频信号中每个音频信号的声场分级参数确定所述M个音频信号的优先级。
  21. 根据权利要求20所述的装置,其特征在于,所述处理模块,具体用于获取第一音频信号的运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;根据获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的声场分级参数;其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中音量的大小,所述传播分级参数用于描述所述第一音频信号在空间声场中传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过程中能量的大小。
  22. 根据权利要求20所述的装置,其特征在于,所述处理模块,具体用于获取所述当前帧中的S组元数据,S为正整数,T≥S,所述S组元数据和所述T个音频信号对应,所述元数据用于描述对应的音频信号在空间声场中的状态。
  23. 根据权利要求22所述的装置,其特征在于,所述处理模块,具体用于根据与第一音频信号对应的元数据,或者根据所述第一音频信号以及与所述第一音频信号对应的元数据获取所述第一音频信号的运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;根据获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的声场分级参数;其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中音量的大小,所述传播分级参数用于描述所述第一音频信号在空间声场中传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过程中能量的大小。
  24. 根据权利要求21或23所述的装置,其特征在于,所述处理模块,具体用于对获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的多个加权平均获取所述声场分级参数;或者,对获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的多个求平均获取所述声场分级参数;或者,将获取的所述运动分级参数、音量分级参数、传播分级参数、扩散分级参数、状态分级参数、排序分级参数和信号分级参数中的一个作为所述声场分级参数。
  25. 根据权利要求20-24中任一项所述的装置,其特征在于,所述处理模块,具体用于根据设定的第一对应关系将与第一音频信号的声场分级参数对应的优先级确定为所述第一音频信号的优先级,所述第一对应关系包括多个声场分级参数和多个优先级之间的对应关系,其中,一个或多个所述声场分级参数对应一个所述优先级,所述第一音频信号为 所述M个音频信号中的任意一个;或者,将所述第一音频信号的声场分级参数作为所述第一音频信号的优先级;或者,根据设定的多个范围阈值确定所述第一音频信号的声场分级参数的所属范围,将与所述第一音频信号的声场分级参数的所属范围对应的优先级确定为所述第一音频信号的优先级。
  26. 根据权利要求19-25中任一项所述的装置,其特征在于,所述处理模块,具体用于根据当前可用比特数和所述M个音频信号的优先级进行比特分配,优先级越高的音频信号分配的比特数越多。
  27. 根据权利要求26所述的装置,其特征在于,所述处理模块,具体用于根据第一音频信号的优先级确定所述第一音频信号的比特数占比,所述第一音频信号为所述M个音频信号中的任意一个;根据所述当前可用比特数和所述第一音频信号的比特数占比的乘积获取所述第一音频信号的比特数。
  28. 根据权利要求26所述的装置,其特征在于,所述处理模块,具体用于根据第一音频信号的优先级从设定的第二对应关系中确定所述第一音频信号的比特数,所述第二对应关系包括多个优先级和多个比特数之间的对应关系,其中,一个或多个所述优先级对应一个所述比特数,所述第一音频信号为所述M个音频信号中的任意一个。
  29. 根据权利要求19-28中任一项所述的装置,其特征在于,所述处理模块,具体用于将所述T个音频信号中预先指定的音频信号加入所述第一音频信号集合。
  30. 根据权利要求22所述的装置,其特征在于,所述处理模块,具体用于将所述S组元数据在所述T个音频信号中对应的音频信号加入所述第一音频信号集合;或者,将大于或等于设定的参与阈值的重要度参数对应的音频信号加入所述第一音频信号集合,所述元数据包括所述重要度参数,所述T个音频信号包括所述重要度参数对应的音频信号。
  31. 根据权利要求20所述的装置,其特征在于,所述处理模块,具体用于获取第一音频信的运动分级参数、音量分级参数、传播分级参数和扩散分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;根据获取的所述运动分级参数、音量分级参数、传播分级参数和扩散分级参数中的一个或多个获取所述第一音频信号的第一声场分级参数;获取所述第一音频信号的状态分级参数、排序分级参数和信号分级参数中的一个或多个;根据获取的所述状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的第二声场分级参数;根据所述第一声场分级参数和所述第二声场分级参数获取所述第一音频信号的声场分级参数;其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中回放时的音量大小,所述传播分级参数用于描述所述第一音频信号在空间声场中回放时的传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过程中能量的大小。
  32. 根据权利要求22所述的装置,其特征在于,所述处理模块,具体用于根据与第一音频信号对应的元数据,或者根据所述第一音频信号以及与所述第一音频信号对应的元数据获取所述第一音频信号的运动分级参数、音量分级参数、传播分级参数和扩散分级参数中的一个或多个,所述第一音频信号为所述M个音频信号中的任意一个;根据与所述 第一音频信号对应的元数据,或者根据所述第一音频信号以及与所述第一音频信号对应的元数据获取所述第一音频信号的状态分级参数、排序分级参数和信号分级参数中的一个或多个;根据获取的所述运动分级参数、音量分级参数、传播分级参数和扩散分级参数中的一个或多个获取所述第一音频信号的第一声场分级参数;根据获取的所述状态分级参数、排序分级参数和信号分级参数中的一个或多个获取所述第一音频信号的第二声场分级参数;根据所述第一声场分级参数和所述第二声场分级参数获取所述第一音频信号的声场分级参数;其中,所述运动分级参数用于描述所述第一音频信号在空间声场中单位时间内移动快慢,所述音量分级参数用于描述所述第一音频信号在空间声场中回放时的音量大小,所述传播分级参数用于描述所述第一音频信号在空间声场中回放时的传播范围的大小,所述扩散分级参数用于描述所述第一音频信号在空间声场中扩散范围的大小,所述状态分级参数用于描述所述第一音频信号在空间声场中声源分割的大小,所述排序分级参数用于描述所述第一音频信号在空间声场中优先排序的大小,所述信号分级参数用于描述所述第一音频信号编码过程中能量的大小。
  33. 根据权利要求31或32所述的装置,其特征在于,所述处理模块,具体用于根据所述第一声场分级参数获取所述第一音频信号的第一优先级;根据所述第二声场分级参数获取所述第一音频信号的第二优先级;根据所述第一优先级和所述第二优先级获取所述第一音频信号的优先级。
  34. 根据权利要求19-33中任一项所述的装置,其特征在于,所述处理模块,还用于根据所述M个音频信号所分配的比特数对所述M个音频信号进行编码以获取编码码流。
  35. 根据权利要求34所述的装置,其特征在于,所述编码码流包括所述M个音频信号的比特数。
  36. 根据权利要求34或35所述的装置,其特征在于,还包括:收发模块,用于接收编码码流;所述处理模块,还用于获取所述M个音频信号各自的比特数;根据所述M个音频信号各自的比特数以及所述编码码流重建所述M个音频信号。
  37. 一种设备,其特征在于,包括:
    一个或多个处理器;
    存储器,用于存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-18中任一项所述的方法。
  38. 一种计算机可读存储介质,其特征在于,包括计算机程序,所述计算机程序在计算机上被执行时,使得所述计算机执行权利要求1-18中任一项所述的方法。
  39. 一种计算机可读存储介质,其特征在于,包括根据如权利要求16所述的方法获取的编码码流。
  40. 一种编码装置,其特征在于,包括处理器和通信接口,所述处理器通过所述通信接口读取存储计算机程序,所述计算机程序包括程序指令,所述处理器用于调用所述程序指令,执行如权利要求1至18中任一项所述的方法。
  41. 一种编码装置,其特征在于,包括处理器和存储器,所述处理器用于执行权利要求16所述的方法,所述存储器用于存放所述编码码流。
PCT/CN2021/084578 2020-04-30 2021-03-31 音频信号的比特分配方法和装置 Ceased WO2021218558A1 (zh)

Priority Applications (5)

Application Number Priority Date Filing Date Title
BR112022021882A BR112022021882A2 (pt) 2020-04-30 2021-03-31 Método e aparelho de alocação de bits para sinal de áudio, dispositivo, meio de armazenamento legível por computador, aparelho de codificação e aparelho de decodificação
KR1020227040823A KR102868387B1 (ko) 2020-04-30 2021-03-31 오디오 신호에 대한 비트 할당 방법 및 장치
JP2022565956A JP7550881B2 (ja) 2020-04-30 2021-03-31 音声信号に対するビット割り当て方法及び装置
EP21797604.2A EP4131259B1 (en) 2020-04-30 2021-03-31 Bit allocation method and apparatus for audio signal
US17/976,474 US11900950B2 (en) 2020-04-30 2022-10-28 Bit allocation method and apparatus for audio signal

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010368424.9A CN113593585A (zh) 2020-04-30 2020-04-30 音频信号的比特分配方法和装置
CN202010368424.9 2020-04-30

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/976,474 Continuation US11900950B2 (en) 2020-04-30 2022-10-28 Bit allocation method and apparatus for audio signal

Publications (1)

Publication Number Publication Date
WO2021218558A1 true WO2021218558A1 (zh) 2021-11-04

Family

ID=78237842

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/084578 Ceased WO2021218558A1 (zh) 2020-04-30 2021-03-31 音频信号的比特分配方法和装置

Country Status (8)

Country Link
US (1) US11900950B2 (zh)
EP (1) EP4131259B1 (zh)
JP (1) JP7550881B2 (zh)
KR (1) KR102868387B1 (zh)
CN (1) CN113593585A (zh)
BR (1) BR112022021882A2 (zh)
TW (1) TWI773286B (zh)
WO (1) WO2021218558A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112767953B (zh) * 2020-06-24 2024-01-23 腾讯科技(深圳)有限公司 语音编码方法、装置、计算机设备和存储介质
CN115497485B (zh) * 2021-06-18 2024-10-18 华为技术有限公司 三维音频信号编码方法、装置、编码器和系统
CN115002613A (zh) * 2022-04-18 2022-09-02 北京安声科技有限公司 耳机
GB2624890A (en) * 2022-11-29 2024-06-05 Nokia Technologies Oy Parametric spatial audio encoding
CN120112994B (zh) * 2023-07-14 2026-03-17 北京小米移动软件有限公司 信号处理方法及其装置
WO2025081393A1 (zh) * 2023-10-18 2025-04-24 北京小米移动软件有限公司 音频信号的处理方法、装置、音频设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101217037A (zh) * 2007-01-05 2008-07-09 华为技术有限公司 对音频信号的编码速率进行源控的方法和系统
CN101950562A (zh) * 2010-11-03 2011-01-19 武汉大学 基于音频关注度的分级编码方法及系统
US20120314875A1 (en) * 2011-06-09 2012-12-13 Samsung Electronics Co., Ltd. Method and apparatus for encoding and decoding 3-dimensional audio signal
CN103928030A (zh) * 2014-04-30 2014-07-16 武汉大学 基于子带空间关注测度的可分级音频编码系统及方法

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5632005A (en) * 1991-01-08 1997-05-20 Ray Milton Dolby Encoder/decoder for multidimensional sound fields
EP0520068B1 (en) * 1991-01-08 1996-05-15 Dolby Laboratories Licensing Corporation Encoder/decoder for multidimensional sound fields
WO2009039897A1 (en) * 2007-09-26 2009-04-02 Fraunhofer - Gesellschaft Zur Förderung Der Angewandten Forschung E.V. Apparatus and method for extracting an ambient signal in an apparatus and method for obtaining weighting coefficients for extracting an ambient signal and computer program
EP2469741A1 (en) * 2010-12-21 2012-06-27 Thomson Licensing Method and apparatus for encoding and decoding successive frames of an ambisonics representation of a 2- or 3-dimensional sound field
US9412385B2 (en) * 2013-05-28 2016-08-09 Qualcomm Incorporated Performing spatial masking with respect to spherical harmonic coefficients
US9495968B2 (en) * 2013-05-29 2016-11-15 Qualcomm Incorporated Identifying sources from which higher order ambisonic audio data is generated
WO2015056383A1 (ja) 2013-10-17 2015-04-23 パナソニック株式会社 オーディオエンコード装置及びオーディオデコード装置
US9564136B2 (en) * 2014-03-06 2017-02-07 Dts, Inc. Post-encoding bitrate reduction of multiple object audio
US10395664B2 (en) 2016-01-26 2019-08-27 Dolby Laboratories Licensing Corporation Adaptive Quantization
US20180338212A1 (en) * 2017-05-18 2018-11-22 Qualcomm Incorporated Layered intermediate compression for higher order ambisonic audio data
US10854209B2 (en) * 2017-10-03 2020-12-01 Qualcomm Incorporated Multi-stream audio coding
JP2019121037A (ja) 2017-12-28 2019-07-22 ソニー株式会社 情報処理装置、情報処理方法およびプログラム

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101217037A (zh) * 2007-01-05 2008-07-09 华为技术有限公司 对音频信号的编码速率进行源控的方法和系统
CN101950562A (zh) * 2010-11-03 2011-01-19 武汉大学 基于音频关注度的分级编码方法及系统
US20120314875A1 (en) * 2011-06-09 2012-12-13 Samsung Electronics Co., Ltd. Method and apparatus for encoding and decoding 3-dimensional audio signal
CN103928030A (zh) * 2014-04-30 2014-07-16 武汉大学 基于子带空间关注测度的可分级音频编码系统及方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4131259A4

Also Published As

Publication number Publication date
KR20230002968A (ko) 2023-01-05
US20230133252A1 (en) 2023-05-04
KR102868387B1 (ko) 2025-10-13
BR112022021882A2 (pt) 2023-01-24
EP4131259A4 (en) 2023-09-20
JP2023523081A (ja) 2023-06-01
CN113593585A (zh) 2021-11-02
JP7550881B2 (ja) 2024-09-13
US11900950B2 (en) 2024-02-13
EP4131259A1 (en) 2023-02-08
EP4131259B1 (en) 2025-06-25
TW202143216A (zh) 2021-11-16
TWI773286B (zh) 2022-08-01

Similar Documents

Publication Publication Date Title
WO2021218558A1 (zh) 音频信号的比特分配方法和装置
TWI819344B (zh) 音訊訊號渲染方法、裝置、設備及電腦可讀存儲介質
JP7745100B2 (ja) 信号の符号化および復号化方法、装置、ユーザイクイップメント、ネットワーク側デバイス並びに記憶媒体
KR102901181B1 (ko) 오디오 코딩 방법 및 장치
US20230368801A1 (en) Bit allocation method and apparatus for audio object
KR102808817B1 (ko) 가상 스피커 세트 결정 방법 및 디바이스
WO2020155976A1 (zh) 一种音频信号处理方法及装置
WO2021213128A1 (zh) 音频信号编码方法和装置
WO2022012554A1 (zh) 多声道音频信号编码方法和装置
EP4356376A1 (en) Apparatus, methods and computer programs for obtaining spatial metadata
CN116883708A (zh) 图像分类方法、装置、电子设备及存储介质
CN115550690B (zh) 帧率调整方法、装置、设备及存储介质
CN114283837B (zh) 一种音频处理方法、装置、设备及存储介质
CN116156184A (zh) 视频编解码方法、装置、设备、存储介质及计算机程序
US12412587B2 (en) Encoding method and apparatus, decoding method and apparatus, device, storage medium, and computer program
US20260038521A1 (en) Scene audio signal encoding method and apparatus
EP4618595A1 (en) Audio signal rendering method, apparatus, device, and storage medium
CN115038027B (zh) Hoa系数的获取方法和装置
CN116980075A (zh) 数据编码方法、装置、电子设备及存储介质
GB2594942A (en) Capturing and enabling rendering of spatial audio signals
WO2024212894A1 (zh) 场景音频信号的解码方法和装置
CN117581566A (zh) 音频处理方法、装置及存储介质
CN105872018A (zh) 一种医群通语音系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21797604

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022565956

Country of ref document: JP

Kind code of ref document: A

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112022021882

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 2021797604

Country of ref document: EP

Effective date: 20221031

ENP Entry into the national phase

Ref document number: 20227040823

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 112022021882

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20221027

WWG Wipo information: grant in national office

Ref document number: 202247062884

Country of ref document: IN

WWG Wipo information: grant in national office

Ref document number: 2021797604

Country of ref document: EP