WO2024213147A1 - 一种音频编码方法、装置、电子设备及存储介质 - Google Patents
一种音频编码方法、装置、电子设备及存储介质 Download PDFInfo
- Publication number
- WO2024213147A1 WO2024213147A1 PCT/CN2024/087626 CN2024087626W WO2024213147A1 WO 2024213147 A1 WO2024213147 A1 WO 2024213147A1 CN 2024087626 W CN2024087626 W CN 2024087626W WO 2024213147 A1 WO2024213147 A1 WO 2024213147A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- frequency band
- channel
- channel group
- matrix
- channels
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0204—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0212—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/022—Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/032—Quantisation or dequantisation of spectral components
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
Definitions
- the present disclosure relates to the field of audio processing technology, and in particular to an audio encoding method and an audio decoding method and a device, an electronic device, a storage medium, a computer program product, and a computer program.
- the existing two-dimensional audio algorithm (2 dimension mid-side, 2D M/S) can effectively reduce data redundancy between multiple channels, in a variety of different audio scenarios, the existing algorithm increases the transmission cost during data transmission, resulting in data waste.
- the embodiments of the present disclosure provide an audio encoding method and an audio decoding method and apparatus thereof, an electronic device, a computer-readable storage medium, a computer program product, and a computer program to solve the problem of waste of transmission and storage media during multi-channel audio transmission.
- the technical solution of the present disclosure is as follows:
- an embodiment of the present disclosure provides an audio encoding method, which is executed by an encoder, and the method includes: grouping a channel sequence to obtain multiple channel groups, each of the channel groups includes a number of continuous channels in the channel sequence, and there are one or more identical channels between adjacent channel groups; performing frequency domain conversion on the audio signal of each channel in the channel sequence frame by frame to obtain a frequency domain coefficient of each channel per frame; determining, from a transformation matrix set, a target transformation matrix for each band in a frequency band set corresponding to the channel group according to the frequency domain coefficient of each channel; performing same-band decorrelation processing on the frequency domain coefficients of the channels in the channel group based on the target transformation matrix of each frequency band to obtain encoding information of the channel group; obtaining an encoded bitstream based on the encoding information of the channel group, and sending the encoded bitstream to a decoder for decoding.
- an embodiment of the present disclosure provides an audio decoding method, which is executed by a decoder, the method comprising: receiving an encoded bitstream sent by an encoder, the encoded bitstream comprising encoding information of a plurality of channel groups, the channel groups being obtained by grouping a channel sequence in sequence, each of the channel groups comprising a plurality of continuous channels in the channel sequence, and one or more identical channels existing between adjacent channel groups; decoding the plurality of channel groups in sequence, and for a current channel group decoded, determining a frequency domain corresponding to the current channel group according to the encoding information of the current channel group; a target decoding matrix for each frequency band in the band set; based on the target decoding matrix of the current channel group on each frequency band, obtaining the decoded frequency domain coefficients of the current channel group for the encoding information of the current channel group; and obtaining the decoded audio signal of each channel in the channel sequence according to the decoded frequency domain coefficients of the multiple channel groups.
- an embodiment of the present disclosure provides an audio encoding device, comprising: a channel grouping module, configured to perform grouping of the channel sequence to obtain multiple channel groups, each of the channel groups includes a number of continuous channels in the channel sequence, and there are one or more identical channels between adjacent channel groups; a frequency domain processing module, configured to perform frequency domain conversion of the audio signal of each channel in the channel sequence on a frame-by-frame basis to obtain frequency domain coefficients of each channel per frame; a matrix determination module, configured to perform, based on the frequency domain coefficients of each channel, determination of a target transformation matrix for each band in a frequency band set corresponding to the channel group from a transformation matrix set; an encoding module, configured to perform same-band decorrelation processing on the frequency domain coefficients of the channels in the channel group based on the target transformation matrix for each frequency band to obtain encoding information of the channel group; a sending module, configured to obtain an encoding code stream based on the encoding information of the channel group, and send the encoding
- an embodiment of the present disclosure provides an audio decoding device, including a receiving module, configured to execute receiving an encoded bitstream sent by an encoder, the encoded bitstream including encoding information of multiple channel groups, the channel groups are obtained by grouping a channel sequence in sequence, each of the channel groups includes a number of consecutive channels in the channel sequence, and there are one or more identical channels between adjacent channel groups; a matrix determination module, configured to execute decoding of the multiple channel groups in sequence, and for a current channel group decoded, determine a target decoding matrix for each frequency band in a frequency band set corresponding to the current channel group according to the encoding information of the current channel group; a decoding module, configured to execute a target decoding matrix based on the target decoding matrix of the current channel group on each frequency band, obtain a decoded frequency domain coefficient of the current channel group for the encoding information of the current channel group, and obtain a decoded audio signal of each channel in the channel sequence according to the decoded frequency domain coefficients of the multiple
- an embodiment of the present disclosure provides an encoder, comprising a processor; and a memory for storing processor-executable instructions; wherein the processor is configured to implement the steps of the method described in the first aspect of the embodiment of the present disclosure.
- an embodiment of the present disclosure provides a decoder, comprising a processor; and a memory for storing processor-executable instructions; wherein the processor is configured to implement the steps of the method described in the second aspect of the embodiment of the present disclosure.
- an embodiment of the present disclosure provides a computer-readable storage medium having computer program instructions stored thereon, which, when executed by a processor, implement the steps of the method described in the first aspect of the embodiment of the present disclosure.
- an embodiment of the present disclosure provides a computer-readable storage medium having computer program instructions stored thereon, which, when executed by a processor, implement the steps of the method described in the second aspect of the embodiment of the present disclosure.
- an embodiment of the present disclosure provides an encoder, which includes a processor and an interface circuit, wherein the interface circuit is used to receive code instructions and transmit them to the processor, and the processor is used to run the code instructions to enable the device to execute the method described in the first aspect above.
- an embodiment of the present disclosure provides a decoder, the device comprising a processor and an interface circuit, the interface circuit being used to receive code instructions and transmit them to the processor, the processor being used to execute the code instructions to enable the device to execute the method described in the second aspect above.
- the embodiments of the present disclosure provide a coding and decoding system, which includes the encoding device described in the third aspect and the decoding device described in the fourth aspect, or the system includes the encoder described in the fifth aspect and the decoder described in the sixth aspect, or the system includes the encoder described in the seventh aspect and the encoding device described in the eighth aspect, or the system includes the encoder described in the ninth aspect and the decoder described in the tenth aspect.
- an embodiment of the present invention provides a computer-readable storage medium for storing instructions for the above-mentioned encoder, and when the instructions are executed, the encoder executes the method described in the first aspect.
- an embodiment of the present invention provides a computer-readable storage medium for storing instructions for the above-mentioned decoder, and when the instructions are executed, the decoder executes the method described in the second aspect.
- an embodiment of the present disclosure further provides a computer program product comprising a computer program, which, when executed on a computer, enables the computer to execute the method described in the first aspect above.
- an embodiment of the present disclosure further provides a computer program product comprising a computer program, which, when executed on a computer, enables the computer to execute the method described in the second aspect above.
- an embodiment of the present disclosure provides a chip system, which includes at least one processor and an interface, and is used to support a network device to implement the functions involved in the first aspect, for example, to determine or process at least one of the data and information involved in the above method.
- the chip system also includes a memory, and the memory is used to store computer programs and data necessary for the network device.
- the chip system can be composed of a chip, or it can include a chip and other discrete devices.
- an embodiment of the present disclosure provides a chip system, which includes at least one processor and an interface, and is used to support a terminal device to implement the functions involved in the second aspect, for example, determining or processing at least one of the data and information involved in the above method.
- the chip system also includes a memory, and the memory is used to store computer programs and data necessary for the terminal device.
- the chip system can be composed of a chip, or it can include a chip and other discrete devices.
- an embodiment of the present disclosure provides a computer program, which, when executed on a computer, enables the computer to execute the method described in the first aspect above.
- an embodiment of the present disclosure provides a computer program, which, when executed on a computer, enables the computer to execute the method described in the second aspect above.
- the encoder obtains frequency domain coefficients by dividing and grouping the channel signals, and the target transformation matrix of each frequency band corresponding to the channel group can be determined based on the frequency domain coefficients of each channel. Further, the frequency domain coefficients of the channel are decorrelated according to the target transformation matrix to obtain The coding information of the channel group is used to obtain a coded bit stream based on the coded information and then sent to a decoder for decoding.
- the frequency domain coefficients on each frequency band in the channel group are coded by the target transformation matrix, thereby compressing the audio signals of multiple channels, reducing the redundancy between multiple channels, reducing the burden on the encoder, and reducing the transmission and storage costs.
- Fig. 1 is a flowchart showing an audio encoding method according to an exemplary embodiment.
- Fig. 2 is a flowchart showing an audio encoding method according to another exemplary embodiment.
- Fig. 3 is a schematic diagram of a cross-correlation matrix according to an exemplary embodiment.
- Fig. 4 is a flowchart showing an audio encoding method according to another exemplary embodiment.
- Fig. 5 is a flowchart showing an audio encoding method according to another exemplary embodiment.
- Fig. 6 is a schematic diagram showing an audio encoding method according to an exemplary embodiment.
- Fig. 7 is a flowchart showing an audio decoding method according to an exemplary embodiment.
- Fig. 8 is a flowchart showing an audio decoding method according to another exemplary embodiment.
- Fig. 9 is a schematic diagram showing an audio decoding method according to an exemplary embodiment.
- Fig. 10 is a flowchart showing an audio encoding method according to another exemplary embodiment.
- Fig. 11 is a block diagram showing an audio encoding device according to an exemplary embodiment.
- Fig. 12 is a block diagram showing an audio decoding device according to an exemplary embodiment.
- Fig. 13 is a block diagram showing an audio processing device according to an exemplary embodiment.
- Fig. 14 is a block diagram of another audio processing chip according to an exemplary embodiment.
- the term “if” as used herein may be interpreted as “at the time of” or “when” or “in response to determining”
- the terms used herein to characterize the size relationship are “greater than” or “less than”, “higher than” or “lower than”.
- greater than also covers the meaning of “greater than or equal to”
- less than also covers the meaning of “less than or equal to”
- higher than covers the meaning of “higher than or equal to”
- “lower than” also covers the meaning of "lower than or equal to”.
- An audio encoding/decoding method disclosed in the embodiments of the present disclosure may be applicable to various communication systems, for example, a third generation (3G) universal mobile communication system (UMTS) long term evolution (LTE) system, a fifth generation (5G) mobile communication system, a 5G new radio (NR) system, a sixth generation (6G) mobile communication system or other future new mobile communication systems.
- 3G third generation
- UMTS universal mobile communication system
- LTE long term evolution
- 5G fifth generation
- NR 5G new radio
- 6G sixth generation
- An audio encoding/decoding method disclosed in the embodiments of the present disclosure may also be applicable to a streaming media transmission system and an OTT (Over The Top) media transmission system.
- Fig. 1 is a flow chart of an audio coding method provided by an embodiment of the present disclosure.
- the audio coding method can be executed by an encoder. As shown in Fig. 1, the method can include but is not limited to steps S101 to S105.
- each channel group including a plurality of continuous channels in the channel sequence, and one or more identical channels exist between adjacent channel groups.
- the encoder may group the M channels in the channel sequence to obtain multiple channel groups.
- each channel group includes a number of continuous channels in the channel sequence, for example, may include 3 continuous channels.
- the adjacent channel groups are the first channel group and the second channel group.
- the first channel group and the second channel group respectively include three continuous channels in the channel sequence, and the first channel group and the second channel group include two identical channels.
- the 5 channels in the channel sequence are divided into channel group 1, channel group 2 and channel group 3.
- channel group 1 includes channel 1, channel 2 and channel 3;
- channel group 2 includes channel 2, channel 3 and channel 4;
- channel group 3 includes channel 3, channel 4 and channel 5.
- the encoder divides the audio signal of each channel in the channel sequence into a plurality of frames of fixed length, and performs a modified discrete cosine transform (MDCT) on each frame, thereby obtaining a frequency domain representation of each frame. Based on the frequency domain representation of each frame, the MDCT coefficient of each frame can be extracted from the frequency domain representation as the frequency domain coefficient of each frame.
- MDCT modified discrete cosine transform
- the channel sequence of the embodiment of the present disclosure includes M channels.
- each frame of audio data of each channel may include 2N sampling points, and the sampling rate is fs .
- Each frame after MDCT transformation may include N frequency points, and accordingly, the spectrum distribution range of MDCT coefficients is (0, fs /2), and the frequency resolution is fs /2N.
- S103 Determine, from the transformation matrix set, a target transformation matrix for each frequency band in the frequency band set corresponding to the channel group according to the frequency domain coefficients of each channel.
- the frequency bands may be divided in advance according to a psychoacoustic frequency band division method to obtain a frequency band set, wherein the frequency band set may include multiple divided frequency bands, for example, the frequency band set may include b divided frequency bands, wherein b is an integer greater than or equal to 1.
- Each frequency band in the frequency band set has a different frequency range, and the frequency ranges of adjacent frequency bands are continuous.
- the sampling point sequence of each frequency domain coefficient is multiplied by the frequency resolution to determine the frequency value of each frequency domain coefficient.
- n represents the nth sampling point corresponding to any frequency domain coefficient
- the value of n ranges from 1 to N, where N is the number of sampling points.
- the frequency value f corresponding to the frequency domain coefficient can be obtained by formula (1).
- the frequency value of the frequency domain coefficient is compared with the frequency range of each frequency band to obtain the frequency range where the frequency domain coefficient is located, so as to determine the frequency domain coefficient in each frequency band.
- the mutual correlation coefficients between different channels in the same frequency band are calculated based on the frequency domain coefficients between channels in the same frequency band. That is, for each frequency band in the frequency band set, the mutual correlation coefficients between two channels corresponding to the frequency band can be obtained based on the frequency domain coefficients between channels in the same frequency band.
- the mutual correlation coefficients between the two channels in the channel group on the frequency band b can be determined from the mutual correlation coefficients between the two channels corresponding to the frequency band b based on the channels included in the channel group.
- the target transformation matrix corresponding to the channel group on the frequency band b is determined from the transformation matrix set. It can be understood that the frequency band set includes B frequency bands, and the target transformation matrix of the channel group on each frequency band can be obtained.
- the transformation matrix set includes multiple transformation matrices, each of which may correspond to a decorrelation mode.
- the transformation matrix set may include M0, M1, M2, M3, and M4. The specific values of the transformation matrix are as follows:
- the decorrelation mode of the channel group in each frequency band can be determined based on the target transformation matrix of the channel group in each frequency band.
- the target transformation matrices corresponding to different frequency bands are different, the decorrelation modes corresponding to different frequency bands are also different.
- the target transformation matrices corresponding to different frequency bands are the same, the decorrelation modes corresponding to different frequency bands are also the same.
- the target transformation matrix of frequency band 1 is M1
- the target transformation matrix of frequency band 2 is M2
- the target transformation matrix of frequency band 3 is M1
- the decorrelation modes of frequency band 1 and frequency band 3 are the same, but the decorrelation modes of frequency band 1 and frequency band 3 are different from those of frequency band 2.
- frequency domain coefficients of channels in a channel group are differentiated by frequency bands to obtain frequency domain coefficients on each frequency band, and decorrelation processing is performed on the frequency domain coefficients on the same frequency band of the channels in the channel group based on a target transformation matrix corresponding to the same frequency band, thereby obtaining channel group coding information.
- the target transformation matrix of the channel group on frequency band b can be determined to be M4.
- the frequency domain coefficients of the channels L, C, and R on frequency band b can form a frequency domain coefficient matrix, and the frequency domain coefficient matrix of the channel group on frequency band b is subjected to matrix operation with the target transformation matrix M4 corresponding to frequency band b to obtain the first coding information of the channel group on frequency band b, that is, the frequency domain coefficients of the channels L, C, and R on frequency band b in the channel group are decorrelated through the target transformation matrix M4 corresponding to frequency band b to obtain the coding information of the channel group on frequency band b, and the channels L, C, and R in the channel group can be co-frequency decorrelated to reduce co-frequency interference and reduce redundancy and transmission cost.
- the frequency domain coefficients of the channel group in each frequency band can be decorrelated by using the target transformation matrix corresponding to each frequency band to obtain the coding information of the channel group in each frequency band. After the coding information is obtained, the coding information of the channel group in the whole frequency band can be obtained according to the coding information in each frequency band. It can be understood that the coding information of the channel group includes the coding information of the channel group in all frequency bands.
- the coding information of the channel group may be encoded in binary to obtain a binary coding stream. That is, the coding information of each channel is converted into a binary code, and the binary codes of all channel groups are connected to form a coding stream. In one implementation, the coding stream is sent to a decoder for decoding to restore the original channel signal.
- the target transformation matrix of the channel group in each frequency band needs to be sent.
- the target transformation matrix of each frequency band can be written into the coded bitstream and sent together with the coding information of the channel, or can be sent to the decoder separately and synchronously with the coded bitstream.
- the encoder obtains frequency domain coefficients by dividing and grouping the channel signals in frequency bands, and the target transformation matrix of each frequency band corresponding to the channel group can be determined based on the frequency domain coefficients of each channel.
- the frequency domain coefficients of the channel are further decorrelated according to the target transformation matrix to obtain the encoding information of the channel group, and the encoded code stream is obtained based on the encoding information and then sent to the decoder for decoding.
- the frequency domain coefficients on each frequency band in the channel group are encoded through the target transformation matrix, thereby achieving compression of audio signals of multiple channels, reducing redundancy between multiple channels, reducing the burden on the encoder, and reducing transmission and storage costs.
- Fig. 2 is a flow chart of an audio coding method provided by an embodiment of the present disclosure.
- the audio coding method can be executed by an encoder. As shown in Fig. 2, the method can include but is not limited to steps S201 to S207.
- each channel group including a plurality of continuous channels in the channel sequence, and one or more identical channels exist between adjacent channel groups.
- the implementation method of S201 can be implemented by any method in the embodiments of the present disclosure, which is not limited here and will not be repeated.
- the implementation method of S202 can be implemented by any method in the embodiments of the present disclosure, which is not limited here and will not be repeated.
- S203 Determine a first cross-correlation matrix between channels corresponding to each frequency band according to the frequency domain coefficients of each channel.
- the energy value of any channel in each frequency band is determined according to the frequency domain coefficient of any channel, and the first cross-correlation matrix corresponding to each frequency band is determined according to the energy value of each channel in each frequency band.
- the root mean square energy RMS of any channel can be calculated based on the frequency domain coefficients of any channel to determine the energy value of any channel in each frequency band.
- the calculation formula of RMS is as follows:
- c represents the channel index value, which ranges from 1 to M, where M is the number of input channels; b is the frequency band index value; Xi is the frequency domain coefficient of the corresponding channel; and Nc,b is the number of frequency points in the frequency band of the channel.
- the number of frequency points can be calculated based on the width of the frequency band and the frequency resolution corresponding to the frequency band.
- the energy ratio of each pair of channels in the channel sequence in any frequency band b is obtained.
- the mutual correlation coefficient between each pair of channels in any frequency band b is determined based on the energy ratio between the channels in any frequency band b.
- the energy ratio Q of two channels in the channel sequence in any frequency band b is obtained, wherein the formula for calculating the energy ratio Q is as follows:
- c represents the channel index value, which ranges from 1 to M, where M is the number of input channels; b is the number of frequency band divisions; c1 can be the same as or different from c2 , and similarly b1 can be the same as or different from b2 .
- energy discrimination can be performed using the energy ratio Q, and the mutual correlation coefficient between the channels in any frequency band b is determined based on the energy discrimination result. If the energy ratio on any frequency band b is less than or equal to a first set threshold, the mutual correlation coefficient between the two channels in any frequency band b is determined to be zero; if the energy ratio on any frequency band b is greater than or equal to a second set threshold, the mutual correlation coefficient between the two channels in any frequency band b is determined to be zero. The first set threshold is less than the second set threshold.
- the mutual correlation coefficient between the two channels in any frequency band b is determined according to the frequency domain coefficients of the two channels in any frequency band b.
- the mutual correlation coefficient of the two channels is 0; if the energy ratio Q of the two channels on any frequency band b is between (0.5, 2), the mutual correlation coefficient of the two channels is further calculated.
- the mutual correlation coefficient may be determined according to the size of Q.
- the formula for calculating the mutual correlation coefficient is as follows:
- a first mutual correlation matrix corresponding to any frequency band b can be obtained. For example, if the channel sequence includes M channels, the first mutual correlation matrix shown below can be obtained:
- the first row and column of the first cross-correlation matrix correspond to channel 1
- the second row and column correspond to channel 2
- the Mth row and column correspond to channel M.
- a corresponding first mutual correlation matrix can be determined in the above manner. If the frequency band set includes b frequency bands, there are b first mutual correlation matrices.
- a second correlation matrix corresponding to the channel group can be extracted from the first correlation matrix.
- the channel index of the channel in the channel group is determined, and the second correlation matrix of the channel group is extracted from the first correlation matrix according to the channel index.
- the channel group includes three channels, and the second correlation matrix corresponding to the channel group can be determined from the first correlation matrix based on the correlation coefficients between the two channels included in the channel group, and the second correlation matrix is a 3*3 matrix.
- the channel sequence includes 5 channels, and the channels are arranged in order as channel 1, channel 2, channel 3, channel 4 and channel 5.
- channel group 1 may include channel 1, channel 2 and channel 3
- channel group 2 may include channel 2, channel 3 and channel 4
- channel group 3 includes channel 3, channel 4 and channel 5.
- the first mutual correlation matrix is a 5*5 matrix, which can be shown in Figure 3.
- the matrix elements of the intersection of rows 2 to 4 and columns 2 to 4 can be intercepted from the first mutual correlation matrix as the second mutual correlation matrix of channel group 2.
- the second mutual correlation matrix of channel group 2 can be the part within the dotted box in the first mutual correlation matrix as shown in Figure 3, that is, the second mutual correlation matrix of channel group 2 is as follows:
- the channel group has a second cross-correlation matrix in each frequency band.
- S205 Determine a target transformation matrix of the channel group in each frequency band from a transformation matrix set based on the second cross-correlation matrix of the channel group in each frequency band.
- any frequency band b if the mutual correlation coefficients between the two channels included in the second mutual correlation matrix on any frequency band b meet the condition of selecting a specified transformation matrix in the transformation matrix set, then the specified transformation matrix is selected as the target transformation matrix for any frequency band b;
- a transformation matrix other than the specified transformation matrix is selected from the transformation matrix set as the target transformation matrix for any frequency band b.
- a mutual correlation coefficient threshold may be set, and a target transformation matrix of the channel group may be determined from the transformation matrix set according to the mutual correlation coefficient threshold.
- the second mutual correlation matrix of the channel group in each frequency band includes the mutual correlation coefficient between each pair of channels in the channel group. If the mutual correlation coefficient between each pair of channels is greater than the set threshold, a specified transformation matrix is selected as the target transformation matrix of any frequency band b; if the mutual correlation coefficient between each pair of channels is not greater than the threshold, a transformation matrix other than the specified transformation matrix is selected from the transformation matrix set according to the maximum mutual correlation coefficient between each pair of channels as the target transformation matrix of any frequency band b.
- the transformation matrix set may include M0, M1, M2, M3 and M4.
- the designated transformation matrix may be M4. If the correlation coefficients of the three channels are all greater than a threshold, the target transformation matrix is determined to be M4; if the maximum value of the correlation coefficients among the three channels is greater than a threshold, the target transformation matrix is determined according to the maximum value.
- the implementation method of S206 can be implemented by any method in the embodiments of the present disclosure, which is not limited here and will not be repeated.
- the implementation method of S207 can be implemented by any method in the embodiments of the present disclosure, which is not limited here and will not be repeated.
- the frequency domain coefficients on each frequency band in the channel group are encoded through the target transformation matrix, thereby achieving compression of audio signals of multiple channels, reducing redundancy between multiple channels, reducing the burden on the encoder, and reducing transmission and storage costs.
- Fig. 4 is a flow chart of an audio coding method provided by an embodiment of the present disclosure.
- the audio coding method may be executed by an encoder. As shown in Fig. 4, the method may include but is not limited to steps S401 to S408.
- each channel group including a plurality of continuous channels in the channel sequence, and one or more identical channels exist between adjacent channel groups.
- the implementation method of S401 can be implemented by any method in the embodiments of the present disclosure, which is not limited here and will not be repeated.
- the implementation method of S402 can be implemented by any method in the embodiments of the present disclosure, which is not limited here and will not be repeated.
- S403 Determine a first cross-correlation matrix between channels corresponding to each frequency band according to the frequency domain coefficients of each channel.
- the implementation method of S403 can be implemented by using any method in the embodiments of the present disclosure, which is not limited here and will not be described in detail.
- the first mutual correlation matrix of each frequency band between channels is normalized to obtain a normalized first mutual correlation matrix of each frequency band, so as to extract a second mutual correlation matrix of each frequency band corresponding to the channel group from the normalized first mutual correlation matrix of each frequency band.
- the second mutual correlation matrix is a normalized matrix.
- the channel identifier associated with any matrix element in the first mutual correlation matrix is determined.
- the normalized matrix element corresponding to any matrix element can be determined according to the associated channel identifier. Wherein any matrix element is a mutual correlation coefficient between two channels, the channel identifier corresponding to the row where any matrix element is located can be a channel identifier associated with any matrix element, and the channel identifier corresponding to the row where any matrix element is located can be another channel identifier associated with any matrix element.
- any matrix element Corr [2,3], b in the first correlation matrix corresponding to frequency band b is taken as an example for explanation, wherein the channels associated with any matrix element Corr [2,3], b are identified as 2 and 3, that is, the channels associated with any matrix element Corr [2,3], b are channel 2 and channel 3.
- the channels associated with any matrix element Corr [2,3], b are channel 2 and channel 3. It can be determined that the normalized matrix elements of any matrix element Corr [2,3], b are Corr [2,2], b and Corr [3,3], b .
- a normalized result of any matrix element is obtained according to any matrix element and a normalized matrix element.
- the normalized formula corresponding to any matrix element is as follows:
- b represents the frequency band index value; represents any matrix element; Represents the normalized matrix elements.
- the diagonal is the diagonal from the upper left corner to the lower right corner.
- S405 Determine the cross-correlation coefficients between any two channels in the channel group in any frequency band b based on the second cross-correlation matrix in any frequency band b.
- the correlation coefficients between any two channels in the channel group at any frequency band b can be determined based on the second correlation matrix.
- S406 Determine a target transformation matrix of the channel group in any frequency band b from the transformation matrix set according to the mutual correlation coefficients between any two channels in the channel group in any frequency band b.
- a threshold of the mutual correlation coefficient between two channels in a channel group in any frequency band b may be set as Thr, and a target transformation matrix may be determined according to preset conditions by comparing the mutual correlation coefficient between two channels in the channel group with the threshold.
- the specified transformation matrix M4 is selected as the target transformation matrix for any frequency band b.
- [L, C, R] are three channels in the channel group. If the mutual correlation coefficient between channel L and channel C, the mutual correlation coefficient between channel L and channel R, and the mutual correlation coefficient between channel C and channel R are all greater than Thr, then the specified transformation matrix M4 is selected as the target transformation matrix for any frequency band b.
- a transformation matrix other than the specified transformation matrix is selected as the target transformation matrix for any frequency band b according to the maximum mutual correlation coefficient between two channels.
- the maximum cross-correlation coefficient is selected from the three cross-correlation coefficients of the cross-correlation coefficient between the channel L and the channel C, the cross-correlation coefficient between the channel L and the channel R, and the cross-correlation coefficient between the channel C and the channel R. If the maximum cross-correlation coefficient is greater than Thr, a target transformation matrix of any frequency band b is selected from the transformation matrix other than the specified transformation matrix. For example, if the specified transformation matrix is M4, a target transformation matrix is selected from M0 to M3 according to the maximum cross-correlation coefficient.
- the target transformation matrix is selected according to formula (6):
- M1 is selected as the target transformation matrix
- M2 is selected as the target transformation matrix
- M3 selects M3 as the target transformation matrix
- the implementation method of S407 can be implemented by using any method in the embodiments of the present disclosure, which is not limited here and will not be described in detail.
- the implementation method of S408 can be implemented by any method in the embodiments of the present disclosure, which is not limited here and will not be repeated.
- the frequency domain coefficients on each frequency band in the channel group are encoded through the target transformation matrix, thereby achieving compression of audio signals of multiple channels, reducing redundancy between multiple channels, reducing the burden on the encoder, and reducing transmission and storage costs.
- Fig. 5 is a flow chart of an audio coding method provided by an embodiment of the present disclosure.
- the audio coding method may be executed by an encoder. As shown in Fig. 5, the method may include but is not limited to steps S501 to S507.
- S501 Grouping a channel sequence to obtain a plurality of channel groups, each channel group including a plurality of continuous channels in the channel sequence, and one or more overlapping channels exist between adjacent channel groups.
- the implementation method of S501 can be implemented by any method in the embodiments of the present disclosure, which is not limited here and will not be repeated.
- the implementation method of S502 can be implemented by any method in the embodiments of the present disclosure, which is not limited here and will not be repeated.
- S503 Determine, from the transformation matrix set, a target transformation matrix for each frequency band in the frequency band set corresponding to the channel group according to the frequency domain coefficients of each channel.
- the implementation method of S503 can be implemented by using any method in the embodiments of the present disclosure, which is not limited here and will not be described in detail.
- the first channel group includes channel 1, channel 2, and channel 3. Based on frequency domain coefficients of the channels in the first channel group in any frequency band b, and according to the frequency domain coefficients in any frequency band b and a target transformation matrix corresponding to any frequency band b, first encoding information of the first channel group in any frequency band b is obtained, and the first encoding information includes center information M 1 , side information S 1 , and first information T 1 of the first channel group.
- the first encoding information of the remaining channel group on any frequency band b is obtained, and the first encoding information includes the first information Ti of the remaining channel group.
- the first coded information of the first channel group completely includes the center information, the side information and the first information, and the first coded information of the remaining channel groups may only include the first information.
- [L, C, R] are the three channels in the channel group; M is the target transformation matrix determined according to the frequency domain coefficients; and [M S T] is the coding information of the channel group.
- the first decorrelation coding unit when performing decorrelation calculation, the first decorrelation coding unit outputs the coding information [M S T] in full, and the remaining coding units only output the first information [T].
- S505 Obtain second encoding information of the channel group according to the first encoding information on each frequency band of the channel group.
- a target transformation matrix of the channel group may be determined according to first encoding information of the channel group in each frequency band, and a decorrelation pattern of the channel group in each frequency band may be determined based on the target transformation matrix, that is, second encoding information of the channel group.
- the decorrelation modes corresponding to the different frequency bands are also different.
- the target transformation matrices corresponding to the different frequency bands are the same, the decorrelation modes corresponding to the different frequency bands are also the same.
- S506 Obtain coding information of the channel group based on the second coding information and the target transformation matrix corresponding to each frequency band.
- the frequency domain coefficients of the channel group in each frequency band may be decorrelated according to the second coding information by using the target transformation matrix corresponding to each frequency band, so as to obtain the coding information of the channel group in each frequency band.
- the coding information of the channel group in the whole frequency band may be obtained according to the coding information on each frequency band. It is understood that the coding information of the channel group includes the coding information of the channel group in all frequency bands.
- S507 Obtain a coded bitstream based on the coded information of the channel group, and send the coded bitstream to a decoder for decoding.
- the implementation method of S507 can be implemented by using any method in the embodiments of the present disclosure, which is not limited here and will not be described in detail.
- the frequency domain coefficients on each frequency band in the channel group are encoded through the target transformation matrix, thereby achieving compression of audio signals of multiple channels, reducing redundancy between multiple channels, reducing the burden on the encoder, and reducing transmission and storage costs.
- the audio signal in each channel is subjected to MDCT transformation to obtain the MDCT coefficients (frequency domain coefficients) of each channel in each frame.
- the channel signal is input into the band division processing unit for frequency band division to obtain the frequency domain coefficients on each frequency band.
- the energy value of the channel on each frequency band is calculated by energy calculation, and the energy value is input into the cross-correlation calculation unit to obtain the cross-correlation coefficients between the channels on each frequency band, so as to obtain the first cross-correlation matrix between the channels on each frequency band.
- each channel group corresponds to a decorrelation unit
- the frequency domain coefficients of the three channels in the channel group and the first cross-correlation matrix of each frequency band are input into the decorrelation unit, and the decorrelation unit performs co-frequency decorrelation processing on the channel group to obtain the coding information of the channel.
- the band division result of the frequency domain coefficients of channel 1, channel 2 and channel 3 in the first channel group and the first mutual correlation matrix of each frequency band are input into the decorrelation unit 1, and the decorrelation unit 1 performs the same frequency decorrelation processing, and outputs the coding information of channel group 1, and the coding information of channel group 1 includes the center information M 1 , the side information S 1 and the first information T 1 ;
- the band division result of the frequency domain coefficients of channel 2, channel 3 and channel 4 in channel group 2 and the first mutual correlation matrix of each frequency band are input into the decorrelation unit 2, and the decorrelation unit 2 performs the same frequency decorrelation processing, and outputs the coding information of channel group 2, and the coding information of channel group 2 includes the first information T 1 ;
- the band division result of the frequency domain coefficients of channel 3, channel 4 and channel 5 in channel group 3 and the first mutual correlation matrix of each frequency band are input into the decorrelation unit 3, and the decorrelation unit 3 performs the same frequency decorrelation processing, and outputs the same frequency
- the frequency domain coefficients on each frequency band in the channel group are encoded through the target transformation matrix, thereby achieving compression of audio signals of multiple channels, reducing redundancy between multiple channels, reducing the burden on the encoder, and reducing transmission and storage costs.
- Fig. 7 is a flow chart of an audio decoding method provided by an embodiment of the present disclosure.
- the audio decoding method may be executed by a decoder. As shown in Fig. 7, the method may include but is not limited to steps S701 to S704.
- S701 Receive an encoded bitstream sent by an encoder, where the encoded bitstream includes encoding information of multiple channel groups.
- the channel groups are obtained by grouping the channel sequences in order, each channel group includes a number of continuous channels in the channel sequence, and there are one or more identical channels between adjacent channel groups.
- the decoder receives the encoded code stream sent by the encoder, reads the encoding information of multiple channel groups from the encoded code stream, and performs inverse transformation on the input channel signal to obtain the original channel signal.
- the encoder side can group the M channels in the channel sequence to obtain multiple channel groups.
- Each channel group includes three consecutive channels in the channel sequence, and there is a channel group between adjacent channel groups.
- the specific process can be referred to the above embodiment, which will not be repeated here.
- S702 sequentially decode multiple channel groups, and for a decoded current channel group, determine a target decoding matrix for each frequency band in a frequency band set corresponding to the current channel group according to encoding information of the current channel group.
- second encoding information of the channel group in the entire frequency band may be obtained from the encoding information, the second encoding information including the first encoding information of each frequency band and the target transformation matrix corresponding to each frequency band.
- the target decoding matrix of the current channel group in any frequency band b is obtained.
- the transformation matrix corresponds to the decoding matrix one-to-one.
- the decoding matrix may include the following matrix:
- transformation matrix M0 corresponds to decoding matrix J0
- transformation matrix M1 corresponds to decoding matrix J1
- transformation matrix M2 corresponds to decoding matrix J2
- transformation matrix M3 corresponds to decoding matrix J3
- transformation matrix M4 corresponds to decoding matrix J4 . If the target transformation matrix of the current channel group is M4 , the target decoding matrix of the current channel group on any frequency band b is J4 .
- the encoding information of the current channel group includes second encoding information of the channel group in the entire frequency band, and the second encoding information includes first encoding information of the channel group in each frequency band.
- the first encoding information of the any frequency band b is decoded to obtain the first decoded frequency domain coefficient of the current channel group on the any frequency band b.
- the decoded frequency domain coefficients of the channel group on the entire frequency band can be obtained.
- S704 Obtain a decoded audio signal of each channel in the channel sequence according to the decoded frequency domain coefficients of the plurality of channel groups.
- the signals of the channel groups may be converted from the frequency domain to the time domain based on the decoded frequency domain coefficients of the multiple channel groups.
- the frequency domain signals of the channels may be converted into time domain signals based on the decoded frequency domain coefficients using an inverse MDCT transform, thereby obtaining decoded audio signals of each channel.
- the decoder receives the coded bit stream sent by the encoder and obtains the coding information of each channel group therefrom.
- the coded information is decoded by a decoding unit in the order of multiple channel groups to obtain the decoded frequency domain coefficients of each channel group, and the decoded frequency domain coefficients of each channel group are converted from frequency domain to time domain to obtain a decoded audio signal of each channel.
- the decoder uses frequency division decoding similar to that on the encoder side to realize the recovery of multi-channel audio signals. Since the encoder performs compression, the multi-channel signal is easier to transmit, saving transmission space.
- Fig. 8 is a flow chart of an audio decoding method provided by an embodiment of the present disclosure.
- the audio decoding method may be executed by a decoder. As shown in Fig. 8, the method may include but is not limited to steps S801 to S809.
- S801 Obtain first encoding information of the first channel group in any frequency band b from encoding information of the first channel group.
- each channel group includes three consecutive channels, wherein the first channel group includes channel 1, channel 2, and channel 3.
- coding information of the first channel group can be obtained from the coded bitstream, and the coding information of the first channel group includes first coding information of the first channel group in each frequency band, and the first coding information at least includes center information M 1 , side information S 1 , and first information T 1 of the first channel group.
- S802 Decode first coded information of the first channel group in each frequency band based on a target decoding matrix of the first channel group in each frequency band to obtain first decoded frequency domain coefficients of the first channel group in any frequency band b.
- S803 Obtain second decoded frequency domain coefficients of the first channel group according to the first decoded frequency domain coefficients of the first channel group in each frequency band, wherein the second decoded frequency domain coefficients are output in three paths.
- the decoder can obtain the target transformation matrix of the first channel group on any frequency band b from the encoding information corresponding to the first channel group. In one implementation, a correspondence between the transformation matrix and the decoding matrix is pre-established. In one implementation, the target decoding matrix of the first channel group on any frequency band b can be determined based on the target transformation matrix on any frequency band b.
- the first coded information on any frequency band b is inversely transformed to obtain the first decoded frequency domain coefficient of the first channel group on any frequency band b.
- the first encoding information of the any frequency band b is decoded to obtain the first decoded frequency domain coefficient of the first channel group on the any frequency band b.
- the first encoding information of the first channel group includes M 1 S 1 T 1
- the first decoded frequency domain coefficient on the any frequency band b after decoding includes three outputs
- the first decoded frequency domain coefficient may include
- the decoded frequency domain coefficients of the first channel group on the full frequency band can be obtained, wherein the decoded frequency domain coefficients of the first channel group on the full frequency band include three outputs:
- the decoding formula for the first channel group is as follows:
- [M 1 S 1 T 1 ] represents the second coding information of the first channel group
- J is the target decoding matrix, Represents the three-way decoded frequency-domain coefficients of the first channel group.
- S804 Determine a plurality of decoded channel groups that are adjacent to and continuous with the current channel group as upmix channel groups corresponding to the current channel group.
- a plurality of decoded channel groups adjacent to and continuous with the current channel group may be determined to perform decoding operations as upmixed channel groups corresponding to the current channel group.
- the plurality of decoded channel groups include two channel groups.
- the corresponding upmix channel group includes the decoded frequency domain coefficients of the last two outputs of channel group 1; if the current channel group is channel group 3, the corresponding upmix channel group includes the decoded frequency domain coefficients of the last output of channel group 1 and one decoded frequency domain coefficient output by channel group 2; if the current channel group is channel group 4, the corresponding upmix channel group includes one decoded frequency domain coefficient output by channel group 2 and one decoded frequency domain coefficient output by channel group 3.
- S805 Obtain first encoding information of the current channel group in any frequency band b from the encoding information.
- the decoder can obtain the first encoding information of the current channel group in any frequency band b from the encoding information of the current channel group, and the first encoding information is the first information Ti of the previous channel group in any frequency band b.
- a target transformation matrix of the current upmix channel group in each frequency band can be determined from the coded information received by the decoder, and a target decoding matrix of the upmix channel group is determined based on the target transformation matrix.
- the decoder performs an inverse transformation on the target decoding matrix to obtain decoded frequency domain coefficients of the upmix channel group in each frequency band.
- the decoder obtains the target transformation matrix of the current channel group in each frequency band from the encoding information, and further determines the target decoding matrix of the current channel group in each frequency band based on the target transformation matrix.
- the first encoded information Ti on any frequency band b and the first decoded frequency domain coefficient of the upmixed channel group on any frequency band b are decoded to obtain the first decoded frequency domain coefficient of the current channel group on any frequency band b.
- the first coded information of the current channel group includes the first information Ti , and the first decoded frequency domain coefficients on any frequency band b after decoding include one output, and the first decoded frequency domain coefficients may include
- S808 Obtain a decoded frequency domain coefficient of the current channel group according to the first decoded frequency domain coefficients of each frequency band of the current channel group, wherein the decoded frequency domain coefficients of the current channel group are output as one channel.
- the decoded frequency domain coefficients of the first channel group on the full frequency band can be obtained, wherein the decoded frequency domain coefficients of the first channel group on the full frequency band include three outputs:
- the decoding formula for the current channel group i is as follows:
- S809 Obtain a decoded audio signal of each channel in the channel sequence according to the decoded frequency domain coefficients of the plurality of channel groups.
- the implementation method of S809 can be implemented by any method in the embodiments of the present disclosure, which is not limited here and will not be repeated.
- Upmix decoding unit 1 is the first upmix decoding unit.
- the first coding information [M 1 S 1 T 1 ] is input into the first upmix decoding unit, and the output three-way decoding frequency domain coefficients are The first coded information T 2 , the first upmix decoding unit Output decoded frequency domain coefficients Input to the second upmix decoding unit, the output decoded frequency domain coefficient is The first coded information T 3 and the decoded frequency domain coefficients output by the first upmix decoding unit are and the decoded frequency domain coefficients output by the second upmix decoding unit Input to the third upmix decoding unit, the output decoded frequency domain coefficient is The first coded information Ti , the decoded frequency domain coefficient output by the i-2th upmix decoding unit and the decoded frequency domain coefficients output by the i-1th upmix decoding unit Input to the i-th upmix decoding unit,
- the decoder uses frequency division decoding similar to that on the encoder side to restore the multi-channel audio signal. Since the encoder performs compression, the multi-channel signal is easier to transmit, saving transmission space.
- Fig. 10 is a schematic flow chart of an audio coding method provided by an embodiment of the present disclosure. As shown in Fig. 10, the method may include but is not limited to steps S1001 to S1015.
- S1001 Group a channel sequence to obtain a plurality of channel groups, each channel group including a plurality of continuous channels in the channel sequence, and one or more overlapping channels exist between adjacent channel groups.
- S1003 Determine a first cross-correlation matrix between channels corresponding to each frequency band according to the frequency domain coefficients of each channel.
- S1004 Determine second correlation matrices of the channel group from the first correlation matrices of each frequency band, respectively.
- the second correlation matrix includes correlation coefficients between channels in the channel group.
- S1005 Determine a target transformation matrix of the channel group in each frequency band from a transformation matrix set based on the second cross-correlation matrix of the channel group in each frequency band.
- S1007 Obtain second encoding information of the channel group according to the first encoding information on each frequency band of the channel group.
- S1008 Obtain coding information of the channel group based on the second coding information and the target transformation matrix corresponding to each frequency band.
- S1009 Obtain a coded bitstream based on the coded information of the channel group, and send the coded bitstream to a decoder for decoding.
- S1010 Receive an encoded bitstream sent by an encoder.
- S1015 Obtain a decoded audio signal of each channel in the channel sequence according to the decoded frequency domain coefficients of the plurality of channel groups.
- the frequency domain coefficients on each frequency band in the channel group are encoded through the target transformation matrix, thereby achieving compression of audio signals of multiple channels, reducing redundancy between multiple channels, reducing the burden of the encoder, and reducing transmission and storage costs.
- the decoder uses frequency division decoding similar to that on the encoder side to achieve recovery of multi-channel audio signals.
- FIG11 is a block diagram of an audio encoding device according to an exemplary embodiment.
- the audio encoding device 1100 of the embodiment of the present disclosure includes: a channel grouping module 1101 , a frequency domain conversion module 1102 , a matrix determination module 1103 , an encoding module 1104 and a sending module 1105 .
- the channel grouping module 1101 is configured to perform grouping on the channel sequence to obtain a plurality of channel groups, each of which includes a plurality of continuous channels in the channel sequence, and one or more identical channels exist between adjacent channel groups.
- the frequency domain processing module 1102 is configured to perform frequency domain conversion on the audio signal of each channel in the channel sequence frame by frame to obtain a frequency domain coefficient of each frame of each channel.
- the matrix determination module 1103 is configured to determine, from a transformation matrix set, a target transformation matrix for each frequency band in a frequency band set corresponding to a channel group according to frequency domain coefficients of each channel.
- the encoding module 1104 is configured to execute a target transformation matrix based on each frequency band, perform same-band decorrelation processing on frequency domain coefficients of channels in the channel group, and obtain encoding information of the channel group.
- the sending module 1105 is configured to execute the encoding information based on the channel group to obtain the encoded bitstream, and send the encoded bitstream to the decoder for decoding.
- the matrix determination module 1103 is further configured to perform: determining a first mutual correlation matrix between channels corresponding to each frequency band based on frequency domain coefficients of each channel; determining a second mutual correlation matrix of the channel group from the first mutual correlation matrix of each frequency band, the second correlation matrix including mutual correlation coefficients between channels in the channel group; and determining a target transformation matrix of the channel group on each frequency band from a transformation matrix set based on the second mutual correlation matrix of the channel group on each frequency band.
- the encoding module 1104 is further configured to perform: obtaining frequency domain coefficients of the channels in the channel group on any frequency band b, and obtaining first encoding information of the channel group on any frequency band b according to the frequency domain coefficients on any frequency band b and the target transformation matrix corresponding to any frequency band b; obtaining first encoding information of the channel group on any frequency band b according to the first
- the coding information is used to obtain second coding information of the channel group; and the coding information of the channel group is obtained based on the second coding information and the target transformation matrix corresponding to each frequency band.
- the matrix determination module 1103 is further configured to perform: determining the mutual correlation coefficient between two channels in the channel group on any frequency band b based on the second mutual correlation matrix on any frequency band b; determining the target transformation matrix of the channel group on any frequency band b from the transformation matrix set according to the mutual correlation coefficient between two channels in the channel group on any frequency band b.
- the matrix determination module 1103 is further configured to execute: if the mutual correlation coefficient between the two channels meets the condition of selecting a specified transformation matrix in the transformation matrix set, then the specified transformation matrix is selected as the target transformation matrix for any frequency band b; if the mutual correlation coefficient between the two channels does not meet the condition, then according to the maximum mutual correlation coefficient between the two channels, a transformation matrix other than the specified transformation matrix is selected from the transformation matrix set as the target transformation matrix for any frequency band b.
- the matrix determination module 1103 is further configured to execute: determining the energy value of any channel in each frequency band according to the frequency domain coefficient of any channel; and determining the first cross-correlation matrix corresponding to each frequency band according to the energy value of each channel in each frequency band.
- the matrix determination module 1103 is further configured to execute: obtaining the energy ratio of each pair of channels in the channel sequence on any frequency band b; if the energy ratio on any frequency band b is less than or equal to a first set threshold, or if the energy ratio on any frequency band b is greater than or equal to a second set threshold, determining that the mutual correlation coefficient between each pair of channels in any frequency band b is zero; wherein the first set threshold is less than the second set threshold, and if the energy ratio is between the first set threshold and the second set threshold, determining the mutual correlation coefficient between each pair of channels in any frequency band b according to the frequency domain coefficients of each pair of channels in any frequency band b; and obtaining a first mutual correlation matrix corresponding to any frequency band b based on the mutual correlation coefficient between each pair of channels in any frequency band b.
- the matrix determination module 1103 is further configured to perform: normalizing the first mutual correlation matrix of each frequency band, and extracting the second mutual correlation matrix of each frequency band corresponding to the channel group from the normalized first mutual correlation matrix of each frequency band according to the channels included in the channel group.
- the matrix determination module 1103 is further configured to execute: determining a channel identifier associated with any matrix element in the first correlation matrix; determining a normalized matrix element corresponding to any matrix element based on the associated channel identifier; and obtaining a normalized result of any matrix element based on any matrix element and the normalized matrix element.
- the encoding module 1104 is further configured to perform: for the first channel group, based on the frequency domain coefficients of the channels in the first channel group on any frequency band b, and according to the frequency domain coefficients on any frequency band b and the target transformation matrix corresponding to any frequency band b, obtain the first encoding information of the first channel group on any frequency band b, the first encoding information including the center information, the side information and the first information of the first channel group; for the channels other than the first channel group, For each remaining channel group, first coding information of the remaining channel group on any frequency band b is obtained based on the frequency domain coefficients of the channels in the remaining channel group on any frequency band b and according to the frequency domain coefficients on any frequency band b and the target transformation matrix corresponding to any frequency band b, wherein the first coding information includes the first information of the remaining channel group.
- the channel grouping module 1101 is further configured to execute: determining that adjacent channel groups include a first channel group and a second channel group, wherein the first channel group and the second channel group respectively include three consecutive channels in a channel sequence, and the first channel group and the second channel group include two identical channels.
- the frequency domain coefficients on each frequency band in the channel group are encoded through the target transformation matrix, thereby achieving compression of audio signals of multiple channels, reducing redundancy between multiple channels, reducing the burden on the encoder, and reducing transmission and storage costs.
- Fig. 12 is a block diagram of an audio decoding device according to an exemplary embodiment.
- the audio decoding device 1200 according to the embodiment of the present disclosure includes: a receiving module 1201 , a matrix determining module 1202 , and a decoding module 1203 .
- the receiving module 1201 is configured to execute receiving the encoded bitstream sent by the encoder, wherein the encoded bitstream includes encoding information of multiple channel groups, the channel groups are obtained by grouping the channel sequence in sequence, each channel group includes a number of consecutive channels in the channel sequence, and there are one or more identical channels between adjacent channel groups.
- the matrix determination module 1202 is configured to perform sequential decoding of multiple channel groups, and determine, for a decoded current channel group, a target decoding matrix for each frequency band in a frequency band set corresponding to the current channel group according to encoding information of the current channel group.
- the decoding module 1203 is configured to execute the target decoding matrix of the current channel group in each frequency band, obtain the decoded frequency domain coefficients of the current channel group based on the encoded information of the current channel group, and obtain the decoded audio signal of each channel in the channel sequence according to the decoded frequency domain coefficients of multiple channel groups.
- the matrix determination module 1202 is further configured to execute: obtaining a target transformation matrix for each frequency band from the encoding information; querying a mapping relationship between the transformation matrix and the decoding matrix based on the target transformation matrix for any frequency band b, and obtaining a target decoding matrix for the current channel group on any frequency band b.
- the decoding module 1203 is further configured to execute: obtaining first encoding information of the first channel group on any frequency band b from the encoding information of the first channel group; decoding the first encoding information of the first channel group on any frequency band b based on the target decoding matrix of the first channel group on any frequency band b to obtain first decoded frequency domain coefficients of the first channel group on any frequency band b; obtaining decoded frequency domain coefficients of the first channel group according to the first decoded frequency domain coefficients of the first channel group on each frequency band, wherein the first decoded frequency domain coefficients of the first channel group and the decoded frequency domain coefficients include three outputs.
- the decoding module 1203 is further configured to execute: the first encoding information of the first channel group in any frequency band b at least includes the center information, the side information and the first information in any frequency band b.
- the decoding module 1203 is further configured to perform: determining a number of decoded channel groups adjacent to and continuous with the current channel group as the upmixed channel group corresponding to the current channel group; obtaining the first encoding information of the current channel group on any frequency band b from the encoding information; obtaining the decoded frequency domain coefficients of the upmixed channel group on each frequency band; decoding the first encoding information on any frequency band b according to the target decoding matrix corresponding to any frequency band b and the decoded frequency domain coefficients on any frequency band b to obtain the first decoded frequency domain coefficients on any frequency band b; obtaining the decoded frequency domain coefficients of the current channel group according to the first decoded frequency domain coefficients on each frequency band of the current channel group, and the decoded frequency domain coefficients of the current channel group include one output.
- the decoding module 1203 is further configured to execute: the first encoded information of the current channel group in any frequency band b includes the first information of the current channel group in the any frequency band b.
- the decoder uses frequency division decoding similar to that on the encoder side to achieve recovery of multi-channel audio signals. Since the encoder performs compression, the multi-channel signal is easier to transmit, saving transmission space.
- FIG13 is a schematic diagram of the structure of another audio processing device 1300 provided in an embodiment of the present disclosure.
- the audio processing device 1300 may be an encoder, or a decoder, or a chip, a chip system, or a processor that supports the encoder to implement the above method, or a chip, a chip system, or a processor that supports the decoder to implement the above method.
- the device may be used to implement the method described in the above method embodiment, and the details may refer to the description in the above method embodiment.
- the audio processing device 1300 may include one or more processors 1301.
- the processor 1301 may be a general-purpose processor or a dedicated processor, etc. For example, it may be a baseband processor or a central processing unit.
- the baseband processor may be used to process the communication protocol and communication data
- the central processing unit may be used to control the audio processing device (such as a base station, a baseband chip, a decoder, a decoder chip, a DU or a CU, etc.), execute a computer program, and process the data of the computer program.
- the audio processing device 1300 may further include one or more memories 1302, on which a computer program 1304 may be stored, and the processor 1301 executes the computer program 1304 to enable the audio processing device 1300 to perform the method described in the above method embodiment.
- data may also be stored in the memory 1302.
- the audio processing device 1300 and the memory 1302 may be provided separately or integrated together.
- the audio processing device 1300 may further include a transceiver 1305 and an antenna 1306.
- the transceiver 1305 may be referred to as a transceiver unit, a transceiver, or a transceiver circuit, etc., and is used to implement a transceiver function.
- the transceiver 1305 may include a receiver and a transmitter, the receiver may be referred to as a receiver or a receiving circuit, etc., and is used to implement a receiving function; the transmitter may be referred to as a transmitter or a transmitting circuit, etc., and is used to implement a transmitting function.
- the audio processing device 1300 may further include one or more interface circuits 1307.
- the interface circuit 1307 is used to receive code instructions and transmit them to the processor 1301.
- the processor 1301 executes the code instructions to enable the audio processing device 1300 to perform the method described in the above method embodiment.
- the processor 1301 may include a transceiver for implementing the receiving and sending functions.
- the transceiver may be a transceiver circuit, an interface, or an interface circuit.
- the transceiver circuit, interface, or interface circuit for implementing the receiving and sending functions may be separate or integrated.
- the above-mentioned transceiver circuit, interface, or interface circuit may be used for reading and writing code/data, or the above-mentioned transceiver circuit, interface, or interface circuit may be used for transmitting or delivering signals.
- the processor 1301 may store a computer program 1303, which runs on the processor 1301 and enables the audio processing device 1300 to perform the method described in the above method embodiment.
- the computer program 1303 may be fixed in the processor 1301, in which case the processor 1301 may be implemented by hardware.
- the audio processing device 1300 may include a circuit that can implement the functions of sending or receiving or communicating in the aforementioned method embodiments.
- the processor and transceiver described in the present disclosure may be implemented in an integrated circuit (IC), an analog IC, a radio frequency integrated circuit RFIC, a mixed signal IC, an application specific integrated circuit (ASIC), a printed circuit board (PCB), an electronic device, etc.
- the processor and transceiver may also be manufactured using various IC process technologies, such as complementary metal oxide semiconductor (CMOS), N-type metal oxide semiconductor (NMOS), P-type metal oxide semiconductor (positive channel metal oxide semiconductor, PMOS), bipolar junction transistor (BJT), bipolar CMOS (BiCMOS), silicon germanium (SiGe), gallium arsenide (GaAs), etc.
- CMOS complementary metal oxide semiconductor
- NMOS N-type metal oxide semiconductor
- PMOS P-type metal oxide semiconductor
- BJT bipolar junction transistor
- BiCMOS bipolar CMOS
- SiGe silicon germanium
- GaAs gallium arsenide
- the audio processing device described in the above embodiments may be an encoder or a decoder, but the scope of the audio processing device described in the present disclosure is not limited thereto, and the structure of the audio processing device may not be limited by FIG. 13.
- the audio processing device may be an independent device or may be part of a larger device.
- the audio processing device may be:
- ICs having a set of one or more ICs, which in some embodiments may also include a storage component for storing data and computer programs;
- ASIC such as modem
- the audio processing device can be a chip or a chip system
- the schematic diagram of the chip structure shown in Figure 14 includes a processor 1401 and an interface 1402.
- the number of processors 1401 can be one or more, and the number of interfaces 1402 can be multiple.
- the chip further includes a memory 1403, which is used to store necessary computer programs and data.
- the chip can be used to implement the functions of the decoder in the above-mentioned embodiments of the present disclosure.
- the chip can be used to implement the functions of the encoder in the above-mentioned embodiments of the present disclosure.
- the present disclosure also provides an audio processing system, which includes the audio processing device as an encoder and the audio processing device as a decoder in the aforementioned embodiment of FIG. 13, or the system includes the audio processing device as an encoder and the audio processing device as an encoder in the aforementioned embodiment of FIG. 14.
- the embodiment of the present disclosure also provides a readable storage medium on which instructions are stored. When the instructions are executed by a computer, the functions of any of the above method embodiments are implemented.
- the embodiments of the present disclosure also provide a computer program product including a computer program, which implements the functions of any of the above method embodiments when executed by a computer.
- the present disclosure also provides a computer program, which, when executed on a computer, enables the computer to execute the functions of any of the above method embodiments.
- all or part of the embodiments may be implemented by software, hardware, firmware or any combination thereof.
- all or part of the embodiments may be implemented in the form of a computer program product.
- the computer program product includes one or more computer programs.
- the computer program When the computer program is loaded and executed on a computer, the process or function described in the embodiments of the present disclosure is generated in whole or in part.
- the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
- the computer program may be stored in a computer-readable storage medium or transferred from one computer-readable storage medium to another computer-readable storage medium, for example.
- the computer program can be transmitted from one website, computer, server or data center to another website, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means.
- the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server or data center that includes one or more available media integrated.
- the available medium can be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a high-density digital video disc (DVD)), or a semiconductor medium (e.g., a solid state disk (SSD)), etc.
- a magnetic medium e.g., a floppy disk, a hard disk, a magnetic tape
- an optical medium e.g., a high-density digital video disc (DVD)
- DVD high-density digital video disc
- SSD solid state disk
- At least one in the present disclosure may also be described as one or more, and a plurality may be two, three, four or more, which is not limited in the present disclosure.
- the technical features in the technical feature are distinguished by “first”, “second”, “third”, “A”, “B”, “C” and “D”, etc., and there is no order of precedence or size between the technical features described by the "first”, “second”, “third”, “A”, “B”, “C” and “D”.
- the correspondences shown in the tables of the present disclosure can be configured or predefined.
- the values of the information in each table are only examples and can be configured to other values, which are not limited by the present disclosure.
- the correspondences shown in some rows may not be configured.
- appropriate deformation adjustments can be made based on the above table, such as splitting, merging, etc.
- the names of the parameters shown in the titles of the above tables can also use other names that can be understood by the audio processing device, and the values or representations of the parameters can also use other values or representations that can be understood by the audio processing device.
- other data structures can also be used, such as arrays, queues, containers, stacks, linear lists, pointers, linked lists, trees, graphs, structures, classes, heaps, hash tables or hash tables.
- the predefined in the present disclosure may be understood as defined, predefined, stored, pre-stored, pre-negotiated, pre-configured, solidified, or pre-burned.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Mathematical Physics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
f=n*fs/2N (1)
[L C R]*M=[M S T] (7)
Claims (27)
- 一种音频编码方法,其特征在于,由编码器执行,所述方法包括:对声道序列进行分组,得到多个声道组,每个所述声道组包括所述声道序列中连续的若干个声道,相邻的所述声道组间存在一个或多个相同的声道;对所述声道序列中各声道的音频信号按帧进行频域转换,得到各声道每帧的频域系数;根据各声道的所述频域系数,从变换矩阵集中确定所述声道组对应的频带集中各频带的目标变换矩阵;基于各频带的所述目标变换矩阵,对所述声道组内声道的频域系数进行同频带去相关处理,得到所述声道组的编码信息;基于所述声道组的编码信息得到编码码流,并将所述编码码流发给解码器进行解码。
- 根据权利要求1所述的方法,其特征在于,所述根据各声道的所述频域系数,从变换矩阵集中确定所述声道组对应的各频带的目标变换矩阵,包括:根据各声道的所述频域系数,确定各频带对应的声道间的第一互相关矩阵;从各频带的所述第一互相关矩阵中分别确定所述声道组的第二互相关矩阵,所述第二相关矩阵包括所述声道组内声道之间的互相关系数;基于所述声道组在各频带上的所述第二互相关矩阵,从所述变换矩阵集中确定所述声道组在各频带上的目标变换矩阵。
- 根据权利要求1或2所述的方法,其特征在于,所述基于各频带的所述目标变换矩阵,对所述声道组中声道的频域系数进行同频带去相关处理,得到所述声道组的编码信息,包括:获取所述声道组内声道在任一频带b上的频域系数,并根据所述任一频带b上的频域系数和所述任一频带b对应的所述目标变换矩阵,得到所述声道组在所述任一频带b上的第一编码信息;根据所述声道组的各频带上的所述第一编码信息,得到所述声道组的第二编码信息;基于所述第二编码信息和各频带对应的所述目标变换矩阵,得到所述声道组的编码信息。
- 根据权利要求2所述的方法,其特征在于,所述基于所述声道组在各频带上的第二互相关矩阵,从所述变换矩阵集中确定所述声道组在各频带上的目标变换矩阵,包括:基于任一频带b上的所述第二互相关矩阵,确定所述声道组内两两声道间在任一频带上的互相关系数;根据所述声道组内两两声道间在所述任一频带b上的互相关系数,从所述变换矩阵集中确定所述声道组在所述任一频带b上的目标变换矩阵。
- 根据权利要求4所述的方法,其特征在于,所述根据所述声道组内两两声道间在任一频带b上的互相关系数,从所述变换矩阵集中,确定所述声道组在所述任一频带上的目标变换矩阵,包括:若所述两两声道之间的互相关系数满足选用所述变换矩阵集中指定变换矩阵的条件,则选取所述指定变换矩阵作为所述任一频带b的目标变换矩阵;若所述两两声道之间的互相关系数未满足所述条件,则根据所述两两声道之间的最大互相关系数,从所述变换矩阵集中,选取除所述指定变换矩阵之外的变换矩阵,作为所述任一频带b的目标变换矩阵。
- 根据权利要求2所述的方法,其特征在于,所述根据各声道的所述频域系数,确定各频带对应的声道间的第一互相关矩阵,包括:根据任一声道的频域系数,确定所述任一声道在各频带上的能量值;根据各声道在各频带的能量值,确定各频带对应的所述第一互相关矩阵。
- 根据权利要求6所述的方法,其特征在于,所述根据各声道在各频带的能量值,确定各频带对应的所述第一互相关矩阵,包括:获取所述声道序列中两两声道在任一频带b上的能量比值;若所述任一频带b上的能量比值小于或者等于第一设定阈值,或者,所述任一频带b上的能量比值大于或者等于第二设定阈值确定所述两两声道间在所述任一频带b的互相关系数为零,其中,所述第一设定阈值小于所述第二设定阈值;若所述任一频带b上的能量比值处于所述第一设定阈值与所述第二设定阈值之间,根据所述两两声道在所述任一频带b的频域系数,确定所述两两声道在所述任一频带b的互相关系数;基于所述两两声道在所述任一频带b的互相关系数,得到所述任一频带b对应的所述第一互相关矩阵。
- 根据权利要求2所述的方法,其特征在于,所述从各频带的所述第一互相关矩阵中分别确定所述声道组的第二互相关矩阵,还包括:对各频带的所述第一互相关矩阵进行归一化处理,并根据所述声道组内所包括的声道,从各频带的归一化第一互相关矩阵中,提取所述声道组对应的各频带的所述第二互相关矩阵。
- 根据权利要求7所述的方法,其特征在于,所述对所述第一互相关矩阵进行归一化处理,包括:确定所述第一互相关矩阵中的任一矩阵元素所关联的声道标识;根据所述所关联的声道标识,确定所述任一矩阵元素对应的归一化矩阵元素;根据所述任一矩阵元素和所述归一化矩阵元素,得到所述任一矩阵元素的归一化结果。
- 根据权利要求3至9中任一项所述的方法,其特征在于,所述方法还包括:针对首个声道组,基于所述首个声道组内声道在所述任一频带b上的频域系数,并根据所述任一频带b上的频域系数和所述任一频带b对应的所述目标变换矩阵,得到所述首个声道组在所述任一频带b上的第一编码信息,所述第一编码信息包括所述首个声道组的中心信息、侧信息和第一信息;针对除所述首个声道组之外的每个剩余声道组,基于所述剩余声道组内声道在所述任一频带b上的频域系数,并根据所述任一频带b上的频域系数和所述任一频带b对应的所述目标变换矩阵,得到所述剩余声道组在所述任一频带b上的第一编码信息,所述第一编码信息包括剩余声道组的第一信息。
- 根据权利要求1至9中任一项所述的方法,其特征在于,所述相邻的所述声道组间存在一个或多个重叠的声道,包括:确定所述相邻的声道组包括第一声道组和所述第二声道组,其中,所述第一声道组和所述第二声道组分别包括所述声道序列中的连续三个声道,所述第一声道组和所述第二声道组中包括两个相同的声道。
- 一种音频解码方法,其特征在于,由解码器执行,所述方法包括:接收编码器发送的编码码流,所述编码码流中包括多个声道组的编码信息,所述声道组由声道序列按序分组得到,每个所述声道组包括所述声道序列中连续的若干个声道,相邻的所述声道组间存在一个或多个相同的声道;对所述多个声道组按序进行解码,针对解码到的当前声道组,根据所述当前声道组的编码信息,确定所述当前声道组对应的频带集中各频带的目标解码矩阵;基于所述当前声道组在各频带上的目标解码矩阵,对所述当前声道组的编码信息,得到所述当前声道组的解码频域系数;根据所述多个声道组的解码频域系数,获取所述声道序列中各声道的解码音频信号。
- 根据权利要求12所述的方法,其特征在于,所述根据所述当前声道组的编码信息,确定所述当前声道组对应的频带集中各频带的目标解码矩阵,包括:从所述编码信息中获取各频带的目标变换矩阵;根据任一频带b的所述目标变换矩阵,查询变换矩阵与解码矩阵之间的映射关系,获取所述当前声道组在所述任一频带b上的目标解码矩阵。
- 根据权利要求12或13所述的方法,其特征在于,所述当前声道为所述多个声道组中的首个声道组的情况下,其中,所述基于所述当前声道组在各频带上的目标解码矩阵,对所述当前声道组的编码信息,得到所述当前声道组的解码频域系数,包括:从所述首个声道组的编码信息中,获取所述首个声道组在任一频带b上的第一编码信息;基于所述首个声道组在所述任一频带b上的目标解码矩阵,对所述首个声道组在所述任一频带b上的第一编码信息进行解码,得到所述首个声道组在所述任一频带b上的第一解码频域系数;根据所述首个声道组在各频带上的所述第一解码频域系数,得到所述首个声道组的解码频域系数,所述首个声道组的第一解码频域系数和所述解码频域系数包括三路输出。
- 根据权利要求14所述的方法,其特征在于,所述首个声道组在任一频带b上的第一编码信息,至少包括所述任一频带上的中心信息、侧信息和第一信息。
- 根据权利要求12或13所述的方法,其特征在于,所述当前声道为所述多个声道组中的除首个声道组之外的声道组的情况下,其中,所述基于所述当前声道组在各频带上的目标解码矩阵,对所述当前声道组的编码信息,得到所述当前声道组的解码频域系数,包括:确定与所述当前声道组相邻且连续的若干个已解码声道组,为所述当前声道组对应的上混声道组;从所述编码信息中获取所述当前声道组在任一频带b上的第一编码信息;获取所述上混声道组在各频带上的解码频域系数;根据所述任一频带b对应的目标解码矩阵和所述任一频带b上的解码频域系数,对所述任一频带b上的第一编码信息进行解码,得到所述任一频带b上的第一解码频域系数;根据所述当前声道组的各频带上的所述第一解码频域系数,得到所述当前声道组的解码频域系数,所述当前声道组的解码频域系数为一路输出。
- 根据权利要求16所述的方法,其特征在于,所述当前声道组在任一频带b上的第一编码信息包括所述当前声道组在所述任一频带b上的第一信息。
- 一种音频编码装置,其特征在于,包括:声道分组模块,被配置为执行对声道序列进行分组,得到多个声道组,每个所述声道组包括所述声道序列中连续的若干个声道,相邻的所述声道组间存在一个或多个相同的声道;频域处理模块,被配置为执行对所述声道序列中各声道的音频信号按帧进行频域转换,得到各声道每帧的频域系数;矩阵确定模块,被配置为执行根据各声道的所述频域系数,从变换矩阵集中确定所述声道组对应的频带集中各频带的目标变换矩阵;编码模块,被配置为执行基于各频带的所述目标变换矩阵,对所述声道组内声道的频域系数进行同频带去相关处理,得到所述声道组的编码信息;发送模块,被配置为执行基于所述声道组的编码信息得到编码码流,并将所述编码码流发给解码器进行解码。
- 一种音频解码装置,其特征在于,包括:接收模块,被配置为执行接收编码器发送的编码码流,所述编码码流中包括多个声道组的编码信息,所述声道组由声道序列按序分组得到,每个所述声道组包括所述声道序列中连续的若干个声道,相邻的所述声道组间存在一个或多个相同的声道;矩阵确定模块,被配置为执行对所述多个声道组按序进行解码,针对解码到的当前声道组,根据所述当前声道组的编码信息,确定所述当前声道组对应的频带集中各频带的目标解码矩阵;解码模块,被配置为执行基于所述当前声道组在各频带上的目标解码矩阵,对所述当前声道组的编码信息,得到所述当前声道组的解码频域系数,并根据所述多个声道组的解码频域系数,获取所述声道序列中各声道的解码音频信号。
- 一种编码器,其特征在于,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为实现权利要求1至11中任一项所述方法的步骤。
- 一种解码器,其特征在于,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为实现权利要求12至17中任一项所述方法的步骤。
- 一种计算机可读存储介质,其上存储有计算机程序指令,其特征在于,该程序指令被处理器执行时实现权利要求1至11中任一项所述方法的步骤。
- 一种计算机可读存储介质,其上存储有计算机程序指令,其特征在于,该程序指令被处理器执行时实现权利要求12至17中任一项所述方法的步骤。
- 一种包括计算机程序的计算机程序产品,当其在计算机上运行时,使得计算机执行根据权利要求1至11中任一项所述的音频编码方法。
- 一种包括计算机程序的计算机程序产品,当其在计算机上运行时,使得计算机执行根据权利要求12至17中任一项所述的音频解码方法。
- 一种计算机程序,其特征在于,当其在计算机上运行时,使得计算机执行如权利要求1至11中任一项所述的音频编码方法。
- 一种计算机程序,其特征在于,当其在计算机上运行时,使得计算机执行如权利要求12至17中任一项所述的音频解码方法。
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR1020257037772A KR20250168677A (ko) | 2023-04-14 | 2024-04-12 | 오디오 인코딩 방법, 장치, 전자 기기 및 저장 매체 |
| EP24788243.4A EP4697325A4 (en) | 2023-04-14 | 2024-04-12 | AUDIO CODING METHOD AND APPARATUS, AND ELECTRONIC DEVICE AND STORAGE MEDIA |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310403661.8 | 2023-04-14 | ||
| CN202310403661.8A CN116434760A (zh) | 2023-04-14 | 2023-04-14 | 一种音频编码方法、装置、电子设备及存储介质 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024213147A1 true WO2024213147A1 (zh) | 2024-10-17 |
Family
ID=87079338
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2024/087626 Ceased WO2024213147A1 (zh) | 2023-04-14 | 2024-04-12 | 一种音频编码方法、装置、电子设备及存储介质 |
Country Status (4)
| Country | Link |
|---|---|
| EP (1) | EP4697325A4 (zh) |
| KR (1) | KR20250168677A (zh) |
| CN (1) | CN116434760A (zh) |
| WO (1) | WO2024213147A1 (zh) |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116434760A (zh) * | 2023-04-14 | 2023-07-14 | 北京小米移动软件有限公司 | 一种音频编码方法、装置、电子设备及存储介质 |
| CN117730532A (zh) * | 2023-10-31 | 2024-03-19 | 北京小米移动软件有限公司 | 编解码方法、终端、网络设备以及存储介质 |
| CN120108406B (zh) * | 2023-11-30 | 2025-11-28 | 荣耀终端股份有限公司 | 音频处理方法、车载音频设备、电子设备及车辆 |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101071570A (zh) * | 2007-06-21 | 2007-11-14 | 北京中星微电子有限公司 | 耦合声道的编、解码处理方法、音频编码装置及解码装置 |
| US20090112606A1 (en) * | 2007-10-26 | 2009-04-30 | Microsoft Corporation | Channel extension coding for multi-channel source |
| US20120020482A1 (en) * | 2010-07-22 | 2012-01-26 | Samsung Electronics Co., Ltd. | Apparatus and method for encoding and decoding multi-channel audio signal |
| CN102982805A (zh) * | 2012-12-27 | 2013-03-20 | 北京理工大学 | 一种基于张量分解的多声道音频信号压缩方法 |
| CN103400582A (zh) * | 2013-08-13 | 2013-11-20 | 武汉大学 | 面向多声道三维音频的编解码方法与系统 |
| CN104240712A (zh) * | 2014-09-30 | 2014-12-24 | 武汉大学深圳研究院 | 一种三维音频多声道分组聚类编码方法及系统 |
| CN113948095A (zh) * | 2020-07-17 | 2022-01-18 | 华为技术有限公司 | 多声道音频信号的编解码方法和装置 |
| CN116434760A (zh) * | 2023-04-14 | 2023-07-14 | 北京小米移动软件有限公司 | 一种音频编码方法、装置、电子设备及存储介质 |
Family Cites Families (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7502743B2 (en) * | 2002-09-04 | 2009-03-10 | Microsoft Corporation | Multi-channel audio encoding and decoding with multi-channel transform selection |
| WO2006056100A1 (en) * | 2004-11-24 | 2006-06-01 | Beijing E-World Technology Co., Ltd | Coding/decoding method and device utilizing intra-channel signal redundancy |
| MX2007014570A (es) * | 2005-05-25 | 2008-02-11 | Koninkl Philips Electronics Nv | Codificacion predictiva de una senal de canales multiples. |
| US8190425B2 (en) * | 2006-01-20 | 2012-05-29 | Microsoft Corporation | Complex cross-correlation parameters for multi-channel audio |
| US8046214B2 (en) * | 2007-06-22 | 2011-10-25 | Microsoft Corporation | Low complexity decoder for complex transform coding of multi-channel sound |
| KR101698439B1 (ko) * | 2010-04-09 | 2017-01-20 | 돌비 인터네셔널 에이비 | Mdct-기반의 복소수 예측 스테레오 코딩 |
| CN105336334B (zh) * | 2014-08-15 | 2021-04-02 | 北京天籁传音数字技术有限公司 | 多声道声音信号编码方法、解码方法及装置 |
| EP4440151A4 (en) * | 2021-11-26 | 2024-11-27 | Beijing Xiaomi Mobile Software Co., Ltd. | METHOD AND DEVICE FOR STEREO AUDIO SIGNAL PROCESSING, ENCODING DEVICE, DECODING DEVICE AND STORAGE MEDIUM |
-
2023
- 2023-04-14 CN CN202310403661.8A patent/CN116434760A/zh active Pending
-
2024
- 2024-04-12 WO PCT/CN2024/087626 patent/WO2024213147A1/zh not_active Ceased
- 2024-04-12 KR KR1020257037772A patent/KR20250168677A/ko active Pending
- 2024-04-12 EP EP24788243.4A patent/EP4697325A4/en active Pending
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101071570A (zh) * | 2007-06-21 | 2007-11-14 | 北京中星微电子有限公司 | 耦合声道的编、解码处理方法、音频编码装置及解码装置 |
| US20090112606A1 (en) * | 2007-10-26 | 2009-04-30 | Microsoft Corporation | Channel extension coding for multi-channel source |
| US20120020482A1 (en) * | 2010-07-22 | 2012-01-26 | Samsung Electronics Co., Ltd. | Apparatus and method for encoding and decoding multi-channel audio signal |
| CN102982805A (zh) * | 2012-12-27 | 2013-03-20 | 北京理工大学 | 一种基于张量分解的多声道音频信号压缩方法 |
| CN103400582A (zh) * | 2013-08-13 | 2013-11-20 | 武汉大学 | 面向多声道三维音频的编解码方法与系统 |
| CN104240712A (zh) * | 2014-09-30 | 2014-12-24 | 武汉大学深圳研究院 | 一种三维音频多声道分组聚类编码方法及系统 |
| CN113948095A (zh) * | 2020-07-17 | 2022-01-18 | 华为技术有限公司 | 多声道音频信号的编解码方法和装置 |
| CN116434760A (zh) * | 2023-04-14 | 2023-07-14 | 北京小米移动软件有限公司 | 一种音频编码方法、装置、电子设备及存储介质 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4697325A4 * |
Also Published As
| Publication number | Publication date |
|---|---|
| KR20250168677A (ko) | 2025-12-02 |
| EP4697325A4 (en) | 2026-04-08 |
| EP4697325A1 (en) | 2026-02-18 |
| CN116434760A (zh) | 2023-07-14 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2024213147A1 (zh) | 一种音频编码方法、装置、电子设备及存储介质 | |
| CN112822491B (zh) | 一种图像数据的编码、解码方法及装置 | |
| EP3874492A1 (en) | Determination of spatial audio parameter encoding and associated decoding | |
| EP3732678A1 (en) | Determination of spatial audio parameter encoding and associated decoding | |
| EP3973460A1 (en) | Linear neural reconstruction for deep neural network compression | |
| CN103167289B (zh) | 图像的编码、解码方法及编码、解码装置 | |
| WO2024164284A1 (zh) | 一种音频信号处理、装置、设备及存储介质 | |
| JP2023510556A (ja) | オーディオ符号化および復号方法ならびにオーディオ符号化および復号デバイス | |
| EP3991170A1 (en) | Determination of spatial audio parameter encoding and associated decoding | |
| US20140372389A1 (en) | Data Encoding and Processing Columnar Data | |
| CN102547291B (zh) | 基于fpga的jpeg2000图像解码装置及方法 | |
| CN109391358B (zh) | 极化码编码的方法和装置 | |
| KR100804640B1 (ko) | 서브밴드 합성 필터링 방법 및 장치 | |
| EP3367575A1 (en) | Removal of dummy bits prior to bit collection for 3gpp lte circular buffer rate matching | |
| JP2026514117A (ja) | オーディオ符号化方法、装置、電子デバイス及び記憶媒体 | |
| WO2025138715A1 (zh) | 一种图像处理方法及其相关设备 | |
| CN109076224B (zh) | 视频解码器及其制造方法,数据处理电路、系统和方法 | |
| CN107436876A (zh) | 文件分割系统及方法 | |
| KR102938940B1 (ko) | 멀티-채널 오디오 신호 인코딩/디코딩 방법 및 장치 | |
| JPWO2020009082A1 (ja) | 符号化装置及び符号化方法 | |
| WO2023202296A1 (zh) | 信号处理方法和设备 | |
| WO2024108449A1 (zh) | 一种信号量化方法、装置、设备及存储介质 | |
| US9955163B2 (en) | Two pass quantization of video data | |
| WO2024020904A1 (zh) | 智能反射表面irs的相移配置的发送、接收方法及装置 | |
| CN112715009B (zh) | 预编码矩阵的指示方法、通信装置及存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24788243 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2025559952 Country of ref document: JP Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2025559952 Country of ref document: JP |
|
| REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112025022203 Country of ref document: BR |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202517108312 Country of ref document: IN |
|
| ENP | Entry into the national phase |
Ref document number: 1020257037772 Country of ref document: KR Free format text: ST27 STATUS EVENT CODE: A-0-1-A10-A15-NAP-PA0105 (AS PROVIDED BY THE NATIONAL OFFICE) |
|
| WWE | Wipo information: entry into national phase |
Ref document number: KR1020257037772 Country of ref document: KR Ref document number: 1020257037772 Country of ref document: KR |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024788243 Country of ref document: EP Ref document number: 2025131047 Country of ref document: RU |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2024788243 Country of ref document: EP Effective date: 20251114 |
|
| ENP | Entry into the national phase |
Ref document number: 2024788243 Country of ref document: EP Effective date: 20251114 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 11202506928Y Country of ref document: SG |
|
| WWP | Wipo information: published in national office |
Ref document number: 11202506928Y Country of ref document: SG |
|
| ENP | Entry into the national phase |
Ref document number: 2024788243 Country of ref document: EP Effective date: 20251114 |
|
| WWP | Wipo information: published in national office |
Ref document number: 1020257037772 Country of ref document: KR |
|
| WWP | Wipo information: published in national office |
Ref document number: 202517108312 Country of ref document: IN |
|
| ENP | Entry into the national phase |
Ref document number: 2024788243 Country of ref document: EP Effective date: 20251114 |
|
| ENP | Entry into the national phase |
Ref document number: 2024788243 Country of ref document: EP Effective date: 20251114 |
|
| ENP | Entry into the national phase |
Ref document number: 2024788243 Country of ref document: EP Effective date: 20251114 |
|
| ENP | Entry into the national phase |
Ref document number: 2024788243 Country of ref document: EP Effective date: 20251114 |
|
| ENP | Entry into the national phase |
Ref document number: 2024788243 Country of ref document: EP Effective date: 20251114 |
|
| ENP | Entry into the national phase |
Ref document number: 2024788243 Country of ref document: EP Effective date: 20251114 |
|
| WWP | Wipo information: published in national office |
Ref document number: 2025131047 Country of ref document: RU |
|
| WWP | Wipo information: published in national office |
Ref document number: 2024788243 Country of ref document: EP |