WO2024213147A1 - 一种音频编码方法、装置、电子设备及存储介质 - Google Patents

一种音频编码方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2024213147A1
WO2024213147A1 PCT/CN2024/087626 CN2024087626W WO2024213147A1 WO 2024213147 A1 WO2024213147 A1 WO 2024213147A1 CN 2024087626 W CN2024087626 W CN 2024087626W WO 2024213147 A1 WO2024213147 A1 WO 2024213147A1
Authority
WO
WIPO (PCT)
Prior art keywords
frequency band
channel
channel group
matrix
channels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2024/087626
Other languages
English (en)
French (fr)
Inventor
张广硕
王宾
刘勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Original Assignee
Beijing Xiaomi Mobile Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Mobile Software Co Ltd filed Critical Beijing Xiaomi Mobile Software Co Ltd
Priority to KR1020257037772A priority Critical patent/KR20250168677A/ko
Priority to EP24788243.4A priority patent/EP4697325A4/en
Publication of WO2024213147A1 publication Critical patent/WO2024213147A1/zh
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0212Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/022Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/032Quantisation or dequantisation of spectral components
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders

Definitions

  • the present disclosure relates to the field of audio processing technology, and in particular to an audio encoding method and an audio decoding method and a device, an electronic device, a storage medium, a computer program product, and a computer program.
  • the existing two-dimensional audio algorithm (2 dimension mid-side, 2D M/S) can effectively reduce data redundancy between multiple channels, in a variety of different audio scenarios, the existing algorithm increases the transmission cost during data transmission, resulting in data waste.
  • the embodiments of the present disclosure provide an audio encoding method and an audio decoding method and apparatus thereof, an electronic device, a computer-readable storage medium, a computer program product, and a computer program to solve the problem of waste of transmission and storage media during multi-channel audio transmission.
  • the technical solution of the present disclosure is as follows:
  • an embodiment of the present disclosure provides an audio encoding method, which is executed by an encoder, and the method includes: grouping a channel sequence to obtain multiple channel groups, each of the channel groups includes a number of continuous channels in the channel sequence, and there are one or more identical channels between adjacent channel groups; performing frequency domain conversion on the audio signal of each channel in the channel sequence frame by frame to obtain a frequency domain coefficient of each channel per frame; determining, from a transformation matrix set, a target transformation matrix for each band in a frequency band set corresponding to the channel group according to the frequency domain coefficient of each channel; performing same-band decorrelation processing on the frequency domain coefficients of the channels in the channel group based on the target transformation matrix of each frequency band to obtain encoding information of the channel group; obtaining an encoded bitstream based on the encoding information of the channel group, and sending the encoded bitstream to a decoder for decoding.
  • an embodiment of the present disclosure provides an audio decoding method, which is executed by a decoder, the method comprising: receiving an encoded bitstream sent by an encoder, the encoded bitstream comprising encoding information of a plurality of channel groups, the channel groups being obtained by grouping a channel sequence in sequence, each of the channel groups comprising a plurality of continuous channels in the channel sequence, and one or more identical channels existing between adjacent channel groups; decoding the plurality of channel groups in sequence, and for a current channel group decoded, determining a frequency domain corresponding to the current channel group according to the encoding information of the current channel group; a target decoding matrix for each frequency band in the band set; based on the target decoding matrix of the current channel group on each frequency band, obtaining the decoded frequency domain coefficients of the current channel group for the encoding information of the current channel group; and obtaining the decoded audio signal of each channel in the channel sequence according to the decoded frequency domain coefficients of the multiple channel groups.
  • an embodiment of the present disclosure provides an audio encoding device, comprising: a channel grouping module, configured to perform grouping of the channel sequence to obtain multiple channel groups, each of the channel groups includes a number of continuous channels in the channel sequence, and there are one or more identical channels between adjacent channel groups; a frequency domain processing module, configured to perform frequency domain conversion of the audio signal of each channel in the channel sequence on a frame-by-frame basis to obtain frequency domain coefficients of each channel per frame; a matrix determination module, configured to perform, based on the frequency domain coefficients of each channel, determination of a target transformation matrix for each band in a frequency band set corresponding to the channel group from a transformation matrix set; an encoding module, configured to perform same-band decorrelation processing on the frequency domain coefficients of the channels in the channel group based on the target transformation matrix for each frequency band to obtain encoding information of the channel group; a sending module, configured to obtain an encoding code stream based on the encoding information of the channel group, and send the encoding
  • an embodiment of the present disclosure provides an audio decoding device, including a receiving module, configured to execute receiving an encoded bitstream sent by an encoder, the encoded bitstream including encoding information of multiple channel groups, the channel groups are obtained by grouping a channel sequence in sequence, each of the channel groups includes a number of consecutive channels in the channel sequence, and there are one or more identical channels between adjacent channel groups; a matrix determination module, configured to execute decoding of the multiple channel groups in sequence, and for a current channel group decoded, determine a target decoding matrix for each frequency band in a frequency band set corresponding to the current channel group according to the encoding information of the current channel group; a decoding module, configured to execute a target decoding matrix based on the target decoding matrix of the current channel group on each frequency band, obtain a decoded frequency domain coefficient of the current channel group for the encoding information of the current channel group, and obtain a decoded audio signal of each channel in the channel sequence according to the decoded frequency domain coefficients of the multiple
  • an embodiment of the present disclosure provides an encoder, comprising a processor; and a memory for storing processor-executable instructions; wherein the processor is configured to implement the steps of the method described in the first aspect of the embodiment of the present disclosure.
  • an embodiment of the present disclosure provides a decoder, comprising a processor; and a memory for storing processor-executable instructions; wherein the processor is configured to implement the steps of the method described in the second aspect of the embodiment of the present disclosure.
  • an embodiment of the present disclosure provides a computer-readable storage medium having computer program instructions stored thereon, which, when executed by a processor, implement the steps of the method described in the first aspect of the embodiment of the present disclosure.
  • an embodiment of the present disclosure provides a computer-readable storage medium having computer program instructions stored thereon, which, when executed by a processor, implement the steps of the method described in the second aspect of the embodiment of the present disclosure.
  • an embodiment of the present disclosure provides an encoder, which includes a processor and an interface circuit, wherein the interface circuit is used to receive code instructions and transmit them to the processor, and the processor is used to run the code instructions to enable the device to execute the method described in the first aspect above.
  • an embodiment of the present disclosure provides a decoder, the device comprising a processor and an interface circuit, the interface circuit being used to receive code instructions and transmit them to the processor, the processor being used to execute the code instructions to enable the device to execute the method described in the second aspect above.
  • the embodiments of the present disclosure provide a coding and decoding system, which includes the encoding device described in the third aspect and the decoding device described in the fourth aspect, or the system includes the encoder described in the fifth aspect and the decoder described in the sixth aspect, or the system includes the encoder described in the seventh aspect and the encoding device described in the eighth aspect, or the system includes the encoder described in the ninth aspect and the decoder described in the tenth aspect.
  • an embodiment of the present invention provides a computer-readable storage medium for storing instructions for the above-mentioned encoder, and when the instructions are executed, the encoder executes the method described in the first aspect.
  • an embodiment of the present invention provides a computer-readable storage medium for storing instructions for the above-mentioned decoder, and when the instructions are executed, the decoder executes the method described in the second aspect.
  • an embodiment of the present disclosure further provides a computer program product comprising a computer program, which, when executed on a computer, enables the computer to execute the method described in the first aspect above.
  • an embodiment of the present disclosure further provides a computer program product comprising a computer program, which, when executed on a computer, enables the computer to execute the method described in the second aspect above.
  • an embodiment of the present disclosure provides a chip system, which includes at least one processor and an interface, and is used to support a network device to implement the functions involved in the first aspect, for example, to determine or process at least one of the data and information involved in the above method.
  • the chip system also includes a memory, and the memory is used to store computer programs and data necessary for the network device.
  • the chip system can be composed of a chip, or it can include a chip and other discrete devices.
  • an embodiment of the present disclosure provides a chip system, which includes at least one processor and an interface, and is used to support a terminal device to implement the functions involved in the second aspect, for example, determining or processing at least one of the data and information involved in the above method.
  • the chip system also includes a memory, and the memory is used to store computer programs and data necessary for the terminal device.
  • the chip system can be composed of a chip, or it can include a chip and other discrete devices.
  • an embodiment of the present disclosure provides a computer program, which, when executed on a computer, enables the computer to execute the method described in the first aspect above.
  • an embodiment of the present disclosure provides a computer program, which, when executed on a computer, enables the computer to execute the method described in the second aspect above.
  • the encoder obtains frequency domain coefficients by dividing and grouping the channel signals, and the target transformation matrix of each frequency band corresponding to the channel group can be determined based on the frequency domain coefficients of each channel. Further, the frequency domain coefficients of the channel are decorrelated according to the target transformation matrix to obtain The coding information of the channel group is used to obtain a coded bit stream based on the coded information and then sent to a decoder for decoding.
  • the frequency domain coefficients on each frequency band in the channel group are coded by the target transformation matrix, thereby compressing the audio signals of multiple channels, reducing the redundancy between multiple channels, reducing the burden on the encoder, and reducing the transmission and storage costs.
  • Fig. 1 is a flowchart showing an audio encoding method according to an exemplary embodiment.
  • Fig. 2 is a flowchart showing an audio encoding method according to another exemplary embodiment.
  • Fig. 3 is a schematic diagram of a cross-correlation matrix according to an exemplary embodiment.
  • Fig. 4 is a flowchart showing an audio encoding method according to another exemplary embodiment.
  • Fig. 5 is a flowchart showing an audio encoding method according to another exemplary embodiment.
  • Fig. 6 is a schematic diagram showing an audio encoding method according to an exemplary embodiment.
  • Fig. 7 is a flowchart showing an audio decoding method according to an exemplary embodiment.
  • Fig. 8 is a flowchart showing an audio decoding method according to another exemplary embodiment.
  • Fig. 9 is a schematic diagram showing an audio decoding method according to an exemplary embodiment.
  • Fig. 10 is a flowchart showing an audio encoding method according to another exemplary embodiment.
  • Fig. 11 is a block diagram showing an audio encoding device according to an exemplary embodiment.
  • Fig. 12 is a block diagram showing an audio decoding device according to an exemplary embodiment.
  • Fig. 13 is a block diagram showing an audio processing device according to an exemplary embodiment.
  • Fig. 14 is a block diagram of another audio processing chip according to an exemplary embodiment.
  • the term “if” as used herein may be interpreted as “at the time of” or “when” or “in response to determining”
  • the terms used herein to characterize the size relationship are “greater than” or “less than”, “higher than” or “lower than”.
  • greater than also covers the meaning of “greater than or equal to”
  • less than also covers the meaning of “less than or equal to”
  • higher than covers the meaning of “higher than or equal to”
  • “lower than” also covers the meaning of "lower than or equal to”.
  • An audio encoding/decoding method disclosed in the embodiments of the present disclosure may be applicable to various communication systems, for example, a third generation (3G) universal mobile communication system (UMTS) long term evolution (LTE) system, a fifth generation (5G) mobile communication system, a 5G new radio (NR) system, a sixth generation (6G) mobile communication system or other future new mobile communication systems.
  • 3G third generation
  • UMTS universal mobile communication system
  • LTE long term evolution
  • 5G fifth generation
  • NR 5G new radio
  • 6G sixth generation
  • An audio encoding/decoding method disclosed in the embodiments of the present disclosure may also be applicable to a streaming media transmission system and an OTT (Over The Top) media transmission system.
  • Fig. 1 is a flow chart of an audio coding method provided by an embodiment of the present disclosure.
  • the audio coding method can be executed by an encoder. As shown in Fig. 1, the method can include but is not limited to steps S101 to S105.
  • each channel group including a plurality of continuous channels in the channel sequence, and one or more identical channels exist between adjacent channel groups.
  • the encoder may group the M channels in the channel sequence to obtain multiple channel groups.
  • each channel group includes a number of continuous channels in the channel sequence, for example, may include 3 continuous channels.
  • the adjacent channel groups are the first channel group and the second channel group.
  • the first channel group and the second channel group respectively include three continuous channels in the channel sequence, and the first channel group and the second channel group include two identical channels.
  • the 5 channels in the channel sequence are divided into channel group 1, channel group 2 and channel group 3.
  • channel group 1 includes channel 1, channel 2 and channel 3;
  • channel group 2 includes channel 2, channel 3 and channel 4;
  • channel group 3 includes channel 3, channel 4 and channel 5.
  • the encoder divides the audio signal of each channel in the channel sequence into a plurality of frames of fixed length, and performs a modified discrete cosine transform (MDCT) on each frame, thereby obtaining a frequency domain representation of each frame. Based on the frequency domain representation of each frame, the MDCT coefficient of each frame can be extracted from the frequency domain representation as the frequency domain coefficient of each frame.
  • MDCT modified discrete cosine transform
  • the channel sequence of the embodiment of the present disclosure includes M channels.
  • each frame of audio data of each channel may include 2N sampling points, and the sampling rate is fs .
  • Each frame after MDCT transformation may include N frequency points, and accordingly, the spectrum distribution range of MDCT coefficients is (0, fs /2), and the frequency resolution is fs /2N.
  • S103 Determine, from the transformation matrix set, a target transformation matrix for each frequency band in the frequency band set corresponding to the channel group according to the frequency domain coefficients of each channel.
  • the frequency bands may be divided in advance according to a psychoacoustic frequency band division method to obtain a frequency band set, wherein the frequency band set may include multiple divided frequency bands, for example, the frequency band set may include b divided frequency bands, wherein b is an integer greater than or equal to 1.
  • Each frequency band in the frequency band set has a different frequency range, and the frequency ranges of adjacent frequency bands are continuous.
  • the sampling point sequence of each frequency domain coefficient is multiplied by the frequency resolution to determine the frequency value of each frequency domain coefficient.
  • n represents the nth sampling point corresponding to any frequency domain coefficient
  • the value of n ranges from 1 to N, where N is the number of sampling points.
  • the frequency value f corresponding to the frequency domain coefficient can be obtained by formula (1).
  • the frequency value of the frequency domain coefficient is compared with the frequency range of each frequency band to obtain the frequency range where the frequency domain coefficient is located, so as to determine the frequency domain coefficient in each frequency band.
  • the mutual correlation coefficients between different channels in the same frequency band are calculated based on the frequency domain coefficients between channels in the same frequency band. That is, for each frequency band in the frequency band set, the mutual correlation coefficients between two channels corresponding to the frequency band can be obtained based on the frequency domain coefficients between channels in the same frequency band.
  • the mutual correlation coefficients between the two channels in the channel group on the frequency band b can be determined from the mutual correlation coefficients between the two channels corresponding to the frequency band b based on the channels included in the channel group.
  • the target transformation matrix corresponding to the channel group on the frequency band b is determined from the transformation matrix set. It can be understood that the frequency band set includes B frequency bands, and the target transformation matrix of the channel group on each frequency band can be obtained.
  • the transformation matrix set includes multiple transformation matrices, each of which may correspond to a decorrelation mode.
  • the transformation matrix set may include M0, M1, M2, M3, and M4. The specific values of the transformation matrix are as follows:
  • the decorrelation mode of the channel group in each frequency band can be determined based on the target transformation matrix of the channel group in each frequency band.
  • the target transformation matrices corresponding to different frequency bands are different, the decorrelation modes corresponding to different frequency bands are also different.
  • the target transformation matrices corresponding to different frequency bands are the same, the decorrelation modes corresponding to different frequency bands are also the same.
  • the target transformation matrix of frequency band 1 is M1
  • the target transformation matrix of frequency band 2 is M2
  • the target transformation matrix of frequency band 3 is M1
  • the decorrelation modes of frequency band 1 and frequency band 3 are the same, but the decorrelation modes of frequency band 1 and frequency band 3 are different from those of frequency band 2.
  • frequency domain coefficients of channels in a channel group are differentiated by frequency bands to obtain frequency domain coefficients on each frequency band, and decorrelation processing is performed on the frequency domain coefficients on the same frequency band of the channels in the channel group based on a target transformation matrix corresponding to the same frequency band, thereby obtaining channel group coding information.
  • the target transformation matrix of the channel group on frequency band b can be determined to be M4.
  • the frequency domain coefficients of the channels L, C, and R on frequency band b can form a frequency domain coefficient matrix, and the frequency domain coefficient matrix of the channel group on frequency band b is subjected to matrix operation with the target transformation matrix M4 corresponding to frequency band b to obtain the first coding information of the channel group on frequency band b, that is, the frequency domain coefficients of the channels L, C, and R on frequency band b in the channel group are decorrelated through the target transformation matrix M4 corresponding to frequency band b to obtain the coding information of the channel group on frequency band b, and the channels L, C, and R in the channel group can be co-frequency decorrelated to reduce co-frequency interference and reduce redundancy and transmission cost.
  • the frequency domain coefficients of the channel group in each frequency band can be decorrelated by using the target transformation matrix corresponding to each frequency band to obtain the coding information of the channel group in each frequency band. After the coding information is obtained, the coding information of the channel group in the whole frequency band can be obtained according to the coding information in each frequency band. It can be understood that the coding information of the channel group includes the coding information of the channel group in all frequency bands.
  • the coding information of the channel group may be encoded in binary to obtain a binary coding stream. That is, the coding information of each channel is converted into a binary code, and the binary codes of all channel groups are connected to form a coding stream. In one implementation, the coding stream is sent to a decoder for decoding to restore the original channel signal.
  • the target transformation matrix of the channel group in each frequency band needs to be sent.
  • the target transformation matrix of each frequency band can be written into the coded bitstream and sent together with the coding information of the channel, or can be sent to the decoder separately and synchronously with the coded bitstream.
  • the encoder obtains frequency domain coefficients by dividing and grouping the channel signals in frequency bands, and the target transformation matrix of each frequency band corresponding to the channel group can be determined based on the frequency domain coefficients of each channel.
  • the frequency domain coefficients of the channel are further decorrelated according to the target transformation matrix to obtain the encoding information of the channel group, and the encoded code stream is obtained based on the encoding information and then sent to the decoder for decoding.
  • the frequency domain coefficients on each frequency band in the channel group are encoded through the target transformation matrix, thereby achieving compression of audio signals of multiple channels, reducing redundancy between multiple channels, reducing the burden on the encoder, and reducing transmission and storage costs.
  • Fig. 2 is a flow chart of an audio coding method provided by an embodiment of the present disclosure.
  • the audio coding method can be executed by an encoder. As shown in Fig. 2, the method can include but is not limited to steps S201 to S207.
  • each channel group including a plurality of continuous channels in the channel sequence, and one or more identical channels exist between adjacent channel groups.
  • the implementation method of S201 can be implemented by any method in the embodiments of the present disclosure, which is not limited here and will not be repeated.
  • the implementation method of S202 can be implemented by any method in the embodiments of the present disclosure, which is not limited here and will not be repeated.
  • S203 Determine a first cross-correlation matrix between channels corresponding to each frequency band according to the frequency domain coefficients of each channel.
  • the energy value of any channel in each frequency band is determined according to the frequency domain coefficient of any channel, and the first cross-correlation matrix corresponding to each frequency band is determined according to the energy value of each channel in each frequency band.
  • the root mean square energy RMS of any channel can be calculated based on the frequency domain coefficients of any channel to determine the energy value of any channel in each frequency band.
  • the calculation formula of RMS is as follows:
  • c represents the channel index value, which ranges from 1 to M, where M is the number of input channels; b is the frequency band index value; Xi is the frequency domain coefficient of the corresponding channel; and Nc,b is the number of frequency points in the frequency band of the channel.
  • the number of frequency points can be calculated based on the width of the frequency band and the frequency resolution corresponding to the frequency band.
  • the energy ratio of each pair of channels in the channel sequence in any frequency band b is obtained.
  • the mutual correlation coefficient between each pair of channels in any frequency band b is determined based on the energy ratio between the channels in any frequency band b.
  • the energy ratio Q of two channels in the channel sequence in any frequency band b is obtained, wherein the formula for calculating the energy ratio Q is as follows:
  • c represents the channel index value, which ranges from 1 to M, where M is the number of input channels; b is the number of frequency band divisions; c1 can be the same as or different from c2 , and similarly b1 can be the same as or different from b2 .
  • energy discrimination can be performed using the energy ratio Q, and the mutual correlation coefficient between the channels in any frequency band b is determined based on the energy discrimination result. If the energy ratio on any frequency band b is less than or equal to a first set threshold, the mutual correlation coefficient between the two channels in any frequency band b is determined to be zero; if the energy ratio on any frequency band b is greater than or equal to a second set threshold, the mutual correlation coefficient between the two channels in any frequency band b is determined to be zero. The first set threshold is less than the second set threshold.
  • the mutual correlation coefficient between the two channels in any frequency band b is determined according to the frequency domain coefficients of the two channels in any frequency band b.
  • the mutual correlation coefficient of the two channels is 0; if the energy ratio Q of the two channels on any frequency band b is between (0.5, 2), the mutual correlation coefficient of the two channels is further calculated.
  • the mutual correlation coefficient may be determined according to the size of Q.
  • the formula for calculating the mutual correlation coefficient is as follows:
  • a first mutual correlation matrix corresponding to any frequency band b can be obtained. For example, if the channel sequence includes M channels, the first mutual correlation matrix shown below can be obtained:
  • the first row and column of the first cross-correlation matrix correspond to channel 1
  • the second row and column correspond to channel 2
  • the Mth row and column correspond to channel M.
  • a corresponding first mutual correlation matrix can be determined in the above manner. If the frequency band set includes b frequency bands, there are b first mutual correlation matrices.
  • a second correlation matrix corresponding to the channel group can be extracted from the first correlation matrix.
  • the channel index of the channel in the channel group is determined, and the second correlation matrix of the channel group is extracted from the first correlation matrix according to the channel index.
  • the channel group includes three channels, and the second correlation matrix corresponding to the channel group can be determined from the first correlation matrix based on the correlation coefficients between the two channels included in the channel group, and the second correlation matrix is a 3*3 matrix.
  • the channel sequence includes 5 channels, and the channels are arranged in order as channel 1, channel 2, channel 3, channel 4 and channel 5.
  • channel group 1 may include channel 1, channel 2 and channel 3
  • channel group 2 may include channel 2, channel 3 and channel 4
  • channel group 3 includes channel 3, channel 4 and channel 5.
  • the first mutual correlation matrix is a 5*5 matrix, which can be shown in Figure 3.
  • the matrix elements of the intersection of rows 2 to 4 and columns 2 to 4 can be intercepted from the first mutual correlation matrix as the second mutual correlation matrix of channel group 2.
  • the second mutual correlation matrix of channel group 2 can be the part within the dotted box in the first mutual correlation matrix as shown in Figure 3, that is, the second mutual correlation matrix of channel group 2 is as follows:
  • the channel group has a second cross-correlation matrix in each frequency band.
  • S205 Determine a target transformation matrix of the channel group in each frequency band from a transformation matrix set based on the second cross-correlation matrix of the channel group in each frequency band.
  • any frequency band b if the mutual correlation coefficients between the two channels included in the second mutual correlation matrix on any frequency band b meet the condition of selecting a specified transformation matrix in the transformation matrix set, then the specified transformation matrix is selected as the target transformation matrix for any frequency band b;
  • a transformation matrix other than the specified transformation matrix is selected from the transformation matrix set as the target transformation matrix for any frequency band b.
  • a mutual correlation coefficient threshold may be set, and a target transformation matrix of the channel group may be determined from the transformation matrix set according to the mutual correlation coefficient threshold.
  • the second mutual correlation matrix of the channel group in each frequency band includes the mutual correlation coefficient between each pair of channels in the channel group. If the mutual correlation coefficient between each pair of channels is greater than the set threshold, a specified transformation matrix is selected as the target transformation matrix of any frequency band b; if the mutual correlation coefficient between each pair of channels is not greater than the threshold, a transformation matrix other than the specified transformation matrix is selected from the transformation matrix set according to the maximum mutual correlation coefficient between each pair of channels as the target transformation matrix of any frequency band b.
  • the transformation matrix set may include M0, M1, M2, M3 and M4.
  • the designated transformation matrix may be M4. If the correlation coefficients of the three channels are all greater than a threshold, the target transformation matrix is determined to be M4; if the maximum value of the correlation coefficients among the three channels is greater than a threshold, the target transformation matrix is determined according to the maximum value.
  • the implementation method of S206 can be implemented by any method in the embodiments of the present disclosure, which is not limited here and will not be repeated.
  • the implementation method of S207 can be implemented by any method in the embodiments of the present disclosure, which is not limited here and will not be repeated.
  • the frequency domain coefficients on each frequency band in the channel group are encoded through the target transformation matrix, thereby achieving compression of audio signals of multiple channels, reducing redundancy between multiple channels, reducing the burden on the encoder, and reducing transmission and storage costs.
  • Fig. 4 is a flow chart of an audio coding method provided by an embodiment of the present disclosure.
  • the audio coding method may be executed by an encoder. As shown in Fig. 4, the method may include but is not limited to steps S401 to S408.
  • each channel group including a plurality of continuous channels in the channel sequence, and one or more identical channels exist between adjacent channel groups.
  • the implementation method of S401 can be implemented by any method in the embodiments of the present disclosure, which is not limited here and will not be repeated.
  • the implementation method of S402 can be implemented by any method in the embodiments of the present disclosure, which is not limited here and will not be repeated.
  • S403 Determine a first cross-correlation matrix between channels corresponding to each frequency band according to the frequency domain coefficients of each channel.
  • the implementation method of S403 can be implemented by using any method in the embodiments of the present disclosure, which is not limited here and will not be described in detail.
  • the first mutual correlation matrix of each frequency band between channels is normalized to obtain a normalized first mutual correlation matrix of each frequency band, so as to extract a second mutual correlation matrix of each frequency band corresponding to the channel group from the normalized first mutual correlation matrix of each frequency band.
  • the second mutual correlation matrix is a normalized matrix.
  • the channel identifier associated with any matrix element in the first mutual correlation matrix is determined.
  • the normalized matrix element corresponding to any matrix element can be determined according to the associated channel identifier. Wherein any matrix element is a mutual correlation coefficient between two channels, the channel identifier corresponding to the row where any matrix element is located can be a channel identifier associated with any matrix element, and the channel identifier corresponding to the row where any matrix element is located can be another channel identifier associated with any matrix element.
  • any matrix element Corr [2,3], b in the first correlation matrix corresponding to frequency band b is taken as an example for explanation, wherein the channels associated with any matrix element Corr [2,3], b are identified as 2 and 3, that is, the channels associated with any matrix element Corr [2,3], b are channel 2 and channel 3.
  • the channels associated with any matrix element Corr [2,3], b are channel 2 and channel 3. It can be determined that the normalized matrix elements of any matrix element Corr [2,3], b are Corr [2,2], b and Corr [3,3], b .
  • a normalized result of any matrix element is obtained according to any matrix element and a normalized matrix element.
  • the normalized formula corresponding to any matrix element is as follows:
  • b represents the frequency band index value; represents any matrix element; Represents the normalized matrix elements.
  • the diagonal is the diagonal from the upper left corner to the lower right corner.
  • S405 Determine the cross-correlation coefficients between any two channels in the channel group in any frequency band b based on the second cross-correlation matrix in any frequency band b.
  • the correlation coefficients between any two channels in the channel group at any frequency band b can be determined based on the second correlation matrix.
  • S406 Determine a target transformation matrix of the channel group in any frequency band b from the transformation matrix set according to the mutual correlation coefficients between any two channels in the channel group in any frequency band b.
  • a threshold of the mutual correlation coefficient between two channels in a channel group in any frequency band b may be set as Thr, and a target transformation matrix may be determined according to preset conditions by comparing the mutual correlation coefficient between two channels in the channel group with the threshold.
  • the specified transformation matrix M4 is selected as the target transformation matrix for any frequency band b.
  • [L, C, R] are three channels in the channel group. If the mutual correlation coefficient between channel L and channel C, the mutual correlation coefficient between channel L and channel R, and the mutual correlation coefficient between channel C and channel R are all greater than Thr, then the specified transformation matrix M4 is selected as the target transformation matrix for any frequency band b.
  • a transformation matrix other than the specified transformation matrix is selected as the target transformation matrix for any frequency band b according to the maximum mutual correlation coefficient between two channels.
  • the maximum cross-correlation coefficient is selected from the three cross-correlation coefficients of the cross-correlation coefficient between the channel L and the channel C, the cross-correlation coefficient between the channel L and the channel R, and the cross-correlation coefficient between the channel C and the channel R. If the maximum cross-correlation coefficient is greater than Thr, a target transformation matrix of any frequency band b is selected from the transformation matrix other than the specified transformation matrix. For example, if the specified transformation matrix is M4, a target transformation matrix is selected from M0 to M3 according to the maximum cross-correlation coefficient.
  • the target transformation matrix is selected according to formula (6):
  • M1 is selected as the target transformation matrix
  • M2 is selected as the target transformation matrix
  • M3 selects M3 as the target transformation matrix
  • the implementation method of S407 can be implemented by using any method in the embodiments of the present disclosure, which is not limited here and will not be described in detail.
  • the implementation method of S408 can be implemented by any method in the embodiments of the present disclosure, which is not limited here and will not be repeated.
  • the frequency domain coefficients on each frequency band in the channel group are encoded through the target transformation matrix, thereby achieving compression of audio signals of multiple channels, reducing redundancy between multiple channels, reducing the burden on the encoder, and reducing transmission and storage costs.
  • Fig. 5 is a flow chart of an audio coding method provided by an embodiment of the present disclosure.
  • the audio coding method may be executed by an encoder. As shown in Fig. 5, the method may include but is not limited to steps S501 to S507.
  • S501 Grouping a channel sequence to obtain a plurality of channel groups, each channel group including a plurality of continuous channels in the channel sequence, and one or more overlapping channels exist between adjacent channel groups.
  • the implementation method of S501 can be implemented by any method in the embodiments of the present disclosure, which is not limited here and will not be repeated.
  • the implementation method of S502 can be implemented by any method in the embodiments of the present disclosure, which is not limited here and will not be repeated.
  • S503 Determine, from the transformation matrix set, a target transformation matrix for each frequency band in the frequency band set corresponding to the channel group according to the frequency domain coefficients of each channel.
  • the implementation method of S503 can be implemented by using any method in the embodiments of the present disclosure, which is not limited here and will not be described in detail.
  • the first channel group includes channel 1, channel 2, and channel 3. Based on frequency domain coefficients of the channels in the first channel group in any frequency band b, and according to the frequency domain coefficients in any frequency band b and a target transformation matrix corresponding to any frequency band b, first encoding information of the first channel group in any frequency band b is obtained, and the first encoding information includes center information M 1 , side information S 1 , and first information T 1 of the first channel group.
  • the first encoding information of the remaining channel group on any frequency band b is obtained, and the first encoding information includes the first information Ti of the remaining channel group.
  • the first coded information of the first channel group completely includes the center information, the side information and the first information, and the first coded information of the remaining channel groups may only include the first information.
  • [L, C, R] are the three channels in the channel group; M is the target transformation matrix determined according to the frequency domain coefficients; and [M S T] is the coding information of the channel group.
  • the first decorrelation coding unit when performing decorrelation calculation, the first decorrelation coding unit outputs the coding information [M S T] in full, and the remaining coding units only output the first information [T].
  • S505 Obtain second encoding information of the channel group according to the first encoding information on each frequency band of the channel group.
  • a target transformation matrix of the channel group may be determined according to first encoding information of the channel group in each frequency band, and a decorrelation pattern of the channel group in each frequency band may be determined based on the target transformation matrix, that is, second encoding information of the channel group.
  • the decorrelation modes corresponding to the different frequency bands are also different.
  • the target transformation matrices corresponding to the different frequency bands are the same, the decorrelation modes corresponding to the different frequency bands are also the same.
  • S506 Obtain coding information of the channel group based on the second coding information and the target transformation matrix corresponding to each frequency band.
  • the frequency domain coefficients of the channel group in each frequency band may be decorrelated according to the second coding information by using the target transformation matrix corresponding to each frequency band, so as to obtain the coding information of the channel group in each frequency band.
  • the coding information of the channel group in the whole frequency band may be obtained according to the coding information on each frequency band. It is understood that the coding information of the channel group includes the coding information of the channel group in all frequency bands.
  • S507 Obtain a coded bitstream based on the coded information of the channel group, and send the coded bitstream to a decoder for decoding.
  • the implementation method of S507 can be implemented by using any method in the embodiments of the present disclosure, which is not limited here and will not be described in detail.
  • the frequency domain coefficients on each frequency band in the channel group are encoded through the target transformation matrix, thereby achieving compression of audio signals of multiple channels, reducing redundancy between multiple channels, reducing the burden on the encoder, and reducing transmission and storage costs.
  • the audio signal in each channel is subjected to MDCT transformation to obtain the MDCT coefficients (frequency domain coefficients) of each channel in each frame.
  • the channel signal is input into the band division processing unit for frequency band division to obtain the frequency domain coefficients on each frequency band.
  • the energy value of the channel on each frequency band is calculated by energy calculation, and the energy value is input into the cross-correlation calculation unit to obtain the cross-correlation coefficients between the channels on each frequency band, so as to obtain the first cross-correlation matrix between the channels on each frequency band.
  • each channel group corresponds to a decorrelation unit
  • the frequency domain coefficients of the three channels in the channel group and the first cross-correlation matrix of each frequency band are input into the decorrelation unit, and the decorrelation unit performs co-frequency decorrelation processing on the channel group to obtain the coding information of the channel.
  • the band division result of the frequency domain coefficients of channel 1, channel 2 and channel 3 in the first channel group and the first mutual correlation matrix of each frequency band are input into the decorrelation unit 1, and the decorrelation unit 1 performs the same frequency decorrelation processing, and outputs the coding information of channel group 1, and the coding information of channel group 1 includes the center information M 1 , the side information S 1 and the first information T 1 ;
  • the band division result of the frequency domain coefficients of channel 2, channel 3 and channel 4 in channel group 2 and the first mutual correlation matrix of each frequency band are input into the decorrelation unit 2, and the decorrelation unit 2 performs the same frequency decorrelation processing, and outputs the coding information of channel group 2, and the coding information of channel group 2 includes the first information T 1 ;
  • the band division result of the frequency domain coefficients of channel 3, channel 4 and channel 5 in channel group 3 and the first mutual correlation matrix of each frequency band are input into the decorrelation unit 3, and the decorrelation unit 3 performs the same frequency decorrelation processing, and outputs the same frequency
  • the frequency domain coefficients on each frequency band in the channel group are encoded through the target transformation matrix, thereby achieving compression of audio signals of multiple channels, reducing redundancy between multiple channels, reducing the burden on the encoder, and reducing transmission and storage costs.
  • Fig. 7 is a flow chart of an audio decoding method provided by an embodiment of the present disclosure.
  • the audio decoding method may be executed by a decoder. As shown in Fig. 7, the method may include but is not limited to steps S701 to S704.
  • S701 Receive an encoded bitstream sent by an encoder, where the encoded bitstream includes encoding information of multiple channel groups.
  • the channel groups are obtained by grouping the channel sequences in order, each channel group includes a number of continuous channels in the channel sequence, and there are one or more identical channels between adjacent channel groups.
  • the decoder receives the encoded code stream sent by the encoder, reads the encoding information of multiple channel groups from the encoded code stream, and performs inverse transformation on the input channel signal to obtain the original channel signal.
  • the encoder side can group the M channels in the channel sequence to obtain multiple channel groups.
  • Each channel group includes three consecutive channels in the channel sequence, and there is a channel group between adjacent channel groups.
  • the specific process can be referred to the above embodiment, which will not be repeated here.
  • S702 sequentially decode multiple channel groups, and for a decoded current channel group, determine a target decoding matrix for each frequency band in a frequency band set corresponding to the current channel group according to encoding information of the current channel group.
  • second encoding information of the channel group in the entire frequency band may be obtained from the encoding information, the second encoding information including the first encoding information of each frequency band and the target transformation matrix corresponding to each frequency band.
  • the target decoding matrix of the current channel group in any frequency band b is obtained.
  • the transformation matrix corresponds to the decoding matrix one-to-one.
  • the decoding matrix may include the following matrix:
  • transformation matrix M0 corresponds to decoding matrix J0
  • transformation matrix M1 corresponds to decoding matrix J1
  • transformation matrix M2 corresponds to decoding matrix J2
  • transformation matrix M3 corresponds to decoding matrix J3
  • transformation matrix M4 corresponds to decoding matrix J4 . If the target transformation matrix of the current channel group is M4 , the target decoding matrix of the current channel group on any frequency band b is J4 .
  • the encoding information of the current channel group includes second encoding information of the channel group in the entire frequency band, and the second encoding information includes first encoding information of the channel group in each frequency band.
  • the first encoding information of the any frequency band b is decoded to obtain the first decoded frequency domain coefficient of the current channel group on the any frequency band b.
  • the decoded frequency domain coefficients of the channel group on the entire frequency band can be obtained.
  • S704 Obtain a decoded audio signal of each channel in the channel sequence according to the decoded frequency domain coefficients of the plurality of channel groups.
  • the signals of the channel groups may be converted from the frequency domain to the time domain based on the decoded frequency domain coefficients of the multiple channel groups.
  • the frequency domain signals of the channels may be converted into time domain signals based on the decoded frequency domain coefficients using an inverse MDCT transform, thereby obtaining decoded audio signals of each channel.
  • the decoder receives the coded bit stream sent by the encoder and obtains the coding information of each channel group therefrom.
  • the coded information is decoded by a decoding unit in the order of multiple channel groups to obtain the decoded frequency domain coefficients of each channel group, and the decoded frequency domain coefficients of each channel group are converted from frequency domain to time domain to obtain a decoded audio signal of each channel.
  • the decoder uses frequency division decoding similar to that on the encoder side to realize the recovery of multi-channel audio signals. Since the encoder performs compression, the multi-channel signal is easier to transmit, saving transmission space.
  • Fig. 8 is a flow chart of an audio decoding method provided by an embodiment of the present disclosure.
  • the audio decoding method may be executed by a decoder. As shown in Fig. 8, the method may include but is not limited to steps S801 to S809.
  • S801 Obtain first encoding information of the first channel group in any frequency band b from encoding information of the first channel group.
  • each channel group includes three consecutive channels, wherein the first channel group includes channel 1, channel 2, and channel 3.
  • coding information of the first channel group can be obtained from the coded bitstream, and the coding information of the first channel group includes first coding information of the first channel group in each frequency band, and the first coding information at least includes center information M 1 , side information S 1 , and first information T 1 of the first channel group.
  • S802 Decode first coded information of the first channel group in each frequency band based on a target decoding matrix of the first channel group in each frequency band to obtain first decoded frequency domain coefficients of the first channel group in any frequency band b.
  • S803 Obtain second decoded frequency domain coefficients of the first channel group according to the first decoded frequency domain coefficients of the first channel group in each frequency band, wherein the second decoded frequency domain coefficients are output in three paths.
  • the decoder can obtain the target transformation matrix of the first channel group on any frequency band b from the encoding information corresponding to the first channel group. In one implementation, a correspondence between the transformation matrix and the decoding matrix is pre-established. In one implementation, the target decoding matrix of the first channel group on any frequency band b can be determined based on the target transformation matrix on any frequency band b.
  • the first coded information on any frequency band b is inversely transformed to obtain the first decoded frequency domain coefficient of the first channel group on any frequency band b.
  • the first encoding information of the any frequency band b is decoded to obtain the first decoded frequency domain coefficient of the first channel group on the any frequency band b.
  • the first encoding information of the first channel group includes M 1 S 1 T 1
  • the first decoded frequency domain coefficient on the any frequency band b after decoding includes three outputs
  • the first decoded frequency domain coefficient may include
  • the decoded frequency domain coefficients of the first channel group on the full frequency band can be obtained, wherein the decoded frequency domain coefficients of the first channel group on the full frequency band include three outputs:
  • the decoding formula for the first channel group is as follows:
  • [M 1 S 1 T 1 ] represents the second coding information of the first channel group
  • J is the target decoding matrix, Represents the three-way decoded frequency-domain coefficients of the first channel group.
  • S804 Determine a plurality of decoded channel groups that are adjacent to and continuous with the current channel group as upmix channel groups corresponding to the current channel group.
  • a plurality of decoded channel groups adjacent to and continuous with the current channel group may be determined to perform decoding operations as upmixed channel groups corresponding to the current channel group.
  • the plurality of decoded channel groups include two channel groups.
  • the corresponding upmix channel group includes the decoded frequency domain coefficients of the last two outputs of channel group 1; if the current channel group is channel group 3, the corresponding upmix channel group includes the decoded frequency domain coefficients of the last output of channel group 1 and one decoded frequency domain coefficient output by channel group 2; if the current channel group is channel group 4, the corresponding upmix channel group includes one decoded frequency domain coefficient output by channel group 2 and one decoded frequency domain coefficient output by channel group 3.
  • S805 Obtain first encoding information of the current channel group in any frequency band b from the encoding information.
  • the decoder can obtain the first encoding information of the current channel group in any frequency band b from the encoding information of the current channel group, and the first encoding information is the first information Ti of the previous channel group in any frequency band b.
  • a target transformation matrix of the current upmix channel group in each frequency band can be determined from the coded information received by the decoder, and a target decoding matrix of the upmix channel group is determined based on the target transformation matrix.
  • the decoder performs an inverse transformation on the target decoding matrix to obtain decoded frequency domain coefficients of the upmix channel group in each frequency band.
  • the decoder obtains the target transformation matrix of the current channel group in each frequency band from the encoding information, and further determines the target decoding matrix of the current channel group in each frequency band based on the target transformation matrix.
  • the first encoded information Ti on any frequency band b and the first decoded frequency domain coefficient of the upmixed channel group on any frequency band b are decoded to obtain the first decoded frequency domain coefficient of the current channel group on any frequency band b.
  • the first coded information of the current channel group includes the first information Ti , and the first decoded frequency domain coefficients on any frequency band b after decoding include one output, and the first decoded frequency domain coefficients may include
  • S808 Obtain a decoded frequency domain coefficient of the current channel group according to the first decoded frequency domain coefficients of each frequency band of the current channel group, wherein the decoded frequency domain coefficients of the current channel group are output as one channel.
  • the decoded frequency domain coefficients of the first channel group on the full frequency band can be obtained, wherein the decoded frequency domain coefficients of the first channel group on the full frequency band include three outputs:
  • the decoding formula for the current channel group i is as follows:
  • S809 Obtain a decoded audio signal of each channel in the channel sequence according to the decoded frequency domain coefficients of the plurality of channel groups.
  • the implementation method of S809 can be implemented by any method in the embodiments of the present disclosure, which is not limited here and will not be repeated.
  • Upmix decoding unit 1 is the first upmix decoding unit.
  • the first coding information [M 1 S 1 T 1 ] is input into the first upmix decoding unit, and the output three-way decoding frequency domain coefficients are The first coded information T 2 , the first upmix decoding unit Output decoded frequency domain coefficients Input to the second upmix decoding unit, the output decoded frequency domain coefficient is The first coded information T 3 and the decoded frequency domain coefficients output by the first upmix decoding unit are and the decoded frequency domain coefficients output by the second upmix decoding unit Input to the third upmix decoding unit, the output decoded frequency domain coefficient is The first coded information Ti , the decoded frequency domain coefficient output by the i-2th upmix decoding unit and the decoded frequency domain coefficients output by the i-1th upmix decoding unit Input to the i-th upmix decoding unit,
  • the decoder uses frequency division decoding similar to that on the encoder side to restore the multi-channel audio signal. Since the encoder performs compression, the multi-channel signal is easier to transmit, saving transmission space.
  • Fig. 10 is a schematic flow chart of an audio coding method provided by an embodiment of the present disclosure. As shown in Fig. 10, the method may include but is not limited to steps S1001 to S1015.
  • S1001 Group a channel sequence to obtain a plurality of channel groups, each channel group including a plurality of continuous channels in the channel sequence, and one or more overlapping channels exist between adjacent channel groups.
  • S1003 Determine a first cross-correlation matrix between channels corresponding to each frequency band according to the frequency domain coefficients of each channel.
  • S1004 Determine second correlation matrices of the channel group from the first correlation matrices of each frequency band, respectively.
  • the second correlation matrix includes correlation coefficients between channels in the channel group.
  • S1005 Determine a target transformation matrix of the channel group in each frequency band from a transformation matrix set based on the second cross-correlation matrix of the channel group in each frequency band.
  • S1007 Obtain second encoding information of the channel group according to the first encoding information on each frequency band of the channel group.
  • S1008 Obtain coding information of the channel group based on the second coding information and the target transformation matrix corresponding to each frequency band.
  • S1009 Obtain a coded bitstream based on the coded information of the channel group, and send the coded bitstream to a decoder for decoding.
  • S1010 Receive an encoded bitstream sent by an encoder.
  • S1015 Obtain a decoded audio signal of each channel in the channel sequence according to the decoded frequency domain coefficients of the plurality of channel groups.
  • the frequency domain coefficients on each frequency band in the channel group are encoded through the target transformation matrix, thereby achieving compression of audio signals of multiple channels, reducing redundancy between multiple channels, reducing the burden of the encoder, and reducing transmission and storage costs.
  • the decoder uses frequency division decoding similar to that on the encoder side to achieve recovery of multi-channel audio signals.
  • FIG11 is a block diagram of an audio encoding device according to an exemplary embodiment.
  • the audio encoding device 1100 of the embodiment of the present disclosure includes: a channel grouping module 1101 , a frequency domain conversion module 1102 , a matrix determination module 1103 , an encoding module 1104 and a sending module 1105 .
  • the channel grouping module 1101 is configured to perform grouping on the channel sequence to obtain a plurality of channel groups, each of which includes a plurality of continuous channels in the channel sequence, and one or more identical channels exist between adjacent channel groups.
  • the frequency domain processing module 1102 is configured to perform frequency domain conversion on the audio signal of each channel in the channel sequence frame by frame to obtain a frequency domain coefficient of each frame of each channel.
  • the matrix determination module 1103 is configured to determine, from a transformation matrix set, a target transformation matrix for each frequency band in a frequency band set corresponding to a channel group according to frequency domain coefficients of each channel.
  • the encoding module 1104 is configured to execute a target transformation matrix based on each frequency band, perform same-band decorrelation processing on frequency domain coefficients of channels in the channel group, and obtain encoding information of the channel group.
  • the sending module 1105 is configured to execute the encoding information based on the channel group to obtain the encoded bitstream, and send the encoded bitstream to the decoder for decoding.
  • the matrix determination module 1103 is further configured to perform: determining a first mutual correlation matrix between channels corresponding to each frequency band based on frequency domain coefficients of each channel; determining a second mutual correlation matrix of the channel group from the first mutual correlation matrix of each frequency band, the second correlation matrix including mutual correlation coefficients between channels in the channel group; and determining a target transformation matrix of the channel group on each frequency band from a transformation matrix set based on the second mutual correlation matrix of the channel group on each frequency band.
  • the encoding module 1104 is further configured to perform: obtaining frequency domain coefficients of the channels in the channel group on any frequency band b, and obtaining first encoding information of the channel group on any frequency band b according to the frequency domain coefficients on any frequency band b and the target transformation matrix corresponding to any frequency band b; obtaining first encoding information of the channel group on any frequency band b according to the first
  • the coding information is used to obtain second coding information of the channel group; and the coding information of the channel group is obtained based on the second coding information and the target transformation matrix corresponding to each frequency band.
  • the matrix determination module 1103 is further configured to perform: determining the mutual correlation coefficient between two channels in the channel group on any frequency band b based on the second mutual correlation matrix on any frequency band b; determining the target transformation matrix of the channel group on any frequency band b from the transformation matrix set according to the mutual correlation coefficient between two channels in the channel group on any frequency band b.
  • the matrix determination module 1103 is further configured to execute: if the mutual correlation coefficient between the two channels meets the condition of selecting a specified transformation matrix in the transformation matrix set, then the specified transformation matrix is selected as the target transformation matrix for any frequency band b; if the mutual correlation coefficient between the two channels does not meet the condition, then according to the maximum mutual correlation coefficient between the two channels, a transformation matrix other than the specified transformation matrix is selected from the transformation matrix set as the target transformation matrix for any frequency band b.
  • the matrix determination module 1103 is further configured to execute: determining the energy value of any channel in each frequency band according to the frequency domain coefficient of any channel; and determining the first cross-correlation matrix corresponding to each frequency band according to the energy value of each channel in each frequency band.
  • the matrix determination module 1103 is further configured to execute: obtaining the energy ratio of each pair of channels in the channel sequence on any frequency band b; if the energy ratio on any frequency band b is less than or equal to a first set threshold, or if the energy ratio on any frequency band b is greater than or equal to a second set threshold, determining that the mutual correlation coefficient between each pair of channels in any frequency band b is zero; wherein the first set threshold is less than the second set threshold, and if the energy ratio is between the first set threshold and the second set threshold, determining the mutual correlation coefficient between each pair of channels in any frequency band b according to the frequency domain coefficients of each pair of channels in any frequency band b; and obtaining a first mutual correlation matrix corresponding to any frequency band b based on the mutual correlation coefficient between each pair of channels in any frequency band b.
  • the matrix determination module 1103 is further configured to perform: normalizing the first mutual correlation matrix of each frequency band, and extracting the second mutual correlation matrix of each frequency band corresponding to the channel group from the normalized first mutual correlation matrix of each frequency band according to the channels included in the channel group.
  • the matrix determination module 1103 is further configured to execute: determining a channel identifier associated with any matrix element in the first correlation matrix; determining a normalized matrix element corresponding to any matrix element based on the associated channel identifier; and obtaining a normalized result of any matrix element based on any matrix element and the normalized matrix element.
  • the encoding module 1104 is further configured to perform: for the first channel group, based on the frequency domain coefficients of the channels in the first channel group on any frequency band b, and according to the frequency domain coefficients on any frequency band b and the target transformation matrix corresponding to any frequency band b, obtain the first encoding information of the first channel group on any frequency band b, the first encoding information including the center information, the side information and the first information of the first channel group; for the channels other than the first channel group, For each remaining channel group, first coding information of the remaining channel group on any frequency band b is obtained based on the frequency domain coefficients of the channels in the remaining channel group on any frequency band b and according to the frequency domain coefficients on any frequency band b and the target transformation matrix corresponding to any frequency band b, wherein the first coding information includes the first information of the remaining channel group.
  • the channel grouping module 1101 is further configured to execute: determining that adjacent channel groups include a first channel group and a second channel group, wherein the first channel group and the second channel group respectively include three consecutive channels in a channel sequence, and the first channel group and the second channel group include two identical channels.
  • the frequency domain coefficients on each frequency band in the channel group are encoded through the target transformation matrix, thereby achieving compression of audio signals of multiple channels, reducing redundancy between multiple channels, reducing the burden on the encoder, and reducing transmission and storage costs.
  • Fig. 12 is a block diagram of an audio decoding device according to an exemplary embodiment.
  • the audio decoding device 1200 according to the embodiment of the present disclosure includes: a receiving module 1201 , a matrix determining module 1202 , and a decoding module 1203 .
  • the receiving module 1201 is configured to execute receiving the encoded bitstream sent by the encoder, wherein the encoded bitstream includes encoding information of multiple channel groups, the channel groups are obtained by grouping the channel sequence in sequence, each channel group includes a number of consecutive channels in the channel sequence, and there are one or more identical channels between adjacent channel groups.
  • the matrix determination module 1202 is configured to perform sequential decoding of multiple channel groups, and determine, for a decoded current channel group, a target decoding matrix for each frequency band in a frequency band set corresponding to the current channel group according to encoding information of the current channel group.
  • the decoding module 1203 is configured to execute the target decoding matrix of the current channel group in each frequency band, obtain the decoded frequency domain coefficients of the current channel group based on the encoded information of the current channel group, and obtain the decoded audio signal of each channel in the channel sequence according to the decoded frequency domain coefficients of multiple channel groups.
  • the matrix determination module 1202 is further configured to execute: obtaining a target transformation matrix for each frequency band from the encoding information; querying a mapping relationship between the transformation matrix and the decoding matrix based on the target transformation matrix for any frequency band b, and obtaining a target decoding matrix for the current channel group on any frequency band b.
  • the decoding module 1203 is further configured to execute: obtaining first encoding information of the first channel group on any frequency band b from the encoding information of the first channel group; decoding the first encoding information of the first channel group on any frequency band b based on the target decoding matrix of the first channel group on any frequency band b to obtain first decoded frequency domain coefficients of the first channel group on any frequency band b; obtaining decoded frequency domain coefficients of the first channel group according to the first decoded frequency domain coefficients of the first channel group on each frequency band, wherein the first decoded frequency domain coefficients of the first channel group and the decoded frequency domain coefficients include three outputs.
  • the decoding module 1203 is further configured to execute: the first encoding information of the first channel group in any frequency band b at least includes the center information, the side information and the first information in any frequency band b.
  • the decoding module 1203 is further configured to perform: determining a number of decoded channel groups adjacent to and continuous with the current channel group as the upmixed channel group corresponding to the current channel group; obtaining the first encoding information of the current channel group on any frequency band b from the encoding information; obtaining the decoded frequency domain coefficients of the upmixed channel group on each frequency band; decoding the first encoding information on any frequency band b according to the target decoding matrix corresponding to any frequency band b and the decoded frequency domain coefficients on any frequency band b to obtain the first decoded frequency domain coefficients on any frequency band b; obtaining the decoded frequency domain coefficients of the current channel group according to the first decoded frequency domain coefficients on each frequency band of the current channel group, and the decoded frequency domain coefficients of the current channel group include one output.
  • the decoding module 1203 is further configured to execute: the first encoded information of the current channel group in any frequency band b includes the first information of the current channel group in the any frequency band b.
  • the decoder uses frequency division decoding similar to that on the encoder side to achieve recovery of multi-channel audio signals. Since the encoder performs compression, the multi-channel signal is easier to transmit, saving transmission space.
  • FIG13 is a schematic diagram of the structure of another audio processing device 1300 provided in an embodiment of the present disclosure.
  • the audio processing device 1300 may be an encoder, or a decoder, or a chip, a chip system, or a processor that supports the encoder to implement the above method, or a chip, a chip system, or a processor that supports the decoder to implement the above method.
  • the device may be used to implement the method described in the above method embodiment, and the details may refer to the description in the above method embodiment.
  • the audio processing device 1300 may include one or more processors 1301.
  • the processor 1301 may be a general-purpose processor or a dedicated processor, etc. For example, it may be a baseband processor or a central processing unit.
  • the baseband processor may be used to process the communication protocol and communication data
  • the central processing unit may be used to control the audio processing device (such as a base station, a baseband chip, a decoder, a decoder chip, a DU or a CU, etc.), execute a computer program, and process the data of the computer program.
  • the audio processing device 1300 may further include one or more memories 1302, on which a computer program 1304 may be stored, and the processor 1301 executes the computer program 1304 to enable the audio processing device 1300 to perform the method described in the above method embodiment.
  • data may also be stored in the memory 1302.
  • the audio processing device 1300 and the memory 1302 may be provided separately or integrated together.
  • the audio processing device 1300 may further include a transceiver 1305 and an antenna 1306.
  • the transceiver 1305 may be referred to as a transceiver unit, a transceiver, or a transceiver circuit, etc., and is used to implement a transceiver function.
  • the transceiver 1305 may include a receiver and a transmitter, the receiver may be referred to as a receiver or a receiving circuit, etc., and is used to implement a receiving function; the transmitter may be referred to as a transmitter or a transmitting circuit, etc., and is used to implement a transmitting function.
  • the audio processing device 1300 may further include one or more interface circuits 1307.
  • the interface circuit 1307 is used to receive code instructions and transmit them to the processor 1301.
  • the processor 1301 executes the code instructions to enable the audio processing device 1300 to perform the method described in the above method embodiment.
  • the processor 1301 may include a transceiver for implementing the receiving and sending functions.
  • the transceiver may be a transceiver circuit, an interface, or an interface circuit.
  • the transceiver circuit, interface, or interface circuit for implementing the receiving and sending functions may be separate or integrated.
  • the above-mentioned transceiver circuit, interface, or interface circuit may be used for reading and writing code/data, or the above-mentioned transceiver circuit, interface, or interface circuit may be used for transmitting or delivering signals.
  • the processor 1301 may store a computer program 1303, which runs on the processor 1301 and enables the audio processing device 1300 to perform the method described in the above method embodiment.
  • the computer program 1303 may be fixed in the processor 1301, in which case the processor 1301 may be implemented by hardware.
  • the audio processing device 1300 may include a circuit that can implement the functions of sending or receiving or communicating in the aforementioned method embodiments.
  • the processor and transceiver described in the present disclosure may be implemented in an integrated circuit (IC), an analog IC, a radio frequency integrated circuit RFIC, a mixed signal IC, an application specific integrated circuit (ASIC), a printed circuit board (PCB), an electronic device, etc.
  • the processor and transceiver may also be manufactured using various IC process technologies, such as complementary metal oxide semiconductor (CMOS), N-type metal oxide semiconductor (NMOS), P-type metal oxide semiconductor (positive channel metal oxide semiconductor, PMOS), bipolar junction transistor (BJT), bipolar CMOS (BiCMOS), silicon germanium (SiGe), gallium arsenide (GaAs), etc.
  • CMOS complementary metal oxide semiconductor
  • NMOS N-type metal oxide semiconductor
  • PMOS P-type metal oxide semiconductor
  • BJT bipolar junction transistor
  • BiCMOS bipolar CMOS
  • SiGe silicon germanium
  • GaAs gallium arsenide
  • the audio processing device described in the above embodiments may be an encoder or a decoder, but the scope of the audio processing device described in the present disclosure is not limited thereto, and the structure of the audio processing device may not be limited by FIG. 13.
  • the audio processing device may be an independent device or may be part of a larger device.
  • the audio processing device may be:
  • ICs having a set of one or more ICs, which in some embodiments may also include a storage component for storing data and computer programs;
  • ASIC such as modem
  • the audio processing device can be a chip or a chip system
  • the schematic diagram of the chip structure shown in Figure 14 includes a processor 1401 and an interface 1402.
  • the number of processors 1401 can be one or more, and the number of interfaces 1402 can be multiple.
  • the chip further includes a memory 1403, which is used to store necessary computer programs and data.
  • the chip can be used to implement the functions of the decoder in the above-mentioned embodiments of the present disclosure.
  • the chip can be used to implement the functions of the encoder in the above-mentioned embodiments of the present disclosure.
  • the present disclosure also provides an audio processing system, which includes the audio processing device as an encoder and the audio processing device as a decoder in the aforementioned embodiment of FIG. 13, or the system includes the audio processing device as an encoder and the audio processing device as an encoder in the aforementioned embodiment of FIG. 14.
  • the embodiment of the present disclosure also provides a readable storage medium on which instructions are stored. When the instructions are executed by a computer, the functions of any of the above method embodiments are implemented.
  • the embodiments of the present disclosure also provide a computer program product including a computer program, which implements the functions of any of the above method embodiments when executed by a computer.
  • the present disclosure also provides a computer program, which, when executed on a computer, enables the computer to execute the functions of any of the above method embodiments.
  • all or part of the embodiments may be implemented by software, hardware, firmware or any combination thereof.
  • all or part of the embodiments may be implemented in the form of a computer program product.
  • the computer program product includes one or more computer programs.
  • the computer program When the computer program is loaded and executed on a computer, the process or function described in the embodiments of the present disclosure is generated in whole or in part.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer program may be stored in a computer-readable storage medium or transferred from one computer-readable storage medium to another computer-readable storage medium, for example.
  • the computer program can be transmitted from one website, computer, server or data center to another website, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server or data center that includes one or more available media integrated.
  • the available medium can be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a high-density digital video disc (DVD)), or a semiconductor medium (e.g., a solid state disk (SSD)), etc.
  • a magnetic medium e.g., a floppy disk, a hard disk, a magnetic tape
  • an optical medium e.g., a high-density digital video disc (DVD)
  • DVD high-density digital video disc
  • SSD solid state disk
  • At least one in the present disclosure may also be described as one or more, and a plurality may be two, three, four or more, which is not limited in the present disclosure.
  • the technical features in the technical feature are distinguished by “first”, “second”, “third”, “A”, “B”, “C” and “D”, etc., and there is no order of precedence or size between the technical features described by the "first”, “second”, “third”, “A”, “B”, “C” and “D”.
  • the correspondences shown in the tables of the present disclosure can be configured or predefined.
  • the values of the information in each table are only examples and can be configured to other values, which are not limited by the present disclosure.
  • the correspondences shown in some rows may not be configured.
  • appropriate deformation adjustments can be made based on the above table, such as splitting, merging, etc.
  • the names of the parameters shown in the titles of the above tables can also use other names that can be understood by the audio processing device, and the values or representations of the parameters can also use other values or representations that can be understood by the audio processing device.
  • other data structures can also be used, such as arrays, queues, containers, stacks, linear lists, pointers, linked lists, trees, graphs, structures, classes, heaps, hash tables or hash tables.
  • the predefined in the present disclosure may be understood as defined, predefined, stored, pre-stored, pre-negotiated, pre-configured, solidified, or pre-burned.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Mathematical Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

提供了一种音频编码方法和音频解码方法及其装置、电子设备、存储介质、计算机程序产品和计算机程序。该方法包括:对声道序列进行分组,得到多个声道组,每个声道组包括声道序列中连续的若干个声道,相邻的声道组间存在一个或多个相同的声道;对声道序列中各声道的音频信号按帧进行频域转换,得到各声道每帧的频域系数;根据各声道的频域系数,从变换矩阵集中确定声道组对应的频带集中各频带的目标变换矩阵;基于各频带的目标变换矩阵,对声道组内声道的频域系数进行同频带去相关处理,得到声道组的编码信息;基于声道组的编码信息得到编码码流,并将编码码流发给解码器进行解码。

Description

一种音频编码方法、装置、电子设备及存储介质
相关申请的交叉引用
本公开要求在2023年04月14日在中国提交的中国专利申请号202310403661.8的优先权,其全部内容通过引用并入本文。
技术领域
本公开涉及领域音频处理技术领域,尤其涉及一种音频编码方法和音频解码方法及其装置、电子设备、存储介质、计算机程序产品和计算机程序。
背景技术
随着多媒体技术的发展,对音频信号的要求越来越高,现有二维音频算法(2 dimension mid-side,2D M/S)虽然可以有效地降低多声道间的数据冗余,但在多种不同音频场景下,现有算法在传输数据过程中增大了传输成本,造成数据的浪费。
发明内容
本公开实施例提供一种音频编码方法和音频解码方法及其装置、电子设备、计算机可读存储介质、计算机程序产品和计算机程序,以解决在多声道音频传输过程中,对传输和存储介质造成浪费等问题。本公开的技术方案如下:
第一方面,本公开实施例提供一种音频编码方法,由编码器执行,所述方法包括:对声道序列进行分组,得到多个声道组,每个所述声道组包括所述声道序列中连续的若干个声道,相邻的所述声道组间存在一个或多个相同的声道;对所述声道序列中各声道的音频信号按帧进行频域转换,得到各声道每帧的频域系数;根据各声道的所述频域系数,从变换矩阵集中确定所述声道组对应的频带集中各频带的目标变换矩阵;基于各频带的所述目标变换矩阵,对所述声道组内声道的频域系数进行同频带去相关处理,得到所述声道组的编码信息;基于所述声道组的编码信息得到编码码流,并将所述编码码流发给解码器进行解码。
第二方面,本公开实施例提供一种音频解码方法,由解码器执行,所述方法包括:接收编码器发送的编码码流,所述编码码流中包括多个声道组的编码信息,所述声道组由声道序列按序分组得到,每个所述声道组包括所述声道序列中连续的若干个声道,相邻的所述声道组间存在一个或多个相同的声道;对所述多个声道组按序进行解码,针对解码到的当前声道组,根据所述当前声道组的编码信息,确定所述当前声道组对应的频 带集中各频带的目标解码矩阵;基于所述当前声道组在各频带上的目标解码矩阵,对所述当前声道组的编码信息,得到所述当前声道组的解码频域系数;根据所述多个声道组的解码频域系数,获取所述声道序列中各声道的解码音频信号。
第三方面,本公开实施例提供一种音频编码装置,包括:声道分组模块,被配置为执行对所述声道序列进行分组,得到多个声道组,每个所述声道组包括所述声道序列中连续的若干个声道,相邻的所述声道组间存在一个或多个相同的声道;频域处理模块,被配置为执行对声道序列中各声道的音频信号按帧进行频域转换,得到各声道每帧的频域系数;矩阵确定模块,被配置为执行根据各声道的所述频域系数,从变换矩阵集中确定所述声道组对应的频带集中各频带的目标变换矩阵;编码模块,被配置为执行基于各频带的所述目标变换矩阵,对所述声道组内声道的频域系数进行同频带去相关处理,得到所述声道组的编码信息;发送模块,被配置为执行基于所述声道组的编码信息得到编码码流,并将所述编码码流发给解码器进行解码。
第四方面,本公开实施例提供一种音频解码装置,包括接收模块,被配置为执行接收编码器发送的编码码流,所述编码码流中包括多个声道组的编码信息,所述声道组由声道序列按序分组得到,每个所述声道组包括所述声道序列中连续的若干个声道,相邻的所述声道组间存在一个或多个相同的声道;矩阵确定模块,被配置为执行对所述多个声道组按序进行解码,针对解码到的当前声道组,根据所述当前声道组的编码信息,确定所述当前声道组对应的频带集中各频带的目标解码矩阵;解码模块,被配置为执行基于所述当前声道组在各频带上的目标解码矩阵,对所述当前声道组的编码信息,得到所述当前声道组的解码频域系数,并根据所述多个声道组的解码频域系数,获取所述声道序列中各声道的解码音频信号。
第五方面,本公开实施例提供一种编码器,包括处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为实现本公开实施例第一方面所述方法的步骤。
第六方面,本公开实施例提供一种解码器,包括处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为实现本公开实施例第二方面所述方法的步骤。
第七方面,本公开实施例提供一种计算机可读存储介质,其上存储有计算机程序指令,该程序指令被处理器执行时实现本公开实施例第一方面所述方法的步骤。
第八方面,本公开实施例提供一种计算机可读存储介质,其上存储有计算机程序指令,该程序指令被处理器执行时实现本公开实施例第二方面所述方法的步骤。
第九方面,本公开实施例提供一种编码器,该装置包括处理器和接口电路,该接口电路用于接收代码指令并传输至该处理器,该处理器用于运行所述代码指令以使该装置执行上述第一方面所述的方法。
第十方面,本公开实施例提供一种解码器,该装置包括处理器和接口电路,该接口电路用于接收代码指令并传输至该处理器,该处理器用于运行所述代码指令以使该装置执行上述第二方面所述的方法。
第十一方面,本公开实施例提供一种编解码系统,该系统包括第三方面所述的编码装置以及第四方面所述的解码装置,或者,该系统包括第五方面所述的编码器以及第六方面所述的解码器,或者,该系统包括第七方面所述的编码器以及第八方面所述的编码装置,或者,该系统包括第九方面所述的编码器以及第十方面所述的解码器。
第十二方面,本发明实施例提供一种计算机可读存储介质,用于储存为上述编码器所用的指令,当所述指令被执行时,使所述编码器执行上述第一方面所述的方法。
第十三方面,本发明实施例提供一种计算机可读存储介质,用于储存为上述解码器所用的指令,当所述指令被执行时,使所述解码器执行上述第二方面所述的方法。
第十四方面,本公开实施例还提供一种包括计算机程序的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面所述的方法。
第十五方面,本公开实施例还提供一种包括计算机程序的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第二方面所述的方法。
第十六方面,本公开实施例提供一种芯片系统,该芯片系统包括至少一个处理器和接口,用于支持网络设备实现第一方面所涉及的功能,例如,确定或处理上述方法中所涉及的数据和信息中的至少一种。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存网络设备必要的计算机程序和数据。该芯片系统,可以由芯片构成,也可以包括芯片和其他分立器件。
第十七方面,本公开实施例提供一种芯片系统,该芯片系统包括至少一个处理器和接口,用于支持终端设备实现第二方面所涉及的功能,例如,确定或处理上述方法中所涉及的数据和信息中的至少一种。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存终端设备必要的计算机程序和数据。该芯片系统,可以由芯片构成,也可以包括芯片和其他分立器件。
第十八方面,本公开实施例提供一种计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面所述的方法。
第十九方面,本公开实施例提供一种计算机程序,当其在计算机上运行时,使得计算机执行上述第二方面所述的方法。
本公开的实施例提供的技术方案至少带来以下有益效果:编码器通过对声道信号进行频带划分和分组,得到频域系数,基于各声道的频域系数可以确定声道组对应的各频带的目标变换矩阵。进一步根据目标变换矩阵对声道的频域系数进行去相关处理,得到 声道组的编码信息,基于编码信息得到编码码流进而发送给解码器进行解码。本公开实施例中,通过目标变换矩阵对声道组内每个频带上的频域系数进行编码处理,进而可以实现对多个声道的音频信号的压缩,而且可以降低多声道之间的冗余,减少了编码器的负担,降低了传输和存储成本。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释申请的原理。
图1是根据一示例性实施例示出的一种音频编码方法的流程图。
图2是根据另一示例性实施例示出的一种音频编码方法的流程图。
图3是根据一示例性实施例示出的互相关矩阵示意图。
图4是根据另一示例性实施例示出的一种音频编码方法的流程图。
图5是根据另一示例性实施例示出的一种音频编码方法的流程图。
图6是根据一示例性实施例示出的一种音频编码方法的示意图。
图7是根据一示例性实施例示出的一种音频解码方法的流程图。
图8是根据另一示例性实施例示出的一种音频解码方法的流程图。
图9是根据一示例性实施例示出的一种音频解码方法的示意图。
图10是根据另一示例性实施例示出的一种音频编码方法的流程图。
图11是根据一示例性实施例示出的一种音频编码装置的框图。
图12是根据一示例性实施例示出的一种音频解码装置的框图。
图13是根据一示例性实施例示出的一种音频处理装置的框图。
图14是根据一示例性实施例示出的另一种音频处理芯片的框图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
在本公开实施例使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开实施例。在本公开实施例和所附权利要求书中所使用的单数形式的“一种”和“该”也旨 在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
需要说明的是,本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本公开的实施例能够以除了在这里图示或描述的那些以外的顺序实施。
取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”出于简洁和便于理解的目的,本文在表征大小关系时,所使用的术语为“大于”或“小于”、“高于”或“低于”。但对于本领域技术人员来说,可以理解:术语“大于”也涵盖了“大于等于”的含义,“小于”也涵盖了“小于等于”的含义;术语“高于”涵盖了“高于等于”的含义,“低于”也涵盖了“低于等于”的含义。
本公开实施例公开的一种音频编/解码方法,可以适用于各种通信系统,例如,例如:第三代(3th Generation,3G)通用移动通信系统(Universal Mobile Telecommunications System,UMTS)长期演进(Long Term Evolution,LTE)系统、第五代(5th Generation,5G)移动通信系统、5G新空口(New Radio,NR)系统,第六代(5th Generation,6G)移动通信系统或者其他未来的新型移动通信系统等。本公开实施例公开的一种音频编/解码方法,还可以适用于流媒体传输系统,OTT(Over The Top)媒体传输系统。
图1是本公开实施例提供的一种音频编码方法的流程示意图。该音频编码方法可以由编码器执行。如图1所示,该方法可以包括但不限于步骤S101至步骤S105。
S101,对声道序列进行分组,得到多个声道组,每个声道组包括声道序列中连续的若干个声道,相邻的声道组间存在一个或多个相同的声道。
在一种实施方式中,编码器可以将声道序列中的M个声道进行分组,得到多个声道组。在一种实施方式中,每个声道组中包含声道序列中连续的若干个声道,例如可以包括连续的3个声道,本公开实施例中,相邻的声道组间存在一个或多个相同的声道。可以理解的是,相邻的声道组分别为第一声道组和第二声道组。其中,第一声道组和第二声道组分别包括声道序列中的连续三个声道,第一声道组和第二声道组中包括两个相同的声道。示例性说明,将声道序列中5个声道,划分为声道组1、声道组2和声道组3。其中,声道组1包括声道1、声道2和声道3;声道组2包括声道2、声道3和声道4;声道组3包括声道3、声道4和声道5。
S102,对声道序列中各声道的音频信号按帧进行频域转换,得到各声道每帧的频域系数。
本公开实施例中,编码器将声道序列中各声道的音频信号分成多个固定长度的帧,对每个帧进行改进型离散余弦变换(Modified Discrete Cosine Transform,MDCT),从而得到每个帧的频域表示。基于每个帧的频域表示,可以从频域表示提取出每帧的MDCT系数,作为每帧的频域系数。
在一种实施方式中,本公开实施例的声道序列中包含M个声道。
在一种实施方式中,各声道的每帧音频数据可以包括2N个采样点,采样率为fs。进行MDCT变换后的每帧可以包括N个频点,相应地,MDCT系数的频谱分布范围为(0,fs/2),频率分辨率为fs/2N。
S103,根据各声道的频域系数,从变换矩阵集中确定声道组对应的频带集中各频带的目标变换矩阵。
在一种实施方式中,可以预先根据心理声学频带划分方法,对频带进行划分,得到一个频带集合,其中频带集合中可以包括多个划分后的频带,例如,频带集合中可以包括b个划分后的频带,其中,b为大于或者等于1的整数。频带集合中的每个频带具有不同的频率范围,相邻的频带的频率范围为连续的。
在一种实施方式中,在对声道进行每帧MDCT变换后,将每个频域系数的采样点顺序与频率分辨率相乘,确定每个频域系数的频率值。
在一种实施方式中,任一频域系数的频率值的计算公式如下所示:
f=n*fs/2N   (1)
其中,n表示任一频域系数对应第n采样点,n的取值为1~N,N为采样点数量,通过公式(1)可以得到频域系数对应的频率值f。
在一种实施方式中,将频域系数的频率值与每个频带的频率范围相比较,得到频域系数所在的频率范围,以确定每个频带中的频域系数。
在一种实施方式中,根据声道间在同频带上的频域系数,计算不同声道间在同频带上的互相关系数。也就是说,频带集合中的每个频带,均可以基于声道间在同频带上的频域系数,获取该频带对应的两两声道间的互相关系数。
在一种实施方式中,针对频带集合中任意一个频带b,可以基于声道组所包括的声道,从频带b对应的两两声道间的互相关系数中,确定声道组内两两声道间在频带b上的互相关系数。在一种实施方式中,基于声道组内两两声道间在频带b上的互相关系数,从变换矩阵集中确定声道组对应的在频带b上的目标变换矩阵。可以理解的是,频带集合包括B个频带,可以获取到声道组在每个频带上的目标变换矩阵。
S104,基于各频带的目标变换矩阵,对声道组内声道的频域系数进行同频带去相关处理,得到声道组的编码信息。
在一种实施方式中,变换矩阵集中包括多个变换矩阵,每个变换矩阵可以对应一种去相关模式。在一种实施方式中,变换矩阵集中可以包括M0、M1、M2、M3和M4。变换矩阵的具体取值如下所示:


在一种实施方式中,可以声道组在各频带上的目标变换矩阵,确定声道组在各频带上的去相关模式。在不同频带对应的目标变换矩阵不同的情况下,不同频带对应的去相关模式也不同。在不同频带对应的目标变换矩阵相同的情况下,则不同频带对应的去相关模式也相同。例如,频带1的目标变换矩阵为M1,频带2的目标变换矩阵为M2,频带3的目标变换矩阵为M1,则频带1与频带3的去相关模式相同,但是频带1和频带3,与频带2的去相关模式不同。
在一种实施方式中,对声道组内声道的频域系数进行频带区分,得到每个频带上的频域系数,基于同频带对应的目标变换矩阵,对声道组内声道的同频带上的频域系数进行去相关处理,进而得到声道组编码信息。
例如,定义声道组中包括的连续的三个声道,该连续的三个声道可以标记为声道L、声道C和声道R。针对频带集合中任意一个频带b,可以确定声道组在频带b上的的目标变换矩阵为M4。可以由声道L、声道C和声道R在频带b频带上的频域系数,组成频域系数矩阵,由该声道组在频带b上的频域系数矩阵与频带b对应的目标变换矩阵M4进行矩阵运算,得到声道组在频带b上的第一编码信息,也就是说,通过频带b对应的目标变换矩阵M4,对声道组内声道L、声道C和声道R在频带b上的频域系数进行去相关处理,得到声道组在频带b上的编码信息,可以对声道组内声道组声道L、声道C和声道R进行同频去相关处理,以降低同频干扰以及减少冗余和传输成本。
可以理解的是,可以通过各频带对应的目标变换矩阵,对声道组在各频带上的频域系数进行去相关处理,得到声道组在每个频带上的编码信息。在获取到各频带对应的编 码信息后,可以根据各频带上的编码信息,得到声道组在全频带上的编码信息。可以理解的是,声道组的编码信息中包括该声道组在所有频带上的编码信息。
S105,基于声道组的编码信息得到编码码流,并将编码码流发给解码器进行解码。
在一种实施方式中,可以将声道组的编码信息按照二进制进行编码,得到一个二进制的编码码流。也就是说,将每个声道的编码信息转化为二进制码,并将所有声道组的二进制码连接起来,形成编码码流。在一种实施方式中,将编码码流发送到解码器进行解码,以恢复原始声道信号。
需要说明的是,为了使得解码器可以实现解码,还需要发送声道组在各频带上的目标变换矩阵。在一种实施方式中,各频带的目标变换矩阵可以与声道的编码信息一同写入编码码流中发送,也可以单独与编码码流同步发送给解码器。
在本公开实施例提供的音频编码方法,编码器通过对声道信号进行频带划分和分组,得到频域系数,基于各声道的频域系数可以确定声道组对应的各频带的目标变换矩阵。进一步根据目标变换矩阵对声道的频域系数进行去相关处理,得到声道组的编码信息,基于编码信息得到编码码流进而发送给解码器进行解码。本公开实施例中,通过目标变换矩阵对声道组内每个频带上的频域系数进行编码处理,进而可以实现对多个声道的音频信号的压缩,而且可以降低多声道之间的冗余,减少了编码器的负担,降低了传输和存储成本。
图2是本公开实施例提供的一种音频编码方法的流程示意图。该音频编码方法可以由编码器执行。如图2所示,该方法可以包括但不限于步骤S201至步骤S207。
S201,对声道序列进行分组,得到多个声道组,每个声道组包括声道序列中连续的若干个声道,相邻的声道组间存在一个或多个相同的声道。
在本公开实施例中,S201的实现方式可以分别采用本公开各实施例中的任一种方式实现,在此并不对此作出限定,也不再赘述。
S202,对声道序列中各声道的音频信号按帧进行频域转换,得到各声道每帧的频域系数。
在本公开实施例中,S202的实现方式可以分别采用本公开各实施例中的任一种方式实现,在此并不对此作出限定,也不再赘述。
S203,根据各声道的频域系数,确定各频带对应的声道间的第一互相关矩阵。
在一种实施方式中,根据任一声道的频域系数,确定任一声道在各频带上的能量值,根据各声道在各频带的能量值,确定各频带对应的第一互相关矩阵。
在一种实施方式中,可以根据任一声道的频域系数,计算声道的均方根能量RMS,确定任一声道在各频带上的能量值。RMS的计算公式如下所示:
其中,c表示声道索引值,其取值为1到M,M为输入声道数;b为频带索引值;Xi为对应声道的频域系数;Nc,b为该声道频带内频点的个数。
需要说明的是,频点个数可以根据频带的宽度和频带对应的频率分辨率计算得到。
通常情况下,每个声道的数据长度相同,则根据相同的频带划分方法不同声道间同频带内的数据长度也都相同,因此同频带内的频点个数是相同的,即Nc,b=Nb
在一种实施方式中,在获取到各声道在各频带的能量值后,获取声道序列中两两声道在任一频带b上的能量比值,在一种实施方式中,根据声道间在任一频带b上的能量比值,确定两两声道间在任一频带b的互相关系数。
在一种实施方式中,通过获取声道序列中两两声道在任一频带b上的能量比值Q,其中,计算能量比值Q的公式如下所示:
其中,c表示声道索引值,其取值为1到M,M为输入声道数;b为频带划分数;c1可以与c2相同也可不同,同样b1可以与b2相同也可以不同。
在一种实施方式中,可以通过能量比值Q,进行能量判别,基于能量判别结果,确定声道间在任一频带b的互相关系数。若任一频带b上的能量比值小于或者等于第一设定阈值,确定两两声道间在任一频带b的互相关系数为零;若任一频带b上的能量比值大于或者等于第二设定阈值,确定两两声道间在任一频带b的互相关系数为零。其中,第一设定阈值小于第二设定阈值。
若任一频带b上的能量比值处于第一设定阈值和第二设定阈值之间,即该能量比值大于第一设定阈值且小于第二设定阈值,根据两两声道在任一频带b上的频域系数,确定两两声道在任一频带b的互相关系数。
也就是说,若任一频带b上两两声道的能量比值Q过大或过小,则将该两两声道的互相关系数即为0;若任一频带b上两两声道的能量比值Q处于(0.5,2)之间,则进一步计算该两两声道的互相关系数。
在一种实施方式中,可以根据Q的大小确定互相关系数。其中,计算互相关系数的公式如下所示:
其中,[x,y]表示声道的索引值;b表示频带索引值;Nb该声道频带内频点的个数。
在一种实施方式中,基于两两声道在任一频带b的互相关系数,可以得到任一频带b对应的第一互相关矩阵。例如,声道序列中包含M个声道,可以得到如下所示的第一互相关矩阵:
其中,第一互相关矩阵的第1行和第1列均对应声道1,第2行和第2列均对应声道2,依次类推,第M行和第M列均对应声道M。其中,
需要说明的是,频带集合中每个频带可以按照上述方式确定一个对应的第一互相关矩阵,若频带集合中包括b个频带,则具有b个第一互相关矩阵。
S204,从各频带的第一互相关矩阵中分别确定声道组的第二互相关矩阵,第二相关矩阵包括声道组内声道之间的互相关系数。
在一种实施方式中,可以根据如上所示的M*M的第一互相关矩阵,基于声道组内声道,从第一互相关矩阵中,提取声道组对应的第二互相关矩阵。在一种实施方式中,确定声道组内声道的声道索引,根据该声道索引从第一互相关矩阵中,提取声道组的第二互相关矩阵。在一种实施方式中,声道组内包括三个声道,可以基于声道组所包括的两两声道间的互相关系数,从第一互相关矩阵中确定声道组对应的第二互相关矩阵,该第二互相相关矩为为3*3的矩阵。
示例性说明,声道序列中包括5个声道,声道按序排列为声道1、声道2、声道3、声道4和声道5,声道组中包含的三个声道,其中,声道组1可以包括声道1、声道2和声道3,声道组2可以包括声道2、声道3和声道4;声道组3包括声道3、声道4和声道5。其中第一互相关矩阵为5*5的矩阵,可以如图3所示。针对声道组2,可以从第一互相关矩阵中截取第2行至第4行,与第2列至第4列相交部分的矩阵元素作为声道组2的第二互相关矩阵。例如,声道组2的第二互相关矩阵可以如图3所示为第一互相关矩阵中虚框内的部分,即声道组2的第二互相关矩阵如下所示:
需要说明的是,声道组在每个频带上均有一个第二互相关矩阵。
S205,基于声道组在各频带上的第二互相关矩阵,从变换矩阵集中确定声道组在各频带上的目标变换矩阵。
在一种实施方式中,针对任一频带b,若任一频带b上的第二互相关矩阵包括的两两声道之间的互相关系数,满足选用变换矩阵集中指定变换矩阵的条件,则选取指定变换矩阵作为任一频带b的目标变换矩阵;
若任一频带b上的第二互相关矩阵包括的两两声道之间的互相关系数,未满足条件,则根据两两声道之间的最大互相关系数,从变换矩阵集中,选取除指定变换矩阵之外的变换矩阵,作为任一频带b的目标变换矩阵。
在一种实施方式中,可以设定互相关系数阈值,根据互相关系数阈值从变换矩阵集中确定声道组的目标变换矩阵。声道组在各频带上的第二互相关矩阵,包含了声道组内两两声道间的互相关系数。若两两声道之间的互相关系数大于设定阈值,则选取指定变换矩阵作为任一频带b的目标变换矩阵;若两两声道之间的互相关系数未能均大于阈值,则根据两两声道之间的最大互相关系数,从变换矩阵集中,选取除指定变换矩阵之外的变换矩阵,作为任一频带b的目标变换矩阵。
例如,变换矩阵集可以包括M0、M1、M2、M3和M4。其中指定变换矩阵可以为M4,若三个声道的互相关系数均大于阈值,则确定目标变换矩阵为M4;若三个声道中互相关系数的最大值大于阈值,则根据最大值确定目标变换矩阵。
S206,基于各频带的目标变换矩阵,对声道组内声道的频域系数进行同频带去相关处理,得到声道组的编码信息。
在本公开实施例中,S206的实现方式可以分别采用本公开各实施例中的任一种方式实现,在此并不对此作出限定,也不再赘述。
S207,基于声道组的编码信息得到编码码流,并将编码码流发给解码器进行解码。
在本公开实施例中,S207的实现方式可以分别采用本公开各实施例中的任一种方式实现,在此并不对此作出限定,也不再赘述。
本公开实施例中,通过目标变换矩阵对声道组内每个频带上的频域系数进行编码处理,进而可以实现对多个声道的音频信号的压缩,而且可以降低多声道之间的冗余,减少了编码器的负担,降低了传输和存储成本。
图4是本公开实施例提供的一种音频编码方法的流程示意图。该音频编码方法可以由编码器执行。如图4所示,该方法可以包括但不限于步骤S401至步骤S408。
S401,对声道序列进行分组,得到多个声道组,每个声道组包括声道序列中连续的若干个声道,相邻的声道组间存在一个或多个相同的声道。
在本公开实施例中,S401的实现方式可以分别采用本公开各实施例中的任一种方式实现,在此并不对此作出限定,也不再赘述。
S402,对声道序列中各声道的音频信号按帧进行频域转换,得到各声道每帧的频域系数。
在本公开实施例中,S402的实现方式可以分别采用本公开各实施例中的任一种方式实现,在此并不对此作出限定,也不再赘述。
S403,根据各声道的频域系数,确定各频带对应的声道间的第一互相关矩阵。
在本公开实施例中,S403的实现方式可以分别采用本公开各实施例中的任一种方式实现,在此并不对此作出限定,也不再赘述。
S404,对各频带的第一互相关矩阵进行归一化处理,并根据声道组内所包括的声道,从各频带的归一化第一互相关矩阵中,提取声道组对应的各频带的第二互相关矩阵。
在一种实施方式中,对声道间各频带的第一互相关矩阵进行归一化处理,得到各频带的归一化第一互相关矩阵,以从各频带的归一化第一互相关矩阵中,提取声道组对应的各频带的第二互相关矩阵。可以理解的是,第二互相关矩阵为归一化的矩阵。
在一种实施方式中,确定第一互相关矩阵中的任一矩阵元素所关联的声道标识。在一种实施方式中,根据所关联的声道标识,可以确定任一矩阵元素对应的归一化矩阵元素。其中任一矩阵元素为两个声道间的互相关系数,任一矩阵元素所在行对应的声道标识,可以为任一矩阵元素所关联的一个声道标识,任一矩阵元素所在行对应的声道标识,可以为任一矩阵元素所关联的另一个声道标识。
示例性说明,以频带b对应的第一互相关矩阵中的任一矩阵元素Corr[2,3],b为例进行解释说明,其中任一矩阵元素Corr[2,3],b所关联的声道标识为2和3,也就是说,该任一矩阵元素Corr[2,3],b所关联的声道为声道2和声道3。
任一矩阵元素Corr[2,3],b所关联的声道为声道2和声道3,可以确定任一矩阵元素Corr[2,3],b的归一化矩阵元素为Corr[2,2],b和Corr[3,3],b
在一种实施方式中,根据任一矩阵元素和归一化矩阵元素,得到任一矩阵元素的归一化结果。在一种实施方式中,任一矩阵元素对应的归一化公式如下所示:
其中,b表示频带索引值;表示任一矩阵元素;表示归一化的矩阵元素。
需要说明的是,由于互相关系数矩阵沿对角线对称位置数值相等,因此不必计算对角线左下方互相关系数,可以用互相关系数矩阵对角线右上方的元素进行归一化计算。在进行上述计算时,若分母部分大于0,则继续进行归一化计算,若分母部分小于0,则对应位置的互相关系数置为0。其中,对角线是从左上角到右下角的对角线。
S405,基于任一频带b上的第二互相关矩阵,确定声道组内两两声道间在任一频带b上的互相关系数。
在一种实施方式中,由于第二互相关矩阵中的元素为声道组内两两声道间在任一频带b上的互相关系数,则基于第二互相关矩阵可以确定声道组内两两声道间在任一频带b上的互相关系数。
S406,根据声道组内两两声道间在任一频带b上的互相关系数,从变换矩阵集中确定声道组在任一频带b上的目标变换矩阵。
在一种实施方式中,可以设定声道组内两两声道间在任一频带b上的互相关系数的阈值为Thr,通过比较声道组内两两声道间的互相关系数与阈值的大小,根据预设条件,确定目标变换矩阵。
在一种实施方式中,若两两声道之间的互相关系数满足选用变换矩阵集中指定变换矩阵的条件,也就是声道组内两两声道之间的互相关系数的值均大于Thr,则选取指定变换矩阵M4作为任一频带b的目标变换矩阵。
例如,[L,C,R]为声道组内的三个声道,若声道L与声道C之间的互相关系数、声道L与声道R之间的互相关系数,以及声道C与声道R之间的互相关系数,均大于Thr,则选取指定变换矩阵M4作为任一频带b的目标变换矩阵。
在一种实施方式中,若任一频带b上两两声道之间的互相关系数未满足条件,则根据两两声道之间的最大互相关系数,选取除指定变换矩阵之外的变换矩阵,作为任一频带b的目标变换矩阵。
也就是说,从声道L与声道C之间的互相关系数、声道L与声道R之间的互相关系数,以及声道C与声道R之间的互相关系数这三个互相关系数中选取最大互相关系数,若最大互相关系数大于Thr,则从除指定变换矩阵之外的变换矩阵中,选取任一频带b的目标变换矩阵。例如,指定变换矩阵为M4,则根据最大互相关系数,从M0至M3中选取一个目标变换矩阵。
在一种实施方式中,在该最大互相关系数大于Thr的情况下,则根据公式(6)选择目标变换矩阵:
例如,若任一频带b上的声道组内两两声道间的最大互相关系数为声道L与声道C之间的互相关系数,则选取M1作为目标变换矩阵;若任一频带b上的声道组内两两声道间的最大互相关系数为声道L与声道R之间的互相关系数,则选取M2作为目标变换矩阵;若任一频带b上的声道组内两两声道间的最大互相关系数为声道C与声道R之间的互相关系 数,则选取M3作为目标变换矩阵。
S407,基于各频带的目标变换矩阵,对声道组内声道的频域系数进行同频带去相关处理,得到声道组的编码信息。
在本公开实施例中,S407的实现方式可以分别采用本公开各实施例中的任一种方式实现,在此并不对此作出限定,也不再赘述。
S408,基于声道组的编码信息得到编码码流,并将编码码流发给解码器进行解码。
在本公开实施例中,S408的实现方式可以分别采用本公开各实施例中的任一种方式实现,在此并不对此作出限定,也不再赘述。
本公开实施例中,通过目标变换矩阵对声道组内每个频带上的频域系数进行编码处理,进而可以实现对多个声道的音频信号的压缩,而且可以降低多声道之间的冗余,减少了编码器的负担,降低了传输和存储成本。
图5是本公开实施例提供的一种音频编码方法的流程示意图。该音频编码方法可以由编码器执行。如图5所示,该方法可以包括但不限于步骤S501至S507。
S501,对声道序列进行分组,得到多个声道组,每个声道组包括声道序列中连续的若干个声道,相邻的声道组间存在一个或多个重叠的声道。
在本公开实施例中,S501的实现方式可以分别采用本公开各实施例中的任一种方式实现,在此并不对此作出限定,也不再赘述。
S502,对声道序列中各声道的音频信号按帧进行频域转换,得到各声道每帧的频域系数。
在本公开实施例中,S502的实现方式可以分别采用本公开各实施例中的任一种方式实现,在此并不对此作出限定,也不再赘述。
S503,根据各声道的频域系数,从变换矩阵集中确定声道组对应的频带集中各频带的目标变换矩阵。
在本公开实施例中,S503的实现方式可以分别采用本公开各实施例中的任一种方式实现,在此并不对此作出限定,也不再赘述。
S504,获取声道组内声道在任一频带b上的频域系数,并根据任一频带b上的频域系数和任一频带b对应的目标变换矩阵,得到声道组在任一频带b上的第一编码信息。
在一种实施方式中,针对首个声道组,首个声道组包括声道1、声道2和声道3,基于首个声道组内声道在任一频带b上的频域系数,并根据任一频带b上的频域系数和任一频带b对应的目标变换矩阵,得到首个声道组在任一频带b上的第一编码信息,第一编码信息包括首个声道组的中心信息M1、侧信息S1和第一信息T1
需要说明的是,针对除首个声道组之外的每个剩余声道组,基于剩余声道组内声道在任一频带b上的频域系数,并根据任一频带b上的频域系数和任一频带b对应的目标变换矩阵,得到剩余声道组在任一频带b上的第一编码信息,第一编码信息包括剩余声道组的第一信息Ti
可以理解的是,首个声道组的第一编码信息完整包含中心信息、侧信息和第一信息,剩余声道组的第一编码信息可以仅包括第一信息。
计算声道编码信息的公式如下所示:
[L C R]*M=[M S T]    (7)
其中,[L,C,R]为声道组内的三个声道;M为根据频域系数确定的目标变换矩阵;[M S T]为声道组的编码信息。
需要说明的是,在进行去相关计算时,第一个去相关编码单元完整输出编码信息[M S T],其余编码单元只输出第一信息[T]。
S505,根据声道组的各频带上的第一编码信息,得到声道组的第二编码信息。
在一种实施方式中,可以根据声道组在各频带上的第一编码信息确定声道组的目标变换矩阵,基于目标变换矩阵确定声道组在各频带上的去相关模式,也就是声道组的第二编码信息。
可以理解的是,在不同频带对应的目标变换矩阵不同的情况下,不同频带对应的去相关模式也不同。在不同频带对应的目标变换矩阵相同的情况下,则不同频带对应的去相关模式也相同。
S506,基于第二编码信息和各频带对应的目标变换矩阵,得到声道组的编码信息。
在一种实施方式中,可以通过各频带对应的目标变换矩阵,根据第二编码信息对声道组在各频带上的频域系数进行去相关处理,得到声道组在每个频带上的编码信息。在获取到各频带对应的编码信息后,可以根据各频带上的编码信息,得到声道组在全频带上的编码信息。可以理解的是,声道组的编码信息中包括该声道组在所有频带上的编码信息。
S507,基于声道组的编码信息得到编码码流,并将编码码流发给解码器进行解码。
在本公开实施例中,S507的实现方式可以分别采用本公开各实施例中的任一种方式实现,在此并不对此作出限定,也不再赘述。
本公开实施例中,通过目标变换矩阵对声道组内每个频带上的频域系数进行编码处理,进而可以实现对多个声道的音频信号的压缩,而且可以降低多声道之间的冗余,减少了编码器的负担,降低了传输和存储成本。
如图6所示为本公开实施例的一种可能的编码流程图,对每个声道中的音频信号进行MDCT变换,得到各声道每帧的MDCT系数(频域系数),然后将声道信号输入分带处理单元进行频带划分,得到各频带上的频域系数,进而通过能量计算单语,计算声道在各频带上的能量值,将能量值输入到互相关计算单元,得到各频带上声道间的互相关系数,以得到声道间在各频带上的第一互相关矩阵。可以理解的是每个声道组对应一个去相关单元,将声道组内的三个声道的频域系数、各频带的第一互相关矩阵输入到去相关单元中,由去相关单元对声道组进行同频去相关处理,得到该声道的编码信息。如图6所示,将首个声道组中的声道1、声道2和声道3的频域系数的分带结果,和各频带的第一互相关矩阵输入到去相关单元1中,由去相关单元1进行同频去相关处理,输出声道组1的编码信息,声道组1的编码信息包括中心信息M1,侧信息S1以及第一信息T1;将声道组2中的声道2、声道3和声道4的频域系数的分带结果,和各频带的第一互相关矩阵输入到去相关单元2中,由去相关单元2进行同频去相关处理,输出声道组2的编码信息,声道组2的编码信息包括第一信息T1;将声道组3中的声道3、声道4和声道5的频域系数的分带结果,和各频带的第一互相关矩阵输入到去相关单元3中,由去相关单元3进行同频去相关处理,输出声道组3的编码信息,声道组3的编码信息包括T3;依次类推,最后一个声道组M-2,将将声道组M-2中的声道M-2、声道M-1和声道M的频域系数的分带结果,和各频带的第一互相关矩阵输入到去相关单元M-2中,由去相关单元M-2进行同频去相关处理,输出声道组M-2的编码信息,声道组M-2的编码信息包括TM-2
本公开实施例中,通过目标变换矩阵对声道组内每个频带上的频域系数进行编码处理,进而可以实现对多个声道的音频信号的压缩,而且可以降低多声道之间的冗余,减少了编码器的负担,降低了传输和存储成本。
图7是本公开实施例提供的一种音频解码方法的流程示意图。该音频解码方法可以由解码器执行。如图7所示,该方法可以包括但不限于步骤S701至S704。
S701,接收编码器发送的编码码流,编码码流中包括多个声道组的编码信息。
本公开实施例中,声道组由声道序列按序分组得到,每个声道组包括声道序列中连续的若干个声道,相邻的声道组间存在一个或多个相同的声道。
本公开实施例中,解码器接收编码器发送的编码码流,从编码码流中读取多个声道组的编码信息对输入的声道信号进行逆变换,以得到原始声道信号。
参见上述实施例中的记载,编码器侧可以对声道序列中的M个声道进行分组,得到多个声道组。每个声道组中包含声道序列中连续的3个声道,相邻的声道组间存在一 个或多个相同的声道。关于编码器对声道组进行编码的过程,具体过程可参见上述实施例,这里不再赘述。
S702,对多个声道组按序进行解码,针对解码到的当前声道组,根据当前声道组的编码信息,确定当前声道组对应的频带集中各频带的目标解码矩阵。
在一种实施方式中,可以从编码信息中获取声道组在全频带的第二编码信息,该第二编码信息包括各频带的第一编码信息,以及各频带对应的目标变换矩阵。针对频带集合中的任一频带b,根据任一频带b的目标变换矩阵,通过查询变换矩阵与解码矩阵之间的对应关系,获取当前声道组在任一频带b上的目标解码矩阵。
需要说明的是,变换矩阵与解码矩阵的一一对应,在一种实施方式中,解码矩阵可以包括以下矩阵:


例如,变换矩阵M0对应解码矩阵J0,变换矩阵M1对应解码矩阵J1,变换矩阵M2对应解码矩阵J2,变换矩阵M3对应解码矩阵J3,变换矩阵M4对应解码矩阵J4。若当前声道组的目标变换矩阵是M4,则当前声道组在任一频带b上的目标解码矩阵为J4
S703,基于当前声道组在各频带上的目标解码矩阵,对当前声道组的编码信息,得到当前声道组的解码频域系数。
在一种实施方式中,当前声道组的编码信息包括声道组在全频带的第二编码信息,该第二编码信息中包括该声道组在各频带的第一编码信息。
针对频带集合中任一频带b,基于该任一频带b对应的目标解码矩阵,对该任一频带b的第一编码信息进行解码,得到当前声道组在任一频带b上的第一解码频域系数。在一种实施方式中,基于各频带上的第一解码频域系数,可以得到声道组在全频带上的解码频域系数。
S704,根据多个声道组的解码频域系数,获取声道序列中各声道的解码音频信号。
在一种实施方式中,可以根据多个声道组的解码频域系数,对声道组的信号进行频域到时域的转换。在一种实施方式中,使用反向MDCT变换,可以基于解码频域系数,将声道的频域信号转换为时域信号,进而得到各声道的解码音频信号。
在本公开实施例提供的音频解码方法,解码器接收编码器发送的编码码流,并从中获取每个声道组的编码信息。按照多个声道组的顺序通过解码单元对编码信息进行解码,以得到每个声道组的解码频域系数,并对每个声道组的解码频域系数进行频域-时域转换,得到每个声道解码音频信号。解码器在解码过程中,采用与编码器侧类似的分频解码,可以实现对多声道音频信号的恢复,由于编码器进行了压缩,使得多声道信号更便于传输,节省了传输空间。
图8是本公开实施例提供的一种音频解码方法的流程示意图。该音频解码方法可以由解码器执行。如图8所示,该方法可以包括但不限于步骤S801至S809。
S801,从首个声道组的编码信息中,获取首个声道组在任一频带b上的第一编码信息。
在一种实施方式中,每个声道组中包含3个连续的声道,其中首个声道组中包含声道1、声道2和声道3。在一种实施方式中,从编码码流中可以获取到首个声道组的编码信息,该首个声道组的编码信息中包括首个声道组在各频带上的第一编码信息,第一编码信息至少包括首个声道组的中心信息M1、侧信息S1和第一信息T1
S802,基于首个声道组在各频带上的目标解码矩阵,对首个声道组在各频带上的第一编码信息进行解码,得到首个声道组在任一频带b上的第一解码频域系数。
S803,根据首个声道组在各频带上的第一解码频域系数,得到首个声道组的第二解码频域系数,其中第二解码频域系数为三路输出。
在一种实施方式中,解码器从首个声道组对应的编码信息中,可以获取到首个声道组在任一频带b上的目标变换矩阵。在一种实施方式中,预先建立变换矩阵与解码矩阵的对应关系,在一种实施方式中,基于任一频带b上的目标变换矩阵可以确定首个声道组在任一频带b上的目标解码矩阵。
基于在任一频带b上的目标解码矩阵,对任一频带b上的第一编码信息进行逆变换,得到首个声道组在任一频带b上的第一解码频域系数。
在一种实施方式中,针对频带集合中任一频带b,基于该任一频带b对应的目标解码矩阵,对该任一频带b的第一编码信息进行解码,得到首个声道组在任一频带b上的第一解码频域系数。可以理解的是,首个声道组的第一编码信息包括M1 S1 T1,经过解码后在任一频带b上的第一解码频域系数包括三路输出,第一解码频域系数可以包括
在一种实施方式中,基于各频带上的第一解码频域系数,可以得到首个声道组在全频带上的解码频域系数,其中,首个声道组在全频带上的解码频域系数包括三路输出
在一种实施方式中,首个声道组的解码公式如下所示:
其中,[M1 S1 T1]表示首个声道组的第二编码信息;J为目标解码矩阵,表示首个声道组的三路解码频域系数。
S804,确定与当前声道组相邻且连续的若干个已解码声道组,为当前声道组对应的上混声道组。
在一种实施方式中,在当前声道组为多个声道组中的除首个声道组之外的声道组的情况下,可以确定与当前声道组相邻且连续的若干个已解码声道组,作为当前声道组对应的上混声道组进行解码运算。在一种实施方式中,若干个已解码声道组中包含2个声道组。
示例性说明,若当前声道组为声道组2时,对应的上混声道组包括声道组1后两路输出的解码频域系数;若当前声道组为声道组3时,对应的上混声道组包括声道组1最后一路输出的解码频域系数和声道组2输出的一路解码频域系数;若当前声道组为声道组4,对应的上混声道组包括声道组2输出的一路解码频域系数和声道组3输出的一路解码频域系数。
S805,从编码信息中获取当前声道组在任一频带b上的第一编码信息。
基于编码器侧的编码过程可知,从声道组2开始后面的每个声道组的输出为一路输出。解码器可以从当前声道组的编码信息中,获取当前声道组在任一频带b上的第一编码信息,该第一编码信息为前声道组在任一频带b上的第一信息Ti
S806,获取上混声道组在各频带上的解码频域系数。
在一种实施方式中,从解码器接收到的编码信息中,可以确定当前上混声道组在各频带上的目标变换矩阵,基于目标变换矩阵确定上混声道组的目标解码矩阵。解码器对目标解码矩阵进行逆变换,可以得到上混声道组在各频带上的解码频域系数。
S807,根据任一频带b对应的目标解码矩阵和任一频带b上的解码频域系数对任一频带b上的第一编码信息进行解码,得到任一频带b上的第一解码频域系数。
在一种实施方式中,解码器从编码信息中,获取当前声道组在各频带上的目标变换矩阵,进而基于目标变换矩阵可以确定当前声道组在各频带上的目标解码矩阵。
针对频带集合中任一频带b,基于任一频带b的目标解码矩阵,对任一频带b上的第一编码信息Ti和上混声道组在任一频带b上的第一解码频域系数进行解码,得到当前声道组在任一频带b上的第一解码频域系数。
可以理解的是,当前声道组的第一编码信息包括第一信息Ti,经过解码后在任一频带b上的第一解码频域系数包括一路输出,第一解码频域系数可以包括
S808,根据当前声道组的各频带上的第一解码频域系数,得到当前声道组的解码频域系数,其中当前声道组的解码频域系数为一路输出。
在一种实施方式中,基于各频带上的第一解码频域系数,可以得到首个声道组在全频带上的解码频域系数,其中,首个声道组在全频带上的解码频域系数包括三路输出
在一种实施方式中,当前声道组i的解码公式如下所示:
其中,表示第i-2个声道组(上混声道组)对应输出的一路解码频域系数,是第i-2个声道组(上混声道组)对应输出的一路解码频域系数,Ti是当前声道组i的第一编码信息,J为目标解码矩阵,表示当前声道组对应输出的一路解码频域系数。
需要说明的是,由于当前声道组使用不同的编码模式,且解码单元只输出一路解码频域系数,因此只需要Ti的值,所以解码矩阵J的取值也不同。具体取值如下所示:


其中,*代表无意义。
S809,根据多个声道组的解码频域系数,获取声道序列中各声道的解码音频信号。
在本公开实施例中,S809的实现方式可以分别采用本公开各实施例中的任一种方式实现,在此并不对此作出限定,也不再赘述。
如图9解码流程图所示,解码端解码需按照解码单元依次进行解码,上混解码单元1为第一个上混解码单元,将第一编码信息[M1 S1 T1]输入到第一个上混解码单元中,输出三路解码频域系数为将第一编码信息T2、第一个上混解码单元 输出的解码频域系数输入到第二个上混解码单元中,输出解码频域系数为将第一编码信息T3、第一个上混解码单元输出的解码频域系数和第二个上混解码单元输出的解码频域系数输入到第三个上混解码单元中,输出解码频域系数为将第一编码信息Ti、第i-2个上混解码单元输出的解码频域系数和第i-1个上混解码单元输出的解码频域系数输入到第i个上混解码单元中,输出解码频域系数为依次类推,将第一编码信息TM-2、第M-4个上混解码单元输出的解码频域系数和第M-3个上混解码单元输出的解码频域系数输入到第M-2个上混解码单元中,输出解码频域系数为
本公开实施例中,解码器在解码过程中,采用与编码器侧类似的分频解码,可以实现对多声道音频信号的恢复,由于编码器进行了压缩,使得多声道信号更便于传输,节省了传输空间。
图10是本公开实施例提供的一种音频编码方法的流程示意图。如图10所示,该方法可以包括但不限于步骤S1001至S1015。
S1001,对声道序列进行分组,得到多个声道组,每个声道组包括声道序列中连续的若干个声道,相邻的声道组间存在一个或多个重叠的声道。
S1002,对声道序列中各声道的音频信号按帧进行频域转换,得到各声道每帧的候选频域系数。
S1003,根据各声道的频域系数,确定各频带对应的声道间的第一互相关矩阵。
S1004,从各频带的第一互相关矩阵中分别确定声道组的第二互相关矩阵,第二相关矩阵包括声道组内声道之间的互相关系数。
S1005,基于声道组在各频带上的第二互相关矩阵,从变换矩阵集中确定声道组在各频带上的目标变换矩阵。
S1006,获取声道组内声道在任一频带b上的频域系数,并根据任一频带b上的频域系数和任一频带b对应的目标变换矩阵,得到声道组在任一频带b上的第一编码信息。
S1007,根据声道组的各频带上的第一编码信息,得到声道组的第二编码信息。
S1008,基于第二编码信息和各频带对应的目标变换矩阵,得到声道组的编码信息。
S1009,基于声道组的编码信息得到编码码流,并将编码码流发给解码器进行解码。
S1010,接收编码器发送的编码码流。
S1011,对多个声道组按序进行解码,针对解码到的当前声道组。
S1012,从编码信息中获取各频带的目标变换矩阵。
S1013,根据任一频带b的目标变换矩阵,查询变换矩阵与解码矩阵之间的映射关系,获取当前声道组在任一频带b上的目标解码矩阵。
S1014,基于当前声道组在各频带上的目标解码矩阵,对当前声道组的编码信息,得到当前声道组的解码频域系数。
S1015,根据多个声道组的解码频域系数,获取声道序列中各声道的解码音频信号。
本公开实施例中,通过目标变换矩阵对声道组内每个频带上的频域系数进行编码处理,进而可以实现对多个声道的音频信号的压缩,而且可以降低多声道之间的冗余,减少了编码器的负担,降低了传输和存储成本。解码器在解码过过程中,采用与编码器侧类似的分频解码,可以实现对多声道音频信号的恢复。
图11是是根据一示例性实施例示出的一种音频编码装置的框图。参照图11,本公开实施例的音频编码装置1100,包括:声道分组模块1101,频域转换模块1102,矩阵确定模块1103,编码模块1104和发送模块1105。
声道分组模块1101被配置为执行对声道序列进行分组,得到多个声道组,每个声道组包括声道序列中连续的若干个声道,相邻的声道组间存在一个或多个相同的声道。
频域处理模块1102被配置为执行对声道序列中各声道的音频信号按帧进行频域转换,得到各声道每帧的频域系数。
矩阵确定模块1103被配置为执行根据各声道的频域系数,从变换矩阵集中确定声道组对应的频带集中各频带的目标变换矩阵。
编码模块1104被配置为执行基于各频带的目标变换矩阵,对声道组内声道的频域系数进行同频带去相关处理,得到声道组的编码信息。
发送模块1105被配置为执行基于声道组的编码信息得到编码码流,并将编码码流发给解码器进行解码。
在本公开的一个实施例中,矩阵确定模块1103,还被配置为执行:根据各声道的频域系数,确定各频带对应的声道间的第一互相关矩阵;从各频带的第一互相关矩阵中分别确定声道组的第二互相关矩阵,第二相关矩阵包括声道组内声道之间的互相关系数;基于声道组在各频带上的第二互相关矩阵,从变换矩阵集中确定声道组在各频带上的目标变换矩阵。
在本公开的一个实施例中,编码模块1104,还被配置为执行:获取声道组内声道在任一频带b上的频域系数,并根据任一频带b上的频域系数和任一频带b对应的目标变换矩阵,得到声道组在任一频带b上的第一编码信息;根据声道组的各频带上的第一 编码信息,得到声道组的第二编码信息;基于第二编码信息和各频带对应的目标变换矩阵,得到声道组的编码信息。
在本公开的一个实施例中,矩阵确定模块1103,还被配置为执行:基于任一频带b上的第二互相关矩阵,确定声道组内两两声道间在任一频带b上的互相关系数;根据声道组内两两声道间在任一频带b上的互相关系数,从变换矩阵集中确定声道组在任一频带b上的目标变换矩阵。
在本公开的一个实施例中,矩阵确定模块1103,还被配置为执行:若两两声道之间的互相关系数满足选用变换矩阵集中指定变换矩阵的条件,则选取指定变换矩阵作为任一频带b的目标变换矩阵;若两两声道之间的互相关系数未满足条件,则根据两两声道之间的最大互相关系数,从变换矩阵集中,选取除指定变换矩阵之外的变换矩阵,作为任一频带b的目标变换矩阵。
在本公开的一个实施例中,矩阵确定模块1103,还被配置为执行:根据任一声道的频域系数,确定任一声道在各频带上的能量值;根据各声道在各频带的能量值,确定各频带对应的第一互相关矩阵。
在本公开的一个实施例中,矩阵确定模块1103,还被配置为执行:获取声道序列中两两声道在任一频带b上的能量比值;若任一频带b上的能量比值小于或者等于第一设定阈值,或者,若任一频带b上的能量比值大于或者等于第二设定阈值,确定两两声道间在任一频带b的互相关系数为零;其中,第一设定阈值小于第二设定阈值,若能量比值处于第一设定阈值与第二设定阈值之间,根据两两声道在任一频带b的频域系数,确定两两声道在任一频带b的互相关系数;基于两两声道在任一频带b的互相关系数,得到任一频带b对应的第一互相关矩阵。
在本公开的一个实施例中,矩阵确定模块1103,还被配置为执行:对各频带的第一互相关矩阵进行归一化处理,并根据声道组内所包括的声道,从各频带的归一化第一互相关矩阵中,提取声道组对应的各频带的第二互相关矩阵。
在本公开的一个实施例中,矩阵确定模块1103,还被配置为执行:确定第一互相关矩阵中的任一矩阵元素所关联的声道标识;根据所关联的声道标识,确定任一矩阵元素对应的归一化矩阵元素;根据任一矩阵元素和归一化矩阵元素,得到任一矩阵元素的归一化结果。
在本公开的一个实施例中,编码模块1104,还被配置为执行:针对首个声道组,基于首个声道组内声道在任一频带b上的频域系数,并根据任一频带b上的频域系数和任一频带b对应的目标变换矩阵,得到首个声道组在任一频带b上的第一编码信息,第一编码信息包括首个声道组的中心信息、侧信息和第一信息;针对除首个声道组之外的 每个剩余声道组,基于剩余声道组内声道在任一频带b上的频域系数,并根据任一频带b上的频域系数和任一频带b对应的目标变换矩阵,得到剩余声道组在任一频带b上的第一编码信息,第一编码信息包括剩余声道组的第一信息。
在本公开的一个实施例中,声道分组模块1101,还被配置为执行:确定相邻的声道组包括第一声道组和第二声道组,其中,第一声道组和第二声道组分别包括声道序列中的连续三个声道,第一声道组和第二声道组中包括两个相同的声道。
本公开实施例中,通过目标变换矩阵对声道组内每个频带上的频域系数进行编码处理,进而可以实现对多个声道的音频信号的压缩,而且可以降低多声道之间的冗余,减少了编码器的负担,降低了传输和存储成本。
图12是是根据一示例性实施例示出的一种音频解码装置的框图。参照图12,本公开实施例的音频解码装置1200,包括:接收模块1201,矩阵确定模块1202,解码模块1203。
接收模块1201被配置为执行接收编码器发送的编码码流,编码码流中包括多个声道组的编码信息,声道组由声道序列按序分组得到,每个声道组包括声道序列中连续的若干个声道,相邻的声道组间存在一个或多个相同的声道。
矩阵确定模块1202被配置为执行对多个声道组按序进行解码,针对解码到的当前声道组,根据当前声道组的编码信息,确定当前声道组对应的频带集中各频带的目标解码矩阵。
解码模块1203被配置为执行基于当前声道组在各频带上的目标解码矩阵,对当前声道组的编码信息,得到当前声道组的解码频域系数,并根据多个声道组的解码频域系数,获取声道序列中各声道的解码音频信号。
在本公开的一个实施例中,矩阵确定模块1202,还被配置为执行:从编码信息中获取各频带的目标变换矩阵;根据任一频带b的目标变换矩阵,查询变换矩阵与解码矩阵之间的映射关系,获取当前声道组在任一频带b上的目标解码矩阵。
在本公开的一个实施例中,解码模块1203,还被配置为执行:从首个声道组的编码信息中,获取首个声道组在任一频带b上的第一编码信息;基于首个声道组在任一频带b上的目标解码矩阵,对首个声道组在任一频带b上的第一编码信息进行解码,得到首个声道组在任一频带b上的第一解码频域系数;根据首个声道组在各频带上的第一解码频域系数,得到首个声道组的解码频域系数,其中,首个声道组的第一解码频域系数和解码频域系数包括三路输出。
在本公开的一个实施例中,解码模块1203,还被配置为执行:首个声道组在任一频带b上的第一编码信息,至少包括任一频带b上的中心信息、侧信息和第一信息。
在本公开的一个实施例中,解码模块1203,还被配置为执行:确定与当前声道组相邻且连续的若干个已解码声道组,为当前声道组对应的上混声道组;从编码信息中获取当前声道组在任一频带b上的第一编码信息;获取上混声道组在各频带上的解码频域系数;根据任一频带b对应的目标解码矩阵和任一频带b上的解码频域系数,对任一频带b上的第一编码信息进行解码,得到任一频带b上的第一解码频域系数;根据当前声道组的各频带上的第一解码频域系数,得到当前声道组的解码频域系数,当前声道组的解码频域系数包括一路输出。
在本公开的一个实施例中,解码模块1203,还被配置为执行:当前声道组在任一频带b上的第一编码信息包括所述当前声道组在所述任一频带b上的第一信息。
在本公开实施例中,解码器在解码过程中,采用与编码器侧类似的分频解码,可以实现对多声道音频信号的恢复,由于编码器进行了压缩,使得多声道信号更便于传输,节省了传输空间。
图13是本公开实施例提供的另一种音频处理装置1300的结构示意图。音频处理装置1300可以是编码器,也可以是解码器,也可以是支持编码器实现上述方法的芯片、芯片系统、或处理器等,还可以是支持解码器实现上述方法的芯片、芯片系统、或处理器等。该装置可用于实现上述方法实施例中描述的方法,具体可以参见上述方法实施例中的说明。
音频处理装置1300可以包括一个或多个处理器1301。处理器1301可以是通用处理器或者专用处理器等。例如可以是基带处理器或中央处理器。基带处理器可以用于对通信协议以及通信数据进行处理,中央处理器可以用于对音频处理装置(如,基站、基带芯片,解码器、解码器芯片,DU或CU等)进行控制,执行计算机程序,处理计算机程序的数据。
在一些实施例中,音频处理装置1300中还可以包括一个或多个存储器1302,其上可以存有计算机程序1304,处理器1301执行所述计算机程序1304,以使得音频处理装置1300执行上述方法实施例中描述的方法。在一些实施例中,所述存储器1302中还可以存储有数据。音频处理装置1300和存储器1302可以单独设置,也可以集成在一起。
在一些实施例中,音频处理装置1300还可以包括收发器1305、天线1306。收发器1305可以称为收发单元、收发机、或收发电路等,用于实现收发功能。收发器1305可以包括接收器和发送器,接收器可以称为接收机或接收电路等,用于实现接收功能;发送器可以称为发送机或发送电路等,用于实现发送功能。
在一些实施例中,音频处理装置1300中还可以包括一个或多个接口电路1307。接口电路1307用于接收代码指令并传输至处理器1301。处理器1301运行所述代码指令以使音频处理装置1300执行上述方法实施例中描述的方法。
在一种实现方式中,处理器1301中可以包括用于实现接收和发送功能的收发器。例如该收发器可以是收发电路,或者是接口,或者是接口电路。用于实现接收和发送功能的收发电路、接口或接口电路可以是分开的,也可以集成在一起。上述收发电路、接口或接口电路可以用于代码/数据的读写,或者,上述收发电路、接口或接口电路可以用于信号的传输或传递。
在一种实现方式中,处理器1301可以存有计算机程序1303,计算机程序1303在处理器1301上运行,可使得音频处理装置1300执行上述方法实施例中描述的方法。计算机程序1303可能固化在处理器1301中,该种情况下,处理器1301可能由硬件实现。
在一种实现方式中,音频处理装置1300可以包括电路,所述电路可以实现前述方法实施例中发送或接收或者通信的功能。本公开中描述的处理器和收发器可实现在集成电路(integrated circuit,IC)、模拟IC、射频集成电路RFIC、混合信号IC、专用集成电路(application specific integrated circuit,ASIC)、印刷电路板(printed circuit board,PCB)、电子设备等上。该处理器和收发器也可以用各种IC工艺技术来制造,例如互补金属氧化物半导体(complementary metal oxide semiconductor,CMOS)、N型金属氧化物半导体(nMetal-oxide-semiconductor,NMOS)、P型金属氧化物半导体(positive channel metal oxide semiconductor,PMOS)、双极结型晶体管(bipolar junction transistor,BJT)、双极CMOS(BiCMOS)、硅锗(SiGe)、砷化镓(GaAs)等。
以上实施例描述中的音频处理装置可以是编码器或者解码器,但本公开中描述的音频处理装置的范围并不限于此,而且音频处理装置的结构可以不受图13的限制。音频处理装置可以是独立的设备或者可以是较大设备的一部分。例如所述音频处理装置可以是:
(1)独立的集成电路IC,或芯片,或,芯片系统或子系统;
(2)具有一个或多个IC的集合,在一些实施例中,该IC集合也可以包括用于存储数据,计算机程序的存储部件;
(3)ASIC,例如调制解调器(Modem);
(4)可嵌入在其他设备内的模块;
(5)接收机、解码器、智能解码器、蜂窝电话、无线设备、手持机、移动单元、车载设备、编码器、云设备、人工智能设备等等;
(6)其他等等。
对于音频处理装置可以是芯片或芯片系统的情况,可参见图14所示的芯片的结构示意图。图14所示的芯片包括处理器1401和接口1402。其中,处理器1401的数量可以是一个或多个,接口1402的数量可以是多个。
在一些实施例中,芯片还包括存储器1403,存储器1403用于存储必要的计算机程序和数据。
在一些实现中,该芯片可以用于实现上述本公开实施例中解码器的功能。
在一些实现中,该芯片可以用于实现上述本公开实施例中编码器的功能。
本领域技术人员还可以了解到本公开实施例列出的各种说明性逻辑块(illustrative logical block)和步骤(step)可以通过电子硬件、电脑软件,或两者的结合进行实现。这样的功能是通过硬件还是软件来实现取决于特定的应用和整个系统的设计要求。本领域技术人员可以对于每种特定的应用,可以使用各种方法实现所述的功能,但这种实现不应被理解为超出本公开实施例保护的范围。
本公开实施例还提供一种音频处理系统,该系统包括前述图13实施例中作为编码器的音频处理装置和作为解码器的音频处理装置,或者,该系统包括前述图14实施例中作为编码器的音频处理装置和作为编码器的音频处理装置。
本公开实施例还提供一种可读存储介质,其上存储有指令,该指令被计算机执行时实现上述任一方法实施例的功能。
本公开实施例还提供一种包括计算机程序的计算机程序产品,该计算机程序产品被计算机执行时实现上述任一方法实施例的功能。
本公开实施例还提供一种计算机程序,当其在计算机上运行时,使得计算机执行上述上述任一方法实施例的功能。
需要说明的是,前述对方法、装置实施例的解释说明也适用于上述实施例的电子设备、计算机可读存储介质、计算机程序产品和计算机程序,此处不再赘述。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机程序。在计算机上加载和执行所述计算机程序时,全部或部分地产生按照本公开实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机程序可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例 如,所述计算机程序可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,高密度数字视频光盘(digital video disc,DVD))、或者半导体介质(例如,固态硬盘(solid state disk,SSD))等。
本领域普通技术人员可以理解:本公开中涉及的第一、第二等各种数字编号仅为描述方便进行的区分,并不用来限制本公开实施例的范围,也表示先后顺序。
本公开中的至少一个还可以描述为一个或多个,多个可以是两个、三个、四个或者更多个,本公开不做限制。在本公开实施例中,对于一种技术特征,通过“第一”、“第二”、“第三”、“A”、“B”、“C”和“D”等区分该种技术特征中的技术特征,该“第一”、“第二”、“第三”、“A”、“B”、“C”和“D”描述的技术特征间无先后顺序或者大小顺序。
本公开中各表所示的对应关系可以被配置,也可以是预定义的。各表中的信息的取值仅仅是举例,可以配置为其他值,本公开并不限定。在配置信息与各参数的对应关系时,并不一定要求必须配置各表中示意出的所有对应关系。例如,本公开中的表格中,某些行示出的对应关系也可以不配置。又例如,可以基于上述表格做适当的变形调整,例如,拆分,合并等等。上述各表中标题示出参数的名称也可以采用音频处理装置可理解的其他名称,其参数的取值或表示方式也可以音频处理装置可理解的其他取值或表示方式。上述各表在实现时,也可以采用其他的数据结构,例如可以采用数组、队列、容器、栈、线性表、指针、链表、树、图、结构体、类、堆、散列表或哈希表等。
本公开中的预定义可以理解为定义、预先定义、存储、预存储、预协商、预配置、固化、或预烧制。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本公开的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
以上所述,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化或替换,都应 涵盖在本公开的保护范围之内。因此,本公开的保护范围应以所述权利要求的保护范围为准。
本公开所有实施例均可以单独被执行,也可以与其他实施例相结合被执行,均视为本公开要求的保护范围。

Claims (27)

  1. 一种音频编码方法,其特征在于,由编码器执行,所述方法包括:
    对声道序列进行分组,得到多个声道组,每个所述声道组包括所述声道序列中连续的若干个声道,相邻的所述声道组间存在一个或多个相同的声道;
    对所述声道序列中各声道的音频信号按帧进行频域转换,得到各声道每帧的频域系数;
    根据各声道的所述频域系数,从变换矩阵集中确定所述声道组对应的频带集中各频带的目标变换矩阵;
    基于各频带的所述目标变换矩阵,对所述声道组内声道的频域系数进行同频带去相关处理,得到所述声道组的编码信息;
    基于所述声道组的编码信息得到编码码流,并将所述编码码流发给解码器进行解码。
  2. 根据权利要求1所述的方法,其特征在于,所述根据各声道的所述频域系数,从变换矩阵集中确定所述声道组对应的各频带的目标变换矩阵,包括:
    根据各声道的所述频域系数,确定各频带对应的声道间的第一互相关矩阵;
    从各频带的所述第一互相关矩阵中分别确定所述声道组的第二互相关矩阵,所述第二相关矩阵包括所述声道组内声道之间的互相关系数;
    基于所述声道组在各频带上的所述第二互相关矩阵,从所述变换矩阵集中确定所述声道组在各频带上的目标变换矩阵。
  3. 根据权利要求1或2所述的方法,其特征在于,所述基于各频带的所述目标变换矩阵,对所述声道组中声道的频域系数进行同频带去相关处理,得到所述声道组的编码信息,包括:
    获取所述声道组内声道在任一频带b上的频域系数,并根据所述任一频带b上的频域系数和所述任一频带b对应的所述目标变换矩阵,得到所述声道组在所述任一频带b上的第一编码信息;
    根据所述声道组的各频带上的所述第一编码信息,得到所述声道组的第二编码信息;
    基于所述第二编码信息和各频带对应的所述目标变换矩阵,得到所述声道组的编码信息。
  4. 根据权利要求2所述的方法,其特征在于,所述基于所述声道组在各频带上的第二互相关矩阵,从所述变换矩阵集中确定所述声道组在各频带上的目标变换矩阵,包括:
    基于任一频带b上的所述第二互相关矩阵,确定所述声道组内两两声道间在任一频带上的互相关系数;
    根据所述声道组内两两声道间在所述任一频带b上的互相关系数,从所述变换矩阵集中确定所述声道组在所述任一频带b上的目标变换矩阵。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述声道组内两两声道间在任一频带b上的互相关系数,从所述变换矩阵集中,确定所述声道组在所述任一频带上的目标变换矩阵,包括:
    若所述两两声道之间的互相关系数满足选用所述变换矩阵集中指定变换矩阵的条件,则选取所述指定变换矩阵作为所述任一频带b的目标变换矩阵;
    若所述两两声道之间的互相关系数未满足所述条件,则根据所述两两声道之间的最大互相关系数,从所述变换矩阵集中,选取除所述指定变换矩阵之外的变换矩阵,作为所述任一频带b的目标变换矩阵。
  6. 根据权利要求2所述的方法,其特征在于,所述根据各声道的所述频域系数,确定各频带对应的声道间的第一互相关矩阵,包括:
    根据任一声道的频域系数,确定所述任一声道在各频带上的能量值;
    根据各声道在各频带的能量值,确定各频带对应的所述第一互相关矩阵。
  7. 根据权利要求6所述的方法,其特征在于,所述根据各声道在各频带的能量值,确定各频带对应的所述第一互相关矩阵,包括:
    获取所述声道序列中两两声道在任一频带b上的能量比值;
    若所述任一频带b上的能量比值小于或者等于第一设定阈值,或者,所述任一频带b上的能量比值大于或者等于第二设定阈值确定所述两两声道间在所述任一频带b的互相关系数为零,其中,所述第一设定阈值小于所述第二设定阈值;
    若所述任一频带b上的能量比值处于所述第一设定阈值与所述第二设定阈值之间,根据所述两两声道在所述任一频带b的频域系数,确定所述两两声道在所述任一频带b的互相关系数;
    基于所述两两声道在所述任一频带b的互相关系数,得到所述任一频带b对应的所述第一互相关矩阵。
  8. 根据权利要求2所述的方法,其特征在于,所述从各频带的所述第一互相关矩阵中分别确定所述声道组的第二互相关矩阵,还包括:
    对各频带的所述第一互相关矩阵进行归一化处理,并根据所述声道组内所包括的声道,从各频带的归一化第一互相关矩阵中,提取所述声道组对应的各频带的所述第二互相关矩阵。
  9. 根据权利要求7所述的方法,其特征在于,所述对所述第一互相关矩阵进行归一化处理,包括:
    确定所述第一互相关矩阵中的任一矩阵元素所关联的声道标识;
    根据所述所关联的声道标识,确定所述任一矩阵元素对应的归一化矩阵元素;
    根据所述任一矩阵元素和所述归一化矩阵元素,得到所述任一矩阵元素的归一化结果。
  10. 根据权利要求3至9中任一项所述的方法,其特征在于,所述方法还包括:
    针对首个声道组,基于所述首个声道组内声道在所述任一频带b上的频域系数,并根据所述任一频带b上的频域系数和所述任一频带b对应的所述目标变换矩阵,得到所述首个声道组在所述任一频带b上的第一编码信息,所述第一编码信息包括所述首个声道组的中心信息、侧信息和第一信息;
    针对除所述首个声道组之外的每个剩余声道组,基于所述剩余声道组内声道在所述任一频带b上的频域系数,并根据所述任一频带b上的频域系数和所述任一频带b对应的所述目标变换矩阵,得到所述剩余声道组在所述任一频带b上的第一编码信息,所述第一编码信息包括剩余声道组的第一信息。
  11. 根据权利要求1至9中任一项所述的方法,其特征在于,所述相邻的所述声道组间存在一个或多个重叠的声道,包括:
    确定所述相邻的声道组包括第一声道组和所述第二声道组,其中,所述第一声道组和所述第二声道组分别包括所述声道序列中的连续三个声道,所述第一声道组和所述第二声道组中包括两个相同的声道。
  12. 一种音频解码方法,其特征在于,由解码器执行,所述方法包括:
    接收编码器发送的编码码流,所述编码码流中包括多个声道组的编码信息,所述声道组由声道序列按序分组得到,每个所述声道组包括所述声道序列中连续的若干个声道,相邻的所述声道组间存在一个或多个相同的声道;
    对所述多个声道组按序进行解码,针对解码到的当前声道组,根据所述当前声道组的编码信息,确定所述当前声道组对应的频带集中各频带的目标解码矩阵;
    基于所述当前声道组在各频带上的目标解码矩阵,对所述当前声道组的编码信息,得到所述当前声道组的解码频域系数;
    根据所述多个声道组的解码频域系数,获取所述声道序列中各声道的解码音频信号。
  13. 根据权利要求12所述的方法,其特征在于,所述根据所述当前声道组的编码信息,确定所述当前声道组对应的频带集中各频带的目标解码矩阵,包括:
    从所述编码信息中获取各频带的目标变换矩阵;
    根据任一频带b的所述目标变换矩阵,查询变换矩阵与解码矩阵之间的映射关系,获取所述当前声道组在所述任一频带b上的目标解码矩阵。
  14. 根据权利要求12或13所述的方法,其特征在于,所述当前声道为所述多个声道组中的首个声道组的情况下,其中,所述基于所述当前声道组在各频带上的目标解码矩阵,对所述当前声道组的编码信息,得到所述当前声道组的解码频域系数,包括:
    从所述首个声道组的编码信息中,获取所述首个声道组在任一频带b上的第一编码信息;
    基于所述首个声道组在所述任一频带b上的目标解码矩阵,对所述首个声道组在所述任一频带b上的第一编码信息进行解码,得到所述首个声道组在所述任一频带b上的第一解码频域系数;
    根据所述首个声道组在各频带上的所述第一解码频域系数,得到所述首个声道组的解码频域系数,所述首个声道组的第一解码频域系数和所述解码频域系数包括三路输出。
  15. 根据权利要求14所述的方法,其特征在于,所述首个声道组在任一频带b上的第一编码信息,至少包括所述任一频带上的中心信息、侧信息和第一信息。
  16. 根据权利要求12或13所述的方法,其特征在于,所述当前声道为所述多个声道组中的除首个声道组之外的声道组的情况下,其中,所述基于所述当前声道组在各频带上的目标解码矩阵,对所述当前声道组的编码信息,得到所述当前声道组的解码频域系数,包括:
    确定与所述当前声道组相邻且连续的若干个已解码声道组,为所述当前声道组对应的上混声道组;
    从所述编码信息中获取所述当前声道组在任一频带b上的第一编码信息;
    获取所述上混声道组在各频带上的解码频域系数;
    根据所述任一频带b对应的目标解码矩阵和所述任一频带b上的解码频域系数,对所述任一频带b上的第一编码信息进行解码,得到所述任一频带b上的第一解码频域系数;
    根据所述当前声道组的各频带上的所述第一解码频域系数,得到所述当前声道组的解码频域系数,所述当前声道组的解码频域系数为一路输出。
  17. 根据权利要求16所述的方法,其特征在于,所述当前声道组在任一频带b上的第一编码信息包括所述当前声道组在所述任一频带b上的第一信息。
  18. 一种音频编码装置,其特征在于,包括:
    声道分组模块,被配置为执行对声道序列进行分组,得到多个声道组,每个所述声道组包括所述声道序列中连续的若干个声道,相邻的所述声道组间存在一个或多个相同的声道;
    频域处理模块,被配置为执行对所述声道序列中各声道的音频信号按帧进行频域转换,得到各声道每帧的频域系数;
    矩阵确定模块,被配置为执行根据各声道的所述频域系数,从变换矩阵集中确定所述声道组对应的频带集中各频带的目标变换矩阵;
    编码模块,被配置为执行基于各频带的所述目标变换矩阵,对所述声道组内声道的频域系数进行同频带去相关处理,得到所述声道组的编码信息;
    发送模块,被配置为执行基于所述声道组的编码信息得到编码码流,并将所述编码码流发给解码器进行解码。
  19. 一种音频解码装置,其特征在于,包括:
    接收模块,被配置为执行接收编码器发送的编码码流,所述编码码流中包括多个声道组的编码信息,所述声道组由声道序列按序分组得到,每个所述声道组包括所述声道序列中连续的若干个声道,相邻的所述声道组间存在一个或多个相同的声道;
    矩阵确定模块,被配置为执行对所述多个声道组按序进行解码,针对解码到的当前声道组,根据所述当前声道组的编码信息,确定所述当前声道组对应的频带集中各频带的目标解码矩阵;
    解码模块,被配置为执行基于所述当前声道组在各频带上的目标解码矩阵,对所述当前声道组的编码信息,得到所述当前声道组的解码频域系数,并
    根据所述多个声道组的解码频域系数,获取所述声道序列中各声道的解码音频信号。
  20. 一种编码器,其特征在于,包括:
    处理器;
    用于存储处理器可执行指令的存储器;
    其中,所述处理器被配置为实现权利要求1至11中任一项所述方法的步骤。
  21. 一种解码器,其特征在于,包括:
    处理器;
    用于存储处理器可执行指令的存储器;
    其中,所述处理器被配置为实现权利要求12至17中任一项所述方法的步骤。
  22. 一种计算机可读存储介质,其上存储有计算机程序指令,其特征在于,该程序指令被处理器执行时实现权利要求1至11中任一项所述方法的步骤。
  23. 一种计算机可读存储介质,其上存储有计算机程序指令,其特征在于,该程序指令被处理器执行时实现权利要求12至17中任一项所述方法的步骤。
  24. 一种包括计算机程序的计算机程序产品,当其在计算机上运行时,使得计算机执行根据权利要求1至11中任一项所述的音频编码方法。
  25. 一种包括计算机程序的计算机程序产品,当其在计算机上运行时,使得计算机执行根据权利要求12至17中任一项所述的音频解码方法。
  26. 一种计算机程序,其特征在于,当其在计算机上运行时,使得计算机执行如权利要求1至11中任一项所述的音频编码方法。
  27. 一种计算机程序,其特征在于,当其在计算机上运行时,使得计算机执行如权利要求12至17中任一项所述的音频解码方法。
PCT/CN2024/087626 2023-04-14 2024-04-12 一种音频编码方法、装置、电子设备及存储介质 Ceased WO2024213147A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
KR1020257037772A KR20250168677A (ko) 2023-04-14 2024-04-12 오디오 인코딩 방법, 장치, 전자 기기 및 저장 매체
EP24788243.4A EP4697325A4 (en) 2023-04-14 2024-04-12 AUDIO CODING METHOD AND APPARATUS, AND ELECTRONIC DEVICE AND STORAGE MEDIA

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202310403661.8 2023-04-14
CN202310403661.8A CN116434760A (zh) 2023-04-14 2023-04-14 一种音频编码方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2024213147A1 true WO2024213147A1 (zh) 2024-10-17

Family

ID=87079338

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2024/087626 Ceased WO2024213147A1 (zh) 2023-04-14 2024-04-12 一种音频编码方法、装置、电子设备及存储介质

Country Status (4)

Country Link
EP (1) EP4697325A4 (zh)
KR (1) KR20250168677A (zh)
CN (1) CN116434760A (zh)
WO (1) WO2024213147A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116434760A (zh) * 2023-04-14 2023-07-14 北京小米移动软件有限公司 一种音频编码方法、装置、电子设备及存储介质
CN117730532A (zh) * 2023-10-31 2024-03-19 北京小米移动软件有限公司 编解码方法、终端、网络设备以及存储介质
CN120108406B (zh) * 2023-11-30 2025-11-28 荣耀终端股份有限公司 音频处理方法、车载音频设备、电子设备及车辆

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101071570A (zh) * 2007-06-21 2007-11-14 北京中星微电子有限公司 耦合声道的编、解码处理方法、音频编码装置及解码装置
US20090112606A1 (en) * 2007-10-26 2009-04-30 Microsoft Corporation Channel extension coding for multi-channel source
US20120020482A1 (en) * 2010-07-22 2012-01-26 Samsung Electronics Co., Ltd. Apparatus and method for encoding and decoding multi-channel audio signal
CN102982805A (zh) * 2012-12-27 2013-03-20 北京理工大学 一种基于张量分解的多声道音频信号压缩方法
CN103400582A (zh) * 2013-08-13 2013-11-20 武汉大学 面向多声道三维音频的编解码方法与系统
CN104240712A (zh) * 2014-09-30 2014-12-24 武汉大学深圳研究院 一种三维音频多声道分组聚类编码方法及系统
CN113948095A (zh) * 2020-07-17 2022-01-18 华为技术有限公司 多声道音频信号的编解码方法和装置
CN116434760A (zh) * 2023-04-14 2023-07-14 北京小米移动软件有限公司 一种音频编码方法、装置、电子设备及存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7502743B2 (en) * 2002-09-04 2009-03-10 Microsoft Corporation Multi-channel audio encoding and decoding with multi-channel transform selection
WO2006056100A1 (en) * 2004-11-24 2006-06-01 Beijing E-World Technology Co., Ltd Coding/decoding method and device utilizing intra-channel signal redundancy
MX2007014570A (es) * 2005-05-25 2008-02-11 Koninkl Philips Electronics Nv Codificacion predictiva de una senal de canales multiples.
US8190425B2 (en) * 2006-01-20 2012-05-29 Microsoft Corporation Complex cross-correlation parameters for multi-channel audio
US8046214B2 (en) * 2007-06-22 2011-10-25 Microsoft Corporation Low complexity decoder for complex transform coding of multi-channel sound
KR101698439B1 (ko) * 2010-04-09 2017-01-20 돌비 인터네셔널 에이비 Mdct-기반의 복소수 예측 스테레오 코딩
CN105336334B (zh) * 2014-08-15 2021-04-02 北京天籁传音数字技术有限公司 多声道声音信号编码方法、解码方法及装置
EP4440151A4 (en) * 2021-11-26 2024-11-27 Beijing Xiaomi Mobile Software Co., Ltd. METHOD AND DEVICE FOR STEREO AUDIO SIGNAL PROCESSING, ENCODING DEVICE, DECODING DEVICE AND STORAGE MEDIUM

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101071570A (zh) * 2007-06-21 2007-11-14 北京中星微电子有限公司 耦合声道的编、解码处理方法、音频编码装置及解码装置
US20090112606A1 (en) * 2007-10-26 2009-04-30 Microsoft Corporation Channel extension coding for multi-channel source
US20120020482A1 (en) * 2010-07-22 2012-01-26 Samsung Electronics Co., Ltd. Apparatus and method for encoding and decoding multi-channel audio signal
CN102982805A (zh) * 2012-12-27 2013-03-20 北京理工大学 一种基于张量分解的多声道音频信号压缩方法
CN103400582A (zh) * 2013-08-13 2013-11-20 武汉大学 面向多声道三维音频的编解码方法与系统
CN104240712A (zh) * 2014-09-30 2014-12-24 武汉大学深圳研究院 一种三维音频多声道分组聚类编码方法及系统
CN113948095A (zh) * 2020-07-17 2022-01-18 华为技术有限公司 多声道音频信号的编解码方法和装置
CN116434760A (zh) * 2023-04-14 2023-07-14 北京小米移动软件有限公司 一种音频编码方法、装置、电子设备及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4697325A4 *

Also Published As

Publication number Publication date
KR20250168677A (ko) 2025-12-02
EP4697325A4 (en) 2026-04-08
EP4697325A1 (en) 2026-02-18
CN116434760A (zh) 2023-07-14

Similar Documents

Publication Publication Date Title
WO2024213147A1 (zh) 一种音频编码方法、装置、电子设备及存储介质
CN112822491B (zh) 一种图像数据的编码、解码方法及装置
EP3874492A1 (en) Determination of spatial audio parameter encoding and associated decoding
EP3732678A1 (en) Determination of spatial audio parameter encoding and associated decoding
EP3973460A1 (en) Linear neural reconstruction for deep neural network compression
CN103167289B (zh) 图像的编码、解码方法及编码、解码装置
WO2024164284A1 (zh) 一种音频信号处理、装置、设备及存储介质
JP2023510556A (ja) オーディオ符号化および復号方法ならびにオーディオ符号化および復号デバイス
EP3991170A1 (en) Determination of spatial audio parameter encoding and associated decoding
US20140372389A1 (en) Data Encoding and Processing Columnar Data
CN102547291B (zh) 基于fpga的jpeg2000图像解码装置及方法
CN109391358B (zh) 极化码编码的方法和装置
KR100804640B1 (ko) 서브밴드 합성 필터링 방법 및 장치
EP3367575A1 (en) Removal of dummy bits prior to bit collection for 3gpp lte circular buffer rate matching
JP2026514117A (ja) オーディオ符号化方法、装置、電子デバイス及び記憶媒体
WO2025138715A1 (zh) 一种图像处理方法及其相关设备
CN109076224B (zh) 视频解码器及其制造方法,数据处理电路、系统和方法
CN107436876A (zh) 文件分割系统及方法
KR102938940B1 (ko) 멀티-채널 오디오 신호 인코딩/디코딩 방법 및 장치
JPWO2020009082A1 (ja) 符号化装置及び符号化方法
WO2023202296A1 (zh) 信号处理方法和设备
WO2024108449A1 (zh) 一种信号量化方法、装置、设备及存储介质
US9955163B2 (en) Two pass quantization of video data
WO2024020904A1 (zh) 智能反射表面irs的相移配置的发送、接收方法及装置
CN112715009B (zh) 预编码矩阵的指示方法、通信装置及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24788243

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2025559952

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2025559952

Country of ref document: JP

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112025022203

Country of ref document: BR

WWE Wipo information: entry into national phase

Ref document number: 202517108312

Country of ref document: IN

ENP Entry into the national phase

Ref document number: 1020257037772

Country of ref document: KR

Free format text: ST27 STATUS EVENT CODE: A-0-1-A10-A15-NAP-PA0105 (AS PROVIDED BY THE NATIONAL OFFICE)

WWE Wipo information: entry into national phase

Ref document number: KR1020257037772

Country of ref document: KR

Ref document number: 1020257037772

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 2024788243

Country of ref document: EP

Ref document number: 2025131047

Country of ref document: RU

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2024788243

Country of ref document: EP

Effective date: 20251114

ENP Entry into the national phase

Ref document number: 2024788243

Country of ref document: EP

Effective date: 20251114

WWE Wipo information: entry into national phase

Ref document number: 11202506928Y

Country of ref document: SG

WWP Wipo information: published in national office

Ref document number: 11202506928Y

Country of ref document: SG

ENP Entry into the national phase

Ref document number: 2024788243

Country of ref document: EP

Effective date: 20251114

WWP Wipo information: published in national office

Ref document number: 1020257037772

Country of ref document: KR

WWP Wipo information: published in national office

Ref document number: 202517108312

Country of ref document: IN

ENP Entry into the national phase

Ref document number: 2024788243

Country of ref document: EP

Effective date: 20251114

ENP Entry into the national phase

Ref document number: 2024788243

Country of ref document: EP

Effective date: 20251114

ENP Entry into the national phase

Ref document number: 2024788243

Country of ref document: EP

Effective date: 20251114

ENP Entry into the national phase

Ref document number: 2024788243

Country of ref document: EP

Effective date: 20251114

ENP Entry into the national phase

Ref document number: 2024788243

Country of ref document: EP

Effective date: 20251114

ENP Entry into the national phase

Ref document number: 2024788243

Country of ref document: EP

Effective date: 20251114

WWP Wipo information: published in national office

Ref document number: 2025131047

Country of ref document: RU

WWP Wipo information: published in national office

Ref document number: 2024788243

Country of ref document: EP