WO2017193551A1 - Procédé de codage de signal multicanal, et codeur - Google Patents

Procédé de codage de signal multicanal, et codeur Download PDF

Info

Publication number
WO2017193551A1
WO2017193551A1 PCT/CN2016/103596 CN2016103596W WO2017193551A1 WO 2017193551 A1 WO2017193551 A1 WO 2017193551A1 CN 2016103596 W CN2016103596 W CN 2016103596W WO 2017193551 A1 WO2017193551 A1 WO 2017193551A1
Authority
WO
WIPO (PCT)
Prior art keywords
current frame
itd parameter
frame
itd
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2016/103596
Other languages
English (en)
Chinese (zh)
Inventor
张兴涛
刘泽新
苗磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of WO2017193551A1 publication Critical patent/WO2017193551A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders

Definitions

  • Embodiments of the present invention relate to the field of audio coding and decoding, and more particularly, to an encoding method and an encoder for a multi-channel signal.
  • stereo audio has the sense of orientation and distribution of each sound source, which can improve the clarity, intelligibility and presence of sound, and is therefore favored by people.
  • Stereo processing techniques mainly include Mid/Sid (MS) encoding, Intensity Stereo (IS) encoding, and Parametric Stereo (PS) encoding.
  • MS Mid/Sid
  • IS Intensity Stereo
  • PS Parametric Stereo
  • the MS code combines and converts the two signals based on the inter-channel correlation.
  • the energy of each channel is mainly concentrated in the sum channel, so that the inter-channel redundancy is removed.
  • the rate saving depends on the correlation of the input signals.
  • the correlation of the left and right channel signals is poor, the left channel signal and the right channel signal need to be separately transmitted.
  • the IS code is based on the fact that the human ear auditory system is insensitive to the fine result of the phase difference of the high frequency component of the channel (for example, a component larger than 2 kHz), and the high frequency components of the left and right signals are simplified.
  • PS coding is based on the binaural auditory model, which converts stereo to a mono signal and a small number of spatial parameters (or spatially perceptual parameters) describing the spatial sound field at the encoding end, as shown in Figure 1 (x L in Figure 1 is the left channel) Time domain signal, x R is the right channel time domain signal).
  • the decoder After the decoder receives the mono signal, it further combines the spatial parameters to restore the stereo, as shown in Figure 2.
  • the PS coding compression ratio is high. Under the premise of maintaining good sound quality, higher coding gain can be obtained, and it can work in the full audio bandwidth, which can well restore the spatial sensing effect of stereo sound.
  • spatial parameters include Inter-channel Coherent (IC), Inter-channel Level Difference (ILD), and Inter-channel Time Difference (ITD). And Inter-channel Phase Difference (IPD).
  • IC Inter-channel Coherent
  • ILD Inter-channel Level Difference
  • IPD Inter-channel Time Difference
  • IPD Inter-channel Phase Difference
  • the IC describes the cross-correlation or coherence between channels, which determines the perception of the sound field range and improves the spatial and acoustic stability of the audio signal.
  • ILD is used to distinguish the horizontal direction of the stereo source and describes the difference in intensity between the channels, which will affect the frequency content of the entire spectrum.
  • ITD and IPD are spatial parameters that represent the horizontal orientation of the sound source. They describe the difference in time and phase between channels. This parameter mainly affects frequency components below 2 kHz.
  • ILD, ITD and IPD can determine the human ear's perception of the sound source position, can effectively determine the sound field position, and play an important role in the recovery of stereo signals.
  • stereo can be encoded in units of frames.
  • the ITD parameter corresponding to the current frame may be extracted based on the multi-channel signal in the current frame.
  • the ITD parameter of the current frame may be extracted based on the time domain signal, or the ITD parameter of the current frame may be extracted based on the frequency domain signal.
  • the ITD parameter extraction methods of all frames are consistent throughout the encoding process, and the ITD parameter extraction method is not flexible enough.
  • the application provides a coding method and an encoder for a multi-channel signal to improve the flexibility of the ITD parameter extraction method.
  • a method for encoding a multi-channel signal includes: acquiring a current frame including a multi-channel signal; determining feature information according to the multi-channel signal, wherein the feature information includes the current frame At least one of a frame type and a signal type, the frame type including a speech frame and/or a non-speech frame, the signal type including unvoiced and/or voiced sound; determining an ITD of the current frame based on the feature information a parameter; encoding the ITD parameter.
  • the ITD parameter of the current frame may represent the ITD parameter of the multi-channel signal in the current frame.
  • the solution determines the ITD parameter of the current frame according to the feature information, instead of taking the type or feature of the current frame multi-channel signal as in the prior art, and extracts the ITD parameter in a fixed manner. Therefore, the solution can improve the ITD parameter extraction. flexibility.
  • the determining the feature information according to the multi-channel signal includes: determining, according to the multi-channel signal, a frame type of the current frame; Determining the ITD parameter of the current frame according to the feature information, including: determining, in the case that the current frame is a non-speech frame, determining an ITD parameter of the current frame by using a first ITD parameter extraction manner; In the case that the current frame is a voice frame, the ITD parameter of the current frame is determined by using a second ITD parameter extraction manner.
  • the determining, by using the first ITD parameter extraction manner, the ITD parameter of the current frame including: before the current frame
  • the ITD parameter of one frame or the previous subframe is determined as the ITD parameter of the current frame.
  • the multi-channel signal may be processed in units of frames, usually 20 ms per frame.
  • the frame may be further divided into sub-frames for processing, for example, when a frame of 20 ms is divided into two sub-frames, Each subframe is 10 ms; when a frame of 20 ms is divided into 4 subframes, each subframe is 5 ms.
  • the previous frame of the current frame may refer to the previous frame immediately adjacent to the current frame, that is, the audio sample included in the current frame start point by 20 ms.
  • the previous subframe of the current frame may refer to the last subframe of the previous frame that is immediately adjacent to the current frame.
  • the background noise signal is generally carried, and the ITD parameter of the background noise signal generally fluctuates less, and the ITD parameter of the previous frame of the current frame can be directly determined as the ITD of the current frame. Parameters, which can improve coding efficiency.
  • the determining, according to the multi-channel signal, the ITD parameter of the current frame including: according to the multi-channel a signal, determining an initial ITD parameter of the current frame; smoothing an initial ITD parameter of the current frame according to an ITD parameter of a previous frame or a previous subframe of the current frame, to obtain an ITD of the current frame parameter.
  • the initial frame of the current frame is determined according to an ITD parameter of a previous frame or a previous subframe of the current frame
  • the determining, according to the multi-channel signal, the ITD parameter of the current frame including: according to the multi-channel And determining, by the signal, an initial ITD parameter of the K subframes of the current frame, where K is an integer greater than 1; according to an ITD parameter of a previous subframe of each of the K subframes, for each subframe
  • the initial ITD parameters are smoothed to obtain an ITD parameter of each of the subframes; and an ITD parameter of the K subframes is determined as an ITD parameter of the current frame.
  • the previous subframe of each of the above subframes may refer to the previous subframe immediately adjacent to each subframe.
  • the previous subframe of the first subframe is the last subframe of the previous frame immediately adjacent to the current frame, and the i-th of the K subframes >2)
  • the previous subframe of the i-th subframe is the i-1th subframe of the K subframes.
  • the value of the smoothing factor is determined based on a signal type of the current frame.
  • the smoothing factor is determined according to the signal type, which can further improve the flexibility of ITD parameter extraction.
  • the determining, according to the multi-channel signal, the ITD parameter of the current frame including: according to the multi-channel And generating a target frequency domain signal; performing frequency-time transform on the target frequency domain signal to obtain a target time domain signal; and determining an ITD parameter of the current frame according to the target time domain signal.
  • the phase of the target frequency domain signal is linearly related to the IPD of the multi-channel signal. In some implementations, the phase of the target frequency domain signal is the IPD of the multi-channel signal. It should be understood that the frequency domain signal may be represented by a complex number, and the complex number may be represented by amplitude and phase, and the phase of the target frequency domain signal may refer to a phase representing a complex number of signals constituting the target frequency domain.
  • the target frequency domain signal can be a cross-correlation signal of the multi-channel frequency domain signal.
  • determining the ITD parameter of the current frame according to the target time domain signal comprising: selecting a target sampling point from N sampling points of the target time domain signal, the target sampling a point is a sampling point having the largest sampling value among the N sampling points, where N is a number of sampling points of the target time domain signal; determining an ITD parameter of the current frame according to an index value corresponding to the target sampling point
  • the index value is used to indicate an ordering of the target sample points in the N sample points.
  • the index value is used to indicate that the target sampling point is the first one of the N sampling points.
  • the index value of the N sample points may be in the range of (-N/2, N/2). Assuming that the target sample point is the last one of the N sample points, the index value corresponding to the target sample point. It is N/2.
  • the generating, according to the multi-channel signal, the target frequency domain signal includes: determining, according to the multi-channel signal, Amplitude of the target frequency domain signal; determining an IPD parameter of the current frame multichannel signal according to the multichannel signal; according to an amplitude of the target frequency domain signal, and the current frame multichannel The IPD parameter of the signal generates the target frequency domain signal.
  • the determining, according to the multi-channel signal, the amplitude of the target frequency domain signal including: Determining an amplitude of the target frequency domain signal, wherein A M (k) represents an amplitude of the target frequency domain signal, and A 1 (k) and A 2 (k) respectively represent the multi-channel signal
  • a M (k) represents an amplitude of the target frequency domain signal
  • a 1 (k) and A 2 (k) respectively represent the multi-channel signal
  • the amplitude of the frequency domain signal of any two channels, k represents the frequency point, 0 ⁇ k ⁇ L/2
  • L represents the time-frequency transform length used when transforming the multi-channel signal from the time domain to the frequency domain.
  • the determining, according to the amplitude of the target frequency domain signal, and the IPD parameter of the current frame (specifically, the IPD parameter of the multi-channel signal in the current frame), generating the target frequency domain signal including: according to Generating the target frequency domain signal, wherein A M (k) represents the amplitude of the target frequency domain signal, X M_real (k) represents the real part of the target frequency domain signal, and X M_iamge (k) represents the The imaginary part of the target frequency domain signal, IPD(k) represents the IPD parameter, k represents the frequency point, 0 ⁇ k ⁇ L/2, and L represents the time when the multichannel signal is transformed from the time domain to the frequency domain. Time-frequency transform length.
  • the determining, according to the multi-channel signal, the current frame a frame type comprising: determining an energy of the multi-channel signal; determining the current frame as a non-speech frame if the energy of the multi-channel signal is less than or equal to a preset energy threshold; Where the energy of the multi-channel signal is greater than the energy threshold, the current frame is determined to be a speech frame.
  • the method further includes: determining an initial ITD parameter of the current frame according to the multi-channel signal; Determining the ITD parameter of the current frame by using the first ITD parameter extraction manner, including: determining an initial ITD parameter of the current frame as an ITD parameter of the current frame; and determining, by using a second ITD parameter extraction manner, the current
  • the ITD parameter of the frame includes: adjusting an initial ITD parameter of the current frame to obtain an ITD parameter of the current frame.
  • the adjusting the initial ITD parameter of the current frame to obtain the ITD parameter of the current frame includes: Determining an ITD parameter of the current frame according to a frame type of a previous frame or a previous N frame of the current frame, and an initial ITD parameter of the current frame, where N is an integer greater than 1.
  • Determining the ITD parameter of the current frame according to the frame type of the previous frame or the first N frame of the current frame and the initial ITD parameter of the current frame can improve the flexibility of ITD parameter extraction.
  • the frame type of the previous frame or the first N frame of the current frame, and the current frame Initial Determining the ITD parameter of the current frame including: in the case that the frame type of the previous frame or the first N frame of the current frame is a voice frame, according to the ITD parameter of the previous frame of the current frame
  • the initial ITD parameter of the current frame determines an ITD parameter of the current frame.
  • the current frame is a frame in the continuous voice frame, and the ITD parameters between consecutive voice frames are associated, according to the current frame.
  • the ITD parameters of one frame and the initial ITD parameters of the current frame determine the ITD parameters of the current frame, which can improve the flexibility of ITD parameter extraction.
  • the determining, according to an ITD parameter of a previous frame of the current frame, and an initial ITD parameter of the current frame The ITD parameter of the current frame includes: if the ITD parameter of the previous frame of the current frame is not a preset value, and the initial ITD parameter of the current frame is a preset value, the current frame is The ITD parameter of the previous frame is determined as the ITD parameter of the current frame; otherwise, the initial ITD parameter of the current frame is determined as the ITD parameter of the current frame.
  • the current frame is one frame in a continuous speech frame
  • the ITD parameter of the continuous speech frame generally fluctuates less
  • the ITD parameter of the previous frame of the current frame is determined as the ITD parameter of the current frame, which can avoid the calculation error of the ITD parameter.
  • the determining, according to an ITD parameter of a previous frame of the current frame, and an initial ITD parameter of the current frame, The ITD parameter of the current frame includes: if the ITD parameter of the previous frame of the current frame is not a preset value, and the initial ITD parameter of the current frame is a preset value, if continuously calculated The ITD parameter is that the number of preset values is less than a preset threshold, and the ITD parameter of the previous frame of the current frame is determined as the ITD parameter of the current frame; otherwise, the initial ITD parameter of the current frame is determined to be The ITD parameters of the current frame.
  • the preset value is zero.
  • an encoder comprising a multi-channel signal capable of performing the first aspect The unit of each step of the encoding method.
  • an encoder comprising a memory for storing a program, the processor for executing a program, and when the program is executed, the processor performs the first aspect method.
  • FIG. 3 is an exemplary flow chart of a time domain based ITD parameter extraction method in the prior art.
  • FIG. 4 is an exemplary flow chart of a frequency domain based ITD parameter extraction method in the prior art.
  • FIG. 5 is a schematic flowchart of a method for encoding a multi-channel signal according to an embodiment of the present invention.
  • FIG. 6 is a schematic flowchart of a method for encoding a multi-channel signal according to an embodiment of the present invention.
  • FIG. 7 is a schematic flowchart of a method for encoding a multi-channel signal according to an embodiment of the present invention.
  • FIG. 8 is a schematic flowchart of a method for encoding a multi-channel signal according to an embodiment of the present invention.
  • FIG 9 is an exemplary flow chart of the manner in which the ITD parameters of the current frame are extracted.
  • FIG. 10 is an exemplary flowchart of an extraction manner of an ITD parameter of a current frame.
  • Figure 11 is a schematic structural diagram of an encoder according to an embodiment of the present invention.
  • Figure 12 is a schematic structural diagram of an encoder according to an embodiment of the present invention.
  • the signal picked up by the first mic is the first channel signal
  • the signal picked up by the second mic is the second channel signal as an example:
  • the ILD describes the difference in intensity between the first channel signal and the second channel signal; if the ILD is greater than 0, it indicates that the energy of the first channel signal is higher than the energy of the second channel signal; if the ILD is equal to 0, The energy of the first channel signal is equal to the energy of the second channel signal; if the ILD is less than 0, the energy of the first channel signal is less than the energy of the second channel signal;
  • the time difference between the first channel signal and the second channel signal described by the ITD that is, the time difference between the sound source reaching the first microphone and the second microphone. If the ITD is greater than 0, the sound source reaches the first microphone. The time is earlier than the time when the sound source reaches the second mic; if the ITD is equal to 0, it means that the sound source arrives at the same time to reach the first mic and the second mic; if the ITD is less than 0, it indicates the time when the sound source reaches the first mic. Later than the time when the sound source reached the second microphone;
  • the IPD describes the phase difference between the first channel signal and the second channel signal, which is usually combined with the ITD parameter so that the decoder recovers the phase information of the multi-channel signal.
  • the method for extracting ITD parameters is mainly divided into a time domain based ITD parameter extraction method and a frequency domain based ITD parameter extraction method.
  • the two ITDs are respectively combined with FIG. 3 and FIG.
  • the parameter extraction method is introduced.
  • FIG. 3 is an exemplary flow chart of a time domain based ITD parameter extraction method.
  • the method of Figure 3 includes:
  • the ITD parameter may be extracted by using a time domain cross-correlation function based on the left and right channel time domain signals, for example, in the range of 0 ⁇ i ⁇ Tmax, and calculated:
  • T 1 takes the opposite of the index value corresponding to max(C n (i)); otherwise T 1 takes the index value corresponding to max(C p (i)); where i is the index value of the calculation cross-correlation function, T Max corresponds to the maximum value of the ITD value at different sampling rates, and Length is the frame length.
  • FIG. 4 is an exemplary flow chart of a frequency domain based ITD parameter extraction method.
  • the method of Figure 4 includes:
  • the time-frequency transform may use a Discrete Fourier Transformation (DFT) or a Modified Discrete Cosine Transform (MDCT) technique to transform the time domain signal into a frequency domain signal.
  • DFT Discrete Fourier Transformation
  • MDCT Modified Discrete Cosine Transform
  • the time-frequency transform may employ a DFT transform, and specifically, the DFT transform may be performed using the following formula.
  • n is the index value of the sample of the time domain signal
  • k is the index value of the frequency point of the frequency domain signal
  • L is the time frequency transform length.
  • x(n) is the left channel time domain signal or the right channel time domain signal.
  • the L frequency bins of the frequency domain signal may be divided into N subbands, and for the b th subband, the frequency bins included A b-1 ⁇ k ⁇ A b -1.
  • the amplitude can be calculated using the following formula:
  • the ITD parameter of the bth subband can be That is, the index value of the sample corresponding to the maximum value calculated by the formula (4).
  • the ITD parameters may be extracted in units of frames, sub-frames, or sub-bands, which are not specifically limited in this embodiment of the present invention.
  • the ITD parameter of the current frame may be an ITD parameter; when the ITD parameter is extracted in units of subframes or sub-bands, the ITD parameter of the current frame may be multiple ITD parameters, ie, each sub- The frame or each subband corresponds to an ITD parameter.
  • the ITD parameters can be extracted in units of frames or subframes.
  • the time-frequency transform can be performed in units of the current frame (ie, 20 ms), and the ITD parameter of the current frame is extracted; the current frame is divided into 2 subframes.
  • the time-frequency transform can be performed to extract the ITD parameter corresponding to each subframe; in the case where the current frame is divided into four subframes, the subframe (ie, 5 ms) can be used.
  • the unit performs time-frequency transform to extract the ITD parameters corresponding to each subframe.
  • the ITD parameter in the frequency domain based ITD parameter extraction method, can be extracted in units of frames or subframes.
  • the ITD parameters can also be extracted in units of sub-bands.
  • the ITD parameter extraction manner of all frames of the multi-channel signal is fixed, and cannot be flexibly adjusted according to actual conditions.
  • different frames of multi-channel signals have different characteristics. For example, some frames contain speech signals, some frames contain background noise signals; some frames have unvoiced speech signals, and some frames have voiced signals. Some frames have high energy and some have low energy.
  • Multi-channel signals Different types of frames or different types of signals can use the same or different ITD parameter extraction methods. For example, for background noise signals, their ITD parameters usually do not change greatly within a certain time range. Repetitively calculating the ITD parameters of the background noise signal by frame wastes coding resources and reduces coding efficiency.
  • FIG. 5 is a schematic flowchart of a method for encoding a multi-channel signal according to an embodiment of the present invention.
  • the encoding method of the multi-channel signal of FIG. 5 includes:
  • the multi-channel signal can be a multi-channel time domain signal; in some embodiments, the multi-channel signal can be a multi-channel frequency domain signal.
  • the feature information may be used to indicate the feature of the multi-channel signal.
  • the feature information may include at least one of a frame type and a signal type of the current frame, the frame type may include a voice frame and/or a non-speech frame; the signal type may include unvoiced and/or voiced.
  • the speech frame Is a frame containing a speech signal.
  • a non-speech frame may also be referred to as a background frame.
  • the signal in the background frame can be, for example, a background noise signal.
  • voice activity detection a frame including a voice signal may be referred to as a voice activated frame (or Active frame); a non-speech frame may be referred to as a voice inactive frame (or inactive frame).
  • the voice frame is used as the voice activation frame, and the non-voice frame is the voice inactive frame as an example.
  • the embodiment of the present invention does not limit the specific manner of determining the signal type of the multi-channel signal according to the multi-channel signal.
  • ZCR Zero Crossing Rate
  • the signal type of the multi-channel signal is unvoiced (or the current frame is an unvoiced frame); otherwise, The signal type of the multi-channel signal is voiced (or the current frame is a voiced frame).
  • the signal type of the multi-channel signal is voiced (or the current frame is a voiced frame) Otherwise, the signal type of the multi-channel signal is unvoiced (or the current frame is an unvoiced frame).
  • the ITD parameter of the current frame may be determined according to the frame type of the current frame. For example, different ITD parameter extraction methods are employed for voice activated frames and voice inactive frames.
  • the ITD parameter of the current frame may be determined according to the signal type of the multi-channel signal. For example, different ITD parameter extraction methods are used for unvoiced signals and voiced signals. The following text will be described in detail in conjunction with specific examples, which will not be described in detail herein.
  • the method of FIG. 5 may further include transmitting the encoded ITD parameters to the decoding end.
  • step 520 can include determining a frame type of the current frame according to the multi-channel signal; step 530 can include: employing the first ITD if the current frame is a non-speech frame The parameter extraction manner determines an ITD parameter of the current frame; in a case where the current frame is a voice frame, the ITD parameter of the current frame is determined by using a second ITD parameter extraction manner.
  • the manner of determining the frame type of the current frame according to the multi-channel signal is not specifically limited.
  • the frame type of the current frame can be determined based on the VAD.
  • first ITD parameter extraction mode and the second ITD parameter extraction mode are not specifically limited in the embodiment of the present invention, as long as the first ITD parameter extraction mode and the second ITD parameter extraction mode are different.
  • the first ITD parameter extraction manner may be determining an ITD parameter of a previous frame or a previous subframe of the current frame as an ITD parameter of the current frame.
  • the second ITD parameter extraction manner may be determining an ITD parameter of the current frame according to the multi-channel signal.
  • the ITD parameters of the current frame may be extracted in a time domain and frequency domain based manner in the prior art.
  • the extracted initial ITD parameters may be smoothed on the basis of the prior art to obtain an ITD parameter of the current frame.
  • the hybrid domain (time domain and frequency domain) based method may be used to extract the ITD parameters according to the embodiment of the present invention. The following describes the ITD parameters based on the hybrid domain in detail, and details are not described herein again.
  • the multi-channel signal is taken as an example for the left and right channel signals, but the embodiment of the present invention is not limited thereto.
  • the solution in the present application can be applied to processing any two channels of two-channel or multi-channel signals.
  • the left and right channels below may be any of the multi-channels. Two channels.
  • FIG. 6 is a schematic flowchart of a method for encoding a multi-channel signal according to an embodiment of the present invention. It should be understood that the processing steps or operations illustrated in FIG. 6 are merely examples, and that other operations of the present invention or variations of the various operations in FIG. 6 may be performed. Moreover, the various steps in FIG. 6 may be performed in a different order than that presented in FIG. 6, and it is possible that not all operations in FIG. 6 are to be performed.
  • the method of Figure 6 includes:
  • the current frame may be VAD, and according to the detection result, it is determined whether the current frame is a voice activated frame or a voice inactive frame.
  • step 630 may be performed; if it is a voice activated frame, step 640 may be performed.
  • the first ITD parameter extraction manner may include determining an ITD parameter of a previous frame or a previous subframe of the current frame as an ITD parameter of the current frame.
  • the ITD parameters of the current frame may be extracted in the manner described in FIG. 3, that is, the ITD parameters of the current frame are extracted in the time domain.
  • the ITD parameters of the current frame may be extracted in the manner described in FIG. 4, that is, the ITD parameters of the current frame are extracted in the frequency domain.
  • the ITD parameter of the current frame may be extracted in the hybrid domain, and the manner of extracting the ITD parameter in the hybrid domain according to the embodiment of the present invention is described in detail below with reference to FIG. 7 and FIG. No longer detailed.
  • the initial ITD parameter of the current frame may be extracted first; then the initial ITD parameter of the current frame is smoothed to obtain an ITD parameter of the current frame.
  • the initial ITD parameter of the current frame may be extracted in the manner described in FIG. 3, that is, the initial ITD parameter of the current frame is extracted in the time domain.
  • the ITD parameter of the current frame may be extracted in the manner described in FIG. 4, that is, the initial ITD parameter of the current frame is extracted in the frequency domain.
  • an initial ITD parameter of the current frame may be extracted in the hybrid domain.
  • the manner of extracting the ITD parameter in the hybrid domain according to an embodiment of the present invention is described in detail below with reference to FIG. 7 and FIG. 8.
  • T sm w 1 *T sm [-1] +w 2 *T 1 (5)
  • each subframe may correspond to an initial ITD parameter (the extraction method of the subframe ITD parameter is similar to the method for extracting the ITD parameters of the frame, and may also be classified into a time domain, a frequency domain, and a hybrid domain.
  • the extraction method, in order to avoid repetition, as described here, can be smoothed by the following formula for the initial ITD parameters of each sub-frame:
  • T sm (j) w 1 *T sm (j-1)+w 2 *T(j) (6)
  • the smoothing process can be implemented in encoding or on the decoding side.
  • the hybrid domain-based ITD parameter extraction method of the embodiment of the present invention is described in detail below with reference to FIG. 7 and FIG. 8.
  • the ITD parameter extraction manners described in Figures 7 and 8 can be used to extract the ITD parameters of the current frame; in addition, in embodiments where smoothing processing is required, the ITD parameter extraction methods described in Figures 7 and 8 can also be used to extract the current frame.
  • Initial ITD parameters The ITD parameter implementation of Figures 7 and 8 constructs a target frequency domain signal in the frequency domain.
  • the phase of the target frequency domain signal is the IPD of the multichannel signal, so that when the target frequency domain signal is converted to the time domain, the target is obtained.
  • the ITD parameter of the current frame is located at the index value corresponding to the sampling point where the sample value of the target time domain signal is the largest.
  • FIG. 7 and FIG. 8 is that the target frequency domain signals are constructed differently.
  • FIG. 7 is a schematic flowchart of a method for encoding a multi-channel signal according to an embodiment of the present invention.
  • the target frequency domain signal is mainly a frequency domain signal constructed by calculating the amplitude of the mono frequency domain signal and the IPD of the left and right channel signals by frequency-by-frequency points.
  • FIG. 7 the processing steps or operations illustrated in FIG. 7 are merely examples, and that other operations of the present invention or variations of the various operations in FIG. 7 may be performed.
  • the various steps in FIG. 7 may be performed in a different order than that presented in FIG. 7, and it is possible that not all operations in FIG. 7 are to be performed.
  • DFT Discrete Fourier Transformation
  • x L (n) and x R (n) are the time domain signals of the left and right channels, respectively
  • Length is the frame length or subframe length
  • k is the index value of the frequency point of the frequency domain signal
  • L is the time frequency transform length.
  • FFT Fast Fourier Transformation
  • the frequency-domain signal obtained after time-frequency transform is a complex signal, including real and imaginary parts, for the left channel.
  • the actual part is X L_real (k)
  • the imaginary part is X L_image (k)
  • the actual part is X R_real (k)
  • the imaginary part is X R_image (k)
  • the values of the real part and the imaginary part can be calculated as follows:
  • the obtained frequency domain signal includes 256 frequency points, wherein the 256th frequency point corresponds to the 8 kHz spectrum.
  • the 128th frequency point corresponds to the 4 kHz spectrum, and so on.
  • the amplitude A M (k) of the target frequency domain signal and the inter-channel phase difference IPD(k) may be calculated on a frequency-by-frequency basis, where k is the frequency point, 0 ⁇ k ⁇ L/2, L
  • the time-frequency transform length used to convert the time domain signals of the left and right channels into the frequency domain signals of the left and right channels.
  • the amplitude A M (k) of the target frequency domain signal may be calculated first:
  • the amplitude of the left channel frequency domain signal can be:
  • the amplitude of the right channel frequency domain signal can be:
  • IPD(k) ⁇ L(k)*R * (k), k 1 ⁇ k ⁇ k 2 (18)
  • L(k) and R(k) are the kth frequency point values of the left and right channel frequency domain signals respectively
  • the frequency point value includes the real part and the imaginary part
  • R * (k) represents the right sound
  • the conjugate of the kth frequency point value of the channel frequency domain signal, the real and imaginary parts of L(k) and R(k) can be constructed based on X L (k) and X R (k), which can be Further organized as:
  • the target frequency domain signal is further processed:
  • the target frequency domain signal may be obtained by using a look-up table method, for example, setting a sin function and a cos function table, using a look-up table method to obtain Target frequency domain signal, which can effectively reduce the computational complexity of the algorithm.
  • the target frequency domain signal may be windowed and subjected to Inverse Discrete Fourier Transform (IDFT).
  • IDFT Inverse Discrete Fourier Transform
  • the target frequency domain signal may be windowed first:
  • k is the frequency point, 0 ⁇ k ⁇ L/2
  • L is the time-frequency transform length used when converting the time domain signals of the left and right channels into the frequency domain signals of the left and right channels.
  • IDFT transform is performed on the windowed signal to obtain a target time domain signal:
  • n is the index value of the sampling point of the time domain signal, 0 ⁇ n ⁇ L/2.
  • step 730 may perform frequency-time conversion using IDFT, or may be employed.
  • IDFT Inverse Fast Fourier Transform
  • IFFT Inverse Fast Fourier Transform
  • the frequency-time transform may not be performed on all the frequency points, and the frequency-time transform may be performed only in a specific frequency domain, which can effectively reduce the computational complexity of the algorithm.
  • a frequency-time transform can be performed within the frequency range [k3, k4], where k3 > 0, k4 ⁇ L/2.
  • the magnitude of the target time domain signal can be expressed by:
  • the phase of the target frequency domain signal obtained after the frequency domain coefficient processing is the IPD of the first channel and the second channel. Further, due to the linear relationship between the IPD and the ITD, the target frequency domain signal can be approximately rewritten into the following formula:
  • the index value corresponding to the sample point with the largest sampled value of the target time domain signal is at the ITD.
  • FIG. 8 is a schematic flowchart of a method for encoding a multi-channel signal according to an embodiment of the present invention.
  • the target frequency domain signal is mainly a frequency domain signal constructed based on the conjugate of the signal of one channel of the left and right channel signals and the signal of the other channel.
  • the processing steps or operations illustrated in FIG. 8 are merely examples of variations in the various operations of FIG. 8 that may be performed by other embodiments of the present invention.
  • the various steps in FIG. 8 may be performed in a different order than that presented in FIG. 8, and it is possible that not all operations in FIG. 8 are to be performed.
  • each step in FIG. 8 corresponds to each step in FIG. 7 , except that the processing manner of step 820 is different from the processing manner of step 720. Other steps may refer to FIG. 7 and will not be described in detail herein.
  • the frequency domain signal of one channel is multiplied by the conjugate of the frequency domain signal of the other channel, and the phase of the obtained frequency domain signal is the IPD of the two channels.
  • the target frequency domain signal X M (k) can be calculated by:
  • L(k) and R(k) are the kth frequency point values of the left and right channel frequency domain signals respectively
  • the frequency value includes the real part and the imaginary part
  • R * (k) represents the right channel frequency domain signal.
  • the conjugates of k frequency point values, the real and imaginary parts of L(k) and R(k) can be constructed based on X L (k) and X R (k).
  • R(k) is the kth frequency point value of the frequency domain signal of the right channel
  • L * (k) is the conjugate of the kth frequency point value of the frequency domain signal of the left channel, 0 ⁇ k ⁇ L/ 2.
  • X M (k) after obtaining X M (k), it can be further X M (k) is normalized to give a target frequency domain signal.
  • step 610 in FIG. 6 may be performed in multiple manners, for example, the frame type of the current frame may be detected in the time domain; and the frame type of the current frame may also be detected in the frequency domain.
  • the VAD detection algorithm can be employed to detect the frame type of the current frame. Specifically, the frame type of the current frame can be detected based on the energy of the signal in the current frame.
  • the energy-based frame type detection mode will be exemplified below with reference to FIG.
  • FIG. 9 is an exemplary flow chart of the manner in which the ITD parameters of the current frame are extracted.
  • FIG. 9 mainly performs VAD detection on the current frame based on the energy of the signal in the current frame to determine whether the current frame is a voice activated frame or a voice inactive frame.
  • the processing steps or operations illustrated in FIG. 9 are merely examples of variations in the various operations of FIG. 9 that may be performed by other embodiments of the present invention.
  • the various steps in FIG. 9 may be performed in a different order than that presented in FIG. 9, and it is possible that not all operations in FIG. 9 are to be performed.
  • time domain signals of the left and right channels can be respectively subjected to Fast Fourier Transformation (FFT) transformation to obtain frequency domain signals of the left and right channels:
  • FFT Fast Fourier Transformation
  • x L (n) and x R (n) represent the time domain signals of the left and right channels, respectively
  • k is the index value of the frequency point of the frequency domain signal
  • Length is the frame length
  • L is the time frequency transform length
  • the complex signal obtained after FFT transformation includes the real part and the imaginary part.
  • the real part is X L_real (k)
  • the imaginary part is X L_image (k)
  • the real part is X R_real ( k)
  • the imaginary part is X R_image (k) where 0 ⁇ k ⁇ L/2.
  • X L_real (k) and X L_image (k) may adopt the following methods (X R_real (k), X R_image (k) are taken in the same way, and are not described here. ):
  • the energy of the current frame/subframe can be calculated according to the following formula:
  • the E VAD can be set to a fixed value, or can be adaptively adjusted according to the current frame/subframe energy.
  • step 930 can be performed; if E tot ⁇ E VAD , step 940 can be performed.
  • the first ITD parameter extraction manner may be: maintaining the ITD value of the previous frame/subframe of the current frame.
  • step 640 of FIG. 6 For the implementation of the second ITD parameter extraction, reference may be made to step 640 of FIG. 6. The following is still exemplified by the hybrid domain-based ITD parameter extraction method.
  • Step 1 the frequency domain coefficient processing can be performed in combination with the energy of the current frame/subframe.
  • the energy of the kth frequency point of the current frame/subframe is E(k), and if E(k)*L ⁇ E tot , the current frequency of the target frequency domain signal can be set to 0; otherwise, it can be calculated
  • the amplitude and IPD of the current frequency point of the target frequency domain signal are processed in a manner similar to that described in FIG. 7 to obtain a target frequency domain signal, where L is a frequency domain signal that converts the time domain signals of the left and right channels into left and right channels.
  • L is a frequency domain signal that converts the time domain signals of the left and right channels into left and right channels.
  • the time-frequency transform length used.
  • the amplitude A M (k) can be calculated using the following formula:
  • the amplitude of the left channel frequency domain signal at the kth frequency point is:
  • the amplitude of the right channel frequency domain signal at the kth frequency point is:
  • inter-channel phase difference IPD(k) of the left and right channel signals can be calculated by the following formula:
  • L(k) and R(k) are the kth frequency point values of the left and right channel frequency domain signals respectively
  • the frequency value includes the real part and the imaginary part
  • R * (k) represents the right channel frequency domain signal.
  • the conjugates of k frequency point values, the real and imaginary parts of L(k) and R(k) can be constructed based on X L (k) and X R (k).
  • the target frequency domain signal is constructed such that the phase of the target frequency domain signal is linearly related to the IPD of the left and right channel signals.
  • the target frequency domain signal can be constructed using the following formula:
  • Step 2 Perform frequency-time transform on the target frequency domain signal to obtain a target time domain signal.
  • the target frequency domain signal may be windowed and IDFT transformed to obtain a target time domain signal.
  • the target frequency domain signal may be windowed by using the following formula:
  • the IDFT transform of the windowed frequency domain signal may be performed by using the following formula to obtain a target time domain signal:
  • the amplitude A(n) of the target time domain signal may also be smoothed to obtain an amplitude smoothed value A sm (n).
  • different smoothing factors can be used for smoothing in combination with signal types. For example, for unvoiced frames, a smaller smoothing factor is used, and for voiced frames, a larger smoothing factor is used.
  • the target time domain signal amplitude A(n) can be calculated by the following formula:
  • A(n) can be smoothed by the following formula to obtain the amplitude smoothing value A sm (n):
  • Step 3 Determine an ITD parameter of the current frame or the current subframe according to an index value corresponding to a sampling point where the sampling value of the target time domain signal is the largest.
  • the index value corresponding to the sampling point with the largest sampling value of the target frequency domain signal may be determined as the ITD parameter of the current frame or the current subframe.
  • the index value corresponding to the sampling point with the largest sampling value of the target frequency domain signal may be transformed (eg, normalized, scaled, etc.), and the transformed value is determined as the ITD parameter of the current frame or the current subframe.
  • the ITD parameter of the previous frame or the previous subframe of the current frame may be determined as the ITD parameter of the current frame, but the embodiment of the present invention is not limited thereto.
  • the ITD parameter of the current frame may be extracted in the time domain, the frequency domain, or the hybrid domain; if the current frame is a voice frame, and the previous frame of the current frame is also a voice activated frame (ie, the current frame is For a frame in a continuous speech frame, since the ITD parameters of consecutive speech frames generally do not fluctuate greatly, if the ITD parameter of the previous frame of the current frame is not the preset value, the calculation result of the ITD parameter of the current frame is The preset value (the preset value may be, for example, 0), which may be caused by an error in the calculation of the ITD parameter of the current frame. Therefore, it may be considered to determine the ITD parameter of the previous frame or the previous subframe of the current frame as the current frame. ITD parameters. This implementation will be described in detail below with reference to FIG.
  • FIG. 10 is an exemplary flowchart of an extraction manner of an ITD parameter of a current frame. It should be understood that the processing steps or operations illustrated in FIG. 10 are merely examples, and that other operations of the present invention or variations of the various operations in FIG. 10 may be performed. Moreover, the various steps in FIG. 10 may be performed in a different order than that presented in FIG. 10, and it is possible that not all operations in FIG. 10 are to be performed.
  • step 9 is similar to step 910. Reference may be made to step 910. To avoid repetition, details are not described herein.
  • VAD detection can be performed based on the frequency domain signals of the left and right channels. If the current frame is a voice inactive frame, step 1030 is performed; if the current frame is a voice activated frame, step 1040 is performed.
  • the ITD parameter of the current frame may be calculated according to the frequency domain cross-correlation algorithm based on the left and right channel frequency domain coefficients.
  • the frequency domain cross-correlation algorithm can be implemented by the following formula:
  • L(k) and R(k) are the kth frequency point values of the left and right channel frequency domain signals respectively
  • the frequency value includes the real part and the imaginary part
  • R * (k) represents the right channel frequency domain signal.
  • the conjugates of k frequency point values, the real and imaginary parts of L(k) and R(k) can be constructed based on X L (k) and X R (k).
  • the ITD parameters calculated in the current frame may be adjusted based on the left and right channel frequency domain signals, combined with the ITD parameters of the previous frame of the current frame and/or the calculated number of consecutive ITD parameters.
  • the ITD parameter of the previous frame of the current frame is not For the preset value (the preset value can be, for example, 0), and the ITD parameter of the current frame is a preset value, the ITD parameter of the previous frame of the current frame can be used as the ITD parameter of the current frame; otherwise, the current frame can be The initial ITD parameter is determined as the ITD parameter of the current frame.
  • the ITD parameter of the previous frame of the current frame is obtained by VAD detection. It is not a preset value (the preset value can be, for example, 0), and the ITD parameter of the current frame is a preset value, and the continuously calculated ITD parameters (including the ITD parameter of the current frame) are preset values.
  • the ITD parameter of the previous frame of the current frame is used as the ITD parameter of the current frame, and the ITD parameter is continuously incremented to the preset value; otherwise, the initial ITD parameter of the current frame may be determined as the current The ITD parameter of the frame.
  • Figure 11 is a schematic structural diagram of an encoder according to an embodiment of the present invention.
  • the encoder 1100 of Figure 11 includes:
  • An obtaining unit 1110 configured to acquire a current frame that includes a multi-channel signal
  • a first determining unit 1120 configured to determine feature information according to the multi-channel signal, where the feature information includes at least one of a frame type and a signal type of the current frame, where the frame type includes a voice frame And/or non-speech frames, the signal types including unvoiced and/or voiced;
  • a second determining unit 1130 configured to determine an inter-channel time difference ITD parameter of the current frame according to the feature information
  • the coding unit 1140 is configured to encode the ITD parameter.
  • the first determining unit 1110 is specifically configured to determine, according to the multi-channel signal, a frame type of the current frame; the second determining unit 1120 is specifically configured to be in the current In the case that the frame is a non-speech frame, the first ITD parameter extraction manner is used to determine the ITD parameter of the current frame; and in the case that the current frame is a voice frame, the second ITD parameter extraction manner is used to determine the current frame. ITD parameters.
  • the second determining unit 1120 is specifically configured to determine an ITD parameter of a previous frame or a previous subframe of the current frame as an ITD parameter of the current frame.
  • the second determining unit 1120 is specifically configured to determine an ITD parameter of the current frame according to the multi-channel signal.
  • the second determining unit 1120 is specifically configured to: generate a target frequency domain signal according to the multi-channel signal; perform frequency-time transform on the target frequency domain signal to obtain a target time domain signal; and according to the target time domain signal, Determining an ITD parameter of the current frame.
  • the second determining unit 1120 is specifically configured to determine the target according to the multi-channel signal. An amplitude of the frequency domain signal; determining an IPD parameter of the current frame multichannel signal according to the multichannel signal; according to an amplitude of the target frequency domain signal, and an IPD of the current frame multichannel signal a parameter that generates the target frequency domain signal.
  • the second determining unit 1120 is specifically configured to be used according to Determining an amplitude of the target frequency domain signal, wherein A M (k) represents an amplitude of the target frequency domain signal, and A 1 (k) and A 2 (k) respectively represent the multi-channel signal
  • a M (k) represents an amplitude of the target frequency domain signal
  • a 1 (k) and A 2 (k) respectively represent the multi-channel signal
  • the amplitude of the frequency domain signal of any two channels, k represents the frequency point, 0 ⁇ k ⁇ L/2
  • L represents the time-frequency transform length used when transforming the multi-channel signal from the time domain to the frequency domain.
  • the second determining unit 1120 is specifically configured to be used according to Generating the target frequency domain signal, wherein A M (k) represents the amplitude of the target frequency domain signal, X M_real (k) represents the real part of the target frequency domain signal, and X M_iamge (k) represents the The imaginary part of the target frequency domain signal, IPD(k) represents the IPD parameter, k represents the frequency point, 0 ⁇ k ⁇ L/2, and L represents the time when the multichannel signal is transformed from the time domain to the frequency domain. Time-frequency transform length.
  • the second determining unit 1120 is specifically configured to determine an initial ITD parameter of the current frame according to the multi-channel signal; according to a previous frame of the current frame or The ITD parameter of the previous subframe is smoothed by the initial ITD parameter of the current frame to obtain an ITD parameter of the current frame.
  • the second determining unit 1120 is specifically configured to determine an initial ITD parameter of the K subframes of the current frame according to the multi-channel signal, where K is an integer greater than 1; An ITD parameter of a previous subframe of each of the K subframes, and smoothing an initial ITD parameter of each subframe to obtain an ITD parameter of each subframe; and an ITD of the K subframes The parameter is determined as the ITD parameter of the current frame.
  • T sm (j) represents the ITD parameter of the jth subframe
  • the value of the smoothing factor is determined based on a signal type of the current frame.
  • the first determining unit 1110 is specifically configured to determine an energy of the multi-channel signal; if the energy of the multi-channel signal is less than or equal to a preset energy threshold, Determining the current frame as a non-speech frame; and determining the current frame as a speech frame if the energy of the multi-channel signal is greater than the energy threshold.
  • the encoder further includes: a third determining unit, configured to determine an initial ITD parameter of the current frame according to the multi-channel signal; the second determining unit 1120 is specifically configured to determine an initial ITD parameter of the current frame as an ITD parameter of the current frame; and adjust an initial ITD parameter of the current frame to obtain an ITD parameter of the current frame.
  • a third determining unit configured to determine an initial ITD parameter of the current frame according to the multi-channel signal
  • the second determining unit 1120 is specifically configured to determine an initial ITD parameter of the current frame as an ITD parameter of the current frame; and adjust an initial ITD parameter of the current frame to obtain an ITD parameter of the current frame.
  • the second determining unit 1120 is specifically configured to determine, according to a frame type of a previous frame or a previous N frame of the current frame, and an initial ITD parameter of the current frame, The ITD parameter of the frame, where N is an integer greater than one.
  • the second determining unit 1120 is specifically configured to: according to the previous frame of the current frame or the frame type of the first N frame is a voice frame, according to the previous one of the current frame The ITD parameter of the frame and the initial ITD parameter of the current frame determine an ITD parameter of the current frame.
  • the second determining unit 1120 is specifically configured to: the ITD parameter of the previous frame of the current frame is not a preset value, and the initial ITD parameter of the current frame is a preset value.
  • the ITD parameter of the previous frame of the current frame is determined as the ITD parameter of the current frame; otherwise, the initial ITD parameter of the current frame may be determined as the ITD parameter of the current frame.
  • Figure 12 is a schematic structural diagram of an encoder according to an embodiment of the present invention.
  • the encoder 1200 of Figure 12 includes:
  • a memory 1210 configured to store a program
  • the processor 1220 is configured to execute a program in the memory 1210, when the program is executed, the processor 1220 acquires a current frame including a multi-channel signal, and determines feature information according to the multi-channel signal, where
  • the feature information includes at least one of a frame type and a signal type of the current frame, the frame type including a voice frame and/or a non-speech frame, the signal type including unvoiced and/or voiced sound; Information, determining an inter-channel time difference ITD parameter of the current frame; encoding the ITD parameter.
  • the processor 1220 is specifically configured to determine, according to the multi-channel signal, a frame type of the current frame, and in a case where the current frame is a non-voice frame, adopt a first The ITD parameter extraction manner determines an ITD parameter of the current frame; and in a case where the current frame is a voice frame, determining an ITD parameter of the current frame by using a second ITD parameter extraction manner.
  • the processor 1220 is specifically configured to determine an ITD parameter of a previous frame or a previous subframe of the current frame as an ITD parameter of the current frame.
  • the processor 1220 is specifically configured to determine an ITD parameter of the current frame according to the multi-channel signal.
  • the processor 1220 is specifically configured to: generate a target frequency domain signal according to the multi-channel signal; perform frequency-time transform on the target frequency domain signal to obtain a target time domain signal; and determine, according to the target time domain signal, The ITD parameters of the current frame.
  • the processor 1220 is specifically configured to determine an amplitude of the target frequency domain signal according to the multi-channel signal, and determine an IPD parameter of the current frame multi-channel signal according to the multi-channel signal; Generating the target frequency domain signal by using an amplitude of the target frequency domain signal and an IPD parameter of the current frame multichannel signal.
  • the processor 1220 is specifically configured to Determining an amplitude of the target frequency domain signal, wherein A M (k) represents an amplitude of the target frequency domain signal, and A 1 (k) and A 2 (k) respectively represent the multi-channel signal
  • a M (k) represents an amplitude of the target frequency domain signal
  • a 1 (k) and A 2 (k) respectively represent the multi-channel signal
  • the amplitude of the frequency domain signal of any two channels, k represents the frequency point, 0 ⁇ k ⁇ L/2
  • L represents the time-frequency transform length used when transforming the multi-channel signal from the time domain to the frequency domain.
  • the processor 1220 is specifically configured to Generating the target frequency domain signal, wherein A M (k) represents the amplitude of the target frequency domain signal, X M_real (k) represents the real part of the target frequency domain signal, and X M_iamge (k) represents the The imaginary part of the target frequency domain signal, IPD(k) represents the IPD parameter, k represents the frequency point, 0 ⁇ k ⁇ L/2, and L represents the time when the multichannel signal is transformed from the time domain to the frequency domain. Time-frequency transform length.
  • X 1 (k) represents the multiple sounds
  • X * 2 (k) represents the conjugate of the frequency domain signal of the second channel in the multichannel signal
  • k represents the frequency point, 0 ⁇ k ⁇ L /2
  • L represents a time-frequency transform length used when transforming the multi-channel signal from the time domain to the frequency domain; normalizing the amplitude of the frequency domain signal X M (k) to obtain the Target frequency domain signal.
  • the processor 1220 is specifically configured to determine, according to the multi-channel signal, an initial ITD parameter of the current frame, according to a previous frame or a previous subframe of the current frame.
  • the ITD parameter performs smoothing on the initial ITD parameter of the current frame to obtain an ITD parameter of the current frame.
  • the processor 1220 is specifically configured to determine an initial ITD parameter of the K subframes of the current frame according to the multi-channel signal, where K is an integer greater than 1: according to the K An ITD parameter of a previous subframe of each subframe in each subframe, and smoothing an initial ITD parameter of each subframe to obtain an ITD parameter of each subframe; determining an ITD parameter of the K subframes Is the ITD parameter of the current frame.
  • the value of the smoothing factor is determined based on a signal type of the current frame.
  • the processor 1220 is specifically configured to determine an energy of the multi-channel signal; if the energy of the multi-channel signal is less than or equal to a preset energy threshold, The current frame is determined to be a non-speech frame; in a case where the energy of the multi-channel signal is greater than the energy threshold, the current frame is determined to be a speech frame.
  • the processor 1220 is further configured to determine an initial ITD parameter of the current frame according to the multi-channel signal, where the processor 1220 is specifically configured to initialize an initial frame.
  • the ITD parameter is determined as an ITD parameter of the current frame; an initial ITD parameter of the current frame is adjusted to obtain an ITD parameter of the current frame.
  • the processor 1220 is specifically configured to determine, according to a frame type of a previous frame or a previous N frame of the current frame, and an initial ITD parameter of the current frame, ITD parameter, where N is an integer greater than one.
  • the processor 1220 is specifically configured to: according to the previous frame of the current frame, the frame type of the previous frame or the first N frame is a voice frame, according to the previous frame of the current frame.
  • the ITD parameter and the initial ITD parameter of the current frame determine an ITD parameter of the current frame.
  • the processor 1220 is specifically configured to: when an ITD parameter of a previous frame of the current frame is not a preset value, and an initial ITD parameter of the current frame is a preset value. Next, the ITD parameter of the previous frame of the current frame is determined as the ITD parameter of the current frame; otherwise, the initial ITD parameter of the current frame may be determined as the ITD parameter of the current frame.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present invention which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

L'invention concerne un procédé pour coder un signal multicanal, et un codeur, le procédé consistant : à acquérir la trame courante contenant un signal multicanal (510) ; à déterminer des informations de caractéristique selon le signal multicanal (520), les informations de caractéristique comprenant au moins un parmi le type de trame et le type de signal de la trame courante, le type de trame comprenant une trame vocale et/ou une trame non-vocale, le type de signal comprenant un son non exprimé et/ou un son exprimé ; à déterminer un paramètre ITD de la trame courante selon les informations de caractéristique (530) ; et à coder le paramètre ITD (540). La présente invention peut améliorer la précision d'extraction de paramètre ITD.
PCT/CN2016/103596 2016-05-10 2016-10-27 Procédé de codage de signal multicanal, et codeur Ceased WO2017193551A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610303992.4A CN107358959B (zh) 2016-05-10 2016-05-10 多声道信号的编码方法和编码器
CN201610303992.4 2016-05-10

Publications (1)

Publication Number Publication Date
WO2017193551A1 true WO2017193551A1 (fr) 2017-11-16

Family

ID=60266105

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/103596 Ceased WO2017193551A1 (fr) 2016-05-10 2016-10-27 Procédé de codage de signal multicanal, et codeur

Country Status (2)

Country Link
CN (1) CN107358959B (fr)
WO (1) WO2017193551A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109859749A (zh) 2017-11-30 2019-06-07 阿里巴巴集团控股有限公司 一种语音信号识别方法和装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1428953A (zh) * 2002-04-22 2003-07-09 西安大唐电信有限公司 一种多通道amr声码器的实现方法和设备
CN101147191A (zh) * 2005-03-25 2008-03-19 松下电器产业株式会社 语音编码装置和语音编码方法
CN101517637A (zh) * 2006-09-18 2009-08-26 皇家飞利浦电子股份有限公司 音频对象的编码与解码
US20110123031A1 (en) * 2009-05-08 2011-05-26 Nokia Corporation Multi channel audio processing
CN103180899A (zh) * 2010-11-17 2013-06-26 松下电器产业株式会社 立体声信号编码装置、立体声信号解码装置、立体声信号编码方法及立体声信号解码方法
CN103295577A (zh) * 2013-05-27 2013-09-11 深圳广晟信源技术有限公司 用于音频信号编码的分析窗切换方法和装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8725500B2 (en) * 2008-11-19 2014-05-13 Motorola Mobility Llc Apparatus and method for encoding at least one parameter associated with a signal source
EP2671221B1 (fr) * 2011-02-03 2017-02-01 Telefonaktiebolaget LM Ericsson (publ) Détermination de la différence de temps entre canaux pour un signal audio multicanal

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1428953A (zh) * 2002-04-22 2003-07-09 西安大唐电信有限公司 一种多通道amr声码器的实现方法和设备
CN101147191A (zh) * 2005-03-25 2008-03-19 松下电器产业株式会社 语音编码装置和语音编码方法
CN101517637A (zh) * 2006-09-18 2009-08-26 皇家飞利浦电子股份有限公司 音频对象的编码与解码
US20110123031A1 (en) * 2009-05-08 2011-05-26 Nokia Corporation Multi channel audio processing
US9129593B2 (en) * 2009-05-08 2015-09-08 Nokia Technologies Oy Multi channel audio processing
CN103180899A (zh) * 2010-11-17 2013-06-26 松下电器产业株式会社 立体声信号编码装置、立体声信号解码装置、立体声信号编码方法及立体声信号解码方法
CN103295577A (zh) * 2013-05-27 2013-09-11 深圳广晟信源技术有限公司 用于音频信号编码的分析窗切换方法和装置

Also Published As

Publication number Publication date
CN107358959A (zh) 2017-11-17
CN107358959B (zh) 2021-10-26

Similar Documents

Publication Publication Date Title
US12334084B2 (en) Multi-channel signal encoding method and encoder
US12154577B2 (en) Method for encoding multi-channel signal and encoder
US10573328B2 (en) Determining the inter-channel time difference of a multi-channel audio signal
EP2352145B1 (fr) Procédé et dispositif de codage de signal vocal transitoire, procédé et dispositif de décodage, système de traitement et support de stockage lisible par ordinateur
WO2017193551A1 (fr) Procédé de codage de signal multicanal, et codeur
US12620406B2 (en) System and method for speech enhancement in multichannel audio processing systems
CN107358960B (zh) 多声道信号的编码方法和编码器
CN107358961B (zh) 多声道信号的编码方法和编码器
US20250087230A1 (en) System and Method for Speech Enhancement in Multichannel Audio Processing Systems
HK40126983A (en) Transient signal encoding method and device, decoding method and device, and processing system
HK40002235A (en) Method for encoding multi-channel signal and encoder

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16901506

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 16901506

Country of ref document: EP

Kind code of ref document: A1