WO2016141732A1 - 确定声道间时间差参数的方法和装置 - Google Patents

确定声道间时间差参数的方法和装置 Download PDF

Info

Publication number
WO2016141732A1
WO2016141732A1 PCT/CN2015/095097 CN2015095097W WO2016141732A1 WO 2016141732 A1 WO2016141732 A1 WO 2016141732A1 CN 2015095097 W CN2015095097 W CN 2015095097W WO 2016141732 A1 WO2016141732 A1 WO 2016141732A1
Authority
WO
WIPO (PCT)
Prior art keywords
channel
domain signal
time domain
value
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2015/095097
Other languages
English (en)
French (fr)
Inventor
张兴涛
苗磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to AU2015385490A priority Critical patent/AU2015385490B2/en
Priority to EP15884410.0A priority patent/EP3252756B1/en
Priority to JP2017547541A priority patent/JP6487569B2/ja
Priority to BR112017018600-4A priority patent/BR112017018600A2/zh
Priority to MX2017011460A priority patent/MX365619B/es
Priority to RU2017135269A priority patent/RU2670843C9/ru
Priority to CA2977846A priority patent/CA2977846A1/en
Priority to SG11201706998QA priority patent/SG11201706998QA/en
Priority to KR1020177026484A priority patent/KR20170120645A/ko
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of WO2016141732A1 publication Critical patent/WO2016141732A1/zh
Priority to US15/698,107 priority patent/US10210873B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/022Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic

Definitions

  • the present invention relates to the field of audio processing and, more particularly, to a method and apparatus for determining inter-channel time difference parameters.
  • stereo audio has the sense of orientation and distribution of each source, which can improve the clarity and intelligibility of information, and is therefore favored by people.
  • a transmission technology for a stereo audio signal is known, and the encoding end converts a stereo signal into a mono audio signal and an Inter-Channel Time Difference (ITD) parameter, which are respectively encoded and transmitted.
  • ITD Inter-Channel Time Difference
  • the stereo signal is further restored according to parameters such as ITD, thereby enabling low-bit high-quality transmission of the stereo signal.
  • the encoding end is capable of determining the limit value T max of the ITD parameter at the sampling rate based on the sampling rate of the time domain signal of the mono audio, and thus, based on the frequency domain signal, the sub-band is [-T] Search calculations in the range of max , T max ] to obtain ITD parameters.
  • the above-mentioned large search range causes the prior art to calculate the ITD parameter process in the frequency domain with a large amount of calculation, which increases the performance requirement of the coding end and affects the processing efficiency.
  • Embodiments of the present invention provide a method and apparatus for determining a time difference parameter between channels, which can reduce the calculation amount of the inter-channel time difference parameter search calculation process in the stereo coding process.
  • a method for determining a time difference parameter between channels comprising: determining a reference parameter according to a time domain signal of the first channel and a time domain signal of the second channel, the reference parameter corresponding to the An acquisition sequence between the time domain signal of the first channel and the time domain signal of the second channel, wherein the time domain signal of the first channel and the time domain signal of the second channel correspond to the same time period; Determining a search range according to the reference parameter and the limit value T max , wherein the limit value T max is determined according to a sampling rate of the time domain signal of the first channel, the search range belongs to [-T max , 0], Or the search range belongs to [0, T max ]; based on the frequency domain signal of the first channel and the frequency domain signal of the second channel, performing search processing within the search range to determine the first channel And a first inter-channel time difference ITD parameter corresponding to the second channel.
  • determining the reference parameter according to the time domain signal of the first channel and the time domain signal of the second channel including: the first channel The time domain signal and the time domain signal of the second channel are subjected to cross-correlation processing to determine a first cross-correlation processing value and a second cross-correlation processing value, wherein the first cross-correlation processing value is the first channel a maximum function value of the cross-correlation function of the time domain signal relative to the time domain signal of the second channel within a preset range, the second cross-correlation processing value being a time domain signal of the second channel relative to the first The maximum function value of the cross-correlation function of the time domain signal of the channel in the preset range; determining the reference parameter according to the size relationship between the first cross-correlation processing value and the second cross-correlation processing value.
  • the reference parameter is an index value corresponding to a larger one of the first cross-correlation processing value and the second cross-correlation processing value. Or the opposite of the index value.
  • determining the reference parameter according to the time domain signal of the first channel and the time domain signal of the second channel including: The time domain signal of the first channel and the time domain signal of the second channel perform peak detection processing to determine a first index value and a second index value, wherein the first index value is related to the first channel An index value corresponding to a maximum amplitude value of the time domain signal within a preset range, the second index value being an index value corresponding to a maximum amplitude value of the time domain signal of the second channel within the preset range; The reference parameter is determined according to a size relationship between the first index value and the second index value.
  • the method further includes: performing smoothing processing on the first ITD parameter based on the second ITD parameter, where the first ITD parameter Is the ITD parameter of the first time period, the second ITD parameter is a smoothed value of the ITD parameter of the second time period, and the second time period is before the first time period.
  • an apparatus for determining a time difference parameter between channels comprising: a determining unit, configured to determine a reference parameter according to a time domain signal of the first channel and a time domain signal of the second channel,
  • the reference parameter corresponds to an acquisition sequence between the time domain signal of the first channel and the time domain signal of the second channel, wherein the time domain signal of the first channel and the time domain signal of the second channel Corresponding to the same time period, and determining a search range according to the reference parameter and the limit value T max , wherein the limit value T max is determined according to a sampling rate of the time domain signal of the first channel, the search range belongs to [- T max , 0], or the search range belongs to [0, T max ];
  • the processing unit is configured to perform, according to the frequency domain signal of the first channel and the frequency domain signal of the second channel, according to the reference parameter
  • a search process is performed to determine a first inter-channel time difference ITD parameter corresponding to the first channel and the second channel.
  • the determining unit is configured to perform cross-correlation processing on the time domain signal of the first channel and the time domain signal of the second channel to determine a first cross-correlation processing value and a second cross-correlation processing value, and determining the reference parameter according to the size relationship between the first cross-correlation processing value and the second cross-correlation processing value, wherein the first cross-correlation processing The value is a maximum function value of the cross-correlation function of the time domain signal of the first channel relative to the time domain signal of the second channel within a preset range, and the second cross-correlation processing value is the second channel The maximum function value of the cross-correlation function of the time domain signal relative to the time domain signal of the first channel within the predetermined range.
  • the determining unit is specifically configured to: correspond to the larger one of the first cross-correlation processing value and the second cross-correlation processing value
  • the index value or the inverse of the index value is determined as the reference parameter.
  • the determining unit is configured to perform peaking on the time domain signal of the first channel and the time domain signal of the second channel. Detecting, determining a first index value and a second index value, and determining the reference parameter according to a size relationship between the first index value and the second index value, wherein the first index value is The index value corresponding to the maximum amplitude value of the first-channel time domain signal in the preset range, the second index value corresponding to the maximum amplitude value of the time domain signal of the second channel in the preset range Index value.
  • the processing unit is further configured to perform a smoothing process on the first ITD parameter based on the second ITD parameter, where the first ITD The parameter is an ITD parameter of a first time period, the second ITD parameter being a smoothed value of the ITD parameter of the second time period, the second time period being before the first time period.
  • a method and apparatus for inter-channel time difference parameter by determining a reference parameter corresponding to an acquisition order between a time domain signal of a first channel and a time domain signal of a second channel in a time domain And determining, according to the reference parameter, a search range, and performing a search process on the frequency domain for the frequency domain signal of the first channel and the frequency domain signal of the second channel in the search range to determine the
  • the inter-channel time difference ITD parameter corresponding to the first channel and the second channel in the embodiment of the present invention, the search range determined according to the reference parameter belongs to [-T max , 0] or [0, T max ], which is smaller than There is a search range [-T max , T max ] in the technology, which can reduce the search calculation amount of the time difference ITD parameter between channels, reduce the performance requirement on the encoding end, and improve the processing efficiency of the encoding end.
  • FIG. 1 is a schematic flow chart of a method of determining an inter-channel time difference parameter according to an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a search range determination process in accordance with an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a process of determining a search range determination according to another embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a process of determining a search range determination according to still another embodiment of the present invention.
  • FIG. 5 is a schematic block diagram of an apparatus for determining an inter-channel time difference parameter according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of an apparatus for determining an inter-channel time difference parameter according to an embodiment of the present invention.
  • the execution body of the method 100 may be an encoding end device for transmitting an audio signal (also referred to as a transmitting device). As shown in FIG. 1, the method 100 includes:
  • the reference parameter corresponds to an acquisition sequence between the time domain signal of the first channel and the time domain signal of the second channel, wherein the time domain signal of the first channel and the time domain of the second channel
  • the signals correspond to the same time period
  • the method 100 of determining an inter-channel time difference parameter of an embodiment of the present invention may be applied to an audio system having at least two channels in which, by at least two channels (ie, including the first channel and the A two-channel mono signal synthesizes a stereo signal, for example, by a mono signal from the left channel (ie, an example of the first channel) and from the right channel (ie, an example of the second channel) The mono signal is synthesized into a stereo signal.
  • a parametric stereo (PS) technique can be cited as a method for transmitting the stereo signal.
  • the encoding end converts the stereo signal into a mono signal and a spatial sensing parameter, and respectively performs encoding, and the decoding end is obtained. After the mono audio, the stereo signal is further restored according to the spatial parameters.
  • the inter-channel time difference (ITD) parameter is a spatial parameter indicating the horizontal orientation of the sound source and is an important component of the spatial parameter.
  • the embodiment of the present invention mainly relates to the process of determining the ITD parameter.
  • the process of encoding and decoding the stereo signal and the mono signal according to the ITD parameter is similar to the prior art, and a detailed description thereof is omitted herein to avoid redundancy.
  • the audio system may also have three or more channels, and can pass The mono signal of any two channels is combined into a stereo signal.
  • the processing procedure of applying the method 100 to an audio system having two channels ie, left channel and right channel
  • left The channel is used as the first channel
  • the right channel is used as the second channel.
  • the encoding end device can acquire an audio signal corresponding to the left channel by, for example, an audio input device such as a microphone corresponding to the left channel, and according to a preset sampling rate ⁇ . (ie, an example of the sampling rate of the time domain signal of the first channel), the audio signal is sampled to generate a time domain signal of the left channel (ie, an example of the time domain signal of the first channel, below In order to facilitate understanding and distinction, record the time domain signal #L). Moreover, in the embodiment of the present invention, the process of acquiring the time domain signal #L may be similar to the prior art. Here, in order to avoid redundancy, detailed description thereof is omitted.
  • the sampling rate of the time domain signal of the first channel is the same as the sampling rate of the time domain signal of the second channel. Therefore, similarly, the encoding end device may correspond to the right channel by, for example.
  • An audio input device such as a microphone acquires an audio signal corresponding to the right channel, and samples the audio signal according to the sampling rate ⁇ to generate a time domain signal of the right channel (ie, the time of the second channel)
  • An example of the domain signal is hereinafter described as time domain signal #R) for ease of understanding and differentiation.
  • the time domain signal #L and the time domain signal #R are time domain signals corresponding to the same time period (or time domain signals acquired in the same time period), for example, when The domain signal #L and the time domain signal #R may be time domain signals corresponding to the same frame (ie, 20 ms). In this case, the time domain signal #L and the time domain signal #R can be obtained corresponding to the one frame signal.
  • An ITD parameter when the domain signal #L and the time domain signal #R may be time domain signals corresponding to the same frame (ie, 20 ms).
  • the time domain signal #L and the time domain signal #R may also be time domain signals corresponding to the same subframe (ie, 10 ms or 5 ms, etc.) in the same frame.
  • the time domain signal #R can obtain a plurality of ITD parameters corresponding to the one frame signal, for example, if the subframe corresponding to the time domain signal #L and the time domain signal #R is 10 ms, then the frame is passed (ie, , 20ms) signal can get two ITD parameters.
  • the subframe corresponding to the time domain signal #L and the time domain signal #R is 5 ms
  • four ITD parameters can be obtained by the one frame (ie, 20 ms) signal.
  • the lengths of the time periods corresponding to the time domain signal #L and the time domain signal #R enumerated above are merely illustrative, and the present invention is not limited thereto, and the length of the time period may be arbitrarily changed as needed.
  • the encoding end device can determine the reference parameter based on the time domain signal #L and the time domain signal #R.
  • the reference parameter may correspond to the time domain signal #L and the time domain signal #R acquisition order (for example, the sequence of input to the audio input device), and then, corresponding to the determination process of the reference parameter, the corresponding parameter The relationship is described in detail.
  • the reference parameter (ie, mode 1) may be determined by performing cross-correlation processing on the time domain signal #L and the time domain signal #R, and may also search for the time domain signal #L and the time domain signal.
  • the reference parameter (ie, mode 2) is determined by the maximum value of #R, and the mode 1 and mode 2 are described in detail below.
  • determining the reference parameter according to the time domain signal of the first channel and the time domain signal of the second channel including:
  • first cross correlation processing value Is a maximum function value of the cross-correlation function of the time domain signal of the first channel relative to the time domain signal of the second channel within a preset range
  • second cross-correlation processing value is the time of the second channel a maximum function value of the cross-correlation function of the domain signal relative to the time domain signal of the first channel within the predetermined range
  • the reference parameter is determined according to a size relationship between the first cross-correlation processing value and the second cross-correlation processing value.
  • the encoding end device may determine the cross-correlation function c n (i) of the time domain signal #L with respect to the time domain signal #R according to Equation 1 below, that is,
  • T max represents a limit value of the ITD parameter (or the maximum value of the acquisition time difference between the time domain signal #L and the time domain signal #R) may be determined according to the above sampling rate ⁇ , and the determination method thereof may be There is a technical similarity, and a detailed description thereof will be omitted herein to avoid redundancy.
  • x R (j) represents the signal value of the time domain signal #R at the jth sampling point
  • x L (j+i) represents the signal value of the time domain signal #L at the j+ith sampling point
  • Length represents The total number of sampling points included in the time domain signal #R, or the length of the time domain signal #R, for example, may be the length of one frame (ie, 20 ms) or the length of one subframe (for example, 10 ms or 5 ms, etc.) ).
  • the encoding end device can determine the maximum value of the cross correlation function c n (i)
  • the encoding end device can determine the cross-correlation function c p (i) of the time domain signal #R with respect to the time domain signal #L according to Equation 2 below, namely:
  • the encoding end device can determine the maximum value of the cross correlation function c p (i)
  • the encoding end device may be configured according to versus The relationship between the reference parameters is determined by the following method 1A or mode 1B.
  • the encoding end device can determine that the time domain signal #L is acquired before the time domain signal #R, that is, the ITD parameter between the left and right channels is a positive number.
  • the reference parameter T can be set to 1.
  • the encoding end device may determine that the reference parameter is greater than 0, thereby determining that the search range is [0, T max ], that is, when the time domain signal #L is acquired before the time domain signal #R,
  • the ITD parameter is a positive number and the search range is [0, T max ] (ie, the search range belongs to an example of [0, T max ]).
  • the encoding end device can determine that the time domain signal #L is acquired after the time domain signal #R, that is, the ITD parameter between the left and right channels is a negative number.
  • the reference parameter T can be set to zero.
  • the encoding end device may determine that the reference parameter is not greater than 0, thereby determining that the search range is [-T max , 0], that is, the time domain signal #L is acquired after the time domain signal #R.
  • the search range is [-T max , 0] (ie, the search range belongs to an example of [-T max , 0]).
  • the reference parameter is an inverse of an index value or an index value corresponding to a larger one of the first cross-correlation processing value and the second cross-correlation processing value.
  • the encoding end device can determine that the time domain signal #L is acquired before the time domain signal #R, that is, the ITD parameter between the left and right channels is a positive number.
  • the reference parameter T can be set to The corresponding index value.
  • the encoding end device may further determine whether the reference parameter T is greater than or equal to T max /2, and determine a search range according to the determination result, for example, when T When ⁇ T max /2, the search range is [T max /2, T max ] (that is, an example in which the search range belongs to [0, T max ]). When T ⁇ T max /2, the search range is [0, T max /2] (that is, another example in which the search range belongs to [0, T max ]).
  • the encoding end device can determine that the time domain signal #L is acquired after the time domain signal #R, that is, the ITD parameter between the left and right channels is a negative number.
  • the reference parameter T can be set to The opposite of the corresponding index value.
  • the encoding end device may further determine whether the reference parameter T is less than or equal to -T max /2, and determine a search range according to the determination result, for example.
  • the search range is [-T max , -T max /2] (that is, the search range belongs to an example of [-T max , 0]).
  • the search range is [-T max /2, 0] (that is, another example in which the search range belongs to [-T max , 0]).
  • determining the reference parameter according to the time domain signal of the first channel and the time domain signal of the second channel including:
  • the first index value is the first sound value
  • the second index value being an index corresponding to a maximum amplitude value of the time domain signal of the second channel within the preset range value
  • the reference parameter is determined according to a size relationship between the first index value and the second index value.
  • the encoding end device can detect the amplitude value of the time domain signal #L (represented as: L(j)) maximum value max(L(j)), j ⁇ [0, Length- 1], and record the index value p left corresponding to the max(L(j)), where Length represents the total number of sampling points included in the time domain signal #L.
  • the encoding end device can detect the amplitude value (represented as: R(j)) maximum value max(R(j)), j ⁇ [0, Length-1] of the time domain signal #R, and record the max (R) (j)) The corresponding index value p right , where Length represents the total number of sample points included in the time domain signal #R.
  • the encoding end device can determine the size relationship between p left and p right .
  • the encoding end device can determine that the time domain signal #L is acquired before the time domain signal #R, that is, the ITD parameter between the left and right channels is a positive number.
  • the reference parameter T can be set to 1.
  • the encoding end device may determine that the reference parameter is greater than 0, thereby determining that the search range is [0, T max ], that is, when the time domain signal #L is acquired before the time domain signal #R,
  • the ITD parameter is a positive number and the search range is [0, T max ] (ie, the search range belongs to an example of [0, T max ]).
  • the encoding end device may determine that the time domain signal #L is acquired after the time domain signal #R, that is, the ITD parameter between the left and right channels is a negative number. In this case, The reference parameter T is set to zero.
  • the encoding end device may determine that the reference parameter is not greater than 0, thereby determining that the search range is [-T max , 0], that is, the time domain signal #L is acquired after the time domain signal #R.
  • the search range is [-T max , 0] (ie, the search range belongs to an example of [-T max , 0]).
  • the encoding end device may perform time-frequency transform processing on the time domain signal #L to obtain a frequency domain signal of the left channel (ie, an example of a frequency domain signal of the first channel, hereinafter, for ease of understanding and distinction, Do the frequency domain signal #L).
  • the time domain signal #R may be subjected to time-frequency transform processing to obtain a frequency domain signal of the right channel (ie, an example of the frequency domain signal of the second channel, hereinafter, for ease of understanding and distinction, the frequency domain signal #R is recorded. )
  • a time-frequency transform process may be performed based on the following Equation 3 using a Fast Fourier Transformation (FFT) technique.
  • FFT Fast Fourier Transformation
  • X(k) represents the frequency domain signal and FFT_LENGTH represents the time-frequency transform length.
  • x(n) represents a time domain signal (ie, time domain signal #L or time domain signal #R), and Length represents the total number of sampling points included in the time domain signal.
  • the encoding end device can determine within the search range determined as described above, as described above
  • the frequency domain signal #L and the frequency domain signal #R perform search processing to determine ITD parameters between the left channel and the right channel.
  • search processing procedure can be cited:
  • the encoding end device may divide the FFT_LENGTH frequency points of the frequency domain signal into N subband (for example, 1) subband according to the preset bandwidth A, where the frequency included in the kth subband A k is included.
  • the point is A k-1 ⁇ b ⁇ A k -1,
  • the correlation function mag(j) of the frequency domain signal #L is calculated according to the following Equation 4.
  • X L (b) represents the signal value of the frequency domain signal #L at the bth frequency point
  • X R (b) represents the signal value of the frequency domain signal #R at the bth frequency point
  • FFT_LENGTH represents the time frequency conversion length.
  • the range of values of j is the search range determined as described above. For ease of understanding and explanation, the search range is denoted as [a, b].
  • the ITD parameter value of the kth subband is That is, the index value corresponding to the maximum value of mag(j).
  • one or more (corresponding to the number of sub-bands determined as described above) between the left channel and the right channel can be obtained as the ITD parameter value.
  • the encoding end device may further perform quantization processing or the like on the ITD parameter value, and send the processed ITD parameter value and the mono signal obtained by processing the left and right channel signals, for example, down-mixing, to the decoding end.
  • Device or, receiving device.
  • the decoder device can recover the stereo audio signal based on the mono audio signal and the ITD parameter value.
  • the method further includes:
  • the first ITD parameter is an ITD parameter of a first time period
  • the second ITD parameter is a smoothed value of an ITD parameter of a second time period
  • the second The time period is before the first time period
  • the encoding end device may further smooth the ITD parameter value as described above, as an example and not a limitation, the encoding end device. This smoothing can be performed according to Equation 5 below:
  • T sm (k) w 1 *T sm [-1] (k)+w 2 *T(k) Equation 5
  • T sm (k) represents the smoothed ITD parameter value corresponding to the kth frame or the kth subframe
  • T sm [-1] represents the k-1th frame or the k-1th subframe corresponding to
  • T(k) represents the unsmoothed ITD parameter value corresponding to the kth frame or the kth subframe
  • w 1 and w 2 are smoothing factors
  • T sm [-1] can be a preset value.
  • the foregoing smoothing process may be performed by the encoding end device, or may be performed by the decoding end device, and the present invention is not particularly limited, that is, the encoding end.
  • the device may also directly send the ITD parameter value obtained as described above to the decoding end device without performing the smoothing process described above, and perform smoothing processing on the ITD parameter value by the decoding end device, and perform smoothing processing by the decoding end device.
  • the method and process may be similar to the method and process of smoothing performed by the above-mentioned decoding device. Here, in order to avoid redundancy, detailed description thereof will be omitted.
  • a method of determining an inter-channel time difference parameter by determining a reference parameter corresponding to an acquisition order between a time domain signal of a first channel and a time domain signal of a second channel in a time domain, A search range can be determined based on the reference parameter, and search processing for the frequency domain signal of the first channel and the frequency domain signal of the second channel is performed in the frequency domain to determine the first
  • the inter-channel time difference ITD parameter corresponding to the first channel and the second channel, the search range determined according to the reference parameter in the embodiment of the present invention belongs to [-T max , 0] or [0, T max ], which is smaller than the existing
  • the search range [-T max , T max ] in the technology can reduce the search calculation amount of the time difference ITD parameter between channels, reduce the performance requirement on the encoding end, and improve the processing efficiency of the encoding end.
  • FIG. 5 shows a schematic block diagram of an apparatus 200 for determining an inter-channel time difference parameter in accordance with an embodiment of the present invention. As shown in FIG. 5, the apparatus 200 includes:
  • the determining unit 210 is configured to determine a reference parameter according to the time domain signal of the first channel and the time domain signal of the second channel, where the reference parameter corresponds to the time domain signal of the first channel and the second channel An acquisition sequence between the time domain signals, wherein the time domain signal of the first channel and the time domain signal of the second channel correspond to the same time period, and the search range is determined according to the reference parameter and the limit value T max Wherein the limit value T max is determined according to a sampling rate of the time domain signal of the first channel, the search range belongs to [-T max , 0], or the search range belongs to [0, T max ];
  • the processing unit 220 is configured to perform a search process to determine the first channel and the second channel according to the reference parameter according to the frequency domain signal of the first channel and the frequency domain signal of the second channel. Corresponding first inter-channel time difference ITD parameters.
  • the determining unit 210 is configured to perform cross-correlation processing on the time domain signal of the first channel and the time domain signal of the second channel to determine a first cross-correlation processing value and a second cross-correlation processing. a value, and determining the reference parameter according to the size relationship between the first cross-correlation processing value and the second cross-correlation processing value, wherein the first cross-correlation processing value is a relative time domain signal of the first channel a maximum function value of the cross-correlation function of the second-channel time domain signal in a preset range, the second cross-correlation processing value being a time domain signal of the second channel relative to the first channel The maximum function value of the cross-correlation function of the domain signal within the preset range.
  • the determining unit 210 is specifically configured to determine an index value corresponding to a larger one of the first cross-correlation processing value and the second cross-correlation processing value or an inverse of the index value as the reference parameter.
  • the determining unit 210 is configured to perform peak detection processing on the time domain signal of the first channel and the time domain signal of the second channel to determine the first index value and the second index value, and according to The size relationship between the first index value and the second index value determines the reference parameter, wherein the first index value is a maximum amplitude value that is within a preset range of the time domain signal of the first channel Corresponding index value, the second index value is an index value corresponding to a maximum amplitude value of the second channel time domain signal within the preset range.
  • the processing unit 220 is further configured to perform smoothing processing on the first ITD parameter based on the second ITD parameter, where the first ITD parameter is an ITD parameter of a first time period, and the second ITD parameter is a second A smoothed value of the ITD parameter of the time period, the second time period being before the first time period.
  • the apparatus 200 for determining the inter-channel time difference parameter according to the embodiment of the present invention may correspond to the encoding end device in the method of the embodiment of the present invention, and
  • the units and modules in the apparatus 200 for determining the inter-channel time difference parameter and the other operations and/or functions described above are respectively implemented in order to implement the corresponding processes of the method 100 in FIG. 1 , and are not described herein again for brevity.
  • An apparatus for determining an inter-channel time difference parameter by determining a reference parameter corresponding to an acquisition order between a time domain signal of a first channel and a time domain signal of a second channel in a time domain, A search range can be determined based on the reference parameter, and search processing for the frequency domain signal of the first channel and the frequency domain signal of the second channel is performed in the frequency domain to determine the first
  • the inter-channel time difference ITD parameter corresponding to the first channel and the second channel, the search range determined according to the reference parameter in the embodiment of the present invention belongs to [-T max , 0] or [0, T max ], which is smaller than the existing
  • the search range [-T max , T max ] in the technology can reduce the search calculation amount of the time difference ITD parameter between channels, reduce the performance requirement on the encoding end, and improve the processing efficiency of the encoding end.
  • FIGS. 1 through 4 a method of determining an inter-channel time difference parameter according to an embodiment of the present invention is described in detail with reference to FIGS. 1 through 4.
  • a method for determining an inter-channel time difference parameter according to an embodiment of the present invention will be described in detail with reference to FIG. device.
  • FIG. 6 shows a schematic block diagram of an apparatus 300 for determining an inter-channel time difference parameter in accordance with an embodiment of the present invention.
  • the device 300 can include:
  • processor 320 connected to the bus
  • the processor 320 calls the program stored in the memory 330 through the bus 310, and is configured to determine a reference parameter according to the time domain signal of the first channel and the time domain signal of the second channel, where the reference parameter corresponds to And an acquisition sequence between the time domain signal of the first channel and the time domain signal of the second channel, wherein the time domain signal of the first channel and the time domain signal of the second channel correspond to the same Time period
  • the search range belongs to [-T max , 0 ], or the search range belongs to [0, T max ];
  • the processor 320 is configured to perform cross-correlation processing on the time domain signal of the first channel and the time domain signal of the second channel to determine a first cross correlation processing value and a second cross correlation processing.
  • a value wherein the first cross-correlation processing value is a maximum function value of a cross-correlation function of the time domain signal of the first channel relative to a time domain signal of the second channel within a preset range, the second mutual The correlation processing value is a maximum function value of the cross-correlation function of the time domain signal of the second channel relative to the time domain signal of the first channel within the preset range;
  • the reference parameter is an index value corresponding to a larger one of the first cross-correlation processing value and the second cross-correlation processing value or an inverse of the index value.
  • the processor 320 is configured to perform peak detection processing on the time domain signal of the first channel and the time domain signal of the second channel to determine a first index value and a second index value, where
  • the first index value is an index value corresponding to a maximum amplitude value of the first channel time domain signal within a preset range
  • the second index value is a time domain signal with the second channel at the pre Set the index value corresponding to the maximum amplitude value in the range
  • the processor 320 is further configured to perform smoothing processing on the first ITD parameter based on the second ITD parameter, where the first ITD parameter is an ITD parameter of a first time period, and the second ITD parameter is a second A smoothed value of the ITD parameter of the time period, the second time period being before the first time period.
  • bus 310 includes a power bus, a control bus, and a status signal bus in addition to the data bus.
  • bus 310 includes a power bus, a control bus, and a status signal bus in addition to the data bus.
  • various buses are labeled as bus 310 in the figure.
  • the processor 320 can implement or perform the steps and logic blocks disclosed in the method embodiments of the present invention.
  • Processor 320 can be a microprocessor or the processor can be any conventional processor, decoder or the like.
  • the steps of the method disclosed in the embodiments of the present invention may be directly implemented by the hardware processor, or may be performed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like.
  • the storage medium is located in the memory 330, and the processor reads the information in the memory 330 and performs the steps of the above method in combination with its hardware.
  • the processor 320 may be a central processing unit (“CPU"), and the processor 320 may also be other general-purpose processors, digital signal processors (DSPs). , an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, and the like.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the memory 330 can include read only memory and random access memory and provides instructions and data to the processor 320. A portion of the memory 330 may also include a non-volatile random access memory. For example, the memory 330 can also store information of the device type.
  • each step of the foregoing method may be completed by an integrated logic circuit of hardware in the processor 320 or an instruction in a form of software.
  • the steps of the method disclosed in the embodiments of the present invention may be directly implemented as a hardware processor, or may be performed by a combination of hardware and software modules in the processor.
  • the software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like.
  • the apparatus 300 for determining the inter-channel time difference parameter according to the embodiment of the present invention may correspond to the encoding end device in the method of the embodiment of the present invention, and
  • the units and modules in the apparatus 300 for determining the inter-channel time difference parameter and the other operations and/or functions described above are respectively implemented in order to implement the corresponding processes of the method 100 in FIG. 1 , and are not described herein again for brevity.
  • An apparatus for determining an inter-channel time difference parameter by determining a reference parameter corresponding to an acquisition order between a time domain signal of a first channel and a time domain signal of a second channel in a time domain, A search range can be determined based on the reference parameter, and search processing for the frequency domain signal of the first channel and the frequency domain signal of the second channel is performed in the frequency domain to determine the first
  • the inter-channel time difference ITD parameter corresponding to the first channel and the second channel, the search range determined according to the reference parameter in the embodiment of the present invention belongs to [-T max , 0] or [0, T max ], which is smaller than the existing
  • the search range [-T max , T max ] in the technology can reduce the search calculation amount of the time difference ITD parameter between channels, reduce the performance requirement on the encoding end, and improve the processing efficiency of the encoding end.
  • the size of the sequence numbers of the above processes does not mean the order of execution, and the order of execution of each process should be determined by its function and internal logic, and should not be taken to the embodiments of the present invention.
  • the implementation process constitutes any limitation.
  • the disclosed systems, devices, and The method can be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present invention which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Stereophonic System (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

提供一种确定声道间时间差参数的方法和装置,能够降低在立体声编码过程中声道间时间差参数搜索计算过程的计算量,方法包括:根据第一声道的时域信号及第二声道的时域信号,确定基准参数,基准参数对应于第一声道的时域信号与第二声道的时域信号之间的获取顺序,第一声道的时域信号及第二声道的时域信号对应于同一时段(S110);根据基准参数和极限值T max,确定搜索范围,其中,极限值T max是根据第一声道的时域信号的采样率确定的,搜索范围属于[-T max,0],或搜索范围属于[0,T max] (S120);基于第一声道的频域信号及第二声道的频域信号,在搜索范围内进行搜索处理,以确定与第一声道及第二声道相对应的第一声道间时间差ITD参数(S130)。

Description

确定声道间时间差参数的方法和装置
本申请要求于2015年03月09日提交中国专利局、申请号为201510101315.X、发明名称为“确定声道间时间差参数的方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及音频处理领域,并且更具体地,涉及确定声道间时间差参数的方法和装置。
背景技术
随着生活质量的提高,人们对高质量音频的需求不断增大。相对于单声道音频,立体声音频具有各生源的方位感和分布感,能够提高信息的清晰度和可懂度,因而备受人们青睐。
目前,已知一种针对立体声音频信号的传输技术,编码端将立体声信号转换为单声道音频信号和声道间时间差(ITD,Inter-Channel Time Difference)等参数,分别对其进行编码并传输给解码端,解码端得到单声道音频信号后,进一步根据ITD等参数恢复立体声信号,从而,能够实现立体声信号的低比特高质量传输。
在上述技术中,编码端基于单声道音频的时域信号的采样率,能够确定该采样率下ITD参数的极限值Tmax,从而,可以基于该频域信号,逐子带在[-Tmax,Tmax]范围内搜索计算以获得ITD参数。
但是,上述较大的搜索范围致现有技术在频域内确定ITD参数过程的计算量较大,增加了编码端的性能要求,影响了处理效率。
因此,希望提供一种技术,能够在确保ITD参数准确性的前提下,减少ITD参数搜索计算过程的计算量。
发明内容
本发明实施例提供一种确定声道间时间差参数的方法和装置,能够降低在立体声编码过程中声道间时间差参数搜索计算过程的计算量。
第一方面,提供了一种确定声道间时间差参数的方法,该方法包括:根 据第一声道的时域信号及第二声道的时域信号,确定基准参数,该基准参数对应于该第一声道的时域信号与该第二声道的时域信号之间的获取顺序,其中,该第一声道的时域信号及该第二声道的时域信号对应于同一时段;根据该基准参数和极限值Tmax,确定搜索范围,其中,该极限值Tmax是根据该第一声道的时域信号的采样率确定的,该搜索范围属于[-Tmax,0],或该搜索范围属于[0,Tmax];基于该第一声道的频域信号及该第二声道的频域信号,在该搜索范围内进行搜索处理,以确定与该第一声道及该第二声道相对应的第一声道间时间差ITD参数。
结合第一方面,在第一方面的第一种实现方式中,该根据第一声道的时域信号及第二声道的时域信号,确定基准参数,包括:对该第一声道的时域信号及该第二声道的时域信号进行互相关处理,以确定第一互相关处理值及第二互相关处理值,其中,该第一互相关处理值是该第一声道的时域信号相对于该第二声道的时域信号的互相关函数在预设范围内的最大函数值,该第二互相关处理值是该第二声道的时域信号相对于该第一声道的时域信号的互相关函数在该预设范围内的最大函数值;根据该第一互相关处理值及该第二互相关处理值之间的大小关系,确定该基准参数。
结合第一方面及其上述实现方式,在第一方面的第二种实现方式中,该基准参数是该第一互相关处理值及该第二互相关处理值中较大一方所对应的索引值或者该索引值的相反数。
结合第一方面及其上述实现方式,在第一方面的第三种实现方式中,该根据第一声道的时域信号及第二声道的时域信号,确定基准参数,包括:对该第一声道的时域信号及该第二声道的时域信号进行峰值检测处理,以确定第一索引值及第二索引值,其中,该第一索引值是与该第一声道的时域信号在预设范围内的最大幅度值相对应的索引值,该第二索引值是与该第二声道的时域信号在该预设范围内的最大幅度值相对应的索引值;根据该第一索引值与该第二索引值之间的大小关系,确定该基准参数。
结合第一方面及其上述实现方式,在第一方面的第四种实现方式中,该方法还包括:基于第二ITD参数,对该第一ITD参数进行平滑处理,其中,该第一ITD参数是第一时段的ITD参数,该第二ITD参数是第二时段的ITD参数的平滑值,该第二时段处于该第一时段之前。
第二方面,提供了一种确定声道间时间差参数的装置,该装置包括:确 定单元,用于根据第一声道的时域信号及第二声道的时域信号,确定基准参数,该基准参数对应于该第一声道的时域信号与该第二声道的时域信号之间的获取顺序,其中,该第一声道的时域信号及该第二声道的时域信号对应于同一时段,并根据该基准参数和极限值Tmax,确定搜索范围,其中,该极限值Tmax是根据该第一声道的时域信号的采样率确定的,该搜索范围属于[-Tmax,0],或该搜索范围属于[0,Tmax];处理单元,用于基于该第一声道的频域信号及该第二声道的频域信号,根据该基准参数,进行搜索处理,以确定与该第一声道及该第二声道相对应的第一声道间时间差ITD参数。
结合第二方面,在第二方面的第一种实现方式中,该确定单元具体用于对该第一声道的时域信号及该第二声道的时域信号进行互相关处理,以确定第一互相关处理值及第二互相关处理值,并根据该第一互相关处理值及该第二互相关处理值之间的大小关系,确定该基准参数,其中,该第一互相关处理值是该第一声道的时域信号相对于该第二声道的时域信号的互相关函数在预设范围内的最大函数值,该第二互相关处理值是该第二声道的时域信号相对于该第一声道的时域信号的互相关函数在该预设范围内的最大函数值。
结合第二方面及其上述实现方式,在第二方面的第二种实现方式中,该确定单元具体用于将该第一互相关处理值及该第二互相关处理值中较大一方所对应的索引值或者所述索引值的相反数确定为该基准参数。
结合第二方面及其上述实现方式,在第二方面的第三种实现方式中,该确定单元具体用于对该第一声道的时域信号及该第二声道的时域信号进行峰值检测处理,以确定第一索引值及第二索引值,并根据该第一索引值与该第二索引值之间的大小关系,确定该基准参数,其中,该第一索引值是与该第一声道的时域信号在预设范围内的最大幅度值相对应的索引值,该第二索引值是与该第二声道的时域信号在该预设范围内的最大幅度值相对应的索引值。
结合第二方面及其上述实现方式,在第二方面的第四种实现方式中,该处理单元还用于基于第二ITD参数,对该第一ITD参数进行平滑处理,其中,该第一ITD参数是第一时段的ITD参数,该第二ITD参数是第二时段的ITD参数的平滑值,该第二时段处于该第一时段之前。
根据本发明实施例的声道间时间差参数的方法和装置,通过在时域上确定与第一声道的时域信号及第二声道的时域信号之间的获取顺序相对应的 基准参数,能够基于该基准参数,确定搜索范围,并在该搜索范围内从在频域上进行针对该第一声道的频域信号及该第二声道的频域信号的搜索处理,以确定该第一声道及该第二声道相对应的声道间时间差ITD参数,本发明实施例中根据基准参数确定的搜索范围属于[-Tmax,0]或[0,Tmax],小于现有技术中的搜索范围[-Tmax,Tmax],从而能够降低声道间时间差ITD参数的搜索计算量,降低了对编码端的性能要求,提高了编码端的处理效率。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对本发明实施例中所需要使用的附图作简单地介绍,显而易见地,下面所描述的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是根据本发明实施例的确定声道间时间差参数的方法的示意性流程图。
图2是根据本发明一实施例的搜索范围确定过程的示意图。
图3是根据本发明另一实施例的确定搜索范围确定过程的示意图。
图4是根据本发明再一实施例的确定搜索范围确定过程的示意图。
图5是根据本发明实施例的确定声道间时间差参数的装置的示意性框图。
图6是根据本发明实施例的确定声道间时间差参数的设备的示意性结构图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
图1是示出了本发明实施例的确定声道间时间差参数的方法100的示意性流程图,该方法100的执行主体可以是传输音频信号的编码端设备(也可以称为,发送端设备),如图1所示,该方法100包括:
S110,根据第一声道的时域信号及第二声道的时域信号,确定基准参数, 该基准参数对应于该第一声道的时域信号与该第二声道的时域信号之间的获取顺序,其中,该第一声道的时域信号及该第二声道的时域信号对应于同一时段;
S120,根据该基准参数和极限值Tmax,确定搜索范围,其中,该极限值Tmax是根据该第一声道的时域信号的采样率确定的,该搜索范围属于[-Tmax,0],或该搜索范围属于[0,Tmax];
S130,基于该第一声道的频域信号及该第二声道的频域信号,在该搜索范围内进行搜索处理,以确定与该第一声道及该第二声道相对应的第一声道间时间差ITD参数。
本发明实施例的确定声道间时间差参数的方法100可以应用于具有至少两个声道的音频系统,在该音频系统中,通过来自至少两个声道(即,包括第一声道和第二声道)的单声道信号合成立体声信号,例如,通过来自左声道(即,第一声道的一例)的单声道信号和来自右声道(即,第二声道的一例)的单声道信号合成立体声信号。
其中,作为传输该立体声信号的方法,可以列举参数立体声(PS)技术,该技术根据空间感知特性,编码端将立体声信号转换为单声道信号和空间感知参数,并分别进行编码,解码端得到单声道音频后,进一步根据空间参数恢复立体声信号。该技术能够实现立体声信号的低比特高质量传输。声道间时间差ITD(ITD,Inter-Channel Time Difference)参数是表示声源水平方位的空间参数,是空间参数的重要组成部分,本发明实施例主要涉及该ITD参数的确定过程。另外,在本发明实施例中,根据ITD参数对立体声信号和单声道信号进行编解码的过程与现有技术相似,这里为了避免赘述,省略其详细说明。
应理解,以上列举的音频系统所具有的声道数量仅为示例性说明,本发明并未限定于此,例如,该音频系统也可以具有三个或三个以上的声道,并且,能够通过任意两个声道的单声道信号合成立体声信号。以下,为了便于理解,以将该方法100应用于具有两个声道(即,左声道和右声道)的音频系统使的处理过程为例,进行说明,并且,为了便于区分,以左声道作为第一声道,以右声道作为第二声道,进行说明。
具体地说,在S110,编码端设备可以通过例如,与左声道相对应的麦克风等音频输入设备获取与左声道相对应的音频信号,并根据预设的采样率α (即,第一声道的时域信号的采样率的一例),对该音频信号进行采样处理,以生成左声道的时域信号(即,第一声道的时域信号的一例,以下,为了便于理解和区分,记做时域信号#L)。并且,在本发明实施例中,该获取时域信号#L的过程可以与现有技术相似,这里,为了避免赘述,省略其详细说明。
在本发明实施例中,第一声道的时域信号的采样率与第二声道的时域信号的采样率相同,因此,类似地,编码端设备可以通过例如,与右声道相对应的麦克风等音频输入设备获取与右声道相对应的音频信号,并根据上述采样率α,对该音频信号进行采样处理,以生成右声道的时域信号(即,第二声道的时域信号的一例,以下,为了便于理解和区分,记做时域信号#R)。
需要说明的是,在本发明实施例中,时域信号#L与时域信号#R是对应同一时段的时域信号(或者说,在同一时段内获取的时域信号),例如,该时域信号#L与时域信号#R可以是对应同一帧(即,20ms)的时域信号,此情况下,基于时域信号#L与时域信号#R能够获得与该一帧信号相对应的一个ITD参数。
再例如,该时域信号#L与时域信号#R也可以是对应同一帧内的同一子帧(即,10ms或5ms等)的时域信号,此情况下,基于时域信号#L与时域信号#R能够获得与该一帧信号相对应的多个ITD参数,例如,如果该时域信号#L与时域信号#R所对应的子帧为10ms,则通过该一帧(即,20ms)信号能够获得两个ITD参数。再例如,如果该时域信号#L与时域信号#R所对应的子帧为5ms,则通过该一帧(即,20ms)信号能够获得四个ITD参数。
应理解,以上列举的时域信号#L与时域信号#R所对应的时段的长度仅为示例性说明,本发明并未限定于此,可以根据需要任意变更该时段的长度。
其后,编码端设备可以根据该时域信号#L和时域信号#R,确定基准参数。其中,该基准参数可以与该时域信号#L和时域信号#R获取顺序(例如,输入至上述音频输入设备的先后顺序)相对应,随后,结合该基准参数的确定过程,对该对应关系进行详细说明。
在本发明实施例中,可以通过对时域信号#L和时域信号#R进行互相关处理来确定该基准参数(即,方式1),也可以通过搜索时域信号#L和时域信号#R的幅度最大值来确定该基准参数(即,方式2),下面,分别对该方式1和方式2进行详细说明。
方式1
可选地,该根据第一声道的时域信号及第二声道的时域信号,确定基准参数,包括:
对该第一声道的时域信号及该第二声道的时域信号进行互相关处理,以确定第一互相关处理值及第二互相关处理值,其中,该第一互相关处理值是该第一声道的时域信号相对于该第二声道的时域信号的互相关函数在预设范围内的最大函数值,该第二互相关处理值是该第二声道的时域信号相对于该第一声道的时域信号的互相关函数在该预设范围内的最大函数值;
根据该第一互相关处理值及该第二互相关处理值之间的大小关系,确定该基准参数。
具体地说,在本发明实施例中,编码端设备可以根据以下式1确定时域信号#L相对于时域信号#R的互相关函数cn(i),即:
Figure PCTCN2015095097-appb-000001
  式1
其中,Tmax表示ITD参数的极限值(或者说,时域信号#L与时域信号#R之间的获取时间差的最大值)可以根据上述采样率α确定,并且,其确定方法可以与现有技术相似,这里为了避免赘述,省略其详细说明。xR(j)表示时域信号#R在第j个采样点处的信号值,xL(j+i)表示时域信号#L在第j+i个采样点处的信号值,Length表示时域信号#R包括的采样点的总数量,或者说,时域信号#R的长度,例如,可以为一个帧的长度(即,20ms)或一个子帧的长度(例如,10ms或5ms等)。
并且,编码端设备可以确定该互相关函数cn(i)的最大值
Figure PCTCN2015095097-appb-000002
类似地,编码端设备可以根据以下式2确定时域信号#R相对于时域信号#L的互相关函数cp(i),即:
Figure PCTCN2015095097-appb-000003
  式2
并且,编码端设备可以确定该互相关函数cp(i)的最大值
Figure PCTCN2015095097-appb-000004
在本发明实施例中,编码端设备可以根据
Figure PCTCN2015095097-appb-000005
Figure PCTCN2015095097-appb-000006
之间的关系,通过以下方式1A或方式1B确定基准参数的值。
方式1A
如图2所示,如果
Figure PCTCN2015095097-appb-000007
则编码端设备可以确定时域信号#L是先于时域信号#R获取的,即,左右声道之间的ITD参数为正数,此情况下,可以将基准参数T置为1。
从而,在S120的判定过程中,编码端设备可以判定该基准参数大于0,从而确定搜索范围为[0,Tmax],即,当时域信号#L是先于时域信号#R获取时,ITD参数为正数,搜索范围为[0,Tmax](即,搜索范围属于[0,Tmax]的一例)。
或者,如果
Figure PCTCN2015095097-appb-000008
则编码端设备可以确定时域信号#L是后于时域信号#R获取的,即,左右声道之间的ITD参数为负数,此情况下,可以将基准参数T置为0。
从而,在S120的判定过程中,编码端设备可以判定该基准参数不大于0,从而确定搜索范围为[-Tmax,0],即,当时域信号#L是后于时域信号#R获取时,ITD参数为负数,搜索范围为[-Tmax,0](即,搜索范围属于[-Tmax,0]的一例)。
方式1B
可选地,该基准参数是该第一互相关处理值及该第二互相关处理值中较大一方所对应的索引值或者索引值的相反数。
具体地说,如图3所示,如果
Figure PCTCN2015095097-appb-000009
则编码端设备可以确定时域信号#L是先于时域信号#R获取的,即,左右声道之间的ITD参数为正数,此情况下,可以将基准参数T置为
Figure PCTCN2015095097-appb-000010
所对应的索引值。
从而,在其后的判定过程中,编码端设备在判定基准参数T大于0之后,可以进一步判定该基准参数T是否大于或等于Tmax/2,并根据判定结果确定搜索范围,例如,当T≥Tmax/2时,搜索范围为[Tmax/2,Tmax](即,搜索范围属于[0,Tmax]的一例)。当T<Tmax/2时,搜索范围为[0,Tmax/2](即,搜索范围属于[0,Tmax]的另一例)。
或者,如果
Figure PCTCN2015095097-appb-000011
则编码端设备可以确定时域信号#L是后于时域信号#R获取的,即,左右声道之间的ITD参数为负数,此情况下,可以将基准参数T置为
Figure PCTCN2015095097-appb-000012
所对应的索引值的相反数。
从而,在S120的判定过程中,编码端设备在判定基准参数T小于或等于0之后,可以进一步判定该基准参数T是否小于于或等于-Tmax/2,并根据判定结果确定搜索范围,例如,当T≤-Tmax/2时,搜索范围为[-Tmax,-Tmax/2](即,搜索范围属于[-Tmax,0]的一例)。当T>-Tmax/2时,搜索范围为[-Tmax/2,0](即,搜索范围属于[-Tmax,0]的另一例)。
方式2
可选地,该根据第一声道的时域信号及第二声道的时域信号,确定基准参数,包括:
对该第一声道的时域信号及该第二声道的时域信号进行峰值检测处理,以确定第一索引值及第二索引值,其中,该第一索引值是与该第一声道的时域信号在预设范围内的最大幅度值相对应的索引值,该第二索引值是与该第二声道的时域信号在该预设范围内的最大幅度值相对应的索引值;
根据该第一索引值与该第二索引值之间的大小关系,确定该基准参数。
具体地说,在本发明实施例中,编码端设备可以检测时域信号#L的幅度值(记做:L(j))最大值max(L(j)),j∈[0,Length-1],并记录该max(L(j))所对应的索引值pleft,其中,Length表示时域信号#L包括的采样点的总数量。
并且,编码端设备可以检测时域信号#R的幅度值(记做:R(j))最大值max(R(j)),j∈[0,Length-1],并记录该max(R(j))所对应的索引值pright,其中,Length表示时域信号#R包括的采样点的总数量。
其后,编码端设备可以判定pleft与pright之间的大小关系。
如图4所示,如果pleft≥pright,则编码端设备可以确定时域信号#L是先于时域信号#R获取的,即,左右声道之间的ITD参数为正数,此情况下, 可以将基准参数T置为1。
从而,在S120的判定过程中,编码端设备可以判定该基准参数大于0,从而确定搜索范围为[0,Tmax],即,当时域信号#L是先于时域信号#R获取时,ITD参数为正数,搜索范围为[0,Tmax](即,搜索范围属于[0,Tmax]的一例)。
或者,如果pleft<pright,则编码端设备可以确定时域信号#L是后于时域信号#R获取的,即,左右声道之间的ITD参数为负数,此情况下,可以将基准参数T置为0。
从而,在S120的判定过程中,编码端设备可以判定该基准参数不大于0,从而确定搜索范围为[-Tmax,0],即,当时域信号#L是后于时域信号#R获取时,ITD参数为负数,搜索范围为[-Tmax,0](即,搜索范围属于[-Tmax,0]的一例)。
在S130,编码端设备可以对时域信号#L进行时频变换处理以获得左声道的频域信号(即,第一声道的频域信号的一例,以下,为了便于理解和区分,记做频域信号#L)。可以对时域信号#R进行时频变换处理以获得右声道的频域信号(即,第二声道的频域信号的一例,以下,为了便于理解和区分,记做频域信号#R)
例如,在本发明实施例中,可以采用快速傅氏变换(FFT,Fast Fourier Transformation)技术,基于以下式3,进行时频变换处理。
Figure PCTCN2015095097-appb-000013
  式3
其中,X(k)表示频域信号,FFT_LENGTH表示时频变换长度。x(n)表示时域信号(即,时域信号#L或时域信号#R),Length表示时域信号包括的采样点的总数量。
应理解,以上列举的时频变换处理的过程仅为示例性说明,本发明并不限定于此,该视频变换处理的方法和过程可以与现有技术相似,例如,还可以采用修正离散余弦变换(MDCT,Modified Discrete Cosine Transform)等技术。
从而,编码端设备可以在如上所述确定的搜索范围内,对如上所述确定 的频域信号#L和频域信号#R进行搜索处理,以确定左声道与右声道之间的ITD参数,例如,可以列举以下搜索处理的过程:
首先,编码端设备可以根据预设的带宽A,将频域信号的FFT_LENGTH个频点划分为Nsubband个(例如,1个)子带,其中,对于第k个子带Ak,其包含的频点为Ak-1≤b≤Ak-1,
在上述搜索范围内,根据以下式4计算频域信号#L的相关函数mag(j)
Figure PCTCN2015095097-appb-000014
  式4
其中,XL(b)表示频域信号#L在第b个频点的信号值,XR(b)表示频域信号#R在第b个频点的信号值,FFT_LENGTH表示时频变换长度,j的取值范围是如上所述确定的搜索范围,为了便于理解和说明,将该搜索范围记做[a,b]。
则第k个子带的ITD参数值为
Figure PCTCN2015095097-appb-000015
即mag(j)的最大值对应的索引值。
由此,可以得到左声道与右声道之间的一个或多个(根据如上所述确定的子带的数量相对应)ITD参数值。
其后,编码端设备还可以对上述ITD参数值进行量化处理等,并将处理后的ITD参数值以及对左右声道的信号进行例如下混频等处理得到的单声道信号发送给解码端设备(或者说,接收端设备)。
解码端设备可以根据单声道音频信号和ITD参数值,恢复出立体声音频信号。
可选地,该方法还包括:
基于第二ITD参数,对该第一ITD参数进行平滑处理,其中,该第一ITD参数是第一时段的ITD参数,该第二ITD参数是第二时段的ITD参数的平滑值,该第二时段处于该第一时段之前。
具体地说,在本发明实施例中,在对ITD参数值进行量化处理等之前,编码端设备还可以对如上所述或缺的ITD参数值进行平滑处理,作为示例而非限定,编码端设备可以根据以下式5进行该平滑处理:
Tsm(k)=w1*Tsm [-1](k)+w2*T(k)  式5
其中,Tsm(k)表示第k个帧或第k个子帧所对应的平滑处理后的ITD参数值,Tsm [-1]表示第k-1个帧或第k-1个子帧所对应的平滑处理后的ITD参数值,T(k)表示第k个帧或第k个子帧所对应的未经平滑处理的ITD参数值,w1、w2为平滑因子,w1、w2可以设置为常数,或者w1、w2也可以根据Tsm [-1]和T(k)的差值设置,只要满足w1+w2=1即可。另外,当k=1时,Tsm [-1]可以为预设的数值。
需要说明的是,在本发明实施例的确定声道间时间差参数的方法中,上述平滑处理可以由编码端设备执行,也可以由解码端设备执行,本发明并未特别限定,即,编码端设备也可以不进行上述平滑处理而将如上所述获得的ITD参数值直接发送给解码端设备,并由解码端设备对该ITD参数值进行平滑处理,并且,该解码端设备所进行的平滑处理的方法和过程可以与上述解码端设备所进行的平滑处理的方法和过程相似,这里,为了避免赘述,省略其详细说明。
根据本发明实施例的确定声道间时间差参数的方法,通过在时域上确定与第一声道的时域信号及第二声道的时域信号之间的获取顺序相对应的基准参数,能够基于该基准参数,确定搜索范围,并在该搜索范围内从在频域上进行针对该第一声道的频域信号及该第二声道的频域信号的搜索处理,以确定该第一声道及该第二声道相对应的声道间时间差ITD参数,本发明实施例中根据基准参数确定的搜索范围属于[-Tmax,0]或[0,Tmax],小于现有技术中的搜索范围[-Tmax,Tmax],从而能够降低声道间时间差ITD参数的搜索计算量,降低了对编码端的性能要求,提高了编码端的处理效率。
上文中,结合图1至图4,详细描述了根据本发明实施例的确定声道间时间差参数的方法,下面,将结合图5,详细描述根据本发明实施例的根据本发明实施例的确定声道间时间差参数的装置。
图5示出了根据本发明实施例的确定声道间时间差参数的装置200的示意性框图。如图5所示,该装置200包括:
确定单元210,用于根据第一声道的时域信号及第二声道的时域信号,确定基准参数,该基准参数对应于该第一声道的时域信号与该第二声道的时域信号之间的获取顺序,其中,该第一声道的时域信号及该第二声道的时域信号对应于同一时段,并根据该基准参数和极限值Tmax,确定搜索范围,其中,该极限值Tmax是根据该第一声道的时域信号的采样率确定的,该搜索范 围属于[-Tmax,0],或该搜索范围属于[0,Tmax];
处理单元220,用于基于该第一声道的频域信号及该第二声道的频域信号,根据该基准参数,进行搜索处理,以确定与该第一声道及该第二声道相对应的第一声道间时间差ITD参数。
可选地,该确定单元210具体用于对该第一声道的时域信号及该第二声道的时域信号进行互相关处理,以确定第一互相关处理值及第二互相关处理值,并根据该第一互相关处理值及该第二互相关处理值之间的大小关系,确定该基准参数,其中,该第一互相关处理值是该第一声道的时域信号相对于该第二声道的时域信号的互相关函数在预设范围内的最大函数值,该第二互相关处理值是该第二声道的时域信号相对于该第一声道的时域信号的互相关函数在该预设范围内的最大函数值。
可选地,该确定单元210具体用于将该第一互相关处理值及该第二互相关处理值中较大一方所对应的索引值或者该索引值的相反数确定为该基准参数。
可选地,该确定单元210具体用于对该第一声道的时域信号及该第二声道的时域信号进行峰值检测处理,以确定第一索引值及第二索引值,并根据该第一索引值与该第二索引值之间的大小关系,确定该基准参数,其中,该第一索引值是与该第一声道的时域信号在预设范围内的最大幅度值相对应的索引值,该第二索引值是与该第二声道的时域信号在该预设范围内的最大幅度值相对应的索引值。
可选地,该处理单元220还用于基于第二ITD参数,对该第一ITD参数进行平滑处理,其中,该第一ITD参数是第一时段的ITD参数,该第二ITD参数是第二时段的ITD参数的平滑值,该第二时段处于该第一时段之前。
根据本发明实施例的确定声道间时间差参数的装置200作为本发明实施例的确定声道间时间差参数的方法100的实施主体,可对应于本发明实施例的方法中的编码端设备,并且,该确定声道间时间差参数的装置200中的各单元及模块和上述其他操作和/或功能分别为了实现图1中的方法100的相应流程,为了简洁,在此不再赘述。
根据本发明实施例的确定声道间时间差参数的装置,通过在时域上确定与第一声道的时域信号及第二声道的时域信号之间的获取顺序相对应的基准参数,能够基于该基准参数,确定搜索范围,并在该搜索范围内从在频域 上进行针对该第一声道的频域信号及该第二声道的频域信号的搜索处理,以确定该第一声道及该第二声道相对应的声道间时间差ITD参数,本发明实施例中根据基准参数确定的搜索范围属于[-Tmax,0]或[0,Tmax],小于现有技术中的搜索范围[-Tmax,Tmax],从而能够降低声道间时间差ITD参数的搜索计算量,降低了对编码端的性能要求,提高了编码端的处理效率。
上文中,结合图1至图4,详细描述了根据本发明实施例的确定声道间时间差参数的方法,下面,将结合图6,详细描述根据本发明实施例的确定声道间时间差参数的设备。
图6示出了根据本发明实施例的确定声道间时间差参数的设备300的示意性框图。如图6所示,该设备300可以包括:
总线310;
与该总线相连的处理器320;
与该总线相连的存储器330;
其中,该处理器320通过该总线310,调用该存储器330中存储的程序,以用于根据第一声道的时域信号及第二声道的时域信号,确定基准参数,该基准参数对应于该第一声道的时域信号与该第二声道的时域信号之间的获取顺序,其中,该第一声道的时域信号及该第二声道的时域信号对应于同一时段;
用于根据该基准参数和极限值Tmax,确定搜索范围,其中,该极限值Tmax是根据该第一声道的时域信号的采样率确定的,该搜索范围属于[-Tmax,0],或该搜索范围属于[0,Tmax];
用于基于该第一声道的频域信号及该第二声道的频域信号,在该搜索范围内进行搜索处理,以确定与该第一声道及该第二声道相对应的第一声道间时间差ITD参数。
可选地,该处理器320具体用于对该第一声道的时域信号及该第二声道的时域信号进行互相关处理,以确定第一互相关处理值及第二互相关处理值,其中,该第一互相关处理值是该第一声道的时域信号相对于该第二声道的时域信号的互相关函数在预设范围内的最大函数值,该第二互相关处理值是该第二声道的时域信号相对于该第一声道的时域信号的互相关函数在该预设范围内的最大函数值;
用于根据该第一互相关处理值及该第二互相关处理值之间的大小关系, 确定该基准参数。
可选地,该基准参数是该第一互相关处理值及该第二互相关处理值中较大一方所对应的索引值或者该索引值的相反数。
可选地,该处理器320具体用于对该第一声道的时域信号及该第二声道的时域信号进行峰值检测处理,以确定第一索引值及第二索引值,其中,该第一索引值是与该第一声道的时域信号在预设范围内的最大幅度值相对应的索引值,该第二索引值是与该第二声道的时域信号在该预设范围内的最大幅度值相对应的索引值;
用于根据该第一索引值与该第二索引值之间的大小关系,确定该基准参数。
可选地,该处理器320还用于基于第二ITD参数,对该第一ITD参数进行平滑处理,其中,该第一ITD参数是第一时段的ITD参数,该第二ITD参数是第二时段的ITD参数的平滑值,该第二时段处于该第一时段之前。
在本发明实施例中,设备300的各个组件通过总线310耦合在一起,其中,总线310除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚明起见,在图中将各种总线都标为总线310。
处理器320可以实现或者执行本发明方法实施例中的公开的各步骤及逻辑框图。处理器320可以是微处理器或者该处理器也可以是任何常规的处理器,解码器等。结合本发明实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用解码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器330,处理器读取存储器330中的信息,结合其硬件完成上述方法的步骤。
应理解,在本发明实施例中,该处理器320可以是中央处理单元(Central Processing Unit,简称为“CPU”),该处理器320还可以是其他通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
该存储器330可以包括只读存储器和随机存取存储器,并向处理器320提供指令和数据。存储器330的一部分还可以包括非易失性随机存取存储器。 例如,存储器330还可以存储设备类型的信息。
在实现过程中,上述方法的各步骤可以通过处理器320中的硬件的集成逻辑电路或者软件形式的指令完成。结合本发明实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。
根据本发明实施例的确定声道间时间差参数的设备300作为本发明实施例的确定声道间时间差参数的方法100的实施主体,可对应于本发明实施例的方法中的编码端设备,并且,该确定声道间时间差参数的设备300中的各单元及模块和上述其他操作和/或功能分别为了实现图1中的方法100的相应流程,为了简洁,在此不再赘述。
根据本发明实施例的确定声道间时间差参数的设备,通过在时域上确定与第一声道的时域信号及第二声道的时域信号之间的获取顺序相对应的基准参数,能够基于该基准参数,确定搜索范围,并在该搜索范围内从在频域上进行针对该第一声道的频域信号及该第二声道的频域信号的搜索处理,以确定该第一声道及该第二声道相对应的声道间时间差ITD参数,本发明实施例中根据基准参数确定的搜索范围属于[-Tmax,0]或[0,Tmax],小于现有技术中的搜索范围[-Tmax,Tmax],从而能够降低声道间时间差ITD参数的搜索计算量,降低了对编码端的性能要求,提高了编码端的处理效率。应理解,在本发明的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和 方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。

Claims (10)

  1. 一种确定声道间时间差参数的方法,其特征在于,所述方法包括:
    根据第一声道的时域信号及第二声道的时域信号,确定基准参数,所述基准参数对应于所述第一声道的时域信号与所述第二声道的时域信号之间的获取顺序,其中,所述第一声道的时域信号及所述第二声道的时域信号对应于同一时段;
    根据所述基准参数和极限值Tmax,确定搜索范围,其中,所述极限值Tmax是根据所述第一声道的时域信号的采样率确定的,所述搜索范围属于[-Tmax,0],或所述搜索范围属于[0,Tmax];
    基于所述第一声道的频域信号及所述第二声道的频域信号,在所述搜索范围内进行搜索处理,以确定与所述第一声道及所述第二声道相对应的第一声道间时间差ITD参数。
  2. 根据权利要求1所述的方法,其特征在于,所述根据第一声道的时域信号及第二声道的时域信号,确定基准参数,包括:
    对所述第一声道的时域信号及所述第二声道的时域信号进行互相关处理,以确定第一互相关处理值及第二互相关处理值,其中,所述第一互相关处理值是所述第一声道的时域信号相对于所述第二声道的时域信号的互相关函数在预设范围内的最大函数值,所述第二互相关处理值是所述第二声道的时域信号相对于所述第一声道的时域信号的互相关函数在所述预设范围内的最大函数值;
    根据所述第一互相关处理值及所述第二互相关处理值之间的大小关系,确定所述基准参数。
  3. 根据权利要求2所述的方法,其特征在于,所述基准参数是所述第一互相关处理值及所述第二互相关处理值中较大一方所对应的索引值或者所述索引值的相反数。
  4. 根据权利要求1所述的方法,其特征在于,所述根据第一声道的时域信号及第二声道的时域信号,确定基准参数,包括:
    对所述第一声道的时域信号及所述第二声道的时域信号进行峰值检测处理,以确定第一索引值及第二索引值,其中,所述第一索引值是与所述第一声道的时域信号在预设范围内的最大幅度值相对应的索引值,所述第二索引值是与所述第二声道的时域信号在所述预设范围内的最大幅度值相对应 的索引值;
    根据所述第一索引值与所述第二索引值之间的大小关系,确定所述基准参数。
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述方法还包括:
    基于第二ITD参数,对所述第一ITD参数进行平滑处理,其中,所述第一ITD参数是第一时段的ITD参数,所述第二ITD参数是第二时段的ITD参数的平滑值,所述第二时段处于所述第一时段之前。
  6. 一种确定声道间时间差参数的装置,其特征在于,所述装置包括:
    确定单元,用于根据第一声道的时域信号及第二声道的时域信号,确定基准参数,所述基准参数对应于所述第一声道的时域信号与所述第二声道的时域信号之间的获取顺序,其中,所述第一声道的时域信号及所述第二声道的时域信号对应于同一时段,并根据所述基准参数和极限值Tmax,确定搜索范围,其中,所述极限值Tmax是根据所述第一声道的时域信号的采样率确定的,所述搜索范围属于[-Tmax,0],或所述搜索范围属于[0,Tmax];
    处理单元,用于基于所述第一声道的频域信号及所述第二声道的频域信号,根据所述基准参数,进行搜索处理,以确定与所述第一声道及所述第二声道相对应的第一声道间时间差ITD参数。
  7. 根据权利要求6所述的装置,其特征在于,所述确定单元具体用于对所述第一声道的时域信号及所述第二声道的时域信号进行互相关处理,以确定第一互相关处理值及第二互相关处理值,并根据所述第一互相关处理值及所述第二互相关处理值之间的大小关系,确定所述基准参数,其中,所述第一互相关处理值是所述第一声道的时域信号相对于所述第二声道的时域信号的互相关函数在预设范围内的最大函数值,所述第二互相关处理值是所述第二声道的时域信号相对于所述第一声道的时域信号的互相关函数在所述预设范围内的最大函数值。
  8. 根据权利要求7所述的装置,其特征在于,所述确定单元具体用于将所述第一互相关处理值及所述第二互相关处理值中较大一方所对应的索引值或者所述索引值的相反数确定为所述基准参数。
  9. 根据权利要求6所述的装置,其特征在于,所述确定单元具体用于对所述第一声道的时域信号及所述第二声道的时域信号进行峰值检测处理, 以确定第一索引值及第二索引值,并根据所述第一索引值与所述第二索引值之间的大小关系,确定所述基准参数,其中,所述第一索引值是与所述第一声道的时域信号在预设范围内的最大幅度值相对应的索引值,所述第二索引值是与所述第二声道的时域信号在所述预设范围内的最大幅度值相对应的索引值。
  10. 根据权利要求6至9中任一项所述的装置,其特征在于,所述处理单元还用于基于第二ITD参数,对所述第一ITD参数进行平滑处理,其中,所述第一ITD参数是第一时段的ITD参数,所述第二ITD参数是第二时段的ITD参数的平滑值,所述第二时段处于所述第一时段之前。
PCT/CN2015/095097 2015-03-09 2015-11-20 确定声道间时间差参数的方法和装置 Ceased WO2016141732A1 (zh)

Priority Applications (10)

Application Number Priority Date Filing Date Title
SG11201706998QA SG11201706998QA (en) 2015-03-09 2015-11-20 Method and apparatus for determining inter-channel time difference parameter
EP15884410.0A EP3252756B1 (en) 2015-03-09 2015-11-20 Method and device for determining inter-channel time difference parameter
JP2017547541A JP6487569B2 (ja) 2015-03-09 2015-11-20 チャネル間時間差パラメータを決定するための方法および装置
BR112017018600-4A BR112017018600A2 (zh) 2015-03-09 2015-11-20 Method and apparatus for determining the time difference between the channel parameters
MX2017011460A MX365619B (es) 2015-03-09 2015-11-20 Metodos y aparato para determinar el parametro de diferencia de tiempo inter-canal.
AU2015385490A AU2015385490B2 (en) 2015-03-09 2015-11-20 Method and apparatus for determining inter-channel time difference parameter
CA2977846A CA2977846A1 (en) 2015-03-09 2015-11-20 Method and apparatus for determining inter-channel time difference parameter
RU2017135269A RU2670843C9 (ru) 2015-03-09 2015-11-20 Способ и устройство для определения параметра межканальной временной разности
KR1020177026484A KR20170120645A (ko) 2015-03-09 2015-11-20 채널 간 시간차 파라미터를 결정하기 위한 방법 및 디바이스
US15/698,107 US10210873B2 (en) 2015-03-09 2017-09-07 Method and apparatus for determining inter-channel time difference parameter

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510101315.X 2015-03-09
CN201510101315.XA CN106033671B (zh) 2015-03-09 2015-03-09 确定声道间时间差参数的方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/698,107 Continuation US10210873B2 (en) 2015-03-09 2017-09-07 Method and apparatus for determining inter-channel time difference parameter

Publications (1)

Publication Number Publication Date
WO2016141732A1 true WO2016141732A1 (zh) 2016-09-15

Family

ID=56879923

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/095097 Ceased WO2016141732A1 (zh) 2015-03-09 2015-11-20 确定声道间时间差参数的方法和装置

Country Status (12)

Country Link
US (1) US10210873B2 (zh)
EP (1) EP3252756B1 (zh)
JP (1) JP6487569B2 (zh)
KR (1) KR20170120645A (zh)
CN (1) CN106033671B (zh)
AU (1) AU2015385490B2 (zh)
BR (1) BR112017018600A2 (zh)
CA (1) CA2977846A1 (zh)
MX (1) MX365619B (zh)
RU (1) RU2670843C9 (zh)
SG (1) SG11201706998QA (zh)
WO (1) WO2016141732A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3255632A4 (en) * 2015-03-09 2017-12-13 Huawei Technologies Co. Ltd. Method and apparatus for determining time difference parameter among sound channels
TWI666630B (zh) * 2017-06-29 2019-07-21 大陸商華為技術有限公司 時延估計方法及裝置

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108877815B (zh) 2017-05-16 2021-02-23 华为技术有限公司 一种立体声信号处理方法及装置
EP3985665B1 (en) * 2018-04-05 2024-08-21 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus, method or computer program for estimating an inter-channel time difference
KR102596885B1 (ko) 2018-08-24 2023-10-31 주식회사 엘지에너지솔루션 리튬 이차 전지용 양극 활물질, 이의 제조 방법, 및 이를 포함하는 리튬 이차 전지

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007052612A1 (ja) * 2005-10-31 2007-05-10 Matsushita Electric Industrial Co., Ltd. ステレオ符号化装置およびステレオ信号予測方法
CN101673549A (zh) * 2009-09-28 2010-03-17 武汉大学 一种移动音源空间音频参数预测编解码方法及系统
WO2010142320A1 (en) * 2009-06-08 2010-12-16 Nokia Corporation Audio processing
CN103339670A (zh) * 2011-02-03 2013-10-02 瑞典爱立信有限公司 确定多通道音频信号的通道间时间差
CN103403800A (zh) * 2011-02-02 2013-11-20 瑞典爱立信有限公司 确定多声道音频信号的声道间时间差

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2002309146A1 (en) * 2002-06-14 2003-12-31 Nokia Corporation Enhanced error concealment for spatial audio
US7930184B2 (en) 2004-08-04 2011-04-19 Dts, Inc. Multi-channel audio coding/decoding of random access points and transients
EP1691348A1 (en) * 2005-02-14 2006-08-16 Ecole Polytechnique Federale De Lausanne Parametric joint-coding of audio sources
US7411528B2 (en) * 2005-07-11 2008-08-12 Lg Electronics Co., Ltd. Apparatus and method of processing an audio signal
TW200945098A (en) * 2008-02-26 2009-11-01 Koninkl Philips Electronics Nv Method of embedding data in stereo image
US20110206223A1 (en) * 2008-10-03 2011-08-25 Pasi Ojala Apparatus for Binaural Audio Coding
US8463414B2 (en) * 2010-08-09 2013-06-11 Motorola Mobility Llc Method and apparatus for estimating a parameter for low bit rate stereo transmission
EP2612321B1 (en) 2010-09-28 2016-01-06 Huawei Technologies Co., Ltd. Device and method for postprocessing decoded multi-channel audio signal or decoded stereo signal
CN102582688A (zh) 2012-02-16 2012-07-18 中联重科股份有限公司 车辆中回结构和工程车辆
JP5724044B2 (ja) * 2012-02-17 2015-05-27 華為技術有限公司Huawei Technologies Co.,Ltd. 多重チャネル・オーディオ信号の符号化のためのパラメトリック型符号化装置
WO2013149671A1 (en) * 2012-04-05 2013-10-10 Huawei Technologies Co., Ltd. Multi-channel audio encoder and method for encoding a multi-channel audio signal
JP5947971B2 (ja) * 2012-04-05 2016-07-06 華為技術有限公司Huawei Technologies Co.,Ltd. マルチチャネルオーディオ信号の符号化パラメータを決定する方法及びマルチチャネルオーディオエンコーダ
EP2989631A4 (en) * 2013-04-26 2016-12-21 Nokia Technologies Oy AUDIO SIGNAL ENCODER
CN104168241B (zh) * 2013-05-16 2017-10-17 华为技术有限公司 多输入输出正交频分复用通信系统及信号补偿方法
CN106033672B (zh) * 2015-03-09 2021-04-09 华为技术有限公司 确定声道间时间差参数的方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007052612A1 (ja) * 2005-10-31 2007-05-10 Matsushita Electric Industrial Co., Ltd. ステレオ符号化装置およびステレオ信号予測方法
WO2010142320A1 (en) * 2009-06-08 2010-12-16 Nokia Corporation Audio processing
CN101673549A (zh) * 2009-09-28 2010-03-17 武汉大学 一种移动音源空间音频参数预测编解码方法及系统
CN103403800A (zh) * 2011-02-02 2013-11-20 瑞典爱立信有限公司 确定多声道音频信号的声道间时间差
CN103339670A (zh) * 2011-02-03 2013-10-02 瑞典爱立信有限公司 确定多通道音频信号的通道间时间差

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3255632A4 (en) * 2015-03-09 2017-12-13 Huawei Technologies Co. Ltd. Method and apparatus for determining time difference parameter among sound channels
US10388288B2 (en) 2015-03-09 2019-08-20 Huawei Technologies Co., Ltd. Method and apparatus for determining inter-channel time difference parameter
TWI666630B (zh) * 2017-06-29 2019-07-21 大陸商華為技術有限公司 時延估計方法及裝置
US11304019B2 (en) 2017-06-29 2022-04-12 Huawei Technologies Co., Ltd. Delay estimation method and apparatus
US11950079B2 (en) 2017-06-29 2024-04-02 Huawei Technologies Co., Ltd. Delay estimation method and apparatus
US12520092B2 (en) 2017-06-29 2026-01-06 Huawei Technologies Co., Ltd. Delay estimation method and apparatus

Also Published As

Publication number Publication date
US10210873B2 (en) 2019-02-19
RU2670843C1 (ru) 2018-10-25
EP3252756A4 (en) 2017-12-13
CA2977846A1 (en) 2016-09-15
JP6487569B2 (ja) 2019-03-20
KR20170120645A (ko) 2017-10-31
SG11201706998QA (en) 2017-09-28
CN106033671A (zh) 2016-10-19
AU2015385490A1 (en) 2017-09-28
US20170372710A1 (en) 2017-12-28
BR112017018600A2 (zh) 2018-04-17
EP3252756B1 (en) 2019-08-14
MX2017011460A (es) 2017-12-14
RU2670843C9 (ru) 2018-11-30
JP2018511824A (ja) 2018-04-26
EP3252756A1 (en) 2017-12-06
MX365619B (es) 2019-06-07
CN106033671B (zh) 2020-11-06
AU2015385490B2 (en) 2019-04-11

Similar Documents

Publication Publication Date Title
JP7443423B2 (ja) マルチチャネル信号の符号化方法およびエンコーダ
US11664034B2 (en) Optimized coding and decoding of spatialization information for the parametric coding and decoding of a multichannel audio signal
US10002614B2 (en) Determining the inter-channel time difference of a multi-channel audio signal
CN113948098B (zh) 一种立体声音频信号时延估计方法及装置
WO2016141732A1 (zh) 确定声道间时间差参数的方法和装置
WO2018188424A1 (zh) 多声道信号的编解码方法和编解码器
CN101673545B (zh) 一种编解码方法及装置
CN102598120B (zh) 多信道信号的编码
WO2016141731A1 (zh) 确定声道间时间差参数的方法和装置
JP7453997B2 (ja) DirACベースの空間オーディオ符号化のためのパケット損失隠蔽
WO2018209942A1 (zh) 一种立体声信号处理方法及装置
US20250087230A1 (en) System and Method for Speech Enhancement in Multichannel Audio Processing Systems
HK1244103A1 (zh) 确定声道间时间差参数的方法和装置
WO2017193551A1 (zh) 多声道信号的编码方法和编码器
CN107358960B (zh) 多声道信号的编码方法和编码器
CN107358961A (zh) 多声道信号的编码方法和编码器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15884410

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2977846

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 11201706998Q

Country of ref document: SG

REEP Request for entry into the european phase

Ref document number: 2015884410

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: MX/A/2017/011460

Country of ref document: MX

ENP Entry into the national phase

Ref document number: 2017547541

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112017018600

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 20177026484

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2015385490

Country of ref document: AU

Date of ref document: 20151120

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2017135269

Country of ref document: RU

ENP Entry into the national phase

Ref document number: 112017018600

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20170830