EP1970893A1 - Verfahren zur Schätzung von Signalkodierungsparametern - Google Patents
Verfahren zur Schätzung von Signalkodierungsparametern Download PDFInfo
- Publication number
- EP1970893A1 EP1970893A1 EP07450044A EP07450044A EP1970893A1 EP 1970893 A1 EP1970893 A1 EP 1970893A1 EP 07450044 A EP07450044 A EP 07450044A EP 07450044 A EP07450044 A EP 07450044A EP 1970893 A1 EP1970893 A1 EP 1970893A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- segment
- model
- spectrum
- spectral
- coding parameters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000003595 spectral effect Effects 0.000 claims abstract description 37
- 238000001228 spectrum Methods 0.000 claims abstract description 23
- 230000015572 biosynthetic process Effects 0.000 claims description 11
- 238000003786 synthesis reaction Methods 0.000 claims description 10
- 238000012546 transfer Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 description 22
- 238000007476 Maximum Likelihood Methods 0.000 description 12
- 238000012545 processing Methods 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000005284 excitation Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/12—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being prediction coefficients
Definitions
- the present invention relates to an improved technique for encoding a digital signal, in particular a speech signal. More specifically, the invention concerns a method for estimating coding parameters of a predictive filter model of a digital signal according to the preamble of claim 1.
- LPC Linear Predictive Coding
- Said technique computes the parameters of an autoregressive filter from the time samples of a digital speech signal.
- the computation of those parameters is well-known to those of ordinary skill in the field of the present invention.
- An example of such computation is found in ITU-T Recommendation G.722.2, "Wideband coding of speech at around 16 kbit/s using adaptive multi-rate wideband (AMR-WB)", Geneva 2002.
- Most of the commercial speech coders such as the LPC Vocoder, the Coded-Excited Linear Predictive Coding (CELP), and its posterior variants (ACELP, VSELP), among many others, rely on the LPC technique.
- x[n] is a windowed segment of the input digital signal
- a m are the linear prediction coefficients
- M is the model order.
- the solution delivered by the LPC technique is thus equivalent to the linear prediction coefficients that make cost function E minimal.
- Speech coders based on the LPC technique are known to deliver coded speech of acceptable but moderate quality. Furthermore, the performance of automatic speech recognition systems drops notably when fed with said coded signal instead of the raw signal.
- the author of the present invention found out that although a predictive filter model is adequate for describing the physical production of speech, the LPC technique is unable to obtain the parameters of said model with enough accuracy.
- substantially minimizing is intended to comprise both, making the cost function minimal as well as making the cost function at least a sufficiently low value, i.e. a value within a given or acceptable tolerance interval from that minimum.
- the proposed coding of each signal segment comprises two main processing steps: on the one hand, the computation of a spectral mask that weights the relevance of each spectral sample of said segment spectrum, wherein the relevance is determined on the basis of the fundamental frequency of the speech utterance and the spectral characteristics of the noise in the segment, and on the other hand the computation of the coding parameters that make a specific cost function minimal, or at least an appropriate level, where said cost function is built with said segment spectrum, said spectral mask and the parametric filter model.
- the invention is based on the insight that not all spectral samples in an input spectrum necessarily contain valuable information for the estimation of linear prediction coefficients: for instance, the spectrum of voiced speech utterances contains only valuable information at harmonic frequencies, and in the presence of background noise the spectrum of the speech can be corrupted at certain frequency components if its level is lower than that of the noise at said components.
- the novel frequency-selective approach of the invention increases the coding precision and efficiency, especially in the case of voiced utterances and/or of noise-corrupted signal segments.
- the method of the invention computes the speech coding parameters on the basis of the spectrum of the signal segment, where said parameters are related to the popular speech formation model, with significantly improved accuracy.
- the method of the invention can replace e.g. the LPC technique in those speech/audio coders that operate with said technique.
- the invention can also be used in speech/audio coders that do not operate with said LPC technique, such as Harmonic Coders and Hybrid Coders.
- the improved accuracy of the estimation of the coding parameters also implies a more accurate estimation of the spectral energy.
- the present invention can be used in automatic speech recognition systems also in such a way that spectral-like features can be drawn directly from the estimated filter model and gain level instead of from the signal spectrum.
- said coding parameters are the gain level and the filter coefficients of said predictive filter model.
- the step of minimizing said cost function can be performed by means of any suitable algorithm of the art; preferably, a multivariate Newton-Raphson algorithm is used.
- the coding parameters determined according to present invention can be translated into any parameterization which are needed for the subsequent decoding, i.e. synthesis stage.
- said predictive filter model is defined by its synthesis equivalent being a parametric all-pole filter model, an autoregressive coefficients filter (ARC) model, a reflection coefficients filter (RC) model, and/or a line spectral frequencies (LSF) model using the coding parameters determined.
- ARC autoregressive coefficients filter
- RC reflection coefficients filter
- LSF line spectral frequencies
- the spectral mask ⁇ ( ⁇ ) plays a vital role in the cost function ML in that it contains for each frequency a value that weights the relevance of the spectral sample at said frequency.
- the gain level ⁇ and the parameters a m that define the synthesis filter H ( ⁇ ) correspond to the parametric degrees of freedom of the cost function ML.
- any reference in this disclosure to the cost function ML also comprises any mathematically or technically equivalent expression of equation (7), e.g. a cost function differing from equation (7) in an additive term that does not depend on said parametric degrees of freedom.
- FIG. 1 shows in the form of a block diagram an analysis stage 100 of a speech coder that uses the method of the present invention.
- a signal segmentation block 10 performs the usual segmentation of an input digital signal x into segments, generally denoted by x [n] .
- a spectral transformation block 20 performs the spectral transformation of said segment.
- Block 20 performs e.g. a Discrete Fourier Transform, Discrete Sinus Transform and/or a Fan-Chirp Transform, among other popular choices.
- a spectral mask block 30 performs the computation of the spectral mask ⁇ ( ⁇ ).
- the segment x [n] is assumed to be corrupted by background noise whose spectral characteristics are described by the power spectrum N ( ⁇ ). Furthermore said segment may contain a speech utterance of "voiced” nature, with fundamental frequency ⁇ 0 (in case of "unvoiced” speech utterances, the fundamental frequency is considered zero or very low).
- ⁇ ( ⁇ ) is the "Dirac delta" function
- spectral mask is two-fold: on the one hand to disable those spectral samples of the segment spectrum that are sensibly corrupted by noise, and on the other hand to discard the spectral samples that do not correspond to harmonic frequencies.
- Said harmonic frequencies point out to the high-energy spectral peaks that delineate the spectral envelope of the speech utterance.
- the estimation of the power spectrum N ( ⁇ ) is carried out by a noise estimation block 35 according to known ad-hoc techniques, such as a Kalman filter estimation, et cet.
- the estimation of the fundamental frequency ⁇ 0 is carried out by a pitch analysis block 40 according to known ad-hoc methods, e.g. peak detection of the autocorrelation of the segment, et cet.
- a cost function minimization block 50 carries out the computation of the gain level ⁇ and parameters a m as coding parameters of the filter model that make cost function ML minimal, or at least below a predetermined level.
- This minimization task is a readily feasible computer programming task.
- a possible choice for the implementation of the minimization task is the multivariate Newton-Raphson algorithm.
- the output parameters of the speech coder analysis stage 100 are the gain level ⁇ , the parameters a m of the predictive filter, and - if desired - the pitch of the excitation ⁇ 0 which can be taken from the output of block 40. Said parameters correspond to the output of the analysis stage of conventional speech coders e.g. relying on the LPC technique. Therefore, the method of the present invention can supersede the LPC technique in said coders.
- the present invention achieves higher accuracy in estimating the coding parameters of a predictive filter model, manifested by a resulting spectral envelope interpolating narrowly, i.e. matching closely, the energy of the harmonics, see Fig. 2b , and a prediction error closer to the actual excitation, see Fig. 2c .
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP07450044A EP1970893A1 (de) | 2007-03-13 | 2007-03-13 | Verfahren zur Schätzung von Signalkodierungsparametern |
| PCT/AT2008/000087 WO2008109904A1 (en) | 2007-03-13 | 2008-03-12 | A method for estimating signal coding parameters |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP07450044A EP1970893A1 (de) | 2007-03-13 | 2007-03-13 | Verfahren zur Schätzung von Signalkodierungsparametern |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| EP1970893A1 true EP1970893A1 (de) | 2008-09-17 |
Family
ID=38229678
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP07450044A Withdrawn EP1970893A1 (de) | 2007-03-13 | 2007-03-13 | Verfahren zur Schätzung von Signalkodierungsparametern |
Country Status (2)
| Country | Link |
|---|---|
| EP (1) | EP1970893A1 (de) |
| WO (1) | WO2008109904A1 (de) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP2363853A1 (de) * | 2010-03-04 | 2011-09-07 | Österreichische Akademie der Wissenschaften | Verfahren zur Schätzung des rauschfreien Spektrums eines Signals |
| RU2577180C2 (ru) * | 2010-08-03 | 2016-03-10 | Стормингсвисс Гмбх | Устройство и способ оценки и оптимизации сигналов на основе алгебраических инвариантов |
| CN107068157A (zh) * | 2017-02-21 | 2017-08-18 | 中国科学院信息工程研究所 | 一种基于音频载体的信息隐藏方法及系统 |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP0851406A2 (de) * | 1996-12-27 | 1998-07-01 | Nec Corporation | Spektralmerkmal-Extrahierungsystem basiert auf der Schätzung einer Frequenzwichtungsfunktion |
-
2007
- 2007-03-13 EP EP07450044A patent/EP1970893A1/de not_active Withdrawn
-
2008
- 2008-03-12 WO PCT/AT2008/000087 patent/WO2008109904A1/en not_active Ceased
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP0851406A2 (de) * | 1996-12-27 | 1998-07-01 | Nec Corporation | Spektralmerkmal-Extrahierungsystem basiert auf der Schätzung einer Frequenzwichtungsfunktion |
Non-Patent Citations (4)
| Title |
|---|
| GU L ET AL: "Perceptual harmonic cepstral coefficients for speech recognition in noisy environment", 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING. PROCEEDINGS. (ICASSP). SALT LAKE CITY, UT, MAY 7 - 11, 2001, IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), NEW YORK, NY : IEEE, US, vol. VOL. 1 OF 6, 7 May 2001 (2001-05-07), pages 125 - 128, XP010803060, ISBN: 0-7803-7041-4 * |
| HERMANSKY H ET AL: "Perceptual linear predictive (PLP) analysis-resynthesis technique", EUROSPEECH 91. 2ND EUROPEAN CONFERENCE ON SPEECH COMMUNICATION AND TECHNOLOGY PROCEEDINGS ISTITUTO INT. COMUNICAZIONI GENOVA, ITALY, 1991, pages 329 - 332 vol.1, XP002442693 * |
| LUKASIAK J ET AL: "Linear prediction incorporating simultaneous masking", 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING. PROCEEDINGS (CAT. NO.00CH37100) IEEE PISCATAWAY, NJ, USA, vol. 3, 2000, pages 1471 - 1474 vol., XP002442694, ISBN: 0-7803-6293-4 * |
| ZHAO Y: "FREQUENCY-DOMAIN MAXIMUM LIKELIHOOD ESTIMATION FOR AUTOMATIC SPEECH RECOGNITION IN ADDITIVE AND CONVOLUTIVE NOISES", IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, IEEE SERVICE CENTER, NEW YORK, NY, US, vol. 8, no. 3, May 2000 (2000-05-01), pages 255 - 266, XP011054019, ISSN: 1063-6676 * |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP2363853A1 (de) * | 2010-03-04 | 2011-09-07 | Österreichische Akademie der Wissenschaften | Verfahren zur Schätzung des rauschfreien Spektrums eines Signals |
| RU2577180C2 (ru) * | 2010-08-03 | 2016-03-10 | Стормингсвисс Гмбх | Устройство и способ оценки и оптимизации сигналов на основе алгебраических инвариантов |
| CN107068157A (zh) * | 2017-02-21 | 2017-08-18 | 中国科学院信息工程研究所 | 一种基于音频载体的信息隐藏方法及系统 |
| CN107068157B (zh) * | 2017-02-21 | 2020-04-10 | 中国科学院信息工程研究所 | 一种基于音频载体的信息隐藏方法及系统 |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2008109904A1 (en) | 2008-09-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Kleijn | Encoding speech using prototype waveforms | |
| CN102089803B (zh) | 用以将信号的不同段分类的方法与鉴别器 | |
| US7272556B1 (en) | Scalable and embedded codec for speech and audio signals | |
| RU2389085C2 (ru) | Способы и устройства для введения низкочастотных предыскажений в ходе сжатия звука на основе acelp/tcx | |
| Zolnay et al. | Acoustic feature combination for robust speech recognition | |
| EP3029670B1 (de) | Bestimmung einer gewichtungsfunktion mit niedriger komplexität zur quantifizierung von koeffizienten für eine lineare vorhersagecodierung | |
| EP1953737A1 (de) | Transformationskodierer und transformationsverfahren | |
| Wang et al. | Phonetically-based vector excitation coding of speech at 3.6 kbps | |
| US8909539B2 (en) | Method and device for extending bandwidth of speech signal | |
| EP3121813B1 (de) | Geräuschunterdrückung ohne nebeninformationen für celp-codierer | |
| US10395665B2 (en) | Apparatus and method determining weighting function for linear prediction coding coefficients quantization | |
| WO1999026234A1 (en) | Method and apparatus for pitch estimation using perception based analysis by synthesis | |
| CN106104682A (zh) | 用于对线性预测编码系数进行量化的加权函数确定装置和方法 | |
| EP1970893A1 (de) | Verfahren zur Schätzung von Signalkodierungsparametern | |
| Ramabadran et al. | Enhancing distributed speech recognition with back-end speech reconstruction. | |
| KR101761820B1 (ko) | Lpc 계수 양자화를 위한 가중치 함수 결정 장치 및 방법 | |
| Ramabadran et al. | The ETSI extended distributed speech recognition (DSR) standards: server-side speech reconstruction | |
| Chen et al. | Analysis-by-synthesis speech coding | |
| Schroeder | Parameter estimation in speech: a lesson in unorthodoxy | |
| Addou et al. | A noise-robust front-end for distributed speech recognition in mobile communications | |
| El-Maleh | Classification-based Techniques for Digital Coding of Speech-plus-noise | |
| KR101867596B1 (ko) | Lpc 계수 양자화를 위한 가중치 함수 결정 장치 및 방법 | |
| Bhaskar et al. | Low bit-rate voice compression based on frequency domain interpolative techniques | |
| Rämö et al. | Segmental speech coding model for storage applications. | |
| Beritelli et al. | A simple and efficient two-band speech coder at 2.4 kbit/s for real-time implementation on a single low-cost DSP |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC MT NL PL PT RO SE SI SK TR |
|
| AX | Request for extension of the european patent |
Extension state: AL BA HR MK RS |
|
| AKX | Designation fees paid | ||
| REG | Reference to a national code |
Ref country code: DE Ref legal event code: 8566 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
| 18D | Application deemed to be withdrawn |
Effective date: 20090318 |