EP1970893A1 - Verfahren zur Schätzung von Signalkodierungsparametern - Google Patents

Verfahren zur Schätzung von Signalkodierungsparametern Download PDF

Info

Publication number
EP1970893A1
EP1970893A1 EP07450044A EP07450044A EP1970893A1 EP 1970893 A1 EP1970893 A1 EP 1970893A1 EP 07450044 A EP07450044 A EP 07450044A EP 07450044 A EP07450044 A EP 07450044A EP 1970893 A1 EP1970893 A1 EP 1970893A1
Authority
EP
European Patent Office
Prior art keywords
segment
model
spectrum
spectral
coding parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP07450044A
Other languages
English (en)
French (fr)
Inventor
Luis Dr. Weruaga
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Innovationsagentur GmbH
Osterreichische Akademie der Wissenschaften
Original Assignee
Innovationsagentur GmbH
Osterreichische Akademie der Wissenschaften
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Innovationsagentur GmbH, Osterreichische Akademie der Wissenschaften filed Critical Innovationsagentur GmbH
Priority to EP07450044A priority Critical patent/EP1970893A1/de
Priority to PCT/AT2008/000087 priority patent/WO2008109904A1/en
Publication of EP1970893A1 publication Critical patent/EP1970893A1/de
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/12Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being prediction coefficients

Definitions

  • the present invention relates to an improved technique for encoding a digital signal, in particular a speech signal. More specifically, the invention concerns a method for estimating coding parameters of a predictive filter model of a digital signal according to the preamble of claim 1.
  • LPC Linear Predictive Coding
  • Said technique computes the parameters of an autoregressive filter from the time samples of a digital speech signal.
  • the computation of those parameters is well-known to those of ordinary skill in the field of the present invention.
  • An example of such computation is found in ITU-T Recommendation G.722.2, "Wideband coding of speech at around 16 kbit/s using adaptive multi-rate wideband (AMR-WB)", Geneva 2002.
  • Most of the commercial speech coders such as the LPC Vocoder, the Coded-Excited Linear Predictive Coding (CELP), and its posterior variants (ACELP, VSELP), among many others, rely on the LPC technique.
  • x[n] is a windowed segment of the input digital signal
  • a m are the linear prediction coefficients
  • M is the model order.
  • the solution delivered by the LPC technique is thus equivalent to the linear prediction coefficients that make cost function E minimal.
  • Speech coders based on the LPC technique are known to deliver coded speech of acceptable but moderate quality. Furthermore, the performance of automatic speech recognition systems drops notably when fed with said coded signal instead of the raw signal.
  • the author of the present invention found out that although a predictive filter model is adequate for describing the physical production of speech, the LPC technique is unable to obtain the parameters of said model with enough accuracy.
  • substantially minimizing is intended to comprise both, making the cost function minimal as well as making the cost function at least a sufficiently low value, i.e. a value within a given or acceptable tolerance interval from that minimum.
  • the proposed coding of each signal segment comprises two main processing steps: on the one hand, the computation of a spectral mask that weights the relevance of each spectral sample of said segment spectrum, wherein the relevance is determined on the basis of the fundamental frequency of the speech utterance and the spectral characteristics of the noise in the segment, and on the other hand the computation of the coding parameters that make a specific cost function minimal, or at least an appropriate level, where said cost function is built with said segment spectrum, said spectral mask and the parametric filter model.
  • the invention is based on the insight that not all spectral samples in an input spectrum necessarily contain valuable information for the estimation of linear prediction coefficients: for instance, the spectrum of voiced speech utterances contains only valuable information at harmonic frequencies, and in the presence of background noise the spectrum of the speech can be corrupted at certain frequency components if its level is lower than that of the noise at said components.
  • the novel frequency-selective approach of the invention increases the coding precision and efficiency, especially in the case of voiced utterances and/or of noise-corrupted signal segments.
  • the method of the invention computes the speech coding parameters on the basis of the spectrum of the signal segment, where said parameters are related to the popular speech formation model, with significantly improved accuracy.
  • the method of the invention can replace e.g. the LPC technique in those speech/audio coders that operate with said technique.
  • the invention can also be used in speech/audio coders that do not operate with said LPC technique, such as Harmonic Coders and Hybrid Coders.
  • the improved accuracy of the estimation of the coding parameters also implies a more accurate estimation of the spectral energy.
  • the present invention can be used in automatic speech recognition systems also in such a way that spectral-like features can be drawn directly from the estimated filter model and gain level instead of from the signal spectrum.
  • said coding parameters are the gain level and the filter coefficients of said predictive filter model.
  • the step of minimizing said cost function can be performed by means of any suitable algorithm of the art; preferably, a multivariate Newton-Raphson algorithm is used.
  • the coding parameters determined according to present invention can be translated into any parameterization which are needed for the subsequent decoding, i.e. synthesis stage.
  • said predictive filter model is defined by its synthesis equivalent being a parametric all-pole filter model, an autoregressive coefficients filter (ARC) model, a reflection coefficients filter (RC) model, and/or a line spectral frequencies (LSF) model using the coding parameters determined.
  • ARC autoregressive coefficients filter
  • RC reflection coefficients filter
  • LSF line spectral frequencies
  • the spectral mask ⁇ ( ⁇ ) plays a vital role in the cost function ML in that it contains for each frequency a value that weights the relevance of the spectral sample at said frequency.
  • the gain level ⁇ and the parameters a m that define the synthesis filter H ( ⁇ ) correspond to the parametric degrees of freedom of the cost function ML.
  • any reference in this disclosure to the cost function ML also comprises any mathematically or technically equivalent expression of equation (7), e.g. a cost function differing from equation (7) in an additive term that does not depend on said parametric degrees of freedom.
  • FIG. 1 shows in the form of a block diagram an analysis stage 100 of a speech coder that uses the method of the present invention.
  • a signal segmentation block 10 performs the usual segmentation of an input digital signal x into segments, generally denoted by x [n] .
  • a spectral transformation block 20 performs the spectral transformation of said segment.
  • Block 20 performs e.g. a Discrete Fourier Transform, Discrete Sinus Transform and/or a Fan-Chirp Transform, among other popular choices.
  • a spectral mask block 30 performs the computation of the spectral mask ⁇ ( ⁇ ).
  • the segment x [n] is assumed to be corrupted by background noise whose spectral characteristics are described by the power spectrum N ( ⁇ ). Furthermore said segment may contain a speech utterance of "voiced” nature, with fundamental frequency ⁇ 0 (in case of "unvoiced” speech utterances, the fundamental frequency is considered zero or very low).
  • ⁇ ( ⁇ ) is the "Dirac delta" function
  • spectral mask is two-fold: on the one hand to disable those spectral samples of the segment spectrum that are sensibly corrupted by noise, and on the other hand to discard the spectral samples that do not correspond to harmonic frequencies.
  • Said harmonic frequencies point out to the high-energy spectral peaks that delineate the spectral envelope of the speech utterance.
  • the estimation of the power spectrum N ( ⁇ ) is carried out by a noise estimation block 35 according to known ad-hoc techniques, such as a Kalman filter estimation, et cet.
  • the estimation of the fundamental frequency ⁇ 0 is carried out by a pitch analysis block 40 according to known ad-hoc methods, e.g. peak detection of the autocorrelation of the segment, et cet.
  • a cost function minimization block 50 carries out the computation of the gain level ⁇ and parameters a m as coding parameters of the filter model that make cost function ML minimal, or at least below a predetermined level.
  • This minimization task is a readily feasible computer programming task.
  • a possible choice for the implementation of the minimization task is the multivariate Newton-Raphson algorithm.
  • the output parameters of the speech coder analysis stage 100 are the gain level ⁇ , the parameters a m of the predictive filter, and - if desired - the pitch of the excitation ⁇ 0 which can be taken from the output of block 40. Said parameters correspond to the output of the analysis stage of conventional speech coders e.g. relying on the LPC technique. Therefore, the method of the present invention can supersede the LPC technique in said coders.
  • the present invention achieves higher accuracy in estimating the coding parameters of a predictive filter model, manifested by a resulting spectral envelope interpolating narrowly, i.e. matching closely, the energy of the harmonics, see Fig. 2b , and a prediction error closer to the actual excitation, see Fig. 2c .

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
EP07450044A 2007-03-13 2007-03-13 Verfahren zur Schätzung von Signalkodierungsparametern Withdrawn EP1970893A1 (de)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP07450044A EP1970893A1 (de) 2007-03-13 2007-03-13 Verfahren zur Schätzung von Signalkodierungsparametern
PCT/AT2008/000087 WO2008109904A1 (en) 2007-03-13 2008-03-12 A method for estimating signal coding parameters

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP07450044A EP1970893A1 (de) 2007-03-13 2007-03-13 Verfahren zur Schätzung von Signalkodierungsparametern

Publications (1)

Publication Number Publication Date
EP1970893A1 true EP1970893A1 (de) 2008-09-17

Family

ID=38229678

Family Applications (1)

Application Number Title Priority Date Filing Date
EP07450044A Withdrawn EP1970893A1 (de) 2007-03-13 2007-03-13 Verfahren zur Schätzung von Signalkodierungsparametern

Country Status (2)

Country Link
EP (1) EP1970893A1 (de)
WO (1) WO2008109904A1 (de)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2363853A1 (de) * 2010-03-04 2011-09-07 Österreichische Akademie der Wissenschaften Verfahren zur Schätzung des rauschfreien Spektrums eines Signals
RU2577180C2 (ru) * 2010-08-03 2016-03-10 Стормингсвисс Гмбх Устройство и способ оценки и оптимизации сигналов на основе алгебраических инвариантов
CN107068157A (zh) * 2017-02-21 2017-08-18 中国科学院信息工程研究所 一种基于音频载体的信息隐藏方法及系统

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0851406A2 (de) * 1996-12-27 1998-07-01 Nec Corporation Spektralmerkmal-Extrahierungsystem basiert auf der Schätzung einer Frequenzwichtungsfunktion

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0851406A2 (de) * 1996-12-27 1998-07-01 Nec Corporation Spektralmerkmal-Extrahierungsystem basiert auf der Schätzung einer Frequenzwichtungsfunktion

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
GU L ET AL: "Perceptual harmonic cepstral coefficients for speech recognition in noisy environment", 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING. PROCEEDINGS. (ICASSP). SALT LAKE CITY, UT, MAY 7 - 11, 2001, IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), NEW YORK, NY : IEEE, US, vol. VOL. 1 OF 6, 7 May 2001 (2001-05-07), pages 125 - 128, XP010803060, ISBN: 0-7803-7041-4 *
HERMANSKY H ET AL: "Perceptual linear predictive (PLP) analysis-resynthesis technique", EUROSPEECH 91. 2ND EUROPEAN CONFERENCE ON SPEECH COMMUNICATION AND TECHNOLOGY PROCEEDINGS ISTITUTO INT. COMUNICAZIONI GENOVA, ITALY, 1991, pages 329 - 332 vol.1, XP002442693 *
LUKASIAK J ET AL: "Linear prediction incorporating simultaneous masking", 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING. PROCEEDINGS (CAT. NO.00CH37100) IEEE PISCATAWAY, NJ, USA, vol. 3, 2000, pages 1471 - 1474 vol., XP002442694, ISBN: 0-7803-6293-4 *
ZHAO Y: "FREQUENCY-DOMAIN MAXIMUM LIKELIHOOD ESTIMATION FOR AUTOMATIC SPEECH RECOGNITION IN ADDITIVE AND CONVOLUTIVE NOISES", IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, IEEE SERVICE CENTER, NEW YORK, NY, US, vol. 8, no. 3, May 2000 (2000-05-01), pages 255 - 266, XP011054019, ISSN: 1063-6676 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2363853A1 (de) * 2010-03-04 2011-09-07 Österreichische Akademie der Wissenschaften Verfahren zur Schätzung des rauschfreien Spektrums eines Signals
RU2577180C2 (ru) * 2010-08-03 2016-03-10 Стормингсвисс Гмбх Устройство и способ оценки и оптимизации сигналов на основе алгебраических инвариантов
CN107068157A (zh) * 2017-02-21 2017-08-18 中国科学院信息工程研究所 一种基于音频载体的信息隐藏方法及系统
CN107068157B (zh) * 2017-02-21 2020-04-10 中国科学院信息工程研究所 一种基于音频载体的信息隐藏方法及系统

Also Published As

Publication number Publication date
WO2008109904A1 (en) 2008-09-18

Similar Documents

Publication Publication Date Title
Kleijn Encoding speech using prototype waveforms
CN102089803B (zh) 用以将信号的不同段分类的方法与鉴别器
US7272556B1 (en) Scalable and embedded codec for speech and audio signals
RU2389085C2 (ru) Способы и устройства для введения низкочастотных предыскажений в ходе сжатия звука на основе acelp/tcx
Zolnay et al. Acoustic feature combination for robust speech recognition
EP3029670B1 (de) Bestimmung einer gewichtungsfunktion mit niedriger komplexität zur quantifizierung von koeffizienten für eine lineare vorhersagecodierung
EP1953737A1 (de) Transformationskodierer und transformationsverfahren
Wang et al. Phonetically-based vector excitation coding of speech at 3.6 kbps
US8909539B2 (en) Method and device for extending bandwidth of speech signal
EP3121813B1 (de) Geräuschunterdrückung ohne nebeninformationen für celp-codierer
US10395665B2 (en) Apparatus and method determining weighting function for linear prediction coding coefficients quantization
WO1999026234A1 (en) Method and apparatus for pitch estimation using perception based analysis by synthesis
CN106104682A (zh) 用于对线性预测编码系数进行量化的加权函数确定装置和方法
EP1970893A1 (de) Verfahren zur Schätzung von Signalkodierungsparametern
Ramabadran et al. Enhancing distributed speech recognition with back-end speech reconstruction.
KR101761820B1 (ko) Lpc 계수 양자화를 위한 가중치 함수 결정 장치 및 방법
Ramabadran et al. The ETSI extended distributed speech recognition (DSR) standards: server-side speech reconstruction
Chen et al. Analysis-by-synthesis speech coding
Schroeder Parameter estimation in speech: a lesson in unorthodoxy
Addou et al. A noise-robust front-end for distributed speech recognition in mobile communications
El-Maleh Classification-based Techniques for Digital Coding of Speech-plus-noise
KR101867596B1 (ko) Lpc 계수 양자화를 위한 가중치 함수 결정 장치 및 방법
Bhaskar et al. Low bit-rate voice compression based on frequency domain interpolative techniques
Rämö et al. Segmental speech coding model for storage applications.
Beritelli et al. A simple and efficient two-band speech coder at 2.4 kbit/s for real-time implementation on a single low-cost DSP

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC MT NL PL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL BA HR MK RS

AKX Designation fees paid
REG Reference to a national code

Ref country code: DE

Ref legal event code: 8566

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20090318