EP1970893A1 - Verfahren zur Schätzung von Signalkodierungsparametern - Google Patents

Verfahren zur Schätzung von Signalkodierungsparametern Download PDF

Info

Publication number: EP1970893A1
Authority: EP; European Patent Office
Prior art keywords: segment; model; spectrum; spectral; coding parameters
Prior art date: 2007-03-13
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.): Withdrawn

Application number

EP07450044A

Other languages

English (en)

French (fr)

Inventor

Luis Dr. Weruaga

Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)

Innovationsagentur GmbH

Osterreichische Akademie der Wissenschaften

Original Assignee

Innovationsagentur GmbH

Osterreichische Akademie der Wissenschaften

Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)

2007-03-13

Filing date

2007-03-13

Publication date

2008-09-17

2007-03-13 Application filed by Innovationsagentur GmbH, Osterreichische Akademie der Wissenschaften filed Critical Innovationsagentur GmbH

2007-03-13 Priority to EP07450044A priority Critical patent/EP1970893A1/de

2008-03-12 Priority to PCT/AT2008/000087 priority patent/WO2008109904A1/en

2008-09-17 Publication of EP1970893A1 publication Critical patent/EP1970893A1/de

Status Withdrawn legal-status Critical Current

Links

238000000034 method Methods 0.000 title claims abstract description 37
230000003595 spectral effect Effects 0.000 claims abstract description 37
238000001228 spectrum Methods 0.000 claims abstract description 23
230000015572 biosynthetic process Effects 0.000 claims description 11
238000003786 synthesis reaction Methods 0.000 claims description 10
238000012546 transfer Methods 0.000 claims description 3
230000006870 function Effects 0.000 description 22
238000007476 Maximum Likelihood Methods 0.000 description 12
238000012545 processing Methods 0.000 description 3
230000009466 transformation Effects 0.000 description 3
238000013459 approach Methods 0.000 description 2
230000005284 excitation Effects 0.000 description 2
238000004519 manufacturing process Methods 0.000 description 2
230000011218 segmentation Effects 0.000 description 2
230000003044 adaptive effect Effects 0.000 description 1
239000000654 additive Substances 0.000 description 1
230000000996 additive effect Effects 0.000 description 1
238000001514 detection method Methods 0.000 description 1
238000010586 diagram Methods 0.000 description 1
230000000694 effects Effects 0.000 description 1
238000012986 modification Methods 0.000 description 1
230000004048 modification Effects 0.000 description 1

Images

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/12—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being prediction coefficients

Definitions

the present invention relates to an improved technique for encoding a digital signal, in particular a speech signal. More specifically, the invention concerns a method for estimating coding parameters of a predictive filter model of a digital signal according to the preamble of claim 1.
LPC Linear Predictive Coding
Said technique computes the parameters of an autoregressive filter from the time samples of a digital speech signal.
the computation of those parameters is well-known to those of ordinary skill in the field of the present invention.
An example of such computation is found in ITU-T Recommendation G.722.2, "Wideband coding of speech at around 16 kbit/s using adaptive multi-rate wideband (AMR-WB)", Geneva 2002.
Most of the commercial speech coders such as the LPC Vocoder, the Coded-Excited Linear Predictive Coding (CELP), and its posterior variants (ACELP, VSELP), among many others, rely on the LPC technique.
x[n] is a windowed segment of the input digital signal
a m are the linear prediction coefficients
M is the model order.
the solution delivered by the LPC technique is thus equivalent to the linear prediction coefficients that make cost function E minimal.
Speech coders based on the LPC technique are known to deliver coded speech of acceptable but moderate quality. Furthermore, the performance of automatic speech recognition systems drops notably when fed with said coded signal instead of the raw signal.
the author of the present invention found out that although a predictive filter model is adequate for describing the physical production of speech, the LPC technique is unable to obtain the parameters of said model with enough accuracy.
substantially minimizing is intended to comprise both, making the cost function minimal as well as making the cost function at least a sufficiently low value, i.e. a value within a given or acceptable tolerance interval from that minimum.
the proposed coding of each signal segment comprises two main processing steps: on the one hand, the computation of a spectral mask that weights the relevance of each spectral sample of said segment spectrum, wherein the relevance is determined on the basis of the fundamental frequency of the speech utterance and the spectral characteristics of the noise in the segment, and on the other hand the computation of the coding parameters that make a specific cost function minimal, or at least an appropriate level, where said cost function is built with said segment spectrum, said spectral mask and the parametric filter model.
the invention is based on the insight that not all spectral samples in an input spectrum necessarily contain valuable information for the estimation of linear prediction coefficients: for instance, the spectrum of voiced speech utterances contains only valuable information at harmonic frequencies, and in the presence of background noise the spectrum of the speech can be corrupted at certain frequency components if its level is lower than that of the noise at said components.
the novel frequency-selective approach of the invention increases the coding precision and efficiency, especially in the case of voiced utterances and/or of noise-corrupted signal segments.
the method of the invention computes the speech coding parameters on the basis of the spectrum of the signal segment, where said parameters are related to the popular speech formation model, with significantly improved accuracy.
the method of the invention can replace e.g. the LPC technique in those speech/audio coders that operate with said technique.
the invention can also be used in speech/audio coders that do not operate with said LPC technique, such as Harmonic Coders and Hybrid Coders.
the improved accuracy of the estimation of the coding parameters also implies a more accurate estimation of the spectral energy.
the present invention can be used in automatic speech recognition systems also in such a way that spectral-like features can be drawn directly from the estimated filter model and gain level instead of from the signal spectrum.
said coding parameters are the gain level and the filter coefficients of said predictive filter model.
the step of minimizing said cost function can be performed by means of any suitable algorithm of the art; preferably, a multivariate Newton-Raphson algorithm is used.
the coding parameters determined according to present invention can be translated into any parameterization which are needed for the subsequent decoding, i.e. synthesis stage.
said predictive filter model is defined by its synthesis equivalent being a parametric all-pole filter model, an autoregressive coefficients filter (ARC) model, a reflection coefficients filter (RC) model, and/or a line spectral frequencies (LSF) model using the coding parameters determined.
ARC autoregressive coefficients filter
RC reflection coefficients filter
LSF line spectral frequencies
the spectral mask ⁇ ( ⁇ ) plays a vital role in the cost function ML in that it contains for each frequency a value that weights the relevance of the spectral sample at said frequency.
the gain level ⁇ and the parameters a m that define the synthesis filter H ( ⁇ ) correspond to the parametric degrees of freedom of the cost function ML.
any reference in this disclosure to the cost function ML also comprises any mathematically or technically equivalent expression of equation (7), e.g. a cost function differing from equation (7) in an additive term that does not depend on said parametric degrees of freedom.
FIG. 1 shows in the form of a block diagram an analysis stage 100 of a speech coder that uses the method of the present invention.
a signal segmentation block 10 performs the usual segmentation of an input digital signal x into segments, generally denoted by x [n] .
a spectral transformation block 20 performs the spectral transformation of said segment.
Block 20 performs e.g. a Discrete Fourier Transform, Discrete Sinus Transform and/or a Fan-Chirp Transform, among other popular choices.
a spectral mask block 30 performs the computation of the spectral mask ⁇ ( ⁇ ).
the segment x [n] is assumed to be corrupted by background noise whose spectral characteristics are described by the power spectrum N ( ⁇ ). Furthermore said segment may contain a speech utterance of "voiced” nature, with fundamental frequency ⁇ 0 (in case of "unvoiced” speech utterances, the fundamental frequency is considered zero or very low).
⁇ ( ⁇ ) is the "Dirac delta" function
spectral mask is two-fold: on the one hand to disable those spectral samples of the segment spectrum that are sensibly corrupted by noise, and on the other hand to discard the spectral samples that do not correspond to harmonic frequencies.
Said harmonic frequencies point out to the high-energy spectral peaks that delineate the spectral envelope of the speech utterance.
the estimation of the power spectrum N ( ⁇ ) is carried out by a noise estimation block 35 according to known ad-hoc techniques, such as a Kalman filter estimation, et cet.
the estimation of the fundamental frequency ⁇ 0 is carried out by a pitch analysis block 40 according to known ad-hoc methods, e.g. peak detection of the autocorrelation of the segment, et cet.
a cost function minimization block 50 carries out the computation of the gain level ⁇ and parameters a m as coding parameters of the filter model that make cost function ML minimal, or at least below a predetermined level.
This minimization task is a readily feasible computer programming task.
a possible choice for the implementation of the minimization task is the multivariate Newton-Raphson algorithm.
the output parameters of the speech coder analysis stage 100 are the gain level ⁇ , the parameters a m of the predictive filter, and - if desired - the pitch of the excitation ⁇ 0 which can be taken from the output of block 40. Said parameters correspond to the output of the analysis stage of conventional speech coders e.g. relying on the LPC technique. Therefore, the method of the present invention can supersede the LPC technique in said coders.
the present invention achieves higher accuracy in estimating the coding parameters of a predictive filter model, manifested by a resulting spectral envelope interpolating narrowly, i.e. matching closely, the energy of the harmonics, see Fig. 2b , and a prediction error closer to the actual excitation, see Fig. 2c .

Landscapes

Engineering & Computer Science (AREA)
Computational Linguistics (AREA)
Signal Processing (AREA)
Health & Medical Sciences (AREA)
Audiology, Speech & Language Pathology (AREA)
Human Computer Interaction (AREA)
Physics & Mathematics (AREA)
Acoustics & Sound (AREA)
Multimedia (AREA)
Compression, Expansion, Code Conversion, And Decoders (AREA)

EP07450044A 2007-03-13 2007-03-13 Verfahren zur Schätzung von Signalkodierungsparametern Withdrawn EP1970893A1 (de)

Priority Applications (2)

Application Number	Priority Date	Filing Date	Title
EP07450044A EP1970893A1 (de)	2007-03-13	2007-03-13	Verfahren zur Schätzung von Signalkodierungsparametern
PCT/AT2008/000087 WO2008109904A1 (en)	2007-03-13	2008-03-12	A method for estimating signal coding parameters

Applications Claiming Priority (1)

Application Number	Priority Date	Filing Date	Title
EP07450044A EP1970893A1 (de)	2007-03-13	2007-03-13	Verfahren zur Schätzung von Signalkodierungsparametern

Publications (1)

Publication Number	Publication Date
EP1970893A1 true EP1970893A1 (de)	2008-09-17

Family

ID=38229678

Family Applications (1)

Application Number	Title	Priority Date	Filing Date
EP07450044A Withdrawn EP1970893A1 (de)	2007-03-13	2007-03-13	Verfahren zur Schätzung von Signalkodierungsparametern

Country Status (2)

Country	Link
EP (1)	EP1970893A1 (de)
WO (1)	WO2008109904A1 (de)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
EP2363853A1 (de) *	2010-03-04	2011-09-07	Österreichische Akademie der Wissenschaften	Verfahren zur Schätzung des rauschfreien Spektrums eines Signals
RU2577180C2 (ru) *	2010-08-03	2016-03-10	Стормингсвисс Гмбх	Устройство и способ оценки и оптимизации сигналов на основе алгебраических инвариантов
CN107068157A (zh) *	2017-02-21	2017-08-18	中国科学院信息工程研究所	一种基于音频载体的信息隐藏方法及系统

Citations (1)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
EP0851406A2 (de) *	1996-12-27	1998-07-01	Nec Corporation	Spektralmerkmal-Extrahierungsystem basiert auf der Schätzung einer Frequenzwichtungsfunktion

2007
- 2007-03-13 EP EP07450044A patent/EP1970893A1/de not_active Withdrawn
2008
- 2008-03-12 WO PCT/AT2008/000087 patent/WO2008109904A1/en not_active Ceased

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
EP0851406A2 (de) *	1996-12-27	1998-07-01	Nec Corporation	Spektralmerkmal-Extrahierungsystem basiert auf der Schätzung einer Frequenzwichtungsfunktion

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
GU L ET AL: "Perceptual harmonic cepstral coefficients for speech recognition in noisy environment", 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING. PROCEEDINGS. (ICASSP). SALT LAKE CITY, UT, MAY 7 - 11, 2001, IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), NEW YORK, NY : IEEE, US, vol. VOL. 1 OF 6, 7 May 2001 (2001-05-07), pages 125 - 128, XP010803060, ISBN: 0-7803-7041-4 *
HERMANSKY H ET AL: "Perceptual linear predictive (PLP) analysis-resynthesis technique", EUROSPEECH 91. 2ND EUROPEAN CONFERENCE ON SPEECH COMMUNICATION AND TECHNOLOGY PROCEEDINGS ISTITUTO INT. COMUNICAZIONI GENOVA, ITALY, 1991, pages 329 - 332 vol.1, XP002442693 *
LUKASIAK J ET AL: "Linear prediction incorporating simultaneous masking", 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING. PROCEEDINGS (CAT. NO.00CH37100) IEEE PISCATAWAY, NJ, USA, vol. 3, 2000, pages 1471 - 1474 vol., XP002442694, ISBN: 0-7803-6293-4 *
ZHAO Y: "FREQUENCY-DOMAIN MAXIMUM LIKELIHOOD ESTIMATION FOR AUTOMATIC SPEECH RECOGNITION IN ADDITIVE AND CONVOLUTIVE NOISES", IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, IEEE SERVICE CENTER, NEW YORK, NY, US, vol. 8, no. 3, May 2000 (2000-05-01), pages 255 - 266, XP011054019, ISSN: 1063-6676 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
EP2363853A1 (de) *	2010-03-04	2011-09-07	Österreichische Akademie der Wissenschaften	Verfahren zur Schätzung des rauschfreien Spektrums eines Signals
RU2577180C2 (ru) *	2010-08-03	2016-03-10	Стормингсвисс Гмбх	Устройство и способ оценки и оптимизации сигналов на основе алгебраических инвариантов
CN107068157A (zh) *	2017-02-21	2017-08-18	中国科学院信息工程研究所	一种基于音频载体的信息隐藏方法及系统
CN107068157B (zh) *	2017-02-21	2020-04-10	中国科学院信息工程研究所	一种基于音频载体的信息隐藏方法及系统

Also Published As

Publication number	Publication date
WO2008109904A1 (en)	2008-09-18

Legal Events

Date	Code	Title	Description
2008-08-15	PUAI	Public reference made under article 153(3) epc to a published international application that has entered the european phase	Free format text: ORIGINAL CODE: 0009012
2008-09-17	AK	Designated contracting states	Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC MT NL PL PT RO SE SI SK TR
2008-09-17	AX	Request for extension of the european patent	Extension state: AL BA HR MK RS
2009-05-20	AKX	Designation fees paid
2009-07-02	REG	Reference to a national code	Ref country code: DE Ref legal event code: 8566
2009-08-14	STAA	Information on the status of an ep patent application or granted ep patent	Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN
2009-09-16	18D	Application deemed to be withdrawn	Effective date: 20090318

Publication	Publication Date	Title
Kleijn	2002	Encoding speech using prototype waveforms
CN102089803B (zh)	2013-02-27	用以将信号的不同段分类的方法与鉴别器
US7272556B1 (en)	2007-09-18	Scalable and embedded codec for speech and audio signals
RU2389085C2 (ru)	2010-05-10	Способы и устройства для введения низкочастотных предыскажений в ходе сжатия звука на основе acelp/tcx
Zolnay et al.	2005	Acoustic feature combination for robust speech recognition
EP3029670B1 (de)	2021-12-01	Bestimmung einer gewichtungsfunktion mit niedriger komplexität zur quantifizierung von koeffizienten für eine lineare vorhersagecodierung
EP1953737A1 (de)	2008-08-06	Transformationskodierer und transformationsverfahren
Wang et al.	1989	Phonetically-based vector excitation coding of speech at 3.6 kbps
US8909539B2 (en)	2014-12-09	Method and device for extending bandwidth of speech signal
EP3121813B1 (de)	2020-03-18	Geräuschunterdrückung ohne nebeninformationen für celp-codierer
US10395665B2 (en)	2019-08-27	Apparatus and method determining weighting function for linear prediction coding coefficients quantization
WO1999026234A1 (en)	1999-05-27	Method and apparatus for pitch estimation using perception based analysis by synthesis
CN106104682A (zh)	2016-11-09	用于对线性预测编码系数进行量化的加权函数确定装置和方法
EP1970893A1 (de)	2008-09-17	Verfahren zur Schätzung von Signalkodierungsparametern
Ramabadran et al.	2001	Enhancing distributed speech recognition with back-end speech reconstruction.
KR101761820B1 (ko)	2017-07-26	Ｌｐｃ 계수 양자화를 위한 가중치 함수 결정 장치 및 방법
Ramabadran et al.	2004	The ETSI extended distributed speech recognition (DSR) standards: server-side speech reconstruction
Chen et al.	2008	Analysis-by-synthesis speech coding
Schroeder	2005	Parameter estimation in speech: a lesson in unorthodoxy
Addou et al.	2007	A noise-robust front-end for distributed speech recognition in mobile communications
El-Maleh	2004	Classification-based Techniques for Digital Coding of Speech-plus-noise
KR101867596B1 (ko)	2018-06-14	Ｌｐｃ 계수 양자화를 위한 가중치 함수 결정 장치 및 방법
Bhaskar et al.	2006	Low bit-rate voice compression based on frequency domain interpolative techniques
Rämö et al.	2004	Segmental speech coding model for storage applications.
Beritelli et al.	1996	A simple and efficient two-band speech coder at 2.4 kbit/s for real-time implementation on a single low-cost DSP