EP1521243A1 - Verfahren zur Sprachkodierung mit Geräuschunterdrückung durch Modifizierung der Kodebuchverstärkung - Google Patents

Verfahren zur Sprachkodierung mit Geräuschunterdrückung durch Modifizierung der Kodebuchverstärkung Download PDF

Info

Publication number: EP1521243A1
Authority: EP; European Patent Office
Prior art keywords: signal; gain; speech; fixed gain; noise
Prior art date: 2003-10-01
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.): Withdrawn

Application number

EP03022251A

Other languages

English (en)

French (fr)

Inventor

Christophe Dr. Beaugeant

Nicolas Dütsch

Herbert Dr. Heiss

Hervé Dr. Taddei

Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)

Siemens AG

Siemens Corp

Original Assignee

Siemens AG

Siemens Corp

Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)

2003-10-01

Filing date

2003-10-01

Publication date

2005-04-06

2003-10-01 Application filed by Siemens AG, Siemens Corp filed Critical Siemens AG

2003-10-01 Priority to EP03022251A priority Critical patent/EP1521243A1/de

2004-08-17 Priority to PCT/EP2004/051810 priority patent/WO2005031709A1/en

2005-04-06 Publication of EP1521243A1 publication Critical patent/EP1521243A1/de

Status Withdrawn legal-status Critical Current

Links

238000000034 method Methods 0.000 title claims abstract description 25
230000009467 reduction Effects 0.000 title description 26
238000001308 synthesis method Methods 0.000 claims abstract description 3
230000003044 adaptive effect Effects 0.000 claims description 16
238000004891 communication Methods 0.000 claims description 6
230000005284 excitation Effects 0.000 description 10
230000004048 modification Effects 0.000 description 8
238000012986 modification Methods 0.000 description 8
230000015572 biosynthetic process Effects 0.000 description 5
238000003786 synthesis reaction Methods 0.000 description 5
230000003595 spectral effect Effects 0.000 description 4
230000000694 effects Effects 0.000 description 3
230000008901 benefit Effects 0.000 description 2
238000001914 filtration Methods 0.000 description 2
230000007774 longterm Effects 0.000 description 2
230000008569 process Effects 0.000 description 2
230000001629 suppression Effects 0.000 description 2
239000000654 additive Substances 0.000 description 1
230000000996 additive effect Effects 0.000 description 1
230000002238 attenuated effect Effects 0.000 description 1
230000005540 biological transmission Effects 0.000 description 1
230000001413 cellular effect Effects 0.000 description 1
230000007423 decrease Effects 0.000 description 1
230000001419 dependent effect Effects 0.000 description 1
238000001514 detection method Methods 0.000 description 1
238000004519 manufacturing process Methods 0.000 description 1
230000007246 mechanism Effects 0.000 description 1
238000013139 quantization Methods 0.000 description 1
230000004044 response Effects 0.000 description 1
238000005070 sampling Methods 0.000 description 1
238000001228 spectrum Methods 0.000 description 1
238000010183 spectrum analysis Methods 0.000 description 1

Images

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02168—Noise filtering characterised by the method used for estimating noise the estimation exclusively taking place during speech pauses

Definitions

the invention refers to a speech coding method applying noise reduction
noise reduction methods have been developed in speech processing. Most of the methods are performed in the frequency domain. They commonly comprise three major components:
the suppression rule modifies only the spectral amplitude, not the phase. It has been shown, that there is no need to modify the phase in speech enhancement processing. Nevertheless, this approximation is only valid for a Signal to Noise Ratio (SNR) greater than 6dB. However, this condition is supposed to be satisfied in the majority of the noise reduction algorithms.
SNR Signal to Noise Ratio
FIG. 1 A scheme of a treatment of a speech signal with noise reduction is depicted in Fig. 1.
the speech component s(p), where p denotes a time interval is superimposed with a noise component n(p).
n(p) This results in the total signal y(p).
the total signal y(p) undergoes a FFT.
the result are Fourier components Y(p, f k ), where f k denotes a quantized frequency.
the noise reduction NR is applied, thus producing modified Fouriercomponents S(p, S and (p,f k ). This leads after an IFFT to a clean speech signal estimate s and (p).
a problem of any spectral weighting noise reduction method is its computational complexity, e.g. if the following steps have to be performed successively:
a method for transmitting speech data said speech data are encoded by using an analysis through synthesis method.
a synthesised signal is produced for approximating the original signal.
the production of the synthesised signal is performed by using at least a fixed codebook with a respective fixed gain and optionally an adaptive codebook and a adaptive gain. The entries of the codebook and the gain are chosen such, that the synthesised signal resembles the original signal.
Parameters describing these quantities will be transmitted from a sender to a receiver, e.g. from a near-end speaker to a far-end speaker or vice versa.
the invention is based on the idea of modifying the fixed gain determined for the signal containing a noise component and a speech component. Objective of this modification is to obtain a useful estimate of the fixed gain of the speech component or clean signal.
the modification is done by subtraction of an estimate of the fixed gain of the noise component.
the fixed gain of the noise component may be derived from an analysis of the power of the signal in a predetermined time window.
One advantage of this procedure is its low computational complexity, particularly if the speech enhancement through noise reduction is done independently from an encoding / decoding unit, e.g. in a certain position within a network, where according to a noise reduction method in the time domain all the steps of decoding, FFT, speech enhancement , IFFT and encoding would have to be performed one after the other. This is not necessary for a noise reduction method according based on modification of parameters
Another advantage is that by using the parameters for any modification, a repeated encoding and decoding process, the so called “tandeming" can be avoided, because the modification takes place in the parameter itself. Any tandeming decreases the speech quality. Furthermore the delay due to the additional encoding/decoding, which is e.g. in GSM typically 5 ms can be avoided.
the procedure is furthermore also applicable within a communications network.
An encoding apparatus set up for performing the above described encoding method includes at least a processing unit.
the encoding apparatus may be part of a communications device, e.g. a cellular phone or it may be also situated in a communication network or a component thereof.
the codec consists of a multi-rate, that is, the AMR codec can switch between the following bit rates: 12.2, 10.2, 7.95, 7.40, 6.70, 5.90, 5.15 and 4.75 kbit/s, speech codec, a source-controlled rate scheme including a Voice Activity Detection (VAD), a comfort noise generation system and an error concealment mechanism to compensate the effects of transmission errors.
VAD Voice Activity Detection
Fig. 2 shows the scheme of the AMR encoder. It uses a LTP (long term prediction) filter. It is transformed to an equivalent structure called adaptive codebook. This codebook saves former LPC filtered excitation signals. Instead of subtracting a long-term prediction as the LTP filter does, an adaptive codebook search is done to get an excitation vector from further LPC filtered speech samples. The amplitude of this excitation is adjusted by a gain factor g a .
the encoder transforms the speech signal to parameters which describe the speech.
these parameters namely the LSF (or LPC) coefficients, the lag of the adaptive codebook, the index of the fixed codebook and the codebook gains, as "speech coding parameters”.
the domain will be called “(speech) codec parameter domain” and the signals of this domain are subscripted with frame index $k$.
Fig. 3 shows the signal flow of the decoder.
the decoder receives the speech coding parameters and computes the excitation signal of the synthesis filter.
This excitation signal is the sum of the excitations of the fixed and adaptive codebook scaled with their respective gain factors.
the speech signal is post-processed.
a (total) signal containing clean speech or a speech component and a noise component is encoded.
a fixed gain g y (m) of the total signal is calculated.
This fixed gain g y (m) of the total signal is subject to a gain modification which bases on a noise gain estimation.
an estimate of the fixed gain g and n ( m ) is determined, which is used for the gain modification.
the result of the gain modification is an estimate of the fixed gain g and s ( m ) of the clean speech or the speech component.
This parameter is transmitted from a sender to a receiver. At the receiver side it is decoded.
g s ( m ) g y ( m ) - g n ( m ), where m denotes a time interval, e.g. a frame or a subframe, g and n ( m ) the estimate of the noise component and g and s ( m ) the estimate of the clean codebook gain. It will be described in the next section in reference with a different embodiment, how the estimate of fixed gain g and n ( m ) of the noise component can be calculated.
That window of length D is divided in U sub-windows of length V .
the minimum value in the window of length D is the minimum of the set of minimums on each subwindow.
a buffer, Min_I of U elements contains the set of minimums from the last U sub-windows. It is renewed each time that V values of P are computed. The oldest element of the buffer is deleted and replaced by the minimum of the last V values of P.
the minimum on the window of length D, ⁇ and 2 / N for each sub-frame m is the minimum between the minimum of the buffer and the last value of P computed.
⁇ and 2 / N can be increased by a gain parameter omin to compensate the bias of the estimation.
a bias might be due to a continued overestimating of the noise, e.g. if a continually present murmuring is considered as noise only.
the noise reduction may cause some artefacts during the voice activity periods, e.g. that the speech signal is attenuated due to an overestimation of the noise component

Landscapes

Engineering & Computer Science (AREA)
Computational Linguistics (AREA)
Signal Processing (AREA)
Health & Medical Sciences (AREA)
Audiology, Speech & Language Pathology (AREA)
Human Computer Interaction (AREA)
Physics & Mathematics (AREA)
Acoustics & Sound (AREA)
Multimedia (AREA)
Quality & Reliability (AREA)
Compression, Expansion, Code Conversion, And Decoders (AREA)

EP03022251A 2003-10-01 2003-10-01 Verfahren zur Sprachkodierung mit Geräuschunterdrückung durch Modifizierung der Kodebuchverstärkung Withdrawn EP1521243A1 (de)

Priority Applications (2)

Application Number	Priority Date	Filing Date	Title
EP03022251A EP1521243A1 (de)	2003-10-01	2003-10-01	Verfahren zur Sprachkodierung mit Geräuschunterdrückung durch Modifizierung der Kodebuchverstärkung
PCT/EP2004/051810 WO2005031709A1 (en)	2003-10-01	2004-08-17	Speech coding method applying noise reduction by modifying the codebook gain

Applications Claiming Priority (1)

Application Number	Priority Date	Filing Date	Title
EP03022251A EP1521243A1 (de)	2003-10-01	2003-10-01	Verfahren zur Sprachkodierung mit Geräuschunterdrückung durch Modifizierung der Kodebuchverstärkung

Publications (1)

Publication Number	Publication Date
EP1521243A1 true EP1521243A1 (de)	2005-04-06

Family

ID=34306818

Family Applications (1)

Application Number	Title	Priority Date	Filing Date
EP03022251A Withdrawn EP1521243A1 (de)	2003-10-01	2003-10-01	Verfahren zur Sprachkodierung mit Geräuschunterdrückung durch Modifizierung der Kodebuchverstärkung

Country Status (2)

Country	Link
EP (1)	EP1521243A1 (de)
WO (1)	WO2005031709A1 (de)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
RU2437172C1 (ru) *	2007-11-04	2011-12-20	Квэлкомм Инкорпорейтед	Способ кодирования/декодирования индексов кодовой книги для квантованного спектра мдкп в масштабируемых речевых и аудиокодеках

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
EP3079151A1 (de)	2015-04-09	2016-10-12	Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.	Audiocodierer und verfahren zur codierung eines audiosignals

Citations (2)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
US20020184010A1 (en) *	2001-03-30	2002-12-05	Anders Eriksson	Noise suppression
EP1301018A1 (de) *	2001-10-02	2003-04-09	Alcatel	Verfahren und Vorrichtung zum Ändern eines digitalen Signals im Kodebereich

2003
- 2003-10-01 EP EP03022251A patent/EP1521243A1/de not_active Withdrawn
2004
- 2004-08-17 WO PCT/EP2004/051810 patent/WO2005031709A1/en not_active Ceased

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
US20020184010A1 (en) *	2001-03-30	2002-12-05	Anders Eriksson	Noise suppression
EP1301018A1 (de) *	2001-10-02	2003-04-09	Alcatel	Verfahren und Vorrichtung zum Ändern eines digitalen Signals im Kodebereich

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHANDRAN R ET AL: "COMPRESSED DOMAIN NOISE REDUCTION AND ECHO SUPPRESSION FOR NETWORK SPEECH ENHANCEMENT", PROCEEDINGS OF THE 43RD. IEEE MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS. MWSCAS 2000. LANSING, MI. NEW YORK, NY: IEEE, US, vol. 1 OF 3, 8 August 2000 (2000-08-08) - 11 August 2000 (2000-08-11), pages 10 - 13, XP002951730, ISBN: 0-7803-6476-7 *
LIM J S ET AL: "ENHANCEMENT AND BANDWIDTH COMPRESSION OF NOISY SPEECH", PROCEEDINGS OF THE IEEE, IEEE. NEW YORK, US, vol. 67, no. 12, December 1979 (1979-12-01), pages 1586 - 1604, XP000891496, ISSN: 0018-9219 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
RU2437172C1 (ru) *	2007-11-04	2011-12-20	Квэлкомм Инкорпорейтед	Способ кодирования/декодирования индексов кодовой книги для квантованного спектра мдкп в масштабируемых речевых и аудиокодеках
US8515767B2 (en)	2007-11-04	2013-08-20	Qualcomm Incorporated	Technique for encoding/decoding of codebook indices for quantized MDCT spectrum in scalable speech and audio codecs

Also Published As

Publication number	Publication date
WO2005031709A1 (en)	2005-04-07

Legal Events

Date	Code	Title	Description
2005-02-18	PUAI	Public reference made under article 153(3) epc to a published international application that has entered the european phase	Free format text: ORIGINAL CODE: 0009012
2005-04-06	AK	Designated contracting states	Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT RO SE SI SK TR
2005-04-06	AX	Request for extension of the european patent	Extension state: AL LT LV MK
2005-12-28	AKX	Designation fees paid
2006-02-02	REG	Reference to a national code	Ref country code: DE Ref legal event code: 8566
2006-05-19	STAA	Information on the status of an ep patent application or granted ep patent	Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN
2006-06-21	18D	Application deemed to be withdrawn	Effective date: 20051007

Publication	Publication Date	Title
EP1363273B1 (de)	2009-04-01	Sprachübertragungssystem und Verfahren zur Behandlung verlorener Datenrahmen
EP0743634B1 (de)	1999-10-06	Verfahren zur Anpassung des Rauschmaskierungspegels in einem Analyse-durch-Synthese-Sprachkodierer mit einem wahrnehmunggebundenen Kurzzeitfilter
US7529660B2 (en)	2009-05-05	Method and device for frequency-selective pitch enhancement of synthesized speech
EP1338003B1 (de)	2006-10-18	Gewinn-faktoren quantisierung für einen celp- sprachkodierer
US7379866B2 (en)	2008-05-27	Simple noise suppression model
US7606703B2 (en)	2009-10-20	Layered celp system and method with varying perceptual filter or short-term postfilter strengths
EP1313091A2 (de)	2003-05-21	Verfahren zur Analyse, Synthese und Quantisierung von Sprache
EP0899718B1 (de)	2003-12-10	Nichtlinearer Filter zur Geräuschunterdrückung in linearen Prädiktions-Sprachkodierungs-Vorrichtungen
US5884251A (en)	1999-03-16	Voice coding and decoding method and device therefor
US10672411B2 (en)	2020-06-02	Method for adaptively encoding an audio signal in dependence on noise information for higher encoding accuracy
EP1301018A1 (de)	2003-04-09	Verfahren und Vorrichtung zum Ändern eines digitalen Signals im Kodebereich
EP1521243A1 (de)	2005-04-06	Verfahren zur Sprachkodierung mit Geräuschunterdrückung durch Modifizierung der Kodebuchverstärkung
EP1521242A1 (de)	2005-04-06	Verfahren zur Sprachkodierung mit Geräuschunterdrückung durch Modifizierung der Kodebuchverstärkung
EP1521241A1 (de)	2005-04-06	Transmission des paramètres de codage de la parole avec annulation d'écho
Lee	1999	An enhanced ADPCM coder for voice over packet networks
EP1944761A1 (de)	2008-07-16	Störreduktion in der digitalen Signalverarbeitung
EP0984433A2 (de)	2000-03-08	Rauschunterdrückung in einer Sprachkommunikationseinheit und Betriebsverfahren
JP2003029798A (ja)	2003-01-31	音響信号符号化方法、音響信号復号方法、これらの装置、これらのプログラム及びその記録媒体
KR20110124528A (ko)	2011-11-17	음성 부호화기에서의 고품질 부호화를 위한 신호 전처리 방법 및 장치