EP1431962B1 - Dispositif et procédé de codage de parole à large bande - Google Patents
Dispositif et procédé de codage de parole à large bande Download PDFInfo
- Publication number
- EP1431962B1 EP1431962B1 EP04100553A EP04100553A EP1431962B1 EP 1431962 B1 EP1431962 B1 EP 1431962B1 EP 04100553 A EP04100553 A EP 04100553A EP 04100553 A EP04100553 A EP 04100553A EP 1431962 B1 EP1431962 B1 EP 1431962B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- speech
- highband
- lowband
- khz
- encoder
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 238000000034 method Methods 0.000 title claims description 16
- 238000005070 sampling Methods 0.000 claims description 18
- 238000013139 quantization Methods 0.000 claims description 13
- 238000001228 spectrum Methods 0.000 claims description 9
- 238000000638 solvent extraction Methods 0.000 claims description 3
- 230000005284 excitation Effects 0.000 description 30
- 238000013459 approach Methods 0.000 description 8
- 230000005540 biological transmission Effects 0.000 description 8
- 238000001914 filtration Methods 0.000 description 5
- 230000015572 biosynthetic process Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 230000006835 compression Effects 0.000 description 3
- 238000007906 compression Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 230000003111 delayed effect Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000001308 synthesis method Methods 0.000 description 1
- 229940034880 tencon Drugs 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/038—Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0204—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/12—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
Definitions
- the present invention relates to electronic devices, and, more particularly, to speech coding, transmission, storage, and decoding/synthesis methods and systems.
- the performance of digital speech systems using low bit rates has become increasingly important with current and foreseeable digital communications.
- Both dedicated channel and packetized-over-network (VoIP) transmission benefit from compression of speech signals.
- the widely-used linear prediction (LP) digital speech coding compression method models the vocal tract as a time-varying filter and a time-varying excitation of the filter to mimic human speech.
- M the order of the linear prediction filter, is taken to be about 10-12; the sampling rate to form the samples s(n) is typically taken to be 8 kHz (the same as the public switched telephone network (PSTN) sampling for digital transmission); and the number of samples ⁇ s(n) ⁇ in a frame is often 80 or 160 (10 or 20 ms frames).
- Various windowing operations may be applied to the samples of the input speech frame.
- ⁇ r(n) 2 yields the ⁇ a(j) ⁇ which furnish the best linear prediction.
- the coefficients ⁇ a(j) ⁇ may be converted to line spectral frequencies (LSFs) for quantization and transmission or storage.
- the ⁇ r (n) ⁇ form the LP residual for the frame, and ideally LP residual would be the excitation for the synthesis filter 1/A(z) where A(z) is the transfer function of equation (1).
- the LP residual is not available at the decoder; thus the task of the encoder is to represent the LP residual so that the decoder can generate an LP excitation from the encoded parameters.
- the LP compression approach basically only transmits/stores updates for the (quantized) filter coefficients, the (quantized) residual (waveform or parameters such as pitch), and the (quantized) gain.
- a receiver regenerates the speech with the same perceptual characteristics as the input speech.
- Figure 9 shows the blocks in an LP system. Periodic updating of the quantized items requires fewer bits than direct representation of the speech signal, so a reasonable LP coder can operate at bits rates as low as 2-3 kb/s (kilobits per second).
- the ITU standard G.729 Annex E with a bit rate of 11.8 kb/s uses LP analysis with codebook excitation (CELP) to compress voiceband speech and has performance comparable to the 64 kb/s PCM used for PSTN digital transmission.
- CELP codebook excitation
- Another approach uses split-band CELP or MPLPC by coding a 4-8 kHz highband separately from the 0-4 kHz lowband and with fewer bits allocated to the highband; see Drogo de Jacovo et al, Some Experiments of 7 kHz Audio Coding at 16 kbit/s, IEEE ICASSP 1989, pp.192-195. Similarly, Tucker, Low Bit-Rate Frequency Extension Coding, IEE Colloquium on Audio and Music Technology 1998, pp.3/1-3/5, provides standard coding of the lowband 0-4 kHz plus codes the 4-8 kHz highband speech only for unvoiced frames (as determined in the lowband) and uses an LP filter of order 2-4 with noise excitation.
- the present invention provides a method of wideband speech coding, comprising: (a) partitioning a frame of digital speech into a lowband and a highband; (b) decimating the sampling rate of both said lowband and said highband; (c) encoding said decimated lowband from step (b) including a first method of quantization; (d) reversing the spectrum of a baseband image of said decimated highband from step (b); and (e) encoding the results of step (d) including said first method of quantization.
- a wideband speech decoder comprising: (a) a first speech decoder with an input for encoded narrowband speech and an LP codebook; (b) a second speech decoder with an input for encoded highband speech, said second decoder using said LP codebook.
- the preferred embodiment systems include preferred embodiment encoders and decoders that process a wideband speech frame as the sum of a lowband signal and a highband signal in which the lowband signal has standalone speech encoding/decoding and the highband signal has encoding/decoding incorporating information from the lowband signal to modulate a noise excitation. This allows for a minimal number of bits to sufficiently encode the highband and yields an embedded coder.
- Figure 1a shows in functional block format a first preferred embodiment system for wideband speech encoding, transmission (storage), and decoding including first preferred embodiment encoders and decoders.
- the encoders and decoders use CELP lowband encoding and decoding plus a highband encoding and decoding incorporating information from the (decoded) lowband for modulation of a noise excitation with LP coding.
- first preferred embodiment encoders proceed as follows.
- the baseband of the decimated highband has a reversed spectrum because the baseband is an aliased image; see Figure 3b.
- encode the first baseband (decimated lowband) signal with a (standard) narrowband speech coder.
- Decoding reverses the encoding process by separating the highband and lowband code, using information from the decoded lowband to help decode the highband, and adding the decoded highband to the decoded lowband speech to synthesize wideband speech. See Figure 1c.
- This split-band approach allows most of the code bits to be allocated to the lowband; for example, the lowband may consume 11.8 kb/s and the highband may add 2.2 kb/s for a total of 14 kb/s.
- Figures 2a-2b illustrate the typical magnitudes of voiced and unvoiced speech, respectively, as functions of frequency over the range 0-8 kHz.
- the bulk of the energy in voiced speech resides in the 0-3 kHz band.
- the pitch structure (the fundamental frequency is about 125 Hz in Figure 2a) clearly appears in the range 0-3.5 kHz and persists (although jumbled) at higher frequencies.
- the perceptual critical bandwidth at higher frequencies is roughly 10% of a band center frequency, so the individual pitch harmonics become indistinguishable and should require fewer bits for inclusion in a highband code.
- the higher band (above 4 kHz) should require fewer bits to encode than the lower band (0-4 kHz).
- This underlies the preferred embodiment methods of partitioning wideband (0-8 kHz) speech into a lowband (0-4 kHz) and a highband (4-8 kHz), recognizing that the lowband may be encoded by any convenient narrowband coder, and separately coding the highband with a relatively small number of bits as described in the following sections.
- Figure 1b illustrates the flow of a first preferred embodiment speech coder which encodes at 14 kb/s with the following steps.
- a first preferred embodiment decoding method essentially reverses the encoding steps for a bitstream encoded by the first preferred embodiment method.
- a coded frame in the bitstream For a coded frame in the bitstream:
- FIGS 8-9 show in functional block form preferred embodiment systems that use the preferred embodiment encoding and decoding.
- the encoding and decoding can be performed with digital signal processors (DSPs) or general purpose programmable processors or application specific circuitry or systems on a chip such as both a DSP and RISC processor on the same chip with the RISC processor controlling.
- Codebooks would be stored in memory at both the encoder and decoder, and a stored program in an onboard ROM or external flash EEPROM for a DSP or programmable processor could perform the signal processing.
- Analog-to-digital converters and digital-to-analog converters provide coupling to the real world, and modulators and demodulators (plus antennas for air interfaces) provide coupling for transmission waveforms.
- the encoded speech can be packetized and transmitted over networks such as the Internet.
- the preferred embodiments may be modified in various ways while retaining the features of separately coding a lowband from a wideband signal and using information from the lowband to help encode the highband (remainder of the wideband) and/or using spectrum reversal for decimated highband LP coefficient quantization in order to obtain efficiency comparable to that for the lowband LP coefficient quantization.
- the upper (2.8-3.8 kHz) portion of the lowband (0-4 kHz) could be replaced by some other portion(s) of the lowband for use as a modulation for the highband excitation.
- the wideband may be partitioned into a lowband plus two or more highbands; the lowband coder could be a parametric or even non-LP coder and a highband coder could be a waveform coder; and so forth.
- the scope of the invention is hereby only limited by the appended claims.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Claims (4)
- Procédé de codage de la parole à large bande, comprenant :(a) la segmentation d'une séquence de parole numérique en une bande basse et une bande haute ;(b) la décimation du taux d'échantillonnage à la fois de ladite bande basse et de ladite bande haute ;(c) le codage de ladite bande basse décimée issue de l'étape (b), incluant un premier procédé de quantification ;(d) l'inversion du spectre d'une image de bande de base de ladite bande haute décimée, issue de l'étape (b) ; et(e) le codage des résultats de l'étape (d), incluant ledit premier procédé de quantification.
- Procédé de décodage de la parole à large bande, comprenant :(a) le décodage d'une première partie d'un signal d'entrée en tant que signal de parole à bande basse, incluant l'utilisation d'un premier livre de code ;(b) le décodage d'une deuxième partie d'un signal d'entrée en tant que signal de parole à bande haute, incluant l'utilisation dudit premier livre de code ; et(c) la combinaison des résultats des étapes (a) et (b) précédentes, pour former un signal de parole à large bande décodé.
- Codeur de parole à large bande, comprenant :(a) un filtre à bande basse et un filtre à bande haute pour la parole numérique ;(b) un premier codeur avec une entrée provenant dudit filtre à bande basse ; ledit premier codeur utilisant un premier quantificateur ;(c) un deuxième codeur avec une entrée venant dudit filtre à bande haute, ledit deuxième codeur incluant ledit premier quantificateur ; et(d) un combineur pour ledit premier codeur et ledit deuxième codeur, pour fournir une parole à large bande codée.
- Décodeur de parole à bande large, comprenant :(a) un premier décodeur de parole ayant une entrée pour la parole à bande étroite codée et un livre de code LP ;(b) un deuxième décodeur de parole avec une entrée pour une parole à bande haute codée, ledit deuxième décodeur utilisant ledit livre de code LP.
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US20615600P | 2000-05-22 | 2000-05-22 | |
| US206156P | 2000-05-22 | ||
| EP01000172A EP1158495B1 (fr) | 2000-05-22 | 2001-05-22 | Dispositif et procédé de codage de parole à large bande |
Related Parent Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP01000172A Division EP1158495B1 (fr) | 2000-05-22 | 2001-05-22 | Dispositif et procédé de codage de parole à large bande |
| EP01000172A Division-Into EP1158495B1 (fr) | 2000-05-22 | 2001-05-22 | Dispositif et procédé de codage de parole à large bande |
Publications (3)
| Publication Number | Publication Date |
|---|---|
| EP1431962A2 EP1431962A2 (fr) | 2004-06-23 |
| EP1431962A3 EP1431962A3 (fr) | 2004-12-01 |
| EP1431962B1 true EP1431962B1 (fr) | 2006-04-05 |
Family
ID=32395343
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP04100553A Expired - Lifetime EP1431962B1 (fr) | 2000-05-22 | 2001-05-22 | Dispositif et procédé de codage de parole à large bande |
Country Status (1)
| Country | Link |
|---|---|
| EP (1) | EP1431962B1 (fr) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8996362B2 (en) | 2008-01-31 | 2015-03-31 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Device and method for a bandwidth extension of an audio signal |
Families Citing this family (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101006495A (zh) | 2004-08-31 | 2007-07-25 | 松下电器产业株式会社 | 语音编码装置、语音解码装置、通信装置以及语音编码方法 |
| EP1793373A4 (fr) | 2004-09-17 | 2008-10-01 | Matsushita Electric Industrial Co Ltd | Appareil de codage audio, appareil de decodage audio, appareil de communication et procede de codage audio |
| CN101023471B (zh) | 2004-09-17 | 2011-05-25 | 松下电器产业株式会社 | 可伸缩性编码装置、可伸缩性解码装置、可伸缩性编码方法、可伸缩性解码方法、通信终端装置以及基站装置 |
| WO2008081777A1 (fr) | 2006-12-25 | 2008-07-10 | Kyushu Institute Of Technology | Dispositif d'interpolation de signal haute fréquence et procédé d'interpolation de signal haute fréquence |
| RU2432624C1 (ru) * | 2010-04-21 | 2011-10-27 | Государственное образовательное учреждение высшего профессионального образования Академия Федеральной службы охраны Российской Федерации (Академия ФСО России) | Способ уменьшения объема данных при широкополосном кодировании речевого сигнала |
-
2001
- 2001-05-22 EP EP04100553A patent/EP1431962B1/fr not_active Expired - Lifetime
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8996362B2 (en) | 2008-01-31 | 2015-03-31 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Device and method for a bandwidth extension of an audio signal |
Also Published As
| Publication number | Publication date |
|---|---|
| EP1431962A3 (fr) | 2004-12-01 |
| EP1431962A2 (fr) | 2004-06-23 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US7330814B2 (en) | Wideband speech coding with modulated noise highband excitation system and method | |
| US7136810B2 (en) | Wideband speech coding system and method | |
| US6795805B1 (en) | Periodicity enhancement in decoding wideband signals | |
| JP4662673B2 (ja) | 広帯域音声及びオーディオ信号復号器における利得平滑化 | |
| US8374853B2 (en) | Hierarchical encoding/decoding device | |
| CA2862715C (fr) | Codec audio multimode et codage celp adapte a ce codec | |
| US6260009B1 (en) | CELP-based to CELP-based vocoder packet translation | |
| US8260620B2 (en) | Device for perceptual weighting in audio encoding/decoding | |
| US7184953B2 (en) | Transcoding method and system between CELP-based speech codes with externally provided status | |
| US8532998B2 (en) | Selective bandwidth extension for encoding/decoding audio/speech signal | |
| EP1273005B1 (fr) | Codec de parole a large bande utilisant differentes frequences d'echantillonnage | |
| RU2584463C2 (ru) | Кодирование звука с малой задержкой, содержащее чередующиеся предсказательное кодирование и кодирование с преобразованием | |
| KR20090104846A (ko) | 디지털 오디오 신호에 대한 향상된 코딩/디코딩 | |
| EP0981816A1 (fr) | Procedes et systemes de codage audio | |
| EP1158495B1 (fr) | Dispositif et procédé de codage de parole à large bande | |
| EP1222659A1 (fr) | Vocodeur harmonique a codage predictif lineaire (lpc) avec structure a supertrame | |
| US6847929B2 (en) | Algebraic codebook system and method | |
| US6687667B1 (en) | Method for quantizing speech coder parameters | |
| US20040111257A1 (en) | Transcoding apparatus and method between CELP-based codecs using bandwidth extension | |
| KR100499047B1 (ko) | 서로 다른 대역폭을 갖는 켈프 방식 코덱들 간의 상호부호화 장치 및 그 방법 | |
| EP1431962B1 (fr) | Dispositif et procédé de codage de parole à large bande | |
| CN101405792A (zh) | 用于在音频解码器中对信号进行后处理的方法 | |
| JP3092653B2 (ja) | 広帯域音声符号化装置及び音声復号装置並びに音声符号化復号装置 | |
| Schnitzler | A 13.0 kbit/s wideband speech codec based on SB-ACELP | |
| US6801887B1 (en) | Speech coding exploiting the power ratio of different speech signal components |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| AC | Divisional application: reference to earlier application |
Ref document number: 1158495 Country of ref document: EP Kind code of ref document: P |
|
| AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR |
|
| PUAL | Search report despatched |
Free format text: ORIGINAL CODE: 0009013 |
|
| AK | Designated contracting states |
Kind code of ref document: A3 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR |
|
| RIC1 | Information provided on ipc code assigned before grant |
Ipc: 7G 10L 21/02 A Ipc: 7G 10L 19/02 B |
|
| 17P | Request for examination filed |
Effective date: 20050601 |
|
| AKX | Designation fees paid |
Designated state(s): DE FR GB |
|
| GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
| GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
| GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
| AC | Divisional application: reference to earlier application |
Ref document number: 1158495 Country of ref document: EP Kind code of ref document: P |
|
| AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): DE FR GB |
|
| REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
| REF | Corresponds to: |
Ref document number: 60118627 Country of ref document: DE Date of ref document: 20060518 Kind code of ref document: P |
|
| ET | Fr: translation filed | ||
| PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
| 26N | No opposition filed |
Effective date: 20070108 |
|
| REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 16 |
|
| REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 17 |
|
| PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20170426 Year of fee payment: 17 Ref country code: FR Payment date: 20170418 Year of fee payment: 17 Ref country code: DE Payment date: 20170531 Year of fee payment: 17 |
|
| REG | Reference to a national code |
Ref country code: DE Ref legal event code: R119 Ref document number: 60118627 Country of ref document: DE |
|
| GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20180522 |
|
| PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20180522 Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20181201 Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20180531 |