JPH05507796A

JPH05507796A - Method and apparatus for low-throughput encoding of speech

Info

Publication number: JPH05507796A
Application number: JP91508756A
Authority: JP
Inventors: ムイ，ブノワ; ロラン，ピエール・アンドレ
Original assignee: トムソン―セエスエフ
Priority date: 1990-04-27
Filing date: 1991-04-19
Publication date: 1993-11-04
Also published as: CA2079884A1; FR2661541A1; EP0454552A3; EP0454552A2; WO1991017541A1

Abstract

(57)【要約】本公報は電子出願前の出願データであるため要約のデータは記録されません。 (57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】音声の低スルーブツト符号化の方法と装置本発明は、音声の低スルーブツト符号化の方法と装置に係わる。[Detailed description of the invention] Method and apparatus for low-throughput audio encoding The present invention provides a method and apparatus for low-throughput audio encoding. Concerning methods and equipment for conversion.

本発明は特に、Ｉ（Ｆ無線リンク用ボコーダ又は音声通信用に使用されるボコーダの作製に適用される。The present invention particularly relates to vocoders for I(F radio links or vocoders used for voice communications). applied to the production of da.

こうした分野では、伝送されるべき情報の量は、音声を伝送することが可能な機器の技術的限界に次第に達しつつある。例えばそのスループットが毎秒２４００ビツトを下回る通信の場合には、既知の符号化技術（ＭＩＣ、ＤＥＬＴＡ　、　ＮＬＰ等）はもはや適切ではなく、音声信号はその波形によって伝送されることが不可能である。この通信を行うためには、ボコーダのより高性能な符号化技術を使用することが必要になる。従って非常に低スルーブツトのボコーダの大半は、声道（Ｉ！ｃｏｎｄａＮマＯＣＩ＋１をモデル化するために、ディジタルフィルタのベクトル符号化技術を′使用する。このモデル化は、ディクショナリ内に参照を探索することによって行われる。しかし、実行することが非常に複雑であり且つ高コストであるこの技術は、音声信号の精確な量子化を得ることを可能にしない。信号エネルギが不適切に表現され従って不適切に符号化されるが故に、更に困難さが生じ、その結果として音声信号の振幅の突発的変動が適正に復元されることはもはや不可能である。In these fields, the amount of information to be transmitted is limited by the We are gradually reaching the technical limits of our equipment. For example, the throughput is 2400 per second For communications below bits, known encoding techniques (MIC, DELTA, NLP, etc.) are no longer appropriate, and audio signals are transmitted by their waveforms. is not possible. In order to perform this communication, higher performance encoding technology of the vocoder is required. It becomes necessary to use. Therefore, most vocoders with very low throughput , vocal tract (I!condaNma OCI+1). The router's vector encoding technique is used. This modeling is done in the dictionary This is done by searching for references. However, it is very complicated to implement. This technique, which is expensive and expensive, makes it possible to obtain accurate quantization of audio signals. do not. Because the signal energy is improperly represented and therefore improperly encoded, Further difficulties arise, with the result that sudden fluctuations in the amplitude of the audio signal cannot be properly recovered. It is no longer possible to

本発明の目標は上記の欠点を緩和することである。The aim of the invention is to alleviate the above-mentioned drawbacks.

このために本発明は、音声の低スルーブツト符号化の方法を目的とする。この方法は、決まった長さのフレームに音声信号を切り分けた後で、声道のモデル化のＮ個のフィルタの特徴、並びに音声信号の基本周期（ピッチ）と有声音化（マｏｉ＋ｅｍｅｎｊ）とエネルギなどの特徴の全てを符号化するために、フレーム毎に、特定数Ｐの回数だけ音声信号のエネルギを計算することによって、Ｎ個の連続したフレーム分の特定の間隔毎にこれらの特徴を計算することによって特徴付けられる。本発明の他の特徴と利点が、次の添付図面に関して行われる説明によって明らかになるだろう。To this end, the invention aims at a method for low-throughput coding of speech. This person The method cuts the audio signal into frames of a fixed length and then models the vocal tract. Characteristics of N filters, basic period (pitch) of audio signal and voiced sound (mao) i+emenj) and energy for each frame. Then, by calculating the energy of the audio signal a specific number P times, N consecutive Characterization is performed by calculating these features for each specific interval of consecutive frames. I get kicked. Other features and advantages of the invention will appear from the description given with reference to the following accompanying drawings. It will become clear.

図１は、本発明によって使用される音声符号化方法を示すフローチャートである。FIG. 1 is a flowchart illustrating the speech encoding method used by the present invention. .

図２は、声道をモデル化するための図１で行われる解析フィルタのＬＳＰ係数の符号化の方法を示す説明図である。Figure 2 shows the LSP coefficients of the analysis filter performed in Figure 1 for modeling the vocal tract. FIG. 2 is an explanatory diagram showing an encoding method.

図３はＬＳＰ係数の表である。FIG. 3 is a table of LSP coefficients.

図４は補間によるフレーム符号化のグラフである。FIG. 4 is a graph of frame encoding by interpolation.

図５は「ピッチ」符号化表である。FIG. 5 is a "pitch" encoding table.

図６は、本発明によって使用される音声信号合成方法を示すフローチャートである。FIG. 6 is a flowchart illustrating the audio signal synthesis method used by the present invention. Ru.

図７は、本発明によって使用される合成フィルり補間方法を示すグラフである。FIG. 7 is a graph illustrating the synthetic fill interpolation method used by the present invention.

図８は、本発明による方法を実行するための装置の実施例を示す説明図である。FIG. 8 is an explanatory diagram showing an embodiment of an apparatus for carrying out the method according to the invention.

本発明による符号化の方法は、約２０〜２５ｍ５の決まった長さのフレームに音声信号を切り分は終わった後で、通常の声道で生ずるように、フレーム毎にＰ回に互って信号エネルギを決定することによって、連続したＮ個のフレーム毎に音声信号の特徴を決定し符号化することにある。The encoding method according to the invention is based on the method of encoding sound into frames of fixed length of approximately 20 to 25 m5. After the voice signal has been segmented, it is processed P times per frame, as occurs in the normal vocal tract. The sound is generated every N consecutive frames by mutually determining the signal energy. The purpose is to determine and encode the characteristics of the voice signal.

各フレームにおける音声信号の合成は、音声信号の符号化された特徴の値のフレーム除去（ｄ’ｅ＋ｒｔ■１１）と復号化とを行った後で行われる。The synthesis of the audio signal in each frame consists of a frame of encoded feature values of the audio signal. This is performed after frame removal (d'e+rt11) and decoding.

Ｎ＝３個の連続したフレームが解析される場合に適用される、本発明による符号化の方法の代表的な諸ステップが図１のフローチャートに示されている。このフローチャートでは、その方法は、解析される第１のフレームにおける係数「ＬＡＰ　Ｊの計算によってステップ１〜６で始まり、この場合ｒＬＳＰ　Ｊは、声道をモデル化する解析フィルタの「線スペクトル対（ＬｊｎｅＳｐｅｃｔｒｕｍ　Ｐｘｉ＋）　Ｊの英語の略語である。例えばこの計算は、ＩＥＥＥ　Ｔｒ！ａｓ！ｃｌｉｏｎｓ　ｏｎ　ＡｃｏｃＮｉｃｓ、５ｐｅｅｃｈ　ｘｎｄ　ＳｉｇｎａｌＰ＋ｏｃｅｓ＋ｉａｇ　ＡＳＳＰ−３４Ｄｅｃ、８６において刊行された「チェビシェフ多項式を使用する線スペクトル周波数の計算（Ｔｈｅｃｏｍｐａｔｘｔｉｏｎ　ｏｆ　ｌｉｎ＋　＋ｐｅｃｌ＋ｓｌ　Ｆ＋ｅｑａｅｎｅｉｅｓ　ｕｓｉｎｇ　ＣｈｅｂＨｈｅｗｐｏｌｙｎｏｆｆｉｉｎｘｌ＋）　Ｊと表題されたＰｅｔｅｒ　ＫＡＢＡＬ　とＲｓｗｉ　ＰＲＡＫＡＳＡの論文の中で説明されている既知の方法に従って行われ得る。Code according to the invention applied when N=3 consecutive frames are analyzed Representative steps of the method are shown in the flowchart of FIG. This frame In the lowchart, the method describes the coefficient “LA We begin with steps 1-6 by calculating PJ, where rLSPJ is the vocal tract The line spectrum pair (LjneSpectrum) of the analysis filter that models Pxi+) is the English abbreviation of J. For example, this calculation is based on IEEE Tr! as ! clions on AcocNics, 5peech xnd Signa 1P+oces+iag ASSP-34Dec, 86 Calculation of line spectral frequencies using Ebyshev polynomials (Thecompatx tion of lin+ +pecl+sl F+eqaeneies us ing ChebHhewpolynoffiinxl+) P entitled It is explained in the paper of eter KABAL and Rswi PRAKASA. This can be done according to known methods.

各フレームにおける音声信号のサンプリングと、特定の数のビットによるサンプルの量子化との後で、これらのサンプルがステップ３においてプリエンファシス（Ｐｒｅ−ｅｍｐｈｘ目ｕｄ）される。Sampling the audio signal in each frame and sampling with a certain number of bits After the sample quantization, these samples undergo pre-emphasis in step 3. (Pre-emphx order ud).

サンプリング操作が音声信号のスペクトルを周期性にするが故に、声道モデル化フィルタの係数を決定するために考慮されるサンプリング数は、既知の仕方で限定され、それは、１つのフレームの持続時間に等しい持続時間のハミング窓（ＨＡ口ＩＮＧｗｉｎｄｏｗ）による、ステップ３でプリエンファシスされたサンプルとの積を生じさせることによるもので、この窓は共振を補強するという利点も有する。Vocal tract modeling because the sampling operation makes the spectrum of the audio signal periodic. The number of samples taken into account to determine the coefficients of the filter is limited in a known way. It is a Hamming window (H) of duration equal to the duration of one frame. The sample pre-emphasized in step 3 by This window also has the advantage of reinforcing resonance. have

声道モデル化フィルタの係数ｋ　は、ステップ５において、次の形式の関係によって定義される自己相関係数Ｒに基づいて計算される。The coefficient k of the vocal tract modeling filter is determined in step 5 by the relationship of the following form: It is calculated based on the autocorrelation coefficient R defined as

前式中で１は、例えば［１−ｔｏに変化する整数であり、Ｓ、は、プリエンファシスされ且つ窓をかけられた信号サンプルを表す。In the above formula, 1 is an integer that changes to, for example, [1-to, and S is the pre-emphasis value. represents a signal sample that has been cissed and windowed.

係数に、の計算は、ＬＥＲＯｔｌトＧ［ＩＥＧＵＥＮの既知のアルゴリズムを適用することによって、ステップ５で行われることが可能である。このアルゴリズムの説明は、雑誌ｒｌＥＥＥ　Ｔｒｓａｔｓｃｌｉｏａｓ　ｏｎＡｃｏ＋＋＋ｔｉｃ＋、５ｐｅｅｃｈ　ｚｎｄ　Ｓｉ（ｏｗｌ　ＰｒｏｃｅｓｓｉＢ　Ｊｕｎｅ　１９７７　Ｊ　の中の「部分相関係数の固定点計算（Ａ　Ｎｘｅｄ　ｐｏｉｎｔ　ｃｏｍｐＩｌｔｚｔｉｏｎｏｆ　ｐｉ＋を目ｌ　ｃｏｒ＋ｅｌｘｔｉｏｎ　ｃｏｅｆｆｉｃｉｅｌｓ）　Ｊという標題の論文の中に見い出し得る。この計算は、その要素が関係（１）の係数Ｒ，である正方行列を逆にすることになる。For the coefficients, the calculation of LEROtl and G[IEGUEN is applied. This can be done in step 5 by using This algorithm The explanation of the program can be found in the magazine rlEEE Trsatscrioas on Aco ic+, 5peech znd Si(owl ProcessiB June “Fixed point calculation of partial correlation coefficient (A Nxed point calculation)” in 1977 J. t compIltztionof pi+cor+elxtion can be found in the paper entitled Coefficiels) J. This calculation will invert a square matrix whose elements are the coefficients R, of relation (1).

反射係数から予測係数Ａ　への移行は、ステップ８で行われる。この移行は、Ｌｅｖ目ｏ目子ｎアルゴリズムで知られるアルゴリズムも使用し、このＬξマ１ｓｏｎアルゴリズムの説明は、「フィルタの設計と予測におけるＶｉｅ＋＋ｅｒ　ＲＭ５誤差規範（Ｔｈｅ　ＷｉｅｎｅｒＲＭ５　ｅｒｒｏｒ　ｃｒｉｔｅｒｉｏｎ　ｉｎ　ｆｉｌｔｅｒ　ｊｅ＋ｉｇｎ　ｉｎｄ、ｐｒｅｄｉｃｌｉｏｎ）Ｊ（Ｊ　Ｍｘ＋ｈ　Ｐｔ＋ｒｇ、２５９９６１４−６１７　ｆ！９４７））　と標題された論文に見い出される。The transition from the reflection coefficient to the prediction coefficient A takes place in step 8. This transition is An algorithm known as the ev-eye o-eye n algorithm is also used, and this Lξma 1s The explanation of the on algorithm is ``Vie++er in filter design and prediction. RM5 error criterion n in filter je+ign ind, prediclion) J( J Mx+h Pt+rg, 2599614-617 f! 947)) and title found in the published paper.

最後にフィルタのＬＳＦ係数が、２変換子面内における次に説明される２つの多項式Ｐ、Ｑから計算される。ここで次式中の２がこれらの多項式の複素変数であり、Ｐ（Ｚ　）＝Ａ（Ｚ　）−Ｚ　、Ａ（Ｚ）　（２）且つＱ（Ｚ　）＝Ａ（Ｚ　）＋Ｚ　、Ａ（Ｚ）　’（３）１；１工αＩ　１β１ｅ　と　ｅ　が多項式Ｐ、Ｑの根を表す時には、前記ＬＳＰ係数は、定義によってこれらの根の独立変数の周波数且つｇ・　＝β　Ｆ　／２π　（６）１　１　ｅである。Finally, the LSF coefficients of the filter are defined as It is calculated from the terms P and Q. Here, 2 in the following equation is the complex variable of these polynomials. , P (Z) = A (Z) - Z, A (Z) (2) and Q(Z)=A(Z)+Z, A(Z)'(3)1;1 Engineering αI 1β1 When e and e represent the roots of polynomials P and Q, the LSP coefficients are, by definition, The frequencies of the independent variables of these roots and g・=βF/2π (6) 1 1 e It is.

この計算では、Ｆ　が音声信号のサンプリング周波数を表す。In this calculation, F represents the sampling frequency of the audio signal.

周波数１　と　ｇ　は（図示されていない）メモリ内に保存され、上記の計算が２つの連続したフレームのサンプルに対して再び開始される。３つの連続したフレームのパラメータが計算され、係数の３つの組が格納されると、この方法はステップ１３におけるその符号化に移行する。Frequencies 1 and g are stored in memory (not shown) and the above calculations It starts again for two consecutive frames of samples. 3 consecutive frames Once the parameters of the frame have been calculated and the three sets of coefficients have been stored, the method Proceed to its encoding in step 13.

信号の基本周期と有声音化との基本周期の計算は、ステップ９とステップ１０を実行することによって既知の仕方で行われる。To calculate the fundamental period of the signal and the fundamental period of voicing, perform steps 9 and 10. This is done in a known manner by executing:

これらのステップ中に、音声信号が２つの音カテゴリに、即ち有声音及び無声音に分類される。声帯によって発生させられる有声音は、その基本波の周期が、英語では「ピッチ」と呼ばれるパルス列と同等に扱われる。乱流によって生じさせられる無声音は、ホワイトノイズと同等に扱われる。従ってその音声信号が顕著な周期性を呈する時には、この方法は、ステップ１０において且つ各フレーム毎に有声音を認識し、その反対の場合には無声音を認識する。この認識は、有効情報を補強し且つ非有効情報を制限するために、信号の前処理の後で行われる。この前処理は、信号の第１の低域が波を行い、それに続いてエバサジ（ｅｂｓｓｚｇｔ）と第２の低域が波を行うことにある。音声信号の基本周波数が５０〜４００ヘルツの間で変化するが故に、その第１のが波は、例えばその３ｄＢ力ツトオフ周波数が６００ヘルツに固定されることが可能な単純な３次の「バターワース」フィルタによって行われる。その次にエバサジが、音声信号の振幅に従って任意に変化可能な特定の予定閾値よりもそのレベルが低い信号のサンプルをゼロ振幅にする。このエバサジは、後続の処理に有害な部分を減少させると同時に信号の周期の側面を補強することを可能にする。During these steps, the audio signal is divided into two sound categories: voiced and unvoiced. are categorized. The period of the fundamental wave of the voiced sound generated by the vocal cords is In Japanese, it is treated as a pulse train called "pitch." caused by turbulence Unvoiced sounds are treated the same as white noise. Therefore, the audio signal is noticeable. When the periodicity is It recognizes voiced sounds in the opposite case and recognizes unvoiced sounds in the opposite case. This recognition is based on valid information. This is done after pre-processing of the signal in order to enhance the information and limit the non-useful information. child The preprocessing of the first low frequency wave of the signal is followed by ebssz gt) and the second low range is to perform waves. The fundamental frequency of the audio signal is 50-40 Since it varies between 0 Hertz, the first wave is e.g. A simple third-order “Butterworth” ” is done by the filter. Then, the Eversage is set according to the amplitude of the audio signal. Zero-wave samples whose level is lower than a certain predetermined threshold that can be changed at will. Make width. This Evasage reduces the harmful part to subsequent processing and at the same time makes it possible to reinforce the side of the cycle.

最後に第２のが波が、高い周波数を除去することによって、エバサジの結果を滑らかにすることを可能にする。このために、稟１のフィルタと同一のバターワースフィルタが使用可能である。Finally, the second wave smoothes the results of the Eversage by removing the high frequencies. make it possible to make it clear. For this purpose, use the same butterwork as the filter in Rin1. filters are available.

前記ピッチ及び有声音化の計算は、ｔｌｌＤＦ関数（平均量差分関数（ＡｙｅｒＢｅＭＢｎｉｆａｄｅ　０Ｈ（ｅｒｅａｃｅ　Ｆ＊ｎｃｔｉｏｎｌ　）を側層することによって既知の仕方で行われる。これらの計算は、次の５つの段階によって実行される。The pitch and voicing calculations are performed using the tllDF function (average difference function (Ayer)). Layer BeMBnifade 0H (ereace F*nctionl) on the side. This is done in a known manner by These calculations are performed in five steps: is executed.

１、エネルギ値と、モデル化フィルタと、ゼロ振幅を通過する信号の敗走に基づいて有声音化の予備決定を計算すること。1. Based on the energy value, the modeling filter, and the rout of the signal passing through zero amplitude. and calculate preliminary decisions for voicing.

２、前記有声音化の予備決定と、低周波数エネルギと、内部定数とに基づいて有声音化閾値を計算すること。2. Based on the preliminary determination of voicing, low frequency energy, and internal constants. Calculating the vocalization threshold.

３、Ｈの各々の値に関して、次式中でＳ（。）が前処理済みの信号を表す関数：ＡＭＤＦ（ｋｌ＝ｓＵＭｉｓ　−９（２）　（ｎ−ｋ）　ｌ　（８）を計算し、この関数の最大値を計算すること。3. For each value of H, the function in which S(.) represents the preprocessed signal is: AMDF(kl=sUMis　-9 (2) (n-k) l (8) and calculate the maximum value of this function.

４、フレームの有声音化とピッチを推論するために、得られた前記最大値を比較し検討すること。4. Compare the obtained maximum values to infer the voicing and pitch of the frame and consider it.

５、その有声音化において一定の定常性を保持するために、現行フレームの結果に応じて先行フレームの有声音化とピッチを補正すること。5. The result of the current frame in order to maintain a certain stationarity in its voiced Correct the voicing and pitch of the preceding frame accordingly.

ステップ８で行われるエネルギ計算は、４つのサブフレームにわたって行われる。この計算は、１つのサブフレームのプリエンファシスされたサンプル各々のエネルギ合計の、底が２である対数を得ることによって行われる。The energy calculations performed in step 8 are performed over four subframes. . This calculation is performed for each pre-emphasized sample of one subframe. This is done by taking the base 2 logarithm of the energy sum.

各フレーム内のサブフレームは、その「ピッチ」の倍数である長さを有するように、繋ぎ合わされるか重ね合わされる。The subframes within each frame have a length that is a multiple of its "pitch". are joined or superimposed on each other.

フィルタのモデル化の特徴、並びに音声信号のエネルギと有声音化とピッチとの特徴が、３つの連続したフレームにわたって得られた直後に、その方法は、ステップ１３〜１６に従ってそれらの符号化を行う。以下ではフレーム１とフレーム２とフレーム３とによって表される３つのフレームのフィルタの符号化は、フレーム３から開始する２つの段階で行われる。Characteristics of filter modeling and the relationship between the energy, voicing, and pitch of the audio signal Immediately after the features are obtained over three consecutive frames, the method They are encoded according to steps 13-16. Below, frame 1 and frame The filter encoding of the three frames represented by frame 2 and frame 3 is This is done in two stages starting from frame 3.

フレーム３の符号化は、スカシ（ｓｃｓｌｓｒｌタイプである。The encoding of frame 3 is of the scslsrl type.

これは例えば、ＩＥＥＥ　ｉｏｎ＋ｎｘｌ　ｏｎ　５ｅｌｅｃｔｅｄ　ｓｔｘ＋　ｉｎｃｏａｍｏｎｉｃｔｔｉｏ１１＋、　Ｖｏｌ、６．　Ｆｅｂ、　８ｇの中のＮ　ＳｕＨｍｔｒｔとＫＦＡＹＡＲＤＩＮ　（１９Ｈ）によるｒ　ＬＳＰ音声解析における量子化装置案（Ｑａｓｎｔ目ｇｒ　ｄｅｓｉｇ＋＋　ｉｎ　ＬＳＰ　５ｐｅｅｃｈ　ｘｃｘ１７ｓｉ＋）　Ｊと題された論文ニ説明されるような、「後退逐次適応（Ｂ！ｃｋｖｘｒｄ　Ｓｇｑｏｅ＋Ｎ１ｔｌＡｄ＊ｐｊｒｔｉｙｅ）　Ｊアルゴリズムという呼称で知られるアルゴリズムを適用することによって行われる。For example, IEEE ion+nxl on 5elected stx+ incoamonictio11+, Vol, 6. Feb, 8g inside r LSP audio by N SuHmtrt and KFAYARDIN (19H) Quantization device proposal in analysis (Qasnt gr design++ in LSP 5peech xcx17si+) As explained in the paper entitled J. “Backward sequential adaptation (B!ckvxrd Sgqoe+N1tlAd*pjrtiy e) By applying the algorithm known as the J algorithm. will be carried out.

この符号化アルゴリズムは、図２と図３に示される仕方で最後から開始して、ＬＳＰ係数の大きい順に実行される。例えば、１０個のＬＳＰ係数を有する声道モデル化フィルタの場合には、最後の係数ＬＳＰ　（１０）の符号化が、２つの屑波数値Ｆ１０ＭＩＮと’１０ｉ１１ＡＩとの間で線形的に行われ、ｌｌＢ１．ビットにわたって線形的に符号化されたＭＹＹ２Ｏ値にわたって行われる。This encoding algorithm starts from the end and L They are executed in order of increasing SP coefficient. For example, a vocal tract model with 10 LSP coefficients In the case of a delt filter, the encoding of the last coefficient LSP (10) is It is performed linearly between the wave number F10MIN and '10i11AI, and llB1. B is performed over the MYY2O values linearly encoded over the cut.

その他の係数ＬＳＰ　（ｉ）　（ｉ＝９．８．　、　、　、　ｌ）の符号化は、係数ＬＳＰＱ（ｉ＋１）を最大周波数値Ｆ、　ＭＡＷと比較することによって行われする。The encoding of other coefficients LSP (i) (i=9.8., , , l) is as follows: This is done by comparing the coefficient LSPQ(i+1) with the maximum frequency value F, MAW. We Ru.

この場合にＬＳＰＱ（ｉ±１１＞　Ｆ、　ＭＡＸであるならば、この係数の符皇骨化は、Ｎｖ　個の値にわたって従って１１８　ビットにわたって、１皿２つの値Ｆ、　ＭＩＮと　Ｆ、　ＭＡＹとの間で線形的に行われる。In this case, if LSPQ(i±11>F, MAX, then the coefficient of this coefficient Ossification is performed in one dish over Nv values and thus over 118 bits. It is performed linearly between two values F, MIN and F, MAY.

ＬＳＰ（ｉ＋ｔ）　＜　ＦｌｌＡＸ　テあるならば、係＆　ノ符骨化ハ、ＮＹつの値にわたって従って［８ビットにわたって、Ｆ、ＭＩＮとＩＬＳＰＱ（ｉ＋１）　との間で線形的に行われる。LSP(i+t) <FllAX If there is one, then it will be over the value of [over 8 bits, F, MIN and I This is performed linearly with LSPQ(i+1).

フレーム１とフレーム２の符号化の過程で、フレーム１とフレーム２に対応するＬＳＰ係数の値の良好な近似値が、フレーム０（フレーム０＝３個の先行フレームのグループのフレーム３）とフレーム３との間の補間によって得られる。この処理では、フレーム１及びフレーム２は直接的に符号化されず、鱒号化されるのは、それらが可能な限り忠実に量子化されることを可能にさせる補間タイプである。In the process of encoding frame 1 and frame 2, corresponding to frame 1 and frame 2 A good approximation of the values of the LSP coefficients is given by frame 0 (frame 0 = 3 previous frames). is obtained by interpolation between frame 3) and frame 3 of the group of frames. this In processing, frame 1 and frame 2 are not encoded directly, but are encoded. is an interpolation type that allows them to be quantized as faithfully as possible. Ru.

フレームｌ又はフレーム２の奇数番目のＬＳＦ係数の値の各々に関して、符号器が、図４のグラフによって示される３つの補間の中から、フレーム１の値とフレーム２の値の最良の近似値を与えると思われる補完を決定する。For each value of the odd LSF coefficient of frame l or frame 2, the encoder is the value of frame 1 and frame 1 among the three interpolations shown by the graph in Figure 4. Determine the interpolation that appears to give the best approximation of the value of system 2.

３つの使用可能な補間の場合、即ち場合Ｏと場合１と場合２は、フレーム１とフレーム２に関して、次のように図４に関連して定義される係数ＬＳＰＱを与える。（１，ＳＰＱ　（フレーム１）＝（フレームｌのＬＳＰの量子化値）。The three possible interpolation cases, namely case O, case 1 and case 2, are used for frame 1 and frame 1. For frame 2, we give the coefficient LSPQ defined in relation to Fig. 4 as follows: . (1, SPQ (frame 1) = (quantized value of LSP of frame l).

場合０ＬＳＰＱ　（場合０．フレーム１）＝　（２“ＬＳＰＱ　Ｃフレーム０）＋ＬＳＩ’Ｑ　（フレーム３）　’）　／３ＬＳＰＱ　Ｃ場合Ｏ，フレーム２）　＝　（ＬＳＰＱ　ＣフレームＯ）＋２’ＬＳＰＱ　Ｃフレーム３）　）／３場合ＩＬＳＰＱ　（場合１．フレーム１）　＝　（ＬＳＰＱ　ＣフレームＯ）＋２”ＬＳＰＱ　（フレーム３）　）／３ＬＳＰＱ　（場合１．フレーム２）＝ＬＳＰＱ（フレーム３）場合２ＬＳＰＱ　（場合２．フレーム１）＝ＬＳＰＱ（フレーム０）ＬＳＰＱ　（場合２．フレーム２）　＝　（２＄ＬＳＰＱ　（フレーム０）＋ＬＳＰＱ　（フレーム３）　）　／３それに続いてその方法は、前述の３つの補間から、対応する符号値を採用することによって下記のように定義される関数Ｄ　ｌ１ｆＴＥＲによって評価された、量子化エラーを最小化する補間を選択する。case 0 LSPQ (case 0. frame 1) = (2"LSPQ C frame 0) + LS I’Q (frame 3)’) /3LSPQ C case O, frame 2) = (LSPQ C frame O)+2’LSPQ C frame 3))/3 Case I LSPQ (case 1. frame 1) = (LSPQ C frame O) + 2”L SPQ (frame 3))/3 LSPQ (case 1. frame 2) = LSPQ (frame 3) case 2 LSPQ (case 2. frame 1) = LSPQ (frame 0) LSPQ (case 2. Frame 2) = (2$LSPQ (Frame 0) + LSPQ (Frame M3))/3 Subsequently, the method adopts the corresponding sign values from the three interpolations mentioned above. Evaluated by the function D l1fTER defined as below, Choose the interpolation that minimizes quantization error.

レーム１）　）　２＋Ｗ２．　（ＬＳＰＱ　（場合１．　フレーム２）−ＬＳＰ（フレーム２））２前式中でＬＳＰＱ　（場合１．　フレーム１）は、タイプｉの補間によって量子化されたフレームｊの奇数番目のＬＳＰ係数の値であり、ＬＳＰ　（フレームｌ）＝量子化されるべき奇数番目のＬＳＰ係数のフレームｊ内の実数値、Ｗｔ＝フレーム１のエネルギ値、Ｗ２＝フレーム２のエネルギ値である。Rem 1)) 2+W2. (LSPQ (Case 1. Frame 2) - LSP (Frame 2))2 In the previous equation, LSPQ (case 1. frame 1) is is the value of the odd-numbered LSP coefficient of the converted frame j, and LSP (frame l ) = real value in frame j of the odd LSP coefficient to be quantized, Wt=energy value of frame 1, W2 = energy value of frame 2 It is.

従って、３つの場合の各々において５つのコードが得られ、即ち、３５・２４３個の場合が得られる。得られるコーＦ＋よ、コードＬＳＰＩ　＋　３、コードＬＳＰ３　＋　９．　コードＬＳＰ５　＋　２７．コードＬＳＰ７　＋　８１　コードＬＳＰ９に等しい。Therefore, 5 codes are obtained in each of the 3 cases, i.e. 35·243 cases are obtained. The resulting code is F+, code LSPI + 3, code L SP3 + 9. Code LSP5 + 27. Code LSP7 + 81 code LSP9 be equivalent to.

この符号化は８ビット以上を占める。This encoding occupies more than 8 bits.

ピッチと有声音化の符号化は、３つの連続したフレームにわたってステップ１４で行われる。Pitch and voicing encoding is performed in step 14 over three consecutive frames. It will be held in

有声音化の現行のタイプは、フレーム１．２．３の有声音化と、フレーム１．２．３の各グループに先行するフレーム０の有声音化とからの６つの使用可能な場合の中から決定される。The current types of voicedization are voiced in frame 1.2.3 and voiced in frame 1.2. ．． 6 available fields from the voiced voiced frame 0 preceding each group of 3 determined from among the

考えられる可能な場合のタイプは次の通りである。The possible types of cases that can be considered are:

フレーム１　フレーム２　フレーム３タイプ１　無　声　無　声　無　声タイプ２　無　声　無　声　有　声タイプ３　無　声　有　声　有　声タイプ４　有　声　無　声　無　声タイプ５　有　声　有　声　無　声タイプ６　有　声　有　声　有　声図５に示される符号化表は、以下で「Ｎチャート」と呼ばれるその値がピッチに最も近接しているその符号化表の数値を、ピッチの全ての値に関連付けることを可能にする。Frame 1 Frame 2 Frame 3 Type 1 No voice No voice No voice Type 2 No voice No voice Yes voice Type 3 No voice Yes voice Yes voice Type 4 Yes voice No voice No voice Type 5 Yes Voice Yes Voice No voice Type 6 Yes Voice Yes Voice Yes Voice The encoding table shown in Figure 5, whose values are referred to below as the "N-chart", corresponds to the pitch. Associating all values of pitch with the nearest numerical value from that encoding table. enable.

この場合、上記の６つの使用可能な場合のタイプの符号化Ｃヨ次のように行われる。In this case, the above six possible case types of encoding C are done as follows: Ru.

コード０がタイプ１に割り当てられる。フレーム３のピッチの「Ｎチャート」値に等しいコードが、タイプ２に割り当てられる。フレーム３のピッチの「Ｎチャート」値がそれに付加される６４に等しいコードが、タイプ３に割り当てられる。フレーム１のピッチの「Ｎチャート」値がそれに付加される１２ｇに等しいコードが、タイプ４に割り当てられる。フレーム１のピッチの「Ｎチャート」値がそれに付加される１９２に等しいコードが、タイプ５に割り当てられる。タイプ６の符号化が、非常に特殊な仕方で行われる。それは、得られる３つの射影を符号化するために、３つの固有ベクトル（ベクトル１、ベクトル２、ベクトル３）上に、３つのフレームのピッチの３つの値で作られたベクトルを射影することによる。これらの３つのベクトル、即ちベクトル１とベクトル２とベクトル３とは、相互相関行列の３つの第１の固有ベクトルの近似値である。第１の固有ベクトル上への射影がピッチの平均を与えるが故に、フレーム１．２．３のピッチの平均（Ｐｌ＋Ｐ２＋Ｐ３）／３に最も近い「Ｎチャート」値を、第１の射影のためのコードとして直接的に採用することがより一層簡単である。その場合には対応するコードが、符号化表の６３個の値によって符号化される。Code 0 is assigned to type 1. "N chart" value of pitch of frame 3 A code equal to is assigned to type 2. Frame 3 pitch “N cha” A code equal to 64 is assigned to type 3, to which the . A code equal to 12g where the "N chart" value of the pitch of frame 1 is appended to it. code is assigned to type 4. The "N chart" value of the pitch of frame 1 is A code equal to 192 appended thereto is assigned to type 5. type 6 is done in a very specific way. It signifies the three resulting projections. To encode, three eigenvectors (vector 1, vector 2, vector 3) By projecting a vector made by the three values of the pitch of the three frames onto the evening. What are these three vectors, namely vector 1, vector 2 and vector 3? , are approximations of the three first eigenvectors of the cross-correlation matrix. first eigenvector Since the projection onto the pitch gives the average pitch, the average pitch of frame 1.2.3 The "N chart" value closest to the average (Pl+P2+P3)/3 is used for the first projection. It is even easier to adopt it directly as code. In that case, we will respond The code is encoded by the 63 values of the encoding table.

第２の固有ベクトル（ベクトル２）上への射影は、第２の固有ベクトル（ベクトル２）によるフレーム１．２．３のピッチのスカラ積に等しく、東３の固有ベクトル（ベクトル３）上への射影は、第３の固有ベクトル（ベクトル３）によるフレーム１．２．３のスカラ積に等しい。The projection onto the second eigenvector (vector 2) is the projection onto the second eigenvector (vector 2). equal to the scalar product of the pitch of frame 1.2.3 according to rule 2), and the eigenvector of east 3 The projection onto the vector (vector 3) is the projection by the third eigenvector (vector 3). is equal to the scalar product of Rhyme 1.2.3.

その対応するコードが、各々に符号化表の４つだけの値と３つだけの値によって得られることが可能である。whose corresponding codes are respectively by only four values and only three values of the encoding table It is possible to obtain.

ステップ１５で行われるエネルギの符号化は、３つの連続したフレームに対して特許出願ＰＲ２６３１１４６号で説明される既知の仕方で行われる。３つのフレーム各々の４つのサブフレームに対応する４つのエネルギ値が符号化される。しかしこれらの４２個の値における冗長な情報を除去するためには、　ＤａΩａｄによって出版されたＩＩＩＩＩＡＹ　ＳＬＥＭＡＩＲＥ　ＳＰＯ［１ＧＥＴ１Ｔ ε５ＴＩＩによる著書において「データ解析エレメント（Ｅｌｅｍｃ＋＋ｔｓ　ｄ’　ｔｎｚ１７ｓｅｄｅｌ　ｄｏｎｎｅｅ＋）Ｊという標題で説明されているタイプの主成分分析（ｏｎｅ　Ａｎｓｌｙ＋ｅ　ｐｓｔ　ＣｏｍｐｏｓｓＩｌ＋ｅｓ　Ｐｒ１ｏｃｉｐｔｌｅ＋）が行われる。The energy encoding performed in step 15 is performed for three consecutive frames. This is done in a known manner as described in patent application PR2631146. three frames Four energy values corresponding to the four subframes of each frame are encoded. death However, in order to remove redundant information in these 42 values, DaΩad IIIAY SLEMAIRE SPO[1GET1T published by In the book by ε5TII, “Data Analysis Element (Elemc++ts It is explained under the title d’tnz17sedeldonnee+)J Principal component analysis of type (one Ansly+e pst CompossIl+ es Pr1occiptle+) is performed.

符号化は２つの段階によって行われる。第１の段階は基底の変換を行うことにある。３つのフレームの１２個のエネルギ値で構成された大きさ１２個エネルギベクトルが、主成分分析の途上で決定される３つの第１の主軸上に射影される（その情報の９７％以上がこれらの３つの射影に含まれている）。Encoding is done in two stages. The first step consists in performing a base transformation. Ru. 12 energy values of magnitude consisting of 12 energy values of 3 frames The vectors are projected onto the three first principal axes determined during the principal component analysis. over 97% of the information in is contained in these three projections).

第２の段階は、これらの３つの射影を量子化することにある。The second step consists in quantizing these three projections.

第１の射影が４ビツトにわたって量子化され、第２の射影が３ビツトにわたって量子化され、第３の射影が２ビツトにわたって量子化される。The first projection is quantized over 4 bits and the second projection is quantized over 3 bits. The third projection is quantized over two bits.

従ってこうして得られたエネルギ符号化が、４＋３＋２＝９ビツトにわたって定義される。Therefore, the energy encoding obtained in this way is constant over 4+3+2=9 bits. be justified.

ステップ１６で行われるフレーム指定（＋＋ｔｍＢｅ）は、次に示すように分解される５４ビツトの１つの連続ワードを形成するために、次のコード全てを再編成することにある。The frame specification (++tmBe) performed in step 16 is decomposed as shown below. Rearrange all of the following code to form one contiguous word of 54 bits It is about achieving.

１）９ビツトにわたる３つのフレームのエネルギコード、２）１０ビツトにわたる３つのフレームのピッチコード、３）２７ビツトにわたるフレーム３のフィルタコード、４）８ビツトにわたるフレーム１とフレーム２のフィルタコード。1) Energy code of 3 frames over 9 bits, 2) Over 10 bits 3) the fill of frame 3 over 27 bits. 4) Filter code for frame 1 and frame 2 over 8 bits.

即ち合計で９　＋　１０　＋　２７　＋　８　＝　５４ビツトである。That is, the total is 9 + 10 + 27 + 8 = 54 bits.

例えば２２．５ｍｔのフレーム持続時間の場合には、この方法はこうした条件下で、毎秒５４／（３１１，０２２５）　＝　８００ビツトという１秒当たりの２値スルーブツトを得ることを可能にする。For example, in the case of a frame duration of 22.5mt, this method works under these conditions. So, 54/(311,0225) = 800 bits per second Allows you to get value throughput.

その合成、即ち音声信号の復号化は、図６のフローチャートのステップ１７〜２８に従って実行される。即ち一方では、フレーム除去（ステップ１７）と、３つの連続したフレームにわたってのフィルタのＬＳＰ係数値の復号化（ステップ１８）と、ピッチ値の復号化（ステップ１９）と、有声音化とエネルギの値の復号化（ステップ２０）とに関してのステップ１７〜２１に従って、他方では、ステップ１７〜２１の実行時に得られた情報に基づいて３つのフレームの各々に関して連続的な音声信号の合成を行うステップ２２〜２８に従かう。フレーム除去と復号化が、図１の７０−チャートに示される解析時に定義されたフレーム指定と復号化の手続きとは逆の手続きに沿って進む。合成フィルタの整形は、ステップ２３において、４つのサブフレームにわたってのＬＳＰ係数の補間計算と、そのＬＳＰ係数を係数Ａ、に変換するための計！算とを行うことにある。後者の計算に続きステップ２４において、フィルタ励起信号のエネルギの計算がそれに付加される、４つのサブフレームに対する合成フィルタの利得の計算が行われる。The synthesis, that is, the decoding of the audio signal is performed in steps 17 to 2 of the flowchart of FIG. 8. That is, on the one hand, frame removal (step 17); Decoding the LSP coefficient values of the filter over consecutive frames (step 1 8), decoding of pitch value (step 19), decoding of voiced sound and energy value according to steps 17 to 21 with respect to (step 20); For each of the three frames based on the information obtained when executing steps 17 to 21, Steps 22 to 28 are followed for synthesizing continuous audio signals. frame removal and The decoding is performed with the frame specification defined during analysis as shown in the 70-chart of Figure 1. Proceed according to the reverse procedure of the decoding procedure. Shaping the synthesis filter is done in steps. 23, interpolation calculation of LSP coefficients over four subframes and its Calculator for converting LSP coefficients to coefficients A! It consists in doing calculations. Following the latter calculation, in step 24 the filter excitation A composite frame for the four subframes is added to which the calculation of the energy of the signal is added. A calculation of the filter gain is performed.

異なったフィルタの間の突発的な移行を防止するために、ステップ２３においてこれらの移行は、４つのステップ全てにおいて１／４フレーム毎に行われる。その場合に、補間される４つのフィルタは、次の形式の関係を満たさなければならない。In order to prevent sudden transitions between different filters, in step 23 These transitions occur every quarter frame in all four steps. So If , the four interpolated filters must satisfy the relationship of the form do not have.

ＬＳＰ（３３τｒ、　、　ＴｒＮ）＝（ＳＬＰ（Ｔｒｎ−１）　（４−ｉ）＋ＬＳＰ（ＴｔＮ）　ｉ）／４前式中でＬＳＰ　（３３Ｔ「ｉ、Ｔｔ　Ｎ）　は、フレームＮのサブフレームｉ内の補間されたフィルタの値を表す。LSP (33τr, , TrN) = (SLP (Trn-1) (4-i) + L SP(TtN) i)/4 In the previous formula, LSP (33T "i, TtN)" is represents the interpolated filter value in subframe i of frame N.

補間は図７の概略図に従って行われる。The interpolation is performed according to the schematic diagram of FIG.

１２個の復号化されたエネルギがプリエンファシス後の音声信号のエネルギに相当するが故に、励起信号のエネルギを得るためには、フィルタの利得によってそのエネルギを分けることが必要である。The 12 decoded energies are comparable to the energy of the pre-emphasized audio signal. Therefore, in order to obtain the energy of the excitation signal, the gain of the filter is It is necessary to separate the energy of

各サブフレームのフィルタの利得は、次の関係に従って係数Ｋ　を使用して計算される。The gain of the filter for each subframe is calculated using the coefficient K according to the following relationship: be done.

最後に最終のステップは、各サブフレームのエネルギの標準偏差タイプの値を決定することにある（励起の計算時に使用される値）。Finally, the final step is to determine the value of the standard deviation type of energy for each subframe. (value used when calculating excitation).

本発明による符号化と復号化の方法の両方が、名称子Ｍ３３２０Ｃ２５でＴ！ｘｘ＋　Ｉｎ＋ｌｒａｍｅｌｓ社から市販されている信号処理マイクロプロセッサのような信号処理マイクロプロセッサ２９によって、図８に例示される仕方で形成されたマイクロプログラム構造によって行われ得る。この構造に続いて、音声信号が、マイクロプロセッサ２９のデータバス３１に与えられる前に人り変換器３０によって最初にサンプリングされる。自動利得制御装置３３に結合されたアナログフィルタ３２は、そのサンプリングの前に音声信号をろ波する。本発明による方法の実行のために使用されるプログラムとデータは、マイクロプロセッサ２９に接続されている読出し専用メモリ３５とランダムアクセスメモリ３４の中に書き込まれる。インタフェース回路３６が、データ伝送線３７を経由して、そのボコーダの外部の（図示されていない）送信装置にマイクロプロセッサ２９を接続する。Both the encoding and decoding methods according to the invention are T! x x+ Signal processing microprocessor commercially available from In+lramels A signal processing microprocessor 29 such as This can be done by a created microprogram structure. Following this structure, the audio before the signal is applied to the data bus 31 of the microprocessor 29. 30 is initially sampled. an amplifier coupled to automatic gain control device 33; Analog filter 32 filters the audio signal prior to its sampling. To the present invention The programs and data used for the execution of the method are processed by a microprocessor. In the read-only memory 35 and random access memory 34 connected to 29 will be written to. An interface circuit 36 connects it via a data transmission line 37. A microprocessor 29 is connected to a transmitting device (not shown) external to the vocoder. Connecting.

ラウドスピーカ３８と出力増幅器３９とアナログフィルタ４０とで形成される音声受信装置が、ディジタルＡＣ変換器４１を経由してマイクロプロセッサ２９に接続されている。Sound formed by loudspeaker 38, output amplifier 39 and analog filter 40 The voice receiving device is connected to the microprocessor 29 via the digital AC converter 41. It is connected.

、ンｈ・＝−フにｈ”けシ令戸ブｔう要　約音声の低スループット符号化の方法であって、決まった長さのフレームに音声信号を切り分けた後で、声道をモデル化するＮ個のフィルタの特徴、並びに音声信号の基本周期（ピッチ）と有声音化とエネルギとの特徴の全てを計算することにある。, nh・=-funih”keshireitobutu summary A method of low-throughput encoding of audio, in which audio signals are divided into frames of fixed length. After dividing the voice signal, the characteristics of the N filters that model the vocal tract as well as the voice signal are determined. In calculating all the characteristics of the fundamental period (pitch), voicing, and energy of the signal. be.

グメロツク符号化は、一方では前記フィルタに関して行われ、他方ではピッチと有声音化に関して行われる。Group The melody coding is done on the one hand with respect to said filter and on the other hand with respect to the pitch and This is done regarding vocalization.

音声信号のエネルギは、Ｎ（１１のフレームに関して１つのフレーム毎にＰ個の回数にわたって決定され、それに続いて単一のブロックの形に符号化される。The energy of the audio signal is N (P for each frame for 11 frames). is determined over a number of times and subsequently encoded in the form of a single block.

毎秒８００　ビットの低スルーブツトのボコーダへの応用。Application to a low throughput vocoder of 800 bits per second.

補正音の写しく翻訳文）提出書（特許法第１８４条の８）平成４年１０月２７日鴎Copy and Translation of Amended Sound) Submission (Article 184-8 of the Patent Law) October 27, 1992 seagull

Claims

[Claims]

1. A low-throughput encoding method for audio that encodes audio into frames of fixed length. After dividing the signal, the characteristics of N filters that model the vocal tract as well as the voice In order to encode all the characteristics of the fundamental period (pitch), voicing, and energy of the signal, In order to calculate the energy of the audio signal a specific number P times for each frame, By The method, characterized in that:

2. The characteristics of the filter modeling the vocal tract are formed by LSP coefficients. A method according to claim 1, characterized in that:

3. 3. Method according to claim 1 or 2, characterized in that the number N is equal to three.

4. The encoding of the LSP coefficients is performed in the first frame and in the other two frames. 4. The process according to claim 3, characterized in that the process is performed scalarly by interpolation in the system. Method.

5. The scalar encoding of the coefficients of the third frame is called "Backward Sequential Adaptation". dsequentialAdaptive)” algorithm. 5. A method according to claim 4, characterized in that:

6. Encoding by interpolation in the other two frames results in three possible interpolations: This is done by finding the interpolation that exhibits the smallest quantization error. The method according to claim 4 or 5, characterized in that:

7. The fundamental period (pitch) and voicing encoding are performed in three consecutive frames. and the encoding includes at least one unvoiced voice in one frame. When there is a sound, it is possible to do so by direct addressing of the encoding table by its pitch value. and when the sound is voiced over three frames This is done by vector transformation of the “pitch” values that exist over the vector In voice conversion, when the sound is voiced over three frames, three A vector consisting of three values of frame pitch is one cross-correlation matrix are projected onto the three first eigenvectors of , and the three values of these three projections are sign 7. A method according to any one of claims 1 to 6, characterized in that the method is encoded.

8. The energy encoding is performed over four subframes within each frame. 8. A method according to any one of claims 1 to 7, characterized in that:

9. Apparatus for carrying out the method according to any one of claims 1 to 8, comprising: , the device is a read-only memory connected to the signal processing microprocessor 29. Microprogram structure composed of memory 34 and random access memory 35 on the one hand, the microprocessor 29 digitally samples the audio signal; on the other hand, the audio restoration device 38 is connected to the AD converter 31 for converting the sound into Apply the audio sample formed by the microprocessor to excite Connected to a DA converter for conversion into an analog signal, and further connected to an interface circuit 3 A device characterized in that it is connected to an external data transmission line 37 for 6.