JPH01998A

JPH01998A - How to normalize spectrograms

Info

Publication number: JPH01998A
Application number: JP62-156958A
Authority: JP
Inventors: 哲中村; 清宏鹿野
Original assignee: 株式会社　エイ・ティ・ア−ル自動翻訳電話研究所
Filing date: 1987-06-24
Publication date: 1989-01-05
Anticipated expiration: 2013-02-04

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】［産業上の利用分野コこの発明はスペクトログラムの正規化方式に関し、特に
、ベクトル量子化を用いて異話者間のスペクトログラム
の正規化を行ない、不特定話者認識のための話者適応化
や性質変換技術に適用できるようなスペクトログラムの
正規化方式に関する。[Detailed Description of the Invention] [Industrial Field of Application] This invention relates to a spectrogram normalization method, and in particular, it normalizes spectrograms between different speakers using vector quantization, and is useful for speaker-independent recognition. This paper relates to a spectrogram normalization method that can be applied to speaker adaptation and property conversion techniques.

［従来の技術および発明が解決しようとする問題点］自動翻訳電話では、入力として音声が用（Ｓられるが、
その音声は不特定話者の音声であり、このような不特定
話者の音声を的確に認識する必要がある。不特定話者認
識のための１つの手段として、異話者間のスペクトログ
ラムの正規化を行なう方法があるが、従来の異話者間の
スペクトログラムの正規化手段は、主に母音区間の正規
化に関するものであり、決定論的なスペクトル周波数の
変化などの方法しかなかった。[Prior art and problems to be solved by the invention] Automatic translation telephones use voice as input;
The voice is the voice of an unspecified speaker, and it is necessary to accurately recognize such voice of an unspecified speaker. One method for speaker-independent recognition is to normalize spectrograms between speakers of different speakers, but conventional methods of normalizing spectrograms between speakers of different speakers mainly involve normalization of vowel intervals. The only methods available were deterministic spectral frequency changes.

そこで、ベクトル量子化を用いて異話者間のスペクトピ
グラムの正規化を行なう方法が考えられる。ところが、
従来のベクトル量子化では、計算量、メモリの増加を抑
えて認１性能を向上させるべくベクトル量子化に用いる
スペクトル歪み尺度の改良が行なわれてきた。そして、
種々の特徴の組合わせの複合スペクトル歪み尺度が用い
られてきたが、この方法ではスペクトル歪み尺度に多種
の特徴間を混在させ、それらの間の依存関係を拘束条件
として用い、より認識性能の良い空間へ特徴を写像する
ところに意味があった。しかし、この方法では、次のよ
うな問題点があった。Therefore, a method of normalizing spectopigrams between different speakers using vector quantization may be considered. However,
In conventional vector quantization, the spectral distortion measure used in vector quantization has been improved in order to suppress increases in calculation amount and memory and improve recognition performance. and,
Composite spectral distortion measures that combine various features have been used, but this method mixes various types of features in the spectral distortion measures and uses the dependencies between them as constraint conditions to improve recognition performance. The meaning lies in mapping features into space. However, this method has the following problems.

■　各特徴間間の依存関係がベクトル量子化のコードブ
ック内に統計的に妥当性を持つためには、非常に多くの
ラーニングサンプルとこのための膨大な計算時間が必要
である。■ In order for the dependencies between features to have statistical validity in the vector quantization codebook, a large number of learning samples and an enormous amount of calculation time are required.

■　コードブックサイズで見た場合、各特徴に必要なコ
ードブックサイズは特徴間の依存関係を拘束条件にする
ことで減少する。しかし、それでも全体のコードブック
サイズは各特徴に必要なコードブックサイズの積になっ
て、非常に大きくなってしまい、膨大なメモリが必要で
あった。■ In terms of codebook size, the codebook size required for each feature can be reduced by using the dependencies between features as a constraint. However, the overall codebook size is still the product of the codebook sizes required for each feature, resulting in a very large size and requiring a huge amount of memory.

■　複合スペクトル歪み尺度を用いてベクトル量子化の
コードブックを生成した場合、各種の特徴間の相関によ
り、スペクトルの再現能力が低下する。■ When a vector quantization codebook is generated using a composite spectral distortion measure, the ability to reproduce the spectrum decreases due to the correlation between various features.

それゆえに、この発明の主たる目的は、ベクトル量子化
を用いてスペクトルを個人ごとに有限のベクトルで表現
し、その後、異話者間のベクトルの対応を求めることに
より、異話者間のスペクトログラムを正規化し得るスペ
クトログラムの正規化方式を提供することである。Therefore, the main purpose of this invention is to express spectra for each individual with finite vectors using vector quantization, and then to find the correspondence between the vectors between different speakers. An object of the present invention is to provide a normalization method for spectrograms that can be normalized.

［問題点を関東するための手段］この発明は音声をディジタル化し、その音声の特徴とし
てスペクトログラムを抽出し、この抽出されたスペクト
ログラムを異話者間で正規化するスペクトログラム正規
化方式であって、音声をベクトル量子化した後ベクトル
量子化のコードブックについて異話者間で対応づけを行
ない、この対応づけに基づいてスペクトログラムの正規
化を行なうように構成したものである。１［作用］この発明に係るスペクトログラムの正規化方式は、音声
をベクトル量子化した後スペクトログラムを個人ごとに
有限のベクトルで表現し、その後異話者間のベクトルの
対応を求めることにより、コードブックサイズは各特徴
に必要なコードブックサイズの和となるので、全体のコ
ードブックサイズを低減できる。[Means for solving the problem] The present invention is a spectrogram normalization method that digitizes speech, extracts a spectrogram as a feature of the speech, and normalizes the extracted spectrogram between different speakers. After vector quantizing speech, the vector quantization codebook is associated with different speakers, and the spectrogram is normalized based on this association. 1 [Operation] The spectrogram normalization method according to the present invention vector quantizes the speech, expresses the spectrogram with a finite vector for each individual, and then calculates the correspondence between the vectors between different speakers, thereby creating a codebook. Since the size is the sum of the codebook sizes required for each feature, the overall codebook size can be reduced.

［発明の実施例］以下に、図面を参照して、この発明の実施例についてよ
り詳細に説明する。[Embodiments of the Invention] Examples of the invention will be described in more detail below with reference to the drawings.

第１図はこの発明が適用される音声認識装置の概略ブロ
ック図である。FIG. 1 is a schematic block diagram of a speech recognition device to which the present invention is applied.

第１図において、音声認識装置はアンプ１とローパスフ
ィルタ２とＡ／Ｄ変換器３と処理装置４とから構成され
る。アンプ１は入力された音声信号を増幅するものであ
り、ローパスフィルタ２は増幅された音声信号から折返
し雑音を除去するものである。Ａ／Ｄ変換器３は音声信
号を１２ｋＨ２のサンプリング信号により、１６ビツト
のディジタル信号に変換するものである。処理装置５は
コンピュータ５と磁気ディスク６と端末類７とプリンタ
８とを含む。コンピュータ５はＡ／Ｄ変換器３から入力
された音声のディジタル信号に基づいて音声認識を行な
うものである。In FIG. 1, the speech recognition device is composed of an amplifier 1, a low-pass filter 2, an A/D converter 3, and a processing device 4. The amplifier 1 is for amplifying an input audio signal, and the low-pass filter 2 is for removing aliasing noise from the amplified audio signal. The A/D converter 3 converts the audio signal into a 16-bit digital signal using a 12kHz sampling signal. The processing device 5 includes a computer 5, a magnetic disk 6, a terminal 7, and a printer 8. The computer 5 performs voice recognition based on the voice digital signal input from the A/D converter 3.

第２図はこの発明の一実施例の音声の入力から正規化ス
ペクトログラムを出力するまでの全体の流れを示すフロ
ー図である。FIG. 2 is a flow diagram showing the overall flow from audio input to output of a normalized spectrogram in one embodiment of the present invention.

次に、第１図ないし第３図を参照して、この発明の一実
施例の動作について説明する。入力された音声信号はア
ンプ１で増幅され、ローパスフィルタ２によって折返し
雑音が除去された後、第２図に示すステップ（図示では
ＳＰと略称する）ＳＰｌにおいて、Ａ／Ｄ変換器３が入
力された音声信号を１６ビツトのディジタル信号に変換
する。Next, the operation of one embodiment of the present invention will be described with reference to FIGS. 1 to 3. The input audio signal is amplified by the amplifier 1, and after aliasing noise is removed by the low-pass filter 2, the input audio signal is input to the A/D converter 3 in step SPl shown in FIG. 2 (abbreviated as SP in the figure). Converts the audio signal into a 16-bit digital signal.

処理装置４のコンピュータ５はステップＳＰ２において
、ディジタル信号に変換された音声の特徴抽出を行なう
。この特徴抽出では、たとえば線形分析（ＬＰＧ分析）
などの手法を用いて行なわれる。In step SP2, the computer 5 of the processing device 4 extracts features of the audio converted into a digital signal. In this feature extraction, for example, linear analysis (LPG analysis)
This is done using methods such as

ステップＳＰ３において、コードブックの生成であるか
否かが判別され、コードブックの生成であれば、ステッ
プＳＰ４において、抽出された音声の特徴に基づいて、
コードブック生成が行なわれる。このコードブック生成
としては、たとえばＬＢＧアルゴリズムが用いられ、特
徴ごとに生成されて、ステップＳＰ５において、磁気デ
ィスク６のセパレートコードブックに格納される。なお
、ＬＢＧアルゴリズムについては、Ｌｉｎｄｅ、Ｂｕｚ
ｏ、Ｇｒａｙ；　　Ａｎ　　ａｌｇｏｒｉｔｈｍｆｏｒ
　　Ｖｅｃｔｏｒ　　Ｑｕａｎｔｉｚａｔｉｏｎ　　Ｄ
ｅｓＬｇｎ’ＩＥＥＥ　　Ｃ０Ｍ−２８（１９８０−０
１）に詳細に記載されている。In step SP3, it is determined whether or not a codebook is to be generated. If a codebook is to be generated, in step SP4, based on the features of the extracted speech,
Codebook generation is performed. For example, the LBG algorithm is used to generate this codebook, and each feature is generated and stored in a separate codebook on the magnetic disk 6 in step SP5. Regarding the LBG algorithm, Linde and Buz
o, Gray; An algorithm for
Vector Quantization D
esLgn'IEEE C0M-28 (1980-0
1) is described in detail.

量子化を行なうときには、ステップＳＰ３においてコー
ドブックの生成でないことが判別され、前述のステップ
ＳＰ２で求められた音声の特徴が、ステップＳＰ６にお
いて、セパレートコードブックを参照してセパレートベ
クトル量子化される。When performing quantization, it is determined in step SP3 that a codebook is not generated, and the speech features obtained in step SP2 described above are subjected to separate vector quantization with reference to a separate codebook in step SP6.

そして、ステップＳＰ７において、変換ベクトルの学習
であるか否かが判別され、変換ベクトルの学習であれば
、ステップＳＰ８において、セパレートベクトル量子化
により生成された特徴ごとのコード列が標準話者の学習
用標準パターン系列とＤｏｕｂｌｅ　　５ｐｌｉｔによ
るＤＰ（Ｄｙｎａｍｉｃ　　Ｐｒｏｇｒａｍｍｉｎｇ：
動的計画法）マツチングされる。この学習用標準パター
ン系列はステップＳＰ９において予め磁気ディスク６に
登録されている。ステップ５Ｐ１０において、ＤＰマツ
チングの結果のベクトルの対応づけのヒストグラムを用
いて、変換ベクトルが生成される。Then, in step SP7, it is determined whether or not the learning is of a transformation vector. If it is learning of a transformation vector, in step SP8, the code string for each feature generated by separate vector quantization is DP (Dynamic Programming:
dynamic programming) is matched. This learning standard pattern series is registered in advance on the magnetic disk 6 in step SP9. In step 5P10, a transformation vector is generated using a histogram of vector correspondences resulting from DP matching.

この変換ベクトルはステップ５Ｐ１１において、磁気デ
ィスク６に登録される。This conversion vector is registered in the magnetic disk 6 in step 5P11.

前述のステップＳＰ７において、変換ベクトルの学習で
ないことを判別したとき、すなわち正規化であることを
判別したときには、ステップ５Ｐ１２において、セパレ
ートベクトル量子化により生成された特徴ごとのコード
列が、ステップ５Ｐ１１において既に格納されている変
換ベクトルを用いてフレームごとに置換えられ、正規化
スペクトログラムが生成されて出力される。In step SP7, when it is determined that the transformation vector is not learning, that is, when it is determined that normalization is being performed, the code string for each feature generated by separate vector quantization is transferred to step 5P11 in step 5P12. The already stored transformation vectors are used to replace each frame, and a normalized spectrogram is generated and output.

第３図はベクトル量子化を用いたスペクトログラム正規
化の動作を説明するためのフロー図であり、第４図はセ
パレートベクトル量子化の動作を説明するためのフロー
図であり、第５図は変換ベクトル学習のアルゴリズムを
説明するためのフロー図であり、第６図はスペクトログ
ラム正規化のアルゴリズムであり、第７図はマツチング
方式を説明するためのフロー図である。Fig. 3 is a flow diagram for explaining the operation of spectrogram normalization using vector quantization, Fig. 4 is a flow diagram for explaining the operation of separate vector quantization, and Fig. 5 is a flow diagram for explaining the operation of spectrogram normalization using vector quantization. FIG. 6 is a flowchart for explaining a vector learning algorithm, FIG. 6 is a spectrogram normalization algorithm, and FIG. 7 is a flowchart for explaining a matching method.

次に、第３図を参照して、ベクトル量子化を用いたスペ
クトログラム正規化について説明する。Next, spectrogram normalization using vector quantization will be explained with reference to FIG.

この発明におけるベクトル量子化を用いたスペクトログ
ラム正規化は大きく２つの機能から構成されている。１
つは、ステップ５Ｐ２３におけるベクトル量子化である
。このベクトル量子化は、特徴の種類ごとに別々にベク
トル量子化を行なうセパレートベクトル量子化であって
、ステップ５Ｐ２２において、特徴別に別々のコードブ
ックが生成される。Spectrogram normalization using vector quantization in this invention consists of two main functions. 1
The first is vector quantization in step 5P23. This vector quantization is separate vector quantization in which vector quantization is performed separately for each type of feature, and in step 5P22, separate codebooks are generated for each feature.

２つ目は、ステップ５Ｐ２４におけるスペクトルの変換
（正規化）であり、ステップ５Ｐ２４において、学習用
ｑｔ語を未知話者に発声させることにより、ベクトルの
対応づけを行なう。ここでは、全学習用単語について求
めた対応づけのヒストグラムを求め、これを重みとして
未知話者のコードブックの特徴ベクトルを標準話者のコ
ードブックの特徴ベクトルの線形結合で表わし、これを
変換コードブックとしてステップ５Ｐ２５において格納
しておき、正規化時には、入力されたスペクトルを入力
ごとに変換コードブックを用いて置換え、スペクトルの
正規化を行なう。The second step is the conversion (normalization) of the spectrum in step 5P24, and in step 5P24, the vectors are associated by having the unknown speaker speak the qt word for learning. Here, we obtain a histogram of the correspondence obtained for all training words, use this as a weight, express the feature vector of the unknown speaker's codebook as a linear combination of the feature vectors of the standard speaker's codebook, and use this as a conversion code. It is stored as a book in step 5P25, and during normalization, the input spectrum is replaced with the conversion codebook for each input to normalize the spectrum.

ここで、セパレートベクトル量子化について詳細に説明
する。この発明では、音声をパワーとスペクトル情報（
自己相関係数、ＬＰＣケプストラム係数）の２種類の特
徴に分割し、それぞれについて別々にベクトル量子化を
行なう。但し、パワーはスカラーであるため、不均一ス
カラー量子化となっている。第４図において、ステップ
５Ｐ３１において、１６ビツトのディジタル信号に変換
された音声信号に対して、１４次の自己相関分析による
ＬＰＧ分析を行ない、入力音声の特徴であるパワーと自
己相関係数、ＬＰＣケプストラム係数を抽出する。ステ
ップ５Ｐ３２において、パワーのコードブック生成であ
るか否かを判別し、パワーのコードブック生成であれば
、ステップ５Ｐ３３において、入力音声のパワーをスカ
ラー量子化する。スカラー量子化では、不均一量子化の
手法を用いて、ステップ５Ｐ３３においてパワーコード
ブックを生成し、ステップ５Ｐ３４において、生成した
パワーコードブックを磁気ディスク６に格納する。Here, separate vector quantization will be explained in detail. In this invention, audio is divided into power and spectral information (
It is divided into two types of features (autocorrelation coefficient and LPC cepstrum coefficient), and vector quantization is performed separately on each of them. However, since the power is a scalar, it is non-uniform scalar quantization. In FIG. 4, in step 5P31, the audio signal converted to a 16-bit digital signal is subjected to LPG analysis using 14th order autocorrelation analysis, and the power, autocorrelation coefficient, and LPC, which are the characteristics of the input audio, are analyzed. Extract cepstral coefficients. In step 5P32, it is determined whether or not power codebook generation is being performed, and if power codebook generation is being performed, the power of the input voice is scalar quantized in step 5P33. In the scalar quantization, a power codebook is generated in step 5P33 using a non-uniform quantization method, and the generated power codebook is stored on the magnetic disk 6 in step 5P34.

パワーコードブックの生成でないとき、すなわち、量子
化時には、ステップ５Ｐ３４におけるパワーコードブッ
クを用いて、ステップ５Ｐ３５において量子化を行ない
、パワーに関するコード列を出力する。When a power codebook is not being generated, that is, during quantization, the power codebook in step 5P34 is used, quantization is performed in step 5P35, and a code string related to power is output.

一方、ステップ５Ｐ３６において、ＬＰＣ相関係数およ
びＬＰＣケプストラム係数のコードブック生成であるこ
とが判別されると、ステップ５Ｐ３７において、ＬＢＧ
アルゴリズムにより、ＷＬＲＲ度に基づいてコードブッ
クが生成され、ステップ５Ｐ３８におて、生成されたコ
ードブックが磁気ディスク６に格納される。こで、ＷＬ
Ｒ尺度は、音声の特徴を強調する尺度であり、単語音声
の認識において高い性能を示すものであり、村山。On the other hand, if it is determined in step 5P36 that the codebook is to be generated for LPC correlation coefficients and LPC cepstral coefficients, then in step 5P37, LBG
A codebook is generated based on the WLRR degree by the algorithm, and the generated codebook is stored on the magnetic disk 6 in step 5P38. Here, WL
The R scale is a scale that emphasizes the features of speech and shows high performance in word speech recognition, according to Murayama.

鹿野による“ピークに重みをおいたＬＰＧスペクトルマ
ツチング尺度尺度子電子通信学会論文）　Ｊ６４−Ａ５
　（１９８−０５）に記載されている。Shikano, “LPG spectrum matching scale scale with peak weighting” (IEICE paper) J64-A5
(198-05).

なお、ＬＰＧ相関係数およびＬＰＣケプストラム係数の
コードブック生成でないとき、すなわち、量子化時には
入力音声の自己相関係数とＬＰＣケプストラム係数に対
し、ステップＳＰ３ｇにおけるスペクトルコードブック
を用いて、ステップ５Ｐ３９においてベクトル量子化を
行ない、スペクトル情報に関するコード列を出力する。Note that when a codebook of LPG correlation coefficients and LPC cepstrum coefficients is not generated, that is, during quantization, vectors are generated in step 5P39 using the spectral codebook in step SP3g for the autocorrelation coefficients and LPC cepstrum coefficients of input speech. Performs quantization and outputs a code string related to spectral information.

ここで、コードブック生成、量子化に用いたスペクトル
歪み尺度は次のものである。Here, the spectral distortion measure used for codebook generation and quantization is as follows.

ｄ　　　　−Ｐ／Ｐ’　＋　Ｐ’　／Ｐ−２・・・（１
）ｏｗｅｒｄ　　　　−Σ　（Ｃ（ｎ）−Ｃ’　（ｎ））　（Ｒ（
ｎ）−Ｒ’　（ｎ））ｓｐｅｃｔｒｕｓ・・・（２）ここで、ｄ　　　　はパワー項の歪み尺度であり、ｏｗｅｒｄＳｐＯｅｔｒｔＪｌはスペクトル歪み尺度であり、Ｒ
（ｎ）はコードブックのｎ次の自己相関、係数であり、Ｒ’　（ｎ）は人力のｎ次の自己相関係数であり、Ｃ（
ｎ）はコードブックのｎ次のＬＰＣケプストラム係数で
あり、Ｃ’　（ｎ）は人力のｎ次のＬＰＣケプストラム係数で
ある。d −P/P' + P' /P-2...(1
)ower d -Σ (C(n)-C' (n)) (R(
n)-R' (n))spectrus...(2) where d is the distortion measure of the power term, power dSpOetrtJl is the spectral distortion measure, and R
(n) is the n-th autocorrelation coefficient of the codebook, R' (n) is the n-th autocorrelation coefficient of human power, and C(
n) is the n-th order LPC cepstral coefficient of the codebook, and C' (n) is the n-th order LPC cepstral coefficient of the human power.

次に、第５図を参照して、第、３図に示したステップ５
Ｐ２４．ステップ５Ｐ２５におけるスペクトルの正規化
および変換コードブ・ツクの生成について詳細に説明す
る。まず、変換コードブ・ツクを生成するにあたって、
学習用単語を未知話者に発声させる。この入力音声をス
テップ５Ｐ４１において、ステップ５Ｐ４２で既に格納
されているコードブックを用いてセパレートベクトル量
子化する。ステップ５Ｐ４３において、量子化されたコ
ード列は、ステップ５Ｐ４４において既に格納されてい
る標準話者の同一単語の学習用標準／く夕一ンとＤｏｕ
ｂｌｅ　　５ｐｌｉｔ法によりＤ　Ｐ　７　ツチングさ
れ、未知話者と標準話者が発声した同一学習単語でベク
トルの対応づけを求める。そして、すべての学習単語に
ついて対応づけを求め、ヒストグラムの形で格納する。Next, referring to FIG. 5, step 5 shown in FIG.
P24. The spectral normalization and conversion codebook generation in step 5P25 will be described in detail. First, when generating the conversion code book,
Have an unknown speaker say the learning words. This input voice is subjected to separate vector quantization in step 5P41 using the codebook already stored in step 5P42. In step 5P43, the quantized code string is used as the learning standard for the same words of standard speakers already stored in step 5P44.
The vectors are matched using the same learning word uttered by an unknown speaker and a standard speaker using D P 7 tsching using the ble 5plit method. Then, the correspondence is obtained for all the learning words and stored in the form of a histogram.

ステップ５Ｐ４５において、求めたヒストグラムを用い
て、未知話者の特徴ベクトルを、ステップ５Ｐ４６にお
いて格納されている標準話者のコードブックの特徴ベク
トルの対応づけのヒストグラムを重みとした荷重和で表
わす。この荷重和は次の式で表すことができる。In step 5P45, the obtained histogram is used to express the feature vector of the unknown speaker as a weighted sum weighted in step 5P46 with the histogram of the association of feature vectors in the codebook of the standard speaker stored. This weighted sum can be expressed by the following formula.

ａ′（ト）−Σｂ　　（ｋ）Ｅｌｋ／ｉ　ｈｎ　（ｋ）
ｎｋ：標準話者のコードブックのコード番号口：未知話者
のコードブックのコード番号ａ′：未知話者から標準話
者への変換ベクトルｂ　（ｋ）　　：標準話者のコード
ブックの特徴ベクトルｈ　　（ｋ）：ＤＰマツチングによる対応付けで求めら
れた未知話者のコードｎに対する標準話者のコードにのヒストグラムつぎに、ステップ５Ｐ４８において　ａ　／の変換ベク
トルで未知話者のツー１ブ・ツクを入替え、ステップ５
Ｐ４３．５Ｐ４５および５Ｐ４７および５Ｐ４８を繰返
し行なう。この繰返しを一定回数または全学習単語に対
するＤＰ短距離収束するまで繰返し、ステップ５Ｐ４７
において収束したことを判別すると、最終的な未知話者
から標準話者への変換ベクトルが求められる。a' (g) - Σb (k) Elk/i hn (k)
n k: Code number of the standard speaker's codebook Mouth: Code number of the unknown speaker's codebook a': Conversion vector from unknown speaker to standard speaker b (k): Characteristics of the standard speaker's codebook Vector h (k): histogram of the standard speaker's code for the unknown speaker's code n found by the correspondence by DP matching.Next, in step 5P48, the unknown speaker's two 1 b. Swap the Tsuku, step 5
P43. Repeat steps 5P45, 5P47 and 5P48. Repeat this process a certain number of times or until DP short distance convergence for all learning words is reached, step 5P47.
When it is determined that convergence has been achieved in , the final conversion vector from the unknown speaker to the standard speaker is determined.

次に、第６図を参照して、スペクトルの正規化について
説明する。ステップ５Ｐ５１において、未知話者の入力
音声を、コードブックを用いてセパレートベクトル量子
化する。ここで、未知話者のコードブックはステップ５
Ｐ５２において予め格納されている。そして、先程求め
たステップ５Ｐ５４における未知話者から標準話者への
変換ベクトルにより、ステップ５Ｐ５３において未知話
者のコードブックを入替え、フレームワイズにスペクト
ルの入替えを行なって正規化スペクトログラムを出力す
る。。Next, normalization of the spectrum will be explained with reference to FIG. In step 5P51, the input speech of the unknown speaker is subjected to separate vector quantization using the codebook. Here, the unknown speaker's codebook is
It is stored in advance in P52. Then, in step 5P53, the codebook of the unknown speaker is replaced based on the conversion vector from the unknown speaker to the standard speaker obtained in step 5P54, framewise spectrum replacement is performed, and a normalized spectrogram is output. .

次に、第７図を参照して、対応づけを求めるマツチング
動作について説明する。マツチングはＤｏｕｂｌｅ　　
５ｐｌｉｔ法を用いて行なう。ステップ５Ｐ６１におい
て、セパレートベクトル量子化によりパワーとスペクト
ルと別々にベクトル量子化し生成されたコード列と、コ
ード列として格納されている標準パターンとをマツチン
グする。Next, with reference to FIG. 7, a matching operation for determining correspondence will be described. Matching is Double
This is carried out using the 5-plit method. In step 5P61, a code string generated by vector quantizing power and spectrum separately by separate vector quantization is matched with a standard pattern stored as a code string.

標準パターンはステップ５Ｐ６２において、セパレート
ベクトル量子化によりコード化されたパワーおよびスペ
クトルの標準パターンが予め格納されている。そして、
ステップ５Ｐ６１におけるマツチングにおいては、コー
ド間の距離は予めステップ５Ｐ６３において距離マトリ
クスを作成しておき、この表びきを行なうことで求める
。このようにして、順番に標準パターンとマツチングし
て求めた人力音声と標準パターンのベクトルの対応をス
テップ５Ｐ６４におけるヒストグラム生成部に出力する
。そして、ヒストグラム生成部で求められたヒストグラ
ムを重みとして、未知話者の特徴ベクトルを標準話者の
特徴ベクトルの線形結合で表わして変換ベクトルとする
。As the standard pattern, in step 5P62, a standard pattern of power and spectrum coded by separate vector quantization is stored in advance. and,
In the matching in step 5P61, the distance between codes is determined by creating a distance matrix in advance in step 5P63 and performing this table search. In this way, the correspondence between the vector of the human voice and the standard pattern obtained by sequential matching with the standard pattern is output to the histogram generation section in step 5P64. Then, using the histogram obtained by the histogram generation unit as a weight, the unknown speaker's feature vector is expressed as a linear combination of the standard speaker's feature vectors, and is used as a conversion vector.

次に、マツチング方法について詳細に説明する。Next, the matching method will be explained in detail.

従来のマツチングでは、入力も標準パターンも１つの特
徴列あるいはコード列であったが、セパレートベクトル
量子化では、一般に複数のコード列により構成される。In conventional matching, both the input and the standard pattern are one feature string or code string, but in separate vector quantization, they are generally composed of a plurality of code strings.

この発明では、パワーコード列とスペクトルコード列の
２系列のマツチング手法を例に掲げて説明する。パワー
とスペクトルの両方の情報を考えた場合の距離尺度とし
てＰＷＬＲ尺度がある。これは次式で示される。In this invention, a method of matching two sequences, a power code sequence and a spectrum code sequence, will be exemplified and explained. There is a PWLR measure as a distance measure when both power and spectrum information are considered. This is shown by the following equation.

ｄＰＶＬＲ−Σ（Ｃ（ｎ）−Ｃ’　（ｎ））Ｑ？（ｎ）
−Ｒ’　（ｎ））＋　ａ（Ｐ／Ｐ’　＋　Ｐ’　／Ｐ−
２）　　　、　　　−（３）ａ＝　０．０１従来のＤｏｕｂｌｅ　　５ｐｌｉｔ法によるコード列の
マツチングでは、前述のようにすべての空間がベクトル
量子化され、有限個の点で代表されていることを利用し
て、予めすべての代表点間の距離を求めて距離マトリク
スに格納しておく。したがって、ｄ、ＷＬＲ（１，ｊ）　−ｏＬ（Ａ（＋）、Ｂ（ｊ））
ＤＬ（Ａ（１）、Ｂ（ｊ））一Σ（ＣＫ（ｎ）−ｃ、　（ｎ））（ＲＫ（ｎ）−ＲＬ
（ｎ））＋　ａ−（ＰＫ／ＰＬ＋　Ｐ、　／ＰＫ−２）
Ａ　（ｉ）は、入力音声のｉフレーム目のコード番号Ｂ　（ｊ）は、標準パターンのｊフレーム目のコード番
号ＤＬ　（Ｋ、Ｌ）は、コードに、Ｌ間の距離を距離マト
リクスから表びきで求めたものに、　　Ｌは、Ａ　（ｊ）　、　　Ｂ　（ｊ）のコード
番号しかし、セパレートベクトル量子化では、２つの系
列を有するので次のようにして距離を求める。dPVLR-Σ(C(n)-C'(n))Q? (n)
-R'(n))+a(P/P'+P'/P-
2) , -(3) a = 0.01 In matching code strings using the conventional Double 5plit method, as mentioned above, all spaces are vector quantized and represented by a finite number of points. Then, the distances between all representative points are calculated in advance and stored in a distance matrix. Therefore, d, WLR(1,j) −oL(A(+),B(j))
DL(A(1), B(j)) one Σ(CK(n)-c, (n))(RK(n)-RL
(n))+ a-(PK/PL+ P, /PK-2)
A (i) is the code number of the i-th frame of the input audio B (j) is the code number of the j-th frame of the standard pattern DL (K, L) is the code number that represents the distance between L from the distance matrix in the code. L is the code number of A (j) and B (j).However, in separate vector quantization, there are two sequences, so the distance is determined as follows.

ｄ［ｐ］［ＶＬＲ］”Ｊ）＝ＤＬ　　　　（Ａ　　　　（＋）、Ｂ　　　　（ｊ）
）Ｓｐ８ｅＬ　　　５ｐｅｃｔ　　　　５ｐｅｃｔ＋　
ａ’　ＤＬ、ｏ、８．　（Ａｐｏ、、、　（１）、　Ｂ
、、、ｅ、　（ｊ））ここで、ＤＬ　　　　（Ａ　　　　（１）、Ｂ　　　　（ｊ））
ｓｐｅｃｔ　　　５ｐｅｃｔ　　　　５ｐｅｃｔ−Σ（
ＣＫ（ｎ）−Ｃ，（ｎ）＞（ＲＫ（ｎ）−Ｒ，（ｎ））
ＤＬ　　　　（Ａ　　　　（１）、Ｂ　　　　（ｊ））
ｐｏｗｅｒ　　　　ｐｏｗｅｒ　　　　　　　ｐｏｗｅ
ｒ”　Ｐ　Ｋメ／ＰＬｌ＋ＰＬメ／Ｐ　Ｋ、　−２に、
Ｌは、Ａ　　　　（ｉ）、Ｂ　　　　（ｊ）のコード５
ｐｃｃｔ　　　　　　　５ｐｅｃｔ番号に’　、Ｌ’　は、Ａ　　　　（ｉ）、Ｂ　　　　（ｊ
）のｐｏｗｅｒ　　　　　　　　　ｐｏｗｅｒコード番
号である。d[p][VLR]”J) =DL (A (+), B (j)
) Sp8eL 5pect 5pect+
a' DL, o, 8. (Apo, , (1), B
,,,e, (j)) where DL (A (1), B (j))
spectrum 5pect 5pect-Σ(
CK(n)-C,(n)>(RK(n)-R,(n))
DL (A (1), B (j))
power power power
r” PK me/PLl+PL me/PK, -2,
L is code 5 of A (i), B (j)
pcct 5pect ', L' in the numbers are A (i), B (j
) is the power code number.

これは、ＰＷＬＲ尺度の第１項と第２項を別々にコード
化して距離を計算し、和を求めたものである。この局部
距離の尺度を用いて、ＤＰマツチングにより距離を求め
る。This is obtained by separately encoding the first and second terms of the PWLR measure, calculating distances, and finding the sum. Using this local distance measure, the distance is determined by DP matching.

以上のようにして、非常に高性能なベクトル量子化を用
いた正規化方式を達成できる。In the manner described above, a normalization method using vector quantization with very high performance can be achieved.

［発明の効果］以上のように、このは発明によれば、音声をベクトル量
子化した後スペクトログラムを抽出し、ベクトル量子化
のコードブックについて異話者間で対応づけを行ない、
この対応づけに基づいてスペクトログラムの正規化を行
なうようにしたので、各特徴の依存項を無視でき、ラー
ニングサンプルを少なくてすみ、計算量が減少する。た
だし、セパレートすることにより、別のベクトル量子化
系を構成するので、この分計算量が多少増加するが、ラ
ーニングサンプルが少ないので十分計算量を減少できる
。コードブックサイズはセパレートベクトル量子化では
、各特徴に必要なコードブックサイズの和になるので、
全体のコードブックサイズを激減させることができる。[Effects of the Invention] As described above, according to the present invention, a spectrogram is extracted after vector quantization of speech, and correspondence is made between different speakers using the vector quantization codebook.
Since the spectrogram is normalized based on this correspondence, the dependent term of each feature can be ignored, the number of learning samples can be reduced, and the amount of calculation can be reduced. However, by separating, a separate vector quantization system is constructed, which slightly increases the amount of calculation, but since there are few learning samples, the amount of calculation can be sufficiently reduced. In separate vector quantization, the codebook size is the sum of the codebook sizes required for each feature, so
The overall codebook size can be drastically reduced.

しかも、各特徴の依存項は無視するので、コードブック
の特徴内で最適な量子化をすることができ、このために
忠実にスペクトログラムを再現できる。Moreover, since the dependent terms of each feature are ignored, optimal quantization can be performed within the features of the codebook, and therefore the spectrogram can be faithfully reproduced.

[Brief explanation of the drawing]

第１図はこの発明の一実施例が適用される音声認識装置
の概略ブロック図である。第２図は音声の入力から正規
化までの全体の処理の流れを示すフロー図である。第３
図はベクトル量子化を用いたスペクトログラム正規化の
動作を説明するためのフロー図である。第４図はセパレ
ートベクトル量子化の動作を説明するためのフロー図で
ある。第５図は変換ベクトル学習のアルゴリズムを説明するた
めのフロー図である。第６図はスペクトログラム正規化
のアルゴリズムを示すフロー図である。第７図はマツチ
ング動作を説明するためのフロー図である。図において、１はアンプ、１はローパスフィルタ、３は
Ａ／Ｄ変換器、４は処理装置、５はコンピュータを示す
。特許出願人　エイ・ティ・アール自動翻訳電話研究所FIG. 1 is a schematic block diagram of a speech recognition device to which an embodiment of the present invention is applied. FIG. 2 is a flow diagram showing the overall processing flow from inputting audio to normalization. Third
The figure is a flow diagram for explaining the operation of spectrogram normalization using vector quantization. FIG. 4 is a flow diagram for explaining the operation of separate vector quantization. FIG. 5 is a flow diagram for explaining the transformation vector learning algorithm. FIG. 6 is a flow diagram showing an algorithm for spectrogram normalization. FIG. 7 is a flow diagram for explaining the matching operation. In the figure, 1 is an amplifier, 1 is a low-pass filter, 3 is an A/D converter, 4 is a processing device, and 5 is a computer. Patent applicant A.T.R. Automatic Translation Telephone Research Institute

Claims

[Claims]

(1) In the spectrogram normalization method, which digitizes speech, extracts a spectrogram as a feature of the speech, and normalizes the extracted spectrogram between speakers of different speakers, the speech is vector quantized, and then vector quantization is performed. A spectrogram normalization method that maps codebooks between different speakers and normalizes spectrograms based on this mapping.

(2) The scope of the claim is that, as the method for making the correspondence between the different speakers, the correspondence between the vectors of the codebooks of the different speakers is obtained by learning certain learning words, and normalization is performed based on this. The spectrogram normalization method described in Section 1.

(3) By creating a matching histogram using dynamic programming during the learning, and rewriting the unknown speaker's feature vector with a linear combination of the reference speaker's feature vectors using this as a weight, The spectrogram normalization method according to claim 2, wherein the spectrogram is normalized.

(4) When matching the codebooks between the different speakers,
Claim 3, wherein the learning of correspondence is performed by constraining the coding path using the sum of inter-code distances of various features as the local distance of matching by dynamic programming. Spectrogram normalization method.

(5) Performing separate vector quantization using two types of voice characteristics: power and autocorrelation coefficient;
By learning a certain number of learning words, a histogram of the correspondence is created, and normalization is performed by replacing the feature vectors of each codebook of the unknown speaker with a linear combination of the feature vectors of the reference speaker with the histogram as a weight. A spectrogram normalization method according to claim 3, wherein the spectrogram normalization method is performed.