JPS617892A

JPS617892A - Word speech recognition method

Info

Publication number: JPS617892A
Application number: JP59128814A
Authority: JP
Inventors: 沢井　秀文; 中川　聖一
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1984-06-22
Filing date: 1984-06-22
Publication date: 1986-01-14

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】辣Ｊ１釈ニ一本発明は、ベクトル量子化を用いた音声認識方式に関す
る。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech recognition method using vector quantization.

災米挟皿単語の標準パターンと未知入カバターンとのパターンマ
ツチングを行なう方法にＤＰマツチング法（動的Ｂ［画
法）と呼ばれる方法がある。これは。There is a method called the DP matching method (dynamic B [picture method]) for pattern matching between the standard pattern of the word "disaster" and the unknown cover pattern. this is.

前記標準パターンの時間軸を非線形に伸縮し、前記未知
入カバターンの時間軸に揃えて両パターンが最も類似す
るようにして即ち前記パターン間の距離を最小にしてパ
ターンマツチングを行なうものである。しかし、上記Ｄ
Ｐマツチング法は、パターンマツチング時、少なくとも
Ｉ　ＸＪ　ＸＮ（Ｉ　：未知入カバターンのフレーム数
、Ｊ：標準パターンのフレーム数、Ｎ：登録単語数）回
の計算量を必要とし、膨大な計算量を必要とする。Pattern matching is performed by non-linearly expanding or contracting the time axis of the standard pattern and aligning it with the time axis of the unknown input pattern so that both patterns are most similar, that is, the distance between the patterns is minimized. However, the above D
The P matching method requires at least I Requires.

目　　　　　的本発明は、上述のごとき従来技術の欠点を解消するため
になされたもので、特に、標準パターンの特徴ベクトル
をベクトル量子化した擬音韻パターンベクトルの頻度分
布パターンと入カバターンの特徴ベクトルをベクトル量
子化した擬音韻パターンベクトルの頻度分布パターンと
の間の形状に基づいて単語音声を認識するようにした音
声認識方式において、前記パターンベクトルにパターン
ベケトル間相互の類似性を付加し、もって、計算量を減
らし、認識速度を向上させることを目的としてなされた
ものである。Purpose The present invention was made in order to eliminate the drawbacks of the prior art as described above, and in particular, the frequency distribution pattern of the onomatopoeic pattern vector, which is obtained by vector quantizing the feature vector of the standard pattern, and the feature vector of the input pattern. In a speech recognition method that recognizes a word sound based on the shape between a vector quantized onomatopoeic pattern vector and a frequency distribution pattern, mutual similarity between pattern vectors is added to the pattern vector, and This was done with the aim of reducing the amount of calculation and improving the recognition speed.

遭−炭本発明の構成について、以下、一実施例に基づいて説明
する。The structure of the present invention will be described below based on one embodiment.

図は１本発明を構成するシステムのブロック図で、図中
、１は音声入力部、２はスペクトル解析部、３はコード
ブック蓄積部、４は未知入力フレームのベクトル量子化
部、５はコードベクトルの使用頻度分布パターン生成部
、６は標準頻度分布パターン蓄積部、７はコードベクト
ル間の類似度テーブル、８は類似度テーブル７によって
変換された頻度分布パターン生成部、９はパターンマツ
チング部、１０は単語同定部、】１は認識結果出力部で
ある。The figure is a block diagram of a system constituting the present invention. In the figure, 1 is an audio input section, 2 is a spectrum analysis section, 3 is a codebook storage section, 4 is an unknown input frame vector quantization section, and 5 is a code 6 is a standard frequency distribution pattern storage unit, 7 is a similarity table between code vectors, 8 is a frequency distribution pattern generator converted by the similarity table 7, 9 is a pattern matching unit , 10 is a word identification section, and ] 1 is a recognition result output section.

標準頻度分布パターン蓄積部６において、まず。First, in the standard frequency distribution pattern storage section 6.

単語ｎの標準パターンをＲｎで表わし、Ｒｎ　＝　ｂ、
　ｎ　ｂ、　ｎ　、、、、、ｂｎ’、、、、、、　ｂｎ
Ｊ　　・Ｊｎ（ｎ＝１．２．・・・・・・、ＮＵＮ：単語数）とする
。ここで、ｂｎは単語ｎの第３番目のフレームの特徴ベ
クトル　）ｎは継続フレーム長である。Let Rn represent the standard pattern of word n, and Rn = b,
n b, n , , , bn', , , , bn
Let J ・Jn (n=1.2..., NUN: number of words). Here, bn is the feature vector of the third frame of word n, and n is the continuous frame length.

次に、前記標準パターンＲｎをコートブック３に含まれ
る擬音銀パターンベクトル（コードベクトルともいう）
Ｃｋ　（ｋ＝１．２．・・・・・、に；に：量子化レベ
ル数）で表わす。即ち、前記ｂ　、ｎ　（、ｉ＝１．．
２．・・・・・、Ｊｎ）の各々を前記コードベクトルＣ
ｋのうち最も近いもので表わす。Next, the standard pattern Rn is converted into an onomatopoeic silver pattern vector (also referred to as a code vector) included in the coat book 3.
It is expressed as Ck (k=1.2..., ni; ni: number of quantization levels). That is, the above b , n (, i=1..
2. ..., Jn) as the code vector C
It is expressed as the closest one among k.

ここで、距離尺度をｄ（ｂ７．ｃｈ）とし、する。Here, let the distance measure be d(b7.ch).

面して、前記り、ｎ（ｊ＝　］、　＋　２　＋・・・・
・Ｊｎ）で表わしだ標準パターンをとする。Facing, above, n(j= ], + 2 +...
・Let Jn) represent the standard pattern.

単語ｎに対して前記コードベクトルＣｋの使用頻度をＹ
ｋ’とし、ベクトル量子化された前記標準パターン良１
を前記Ｙｋｎにより表わしたものを４１準頻度分布パタ
ーンＲ’とすると、のように表現できる。ここで、前記
コードベクトルＣｊ　（ｉ　＝１．２．・・・・・・、
■（）相互間の類似性を反映した類似度テーブル７にｓ
（＋＋ｊＬ（ｉ。Let Y be the usage frequency of the code vector Ck for word n.
k', and the vector quantized standard pattern is
When expressed by the above Ykn as the 41 quasi-frequency distribution pattern R', it can be expressed as follows. Here, the code vector Cj (i = 1.2...,
■() In the similarity table 7 that reflects the similarity between the s
(++jL(i.

ｊ＝１．２．・・・・・、Ｋ）を作成しておく。類似性
の尺度としては前記コードベタ１〜ルＣ」　とＣｊどの
距離ｄ　（Ｃｉ　、　Ｃｊ　）を用いて、ｄ　（Ｃｉ　
、　Ｃ，］）の値が小さければｓ（＋、ｊ）に大きな値
を入れておく。例えば、あるｉの値に列してｄ　（Ｃ」
、Ｃ，１）　＋（ｊ＝］、２．・・・・・、Ｋ）が最小
どなる場合（］＝Ｊのときｄ（Ｃｉ　、Ｃｊ　）＝Ｏと
なる）には、Ｓ（ｉ。j=1.2. ..., K) is created in advance. As a measure of similarity, the distance d (Ci, Cj) between the code patterns 1 to C' and Cj is used, and d (Ci
, C, ]) is small, put a large value in s(+, j). For example, for a certain value of i, d (C''
, C, 1) + (j=], 2..., K) is the minimum (when ]=J, d(Ci, Cj)=O), then S(i.

に対しては、Ｓ（＋１Ｊ）＝５というように前記類似度
テーブルＳ（ｉ、ｊ）の要素を決定する。, the elements of the similarity table S(i,j) are determined such that S(+1J)=5.

次に、前記標＄頻度分布パターンＲ’　＝Ｙ　、　。Next, the target $ frequency distribution pattern R' = Y.

￥−・・’Ｙｊ　Ｙｋを前記類似度テーブルＳ（ｉ、ｊ
）を用いて、のように変換する。変換後の標４！頻度分布パターと表
わせる。Ｒｎのパターンを全ての単語ｎ　（ｎ＝１．２
．・・・・・・、Ｎ）ｔ：ついて予め求めておき、標僧
頻度分布パターン蓄積部６に格納しておく。¥-...'Yj Yk is calculated from the similarity table S(i, j
) to convert as follows. Mark 4 after conversion! It can be expressed as a frequency distribution pattern. The pattern of Rn is all words n (n=1.2
．． . . ., N) t: is determined in advance and stored in the mark frequency distribution pattern storage section 6.

一方、音声入力部１に入力された未知入力音声をスペク
トル解析部２で周波数分析し、未知入・カバターンＴを
得る。Ｔは、Ｔ＝ａ１ａ２・・・・・・ａｉ・・・・・ａＩと表現で
きる。ここで、ａｉｌｔ、第ｊフレームの特徴ベクトル
、■は継続フレーム長である。。On the other hand, the unknown input voice input to the voice input unit 1 is frequency-analyzed by the spectrum analysis unit 2 to obtain an unknown input/cover turn T. T can be expressed as T=a1a2...ai...aI. Here, ailt is the feature vector of the j-th frame, and ■ is the continuous frame length. .

前記未知入カバターンＴもコードブック３によってベタ
１−ル量子化部４においてベクトル量子化し、各フレー
ムａｉごとに最も近いコードベタ（・ルＣｋによって表
わしたパターンを千とする。予は、Ｔ＝ｑＩＱ、　・−８１・・・８丁と表わす。次に１゛を前記コー１〜べ月−ルＣ７，ｋの
使用頻度Ｘｋによって表わしたパターンをパターン生成
部５で求め、このときのパターンをＴ＝Ｘ、Ｘ、−・・
・・・・Ｘｋ・・・・・・ＸＩ＜とする。この１゛も前
記と同様にして類似度テープ／Ｌ／７（７）　Ｓ　ｌ　
、ｊ）　ニヨって変換し、変換したものをパターン変換
部８で求め。こＪしをＴとすると、′１゛は、Ｔ＝Ｘ、
Ｘ２・・・・・・Ｘｌ・・・・・Ｘｋとなる。The unknown input pattern T is also vector quantized by the code book 3 in the solid quantizer 4, and the pattern represented by the nearest code pattern (Ck) for each frame ai is assumed to be 1,000. , . =X,X,-...
...Xk...XI<. Similarity tape/L/7 (7) S l for this 1゛ as well as above.
, j) The pattern conversion section 8 obtains the converted result. If this is T, '1' is T=X,
X2...Xl...Xk.

次に、前記パターン蓄積部６の標準頻度分布バ知入力の
頻度分布パターンＴとのパターンマツチングをパターン
マツチング部９で行なう。即ち、前記標準４パターンＲ
ｎと未知人力パターンＴとの単語間距離をＤ　（Ｒｎ　
、　Ｔ）とし、前記標′＄頻度分布パターン１２．．　
ｎの使用頻度Ｙｋと前記未知人力をｄｆとり、、前ｉａ
　単Ｍ　間距′ＮＩＤ　（Ｒｎ＋　Ｔ　）　ヲ前記入力
フレーム長■と単語ｎのフレーム長Ｊｎの和で正規化し
て、で表わす。Next, a pattern matching section 9 performs pattern matching with the frequency distribution pattern T of the standard frequency distribution information inputted to the pattern storage section 6. That is, the standard four patterns R
The word distance between n and unknown human pattern T is D (Rn
, T), and the target '$ frequency distribution pattern 12. ．．
Taking the frequency of use Yk of n and the unknown human power as df, the previous ia
The distance between single M'NID (Rn+T) is normalized by the sum of the input frame length ■ and the frame length Jn of word n, and is expressed as follows.

前記距ＭＡ［ｄｆとしては、通′＃絶対値距離を用い、ｄｆ（’Ｙｎ　ｋ、　Ｘｋ）　＝　ｌ　Ｙｎ　ｋ−Ｘｋ
　　１とする。As the distance MA [df, the absolute value distance is used, and df ('Yn k, Xk) = l Yn k - Xk
Set to 1.

また、（１）式における距離尺度ｄｆとして、知入力頻
度分布パターンＴの頻度数Ｘｋとの値の差が例えば１／
２〜２倍の範囲であれば、前記距離尺度ｄｆを０とする
。In addition, as the distance measure df in equation (1), the difference in value between the frequency number Xk of the knowledge input frequency distribution pattern T and the frequency number Xk is, for example, 1/
If it is in the range of 2 to 2 times, the distance scale df is set to 0.

なる距離尺度を使用することもできる。但し、α。It is also possible to use a distance measure. However, α.

β、γはパラメータであり、αは、（２）式の分母を０
としないために例えばα＝１とする。βは距離尺度の調
整用パラメータであるが、通常β＝０とし、γはγ＝１
として使用する。β and γ are parameters, and α is the denominator of equation (2).
In order to avoid this, for example, α=1. β is a parameter for adjusting the distance scale, but normally β = 0, and γ is γ = 1
Use as.

上述のごとき距離尺度を使用することにより標準パター
ン並びに未知入カバターンの時間的な非線形伸縮に強い
パターンマツチングを行なうことカテキル。（２）式（
７）　ｄｆ（Ｙ、”　ｋ、　Ｘｋ）　ノ計ｘｔｔ。By using the distance measure described above, pattern matching that is resistant to temporal nonlinear expansion and contraction of standard patterns and unknown input cover patterns can be performed. (2) Equation (
7) df(Y, "k, Xk) total xtt.

予め行なっておき、テーブルに格納しておけば、〜任意のＹ’に、Ｘｋとの組み合せに対する距離ｄｆ（Ｙ
’　ｋ、　Ｘｋ　）は、前記テーブルを引用することに
よって直ちに求められる。If you do this in advance and store it in a table, ~ to any Y', the distance df(Y
'k, Xk) can be immediately determined by quoting the table above.

而して、前記単語間距離Ｄ　（Ｒｎ、Ｔ）を全ての辞書
単語ｎ　（ｎ＝１．２．・・・・・、Ｎ）について計算
し、単語同定部１０において前記Ｄ（Ｒｎ　、Ｔ）の値
が最小となる辞書単語ｎを前記未知入力単語の認識結果
として認識結果出力部１１で出力する。Then, the inter-word distance D (Rn, T) is calculated for all dictionary words n (n=1.2..., N), and the word identification section 10 calculates the inter-word distance D (Rn, T). ) is output by the recognition result output unit 11 as the recognition result of the unknown input word.

すなわちＲは、ｎ　＝　ａｒｇ　ｍｉｎ　Ｄ（Ｒ’　＋　Ｔ）で表わさ
れる。That is, R is expressed as n=arg min D(R'+T).

なお、以上には、類似度テーブルを用いてコードベクト
ル間相互の類似性を導入するようにした実施例について
説明したが、本発明は、上記実施例に限定されたもので
はなく１例えば、前記頻度近い第１候補のコードベクト
ルに対しては例えば頻度数（カウント数）を２．第２候
補のコードベタ１−ルに対しては例えば頻度数（カウン
ト数）を１とし、コードベクトル間の類似性を反映させ
るようにしてもよい。Note that although an embodiment in which mutual similarity between code vectors is introduced using a similarity table has been described above, the present invention is not limited to the above embodiment. For example, for the first candidate code vector with a similar frequency, the frequency number (count number) is set to 2. For example, the frequency number (count number) may be set to 1 for the second candidate code vector to reflect the similarity between the code vectors.

上述のように、本発明によるとＤＰマツチングの際に必
要な計算回数ＩＸＪＸＮに比べて、ＫＸＮ回（Ｋ＜＜　
Ｉ　Ｘ　Ｊ）程度で済むことになり、また、コードベク
トル間相互の類似性を導入してパターンマツチングを行
なうことにより、認識精度の向上を図ることができるの
で、高速でかつ正確な認識が可能となる。As described above, according to the present invention, the number of calculations required for DP matching is KXN times (K<<
I It becomes possible.

助−１以」二の説明から明らかなように、本発明によると、単
語標準パターンおよび未知入カバターンの特徴ベクトル
の頻度分布パターンに基づいてパターンマツチングを行
なう際に、コードベタトル間相互の類似性を導入するよ
うにしたので、認識精度の向上を図ることができ、また
、パターンマツチングに必要な計算量を減少せしめ、高
速かつ正確に未知入力音声単語を認識することができる
。更には、大語霊単語を高速かつ正確に予備選択する手
段に応用することが可能である等の利点がある。As is clear from the explanations given below, according to the present invention, when pattern matching is performed based on the frequency distribution pattern of the feature vectors of word standard patterns and unknown input cover patterns, mutual similarities between code vectors are determined. By introducing this method, it is possible to improve recognition accuracy, reduce the amount of calculation required for pattern matching, and recognize unknown input speech words quickly and accurately. Furthermore, it has the advantage that it can be applied as a means for quickly and accurately preselecting big word meaning words.

[Brief explanation of the drawing]

図は１本発明の一実施例を説明するためのブロック線図
である。１・・音声入力部、２・・・スペクトル解析部、３・・
・コードブック蓄積部、４・・未知入力フレームのベク
トル量子化部、５・・・コードベタ１−ルの使用頻度分
布パターン生成部、６・・・標準頻度分布パターン蓄積
部、７・・・コードベタ１−ル間の類似度テーブル、８
・・・頻度分布パターン変換部、９パタ一ンマツチング
部、１０・・・単語同定部、１１・・認識結果出力部。手続術［ｔｉｌ三書（方式）％式％：１、事件の表示昭和５９年　特許願　第１２８８１．４号２、発明の名
称単語音声認識方式、補正をする者事件との関係　　特許出願人オオタ　り　ナカマゴメ住所　　東京都大田区中馬込１丁目３番６号氏名（名称
）　　　（６７４）株式会社　リ　コ　−代表者　　浜
　　１）　　　広　（ばか１名）、代　理　人住　所　　　　　〒２３１　横浜市中区不老町１−２−
７シヤトレーイン横浜８０７号、補正の対象（１）、明細書の発明の詳細な説明の欄７、補正の内容（１）、明細書第３頁第７行目及び第１１頁第１２行目
に記載の１図は、」を「第１図は、」に補正する。（２）、図に、朱書の通り「第１図」を加入する。８、添付書類上申書　　　　　　　　　１通第　　１１〆［The figure is a block diagram for explaining one embodiment of the present invention. 1...Audio input section, 2...Spectrum analysis section, 3...
・Codebook storage unit, 4: Unknown input frame vector quantization unit, 5: Code pattern usage frequency distribution pattern generation unit, 6: Standard frequency distribution pattern storage unit, 7: Code pattern 1- similarity table between rules, 8
. . . Frequency distribution pattern conversion section, 9 pattern matching section, 10. . . Word identification section, 11. . . Recognition result output section. Procedural technique [til three books (method) % formula %: 1. Display of the case 1982 Patent Application No. 12881.4 2. Name of the invention Word speech recognition method, person making the amendment Relationship with the case Patent applicant Ota ri Nakamagome Address 1-3-6 Nakamagome, Ota-ku, Tokyo Name (674) Ricoh Co., Ltd. - Representative Hama 1) Hiro (1 idiot), Agent Address 231 Naka-ku, Yokohama Furocho 1-2-
7 Shear Train Yokohama No. 807, subject of amendment (1), detailed description of the invention column 7 of the specification, content of amendment (1), page 3, line 7 of the specification, and page 11, line 12 1 in the description is corrected to ``Fig. 1 is''. (2) Add "Figure 1" to the diagram as written in red. 8. Attached documents report form 1st copy No. 11 [

Claims

[Claims]

Frequency distribution patterns of onomatopoeic pattern vectors obtained by vector quantizing feature vectors of standard patterns of words are stored in advance, and feature vectors obtained by spectrum analysis of unknown input word speech are also vector quantized to perform frequency analysis of onomatopoeic pattern vectors. A speech recognition method that performs pattern matching with a frequency distribution pattern of the word standard pattern represented by a pattern, characterized in that mutual similarity between pattern vectors is added to the pattern vector.