JPH0247759B2

JPH0247759B2 -

Info

Publication number: JPH0247759B2
Application number: JP57110530A
Authority: JP
Inventors: Seiichi Nakagawa
Original assignee: Individual
Current assignee: Individual
Priority date: 1982-06-25
Filing date: 1982-06-25
Publication date: 1990-10-22
Also published as: JPS592094A

Description

[Detailed description of the invention]

本発明は、登録された複数種類のパターンと入
力パターンとの比較を行い、入力パターンの識別
を行うパターン比較装置、特に連続して発声した
単語音声の認識などに適用可能なパターン比較装
置に関する。人間にとつて最も自然な情報発生手段である音
声が、人間−機械系の入力手段として真価が発揮
されるためには、話者を限定せず連続的な通常の
会話音声の認識が可能なことが望ましい。このう
ち第１図は単語単位を認識単位とする音声認識装
置のブロツク図である。１は音声信号の入力端
子、２は入力音声信号を周波数分析、LPC分析、
PARCOR分析、相関分析等により幾つかの数値
の組（特徴ベクトル）の系列に変換する音響分析
部３は認識すべき単語が前記特徴ベクトルの系列
として登録されている標準パターン記憶部、４は
音響分析部２で分析された認識すべき入力音声信
号に対する前記特徴ベクトルの系列と前記標準パ
ターンのそれぞれとを比較し、両者の距離あるい
は類似度を計算するパターンマツチング部、５は
パターンマツチング部４の計算結果に基づいて前
記入力音声パターンに最も近い標準パターンに対
応する単語を認識結果として判定する判定部であ
り、６はこの認識結果を出力する出力端子であ
る。このような構成による音声認識装置におい
て、パターンマツチングの方法として、動的計画
法による時間軸非線形伸縮によりマツチング
（DPマツチング）を行う方法が優れている。本発明装置による連続単語認識において、この
DPマツチングは中心的な役割を演ずる。次にDP
マツチングのアルゴリズムについて簡単に説明す
る。いま、Ａ＝a₁，a₂…a_I…A〓Ｂ＝b₁，b₂…b_j…b_J ……(1) を二つの音声パターンとする。すなわち、それら
の音声パターンは、それぞれに対する特徴ベクト
ルa_i，b_jの系列で表わされる。ベクトルa_iとb_jの距離をｄ（ｉ，ｊ）とすると
き、前記両系列を構成するベクトルの種々の対応
づけに対し、ｄ（ｉ，ｊ）の荷重平均を求め、そ
れが最小になる対応づけを両系列間の最適な対応
づけとし、そのときの荷重平均を両系列間の距離
Ｄ（Ａ，Ｂ）とするのであるが、この手続を動的
計画法を用いて効率よく行うのが、DPマツチン
グである。なお、ｄ（ｉ，ｊ）は通常ベクトルa_i
とb_jのユークリツド距離または市街距離が用いら
れる。第２図はこれを二次元的に図示したもので、
Ａ、Ｂ両パターンの時間の対応すなわち時間変換
亟数ｊ(i)は、ｉ−ｊ平面上の格子点ｃ（ｋ）＝（ｉ
（ｋ），ｊ（ｋ））のの系列Ｆ＝ｃ(1)，ｃ(2)…ｃ（ｋ）…ｃ（Ｋ） ……(2) （ｉ（Ｋ）＝Ｉ，ｉ（ｋ）＝Ｊ）で表わされる。このとき、Ｄ（Ａ，Ｂ）は次のよ
うに定義される。ここに、Ｗ（ｋ）は非負の定数で、その値は時
間変換凾数ｊ(i)を点列で近似するときの方式によ
つて定められる。ここで、式(3)の分母をＦに依存
しない定数Ｍ＝_K 〓^l=1 Ｗ（ｋ）とすれば、Ｄ（Ａ，Ｂ）
は動的計画法により効率的に求められる。すなわ
ち、ｇ（ｃ（ｋ））＝min ｃ（１）ｃ（２）…ｃ（ｋ）〔_K 〓^l=1 ｄ（ｃ（ｌ）Ｗ（ｌ）〕＝min ｃ（ｋ）〔min ｃ（１）ｃ（２）…ｃ（ｋ−１）〔_K 〓^l=1 ｄ（ｃ（ｌ））Ｗ（ｌ）〕＋ｄ（ｃ（ｋ））Ｗ（ｋ）〕＝min ＋ｄ（ｃ（ｋ））Ｗ（ｋ）〕＝min （CK）〔ｇ（ｃ（ｋ−１））＋ｄ（ｃ（ｋ））Ｗ（ｋ）
〕……(4) であるから、ｇ（ｃ(1)＝ｇ（１，１）＝ｄ（１，１）
として、漸化式(4)を解き、ｇ（ｃ（ｋ））＝ｇ（Ｉ，
Ｊ）が求められればＤ（Ａ，Ｂ）＝１／Ｍｇ（Ｉ，Ｊ） ……(5) としてＤ（Ａ，Ｂ）が求められる。式(3)の分母を定数化する方法として、Ｍ＝Ｉ＋
Ｊとなるようにする方法（対称型）と、Ｍ＝Ｉま
たはＪとなるようにする方法（非対称型）があ
る。第３図ａ〜ｆは点列Ｆを選ぶ際の拘束条件の
例を示しており、点（ｉ，ｊ）に至る径路は図の
矢線で示される径路のみとり得る。また、各線分
上に示された数字はその線分が径路として選ばれ
た場合の荷重Ｗ（ｋ）を示している。ａ、ｂは前
記対称型の例でＭ＝Ｉ＋Ｊとなり、ｃ〜ｆは前記
非対称型の例でＭ＝Ｉとなる。このようなマツチング法を用いて単語音声の認
識をするには次のようにする。認識の対象となつ
ている単語クラスをｎ（ｎ＝１〜Ｎ）、その標準パ
ターンをBⁿで表す。入力Ａと各標準パターンBⁿ
との距離D_o＝Ｄ（Ａ，Bⁿ）を上記の方法で計算
し、D_o0＝^min _o（D_o）を与えるクラスn₀をＡに対す
る認識結果とする。前記非対称型のDPマツチングでＭ＝Ｉとなる
ようにすれば、Ｍは入力パターン長にのみ関係す
る量となり、式(5)において何れの標準パターンに
対してもＭは一定であるから、Ｄ（Ａ，Ｂ）＝ｇ（Ｉ，Ｊ）＝min〔_I 〓ⁱ⁼¹ ｄ（ｉ，ｊ）〕 ……(6) と定義できる。以後、パターン間の距離は式(6)に
よるものとする。第３図ｃの拘束条件のもとに式
(6)を求める場合には次の漸化式(7)を計算すればよ
い。ｇ（ｉ，ｊ）＝ｄ（ｉ，ｊ）＋minｇ（ｉ−
１，ｊ）〔ｇ（ｉ−１，ｊ−１）ｇ（ｉ−１，ｊ−２）〕 ……(7) 初期条件ｇ（１，１）＝ｄ（１，１）次に連続単語音声の認識について説明する。連
続単語音声認識は次のように定式化できる。い
ま、Ｘ個の単語ｑ(1)，ｑ(2)，…ｑ（ｘ）を連続し
て発声したときの音声パターンをＡで表わす。Ａ＝a₁，a₂…ai…a₁ ……(8) 単語ｑ（ｘ）の標準パターンを B_q(x)＝b₁ ^q(x)b₂ ^q(x)…b^q(x) _j…b^q(x) _Jq(x) ……(9) とするとき、Ｘ個の単語B_q(1)，B_q(2)，…B_q(x)を接
続して得らたる標準パターンは＝B_q(1)B_q(2)…B_q(x) ＝b₁ ^q(1)b₂ ^q(1)…b^q(1) _Jq(1)，b₁ ^q(2)b₂ ^q(2)…b^q(2) _jq(
2)…b₁ ^q(x)
b₂ ^q(x)b^q(x) _jq(x) ……(10) で表わされる。ここではパターンの接続を表わ
す。そこで連続単語音声認識は、このと入力音声
パターンＡとの間でDPマツチングを実行し、そ
の際得られるＤ（Ａ，）が最小になるように、
Ｘとｑ（ｘ）（ｘ＝１，２，…，ｘ）を決めるとい
う問題になる。すなわち、Ｔ＝^min _X,q(x)〔Ｄ（Ａ，B_q(1)B_q(2)…B_q(x)〕
…(11) を計算し、Ｔが最小になる条件を求めればよい。
式(11)の計算をまともに実行しようとすると、膨大
な計算量が必要となる。すなわち、入力音声パタ
−ンにおいて連続発声の単語数の最大値をｋ、単
語標準パターンの数をＮとすれば、N^k回の計算
を実行することになる。そこで、実際にはこの問
題を次の漸化式を解く問題に帰着させている。入力音声パタ−ンＡにおいて、ｉ＝ｌ＋１から
ｉ＝ｍまでの部分区間を、部分パターンＡ（ｌ，
ｍ）で定義する。Ａ（ｌ，ｍ）＝a_l+1a_l+2…a_n ……(12) このとき、式(6)によりパターン間の距離を定義
すれば次のことが言える。Ｄ（Ａ，B₁B₂）＝min ｍ〔Ｄ（Ａ（ｏ，ｍ），B₁）＋Ｄ（Ａ（ｍ，Ｉ），B₂）〕 ……（13）このことを用いれば式(11)は次のように解ける。ここで以後用いる記号の意味を第１表にまとめ
て示す。 The present invention relates to a pattern comparison device that compares a plurality of registered patterns with an input pattern and identifies the input pattern, and particularly relates to a pattern comparison device that is applicable to recognition of continuously uttered word sounds. In order for voice, which is the most natural means of information generation for humans, to demonstrate its true value as an input means for human-machine systems, it is necessary to be able to recognize continuous normal conversational speech without limiting the speaker. This is desirable. FIG. 1 is a block diagram of a speech recognition apparatus that uses words as recognition units. 1 is the audio signal input terminal, 2 is the input audio signal for frequency analysis, LPC analysis,
An acoustic analysis unit 3 that converts into a series of several sets of numerical values (feature vectors) by PARCOR analysis, correlation analysis, etc. is a standard pattern storage unit in which words to be recognized are registered as a series of feature vectors; a pattern matching unit that compares the series of feature vectors for the input audio signal to be recognized analyzed by the analysis unit 2 with each of the standard patterns and calculates the distance or similarity between the two; 5 is a pattern matching unit; 4 is a determination unit that determines a word corresponding to a standard pattern closest to the input speech pattern as a recognition result based on the calculation result, and 6 is an output terminal that outputs this recognition result. In a speech recognition device having such a configuration, an excellent pattern matching method is a method of performing matching (DP matching) using time-based nonlinear expansion/contraction using dynamic programming. In continuous word recognition using the device of the present invention, this
DP matching plays a central role. Then DP
The matching algorithm will be briefly explained. Now, assume that A = a ₁ , a ₂ ... a _I ... A〓 B = b ₁ , b ₂ ... b _j ... b _J ... (1) as two speech patterns. That is, those speech patterns are represented by a series of feature vectors a _i and b _j for each. When the distance between vectors a _i and b _j is d(i, j), calculate the weighted average of d(i, j) for the various correspondences of the vectors that make up both series, and find the weighted average of d(i, j) that is the minimum The correspondence between the two series is set as the optimal correspondence between the two series, and the weighted average at that time is set as the distance D (A, B) between the two series. This procedure is efficiently performed using dynamic programming. This is DP matching. Note that d(i,j) is usually a vector a _i
The Euclidean distance or city distance of and b _j is used. Figure 2 shows this two-dimensionally.
The time correspondence between patterns A and B, that is, the time conversion coefficient j(i), is given by the grid point c(k) = (i
(k), j(k)) series F=c(1),c(2)...c(k)...c(K)...(2) (i(K)=I,i(k) =J). At this time, D(A, B) is defined as follows. Here, W(k) is a non-negative constant, and its value is determined by the method used to approximate the time conversion function j(i) by a point sequence. Here, if the denominator of equation (3) is a constant M= _K 〓 ^l=1 W(k) that does not depend on F, then D(A, B)
can be efficiently determined using dynamic programming. That is, g(c(k))=min c(1)c(2)...c(k)[ _K 〓 ^l=1 d(c(l)W(l)]=min c(k)[min c (1) c(2)...c(k-1) [ _K 〓 ^l=1 d(c(l))W(l)] +d(c(k))W(k)]=min +d(c( k))W(k)]=min (CK)[g(c(k-1))+d(c(k))W(k)
]...(4), so g(c(1)=g(1,1)=d(1,1)
, solve recurrence equation (4) and get g(c(k))=g(I,
J) is obtained, then D(A, B) is obtained as D(A, B)=1/Mg(I, J)...(5). As a method of constantizing the denominator of equation (3), M=I+
There are two methods: one method is to make sure that J (symmetric type), and the other is to make sure that M=I or J (asymmetric type). FIGS. 3a to 3f show examples of constraint conditions when selecting the point sequence F, and only the route indicated by the arrow in the figure can be taken to reach the point (i, j). Further, the number shown on each line segment indicates the load W(k) when that line segment is selected as the route. For a and b, M=I+J in the symmetrical example, and M=I for c to f in the asymmetrical example. To recognize word sounds using such a matching method, proceed as follows. The word class to be recognized is represented by n (n=1 to N), and its standard pattern is represented by B ⁿ . Input A and each standard pattern B ⁿ
The distance D _o = D (A, B ⁿ ) from the above is calculated using the above method, and the class n ₀ that gives D _o0 = ^min _o (D _o ) is taken as the recognition result for A. If M=I in the asymmetric DP matching, M becomes a quantity that is related only to the input pattern length, and M is constant for any standard pattern in equation (5), so D It can be defined as (A, B)=g(I,J)=min[ _I 〓 ⁱ⁼¹ d(i,j)]...(6). Hereinafter, the distance between patterns will be based on equation (6). Under the constraint conditions shown in Figure 3 c, the formula
To obtain (6), the following recurrence formula (7) can be calculated. g(i,j)=d(i,j)+ming(i-
1, j) [g (i-1, j-1) g (i-1, j-2)] ...(7) Initial condition g (1, 1) = d (1, 1) Next, continuous words Explain speech recognition. Continuous word speech recognition can be formulated as follows. Let A represent the speech pattern when X words q(1), q(2),...q(x) are uttered consecutively. A=a ₁ , a ₂ …ai…a ₁ …(8) The standard pattern of word q(x) is B _q(x) =b ₁ ^q(x) b ₂ ^q(x) …b ^q(x) When _j …b ^q(x) _Jq(x) …(9), the standard obtained by connecting X words B _q(1) , B _q(2) , …B _q(x) The pattern is =B _q(1) B _q(2) …B _q(x) =b ₁ ^q(1) b ₂ ^q(1) …b ^q(1) _Jq(1) ,b ₁ ^q(2) b ₂ ^q(2) …b ^q(2) _{jq(
2)} …b ₁ ^q(x)
b ₂ ^q(x) b ^q(x) _jq(x) ...(10) This shows the connection of patterns. Therefore, in continuous word speech recognition, DP matching is performed between this and the input speech pattern A, so that the obtained D(A,) is minimized.
The problem is to determine X and q(x) (x=1, 2, ..., x). That is, T= ^min _X,q(x) [D(A,B _q(1) B _q(2) …B _q(x) ]
…(11) and find the conditions that minimize T.
If we attempt to properly perform the calculation of equation (11), a huge amount of calculation will be required. That is, if the maximum number of consecutively uttered words in the input speech pattern is k and the number of word standard patterns is N, calculations will be performed N ^k times. Therefore, in reality, this problem is reduced to the problem of solving the following recurrence formula. In the input speech pattern A, the partial interval from i=l+1 to i=m is converted into the partial pattern A(l,
Defined in m). A(l,m)=a _l+1 a _l+2 ...a _n ...(12) At this time, if the distance between the patterns is defined by equation (6), the following can be said. D (A, B ₁ B ₂ ) = min m [D (A (o, m) , B ₁ ) + D (A (m, I), B ₂ )] ... (13) Using this, the formula ( 11) can be solved as follows. The meanings of the symbols used hereinafter are summarized in Table 1.

【表】入力単語数Ｘが既知の場合 D_x(i)＝min ｎ，ｍ〔D_x-1（ｍ）＋D₀ ⁿ（ｍ＋１：ｉ）〕 ……（14） N_x(i)＝ｎ，B_x(i)＝ｍ（n^，m^は式（14）を満たすｎとｍ）なる漸化式の解を求めれば、認識結果は第４図に
示すフローチヤートにより、Ｘ単語列の最後尾単
語名とセグメンテーシヨン結果から先頭単語名と
セグメンテーシヨン結果まで順次求まる。入力単語数Ｘが末知の場合Ｄ(i)＝min ｎ，ｍ，ｘ〔D_x（ｍ）＋D₀ ⁿ（ｍ＋１：ｉ）〕＝min〔Ｄ（ｍ）＋D₀ ⁿ（ｍ＋１：ｉ）〕
……（15）Ｎ(i)＝ｎ，Ｂ(i)＝ｍ（ｎ，ｍは式（15）を満さすｎとｍ）なる漸化式の解から第５図のフローチヤートによ
り認識結果が求まる。式（14），式（15）の計算において、問題とする
ところは (イ) 計算量が少いこと。 (ロ) 必要とする記憶容量がなるべく少いこと。 (ハ) 実時間向きアルゴリズムであること。である。(イ)に関し、解を求めるための主な計算
は、主にDⁿ ₀（ｓ：ｔ）とこれを求めるために必要
なdⁿ（ｉ，ｊ）である。特に、各レームは、通常
10次元以上のパラメータで表現されるものであ
り、この計算量をいかに減らすかが問題となる。次に、この計算方法として従来行われている２
段DP法について説明する。２段DP法は、先ずDⁿ ₀（ｓ：ｔ）をあらゆるｓ，
ｔの組合せに対してDPで求めておき、その後Ｄ
(i)をDPで求める方法で、DPを２段にしているの
が特徴である。この２段DP法としては前向きア
ルゴリズムと後向きアルゴリズムが提案されてい
るが、ここでは後向きアルゴリズムについて説明
する。入力パターンのフレームｉ−１に対して、Ｄ
（ｉ−１），Ｎ（ｉ−１），Ｂ（ｉ−１）は求まつ
ているとする。単語ｎ（ｎ＝１，２，…，Ｎ）の標準パター
ンと入力パターンを、i₀を始点として逆時間向
きにDPマツチングする。従つて、径路の拘束
条件は第３図ｃ、ｄ、ｅ、ｆに対応して、第７
図ａ、ｂ、ｃ、ｄとなる。マツチング範囲は、
整合窓幅Ｒで行うことも考えられるが、ここで
は傾き1/2〜２の範囲（傾斜制限内、第６図の
斜線部）で行うものとする。このマツチングを
終端フリーとして行う、その結果、Dⁿ ₀（ｓ：
ｉ）が求まる。ただし、ｉ−2Jⁿ＋１ｓｉ
−（1/2）Jⁿである。式（15）のＤ(i)，Ｎ(i)，Ｂ(i)を求める。ｉ＝ｉ＋１としてへもどる。この方法は、入力フレーム毎に各単語につきマ
ツチング範囲内で、フレーム間距離ｄと累積距離
Ｄを計算する必要がある。このため全体の計算回
数はフレーム間距離ｄ、累積距離Ｄ共にＮ・Ｉ・
３／４J²となる（整合窓幅ＲのときはＮ・Ｉ・Ｊ・
Ｒ）。以上が２段DPマツチング法であるが、この方
法の欠点はまだ計算回数が多いという点である。
その理由は各入力フレーム毎に各単語について3/
４J²回、ｄとＤを計算するためである。一方、最
も安易な計算回数の低減化方法は、一度計算した
ｄの値を必要がなくなるまですべて保存しておく
方法である。しかし、この方法であるとｄの計算
量はＮ・Ｉ・Ｊとなるが、この計算のために必要
な記憶量はかなり大きいものになる。しかもこの
方法であると、ｄの計算結果の記憶アドレスを入
力フレーム毎に変更する必要がある。本発明は以上の欠点を除去し、入力パターンと
標準パターンを比較する際に必要な計算量を、記
憶量をそれほど増すことなく大幅に減少し、計算
速度の速いパターン比較装置を提供することを目
的とする。この目的を達成するために本発明は、一定時間
毎に２段DP法の後向きアルゴリズムをまとめて
行うようにしたもので、以下、第８図を用いて本
発明の原理について説明する。第８図において、斜線部分は一回にｄを計算す
る領域（以下、領域Ａという）を示している。こ
の領域Ａは第６図において示した斜線部分の領域
（以下、領域Ｂという）にＷ×Ｊの領域（以下、
領域Ｃという）を付加したものになつている。Ｗ
はフレーム数であり、ｄの計算は以後、Ｗフレー
ムづつずらして行われる。こうすると、ｄの計算量は、Ｗフレーム毎に平
均J²＋Ｗ／２Ｊで済み、これを入力パターンのレーム数Ｉに対し、Ｉ／Ｗ回行うので、結局ｄの全体
の計算量はＮ・Ｉ・Ｊ（1/2＋Ｊ／Ｗ）となる。以下、上記原理を用いた本発明の実施例につい
て図面とともに説明する。第９図は本発明のパターン比較装置を音声認識
装置に適用した場合の一実施例を示すブロツク図
である。図において、１は音声信号の入力端子、２は連
続単語の入力音声信号を特徴ベクトルの系列に変
換し、入力パターンＡ＝a₁，a₂…a_Iとして出力す
る音響分析部である。この音響分析部２より出力
される入力パターンの長さをＩとする。すなわち
入力パターンはＩ個のフレームからなる。３は認
識すべき単語（単語数Ｎ）が特徴ベクトルの系列
Bⁿ＝b₁ ⁿb₂ ⁿ…bⁿ _Joとして記憶されている標準パター
ン記憶部で、この記憶内容は入力パターンとのフ
レーム間距離を計算する際に標準パターンとして
読み出される。この音響分析部２および標準パタ
ーン記憶部３は、第１図において示したものと同
様のものである。７はDPマツチング部で、標準パターンの特徴
ベクトルbⁿ _jと入力パターンの特徴ベクトルa₁との
フレーム間距離dⁿ（ｉ，ｊ）を計算するフレーム
間距離計算部７ａと、このフレーム間距離dⁿ（ｉ，
ｊ）を記憶するフレーム間距離記憶部７ｂと、こ
のフレーム間距離記憶部７ｂに記憶されているフ
レーム間距離dⁿ（ｉ，ｊ）を用いて単語ｎとフレ
ームi′〜ｉの入力パターンとの部分距離Dⁿ ₀（i′：
ｉ）を計算する部分距離計算部７ｃより構成され
る。８はDPマツチング部７におけるフレーム間距
離計算部７ａおよび部分距離計算部７ｃにマツチ
ングの開始フレーム情報を与えるマツチング開始
フレーム設定部で、開始フレームi₀は初期値が１
であり、以後、Ｗ＋１，2W＋１，…とＷフレー
ム毎の値が設定される。９は部分距離計算部７ｃで計算された部分距離
Dⁿ ₀（i′：ｉ）を記憶する部分距離記憶部である。
ただし、ｎ＝１，２，…，Ｎ，ｉ＝i₀，i₀＋１，
…，i₀＋Ｗ−１，i′はｉ−2Jⁿ＋１i′ｉ−1/2Jⁿ
の範囲である。１０は部分距離記憶部９に記憶されている部分
距離から、入力パターンのフレームｉで終端する
単語列の最小累積距離Ｄ(i)を、ｉ＝i₀，i₀＋１，
…，i₀＋Ｗ−１について求めるとともに、このＤ
(i)を与えるフレームｉで終端する単語列の最後尾
単語Ｎ(i)と、この単語Ｎ(i)の始端フレームより１
を減じたフレームを示すＢ(i)を計算する累積距離
計算部である。１１は累積距離計算部１０で計算されたＤ(i)，
Ｎ(i)，Ｂ(i)を記憶する累積距離情報記憶部であ
る。ただし、ｉ＝i₀，i₀＋１，…，i₀＋Ｗ−ｉ，i₀
＝１，Ｗ＋１，2W＋１，…，LW＋１であり、
Ｌ＝〔（Ｉ−１）／Ｗ〕である。なお〔Ｘ〕はＸの
整数部分を示す。１２は入力パターンの最終フレームＩまでの累
積距離の計算が終了した時点で、入力された連続
単語の単語境界を示すバツクポインタＢ（），Ｂ
（Ｂ（）），…，Ｂ（Ｂ（（…Ｂ（Ｂ（））…））
），０
を最終フレームの方から逆順に求めるセグメンテ
ーシヨン部である。１３はセグメンテーシヨン部１２で求められた
バクポインタをもとに当該境界フレームで終端す
る単語を順次累積距離情報記憶部１１から読み出
し、認識単語とする単語決定部である。以下、本実施例の動作について第８図、第９図
を用いて説明する。フレーム間距離計算部７ａは音響分析部２より
入力パターンが出力されると標準パターンの特徴
ベクトルと入力パターンの特徴ベクトルのフレー
ム間距離dⁿ（ｉ，ｊ）の計算を開始する。この計
算を行う範囲は、第８図において示される斜線部
分の領域Ａであり、この領域Ａは、マツチング開
始フレーム設定部８により開始フレームi₀ば１，
Ｗ＋１，…とＷの幅づつ順次変えられるごとにＷ
の幅で移動する。フレーム間距離dⁿ（ｉ，ｊ），部分距離Dⁿ ₀（i′：
ｉ）、最小累積距離Ｄ(i)の計算は、領域Ａ毎に同
じ手順で計算されるので、説明の簡略化のために
以下、開始フレームi₀についてのみ説明する。フレーム間距離dⁿ（ｉ，ｄ）は、標準パターン
記憶部３に記憶されているＮ個の単語に対し、順
次計算される。この計算は領域Ａ内について行わ
れる。なお領域Ａは標準パターン長Jⁿにより変化
する。例えば、１番目の単語（ｎ＝１）に対して
は、点（i₀，Jⁱ）、点（i₀−2J′＋１，１），点（i₀
＋Ｗ−ｉ，J¹）、点（i₀＋Ｗ−1/2J¹，１）の４点
を結んだ領域A₁となる。従つて、ｎ＝１に対し
て、領域A₁内の各点におけるフレーム間距離d¹
（ｉ，ｊ）が計算される。以下、同様にｎ＝２，
ｎ＝３，…，ｎ＝Ｎについて、フレーム間距離d²
（ｉ，ｊ），d³（ｉ，ｊ），…，d^N（ｉ，ｊ）が計算
される。なお、dⁿ（ｉ，ｊ）の計算はDⁿ ₀（i′：ｉ）
の計算と交互に行う。すなわち、ｎ＝１について
フレーム間距離d¹（ｉ，ｊ）の計算が終了すると、
次に部分距離D₀ ¹（i′：ｊ）の計算を行う。部分距離D₀ ¹（i′：ｊ）の計算は、先に求めたフ
レーム間距離d¹（ｉ：ｊ）を基にして行われる。
すなわち部分距離は、始点ｉ＝i₀，ｊ＝J¹より、
領域A₁内で、かつ第７図ａに示す拘束条件の下
で順次経路が選択され選択された経路上の点のフ
レーム間距離の利として求められる。前記拘束条
件上の３つの経路のうちどれを選択するかは、当
該点に到る一つ前の点における、それまでの経路
上のフレーム間距離の和が最小となる経路が選択
される。以上のようにして、始点ｉ＝i₀，ｊ＝J¹
より、終点ｉ＝j′，ｊ＝１までに到る経路が決定
され、部分距離D₀ ¹（i′：i₀）が求められる。なお、
i′はi₀−2J¹＋１i′i₀−1/2J¹の範囲値となる。この（i₀，J¹）を始点としたときの部分距離
D₀ ¹（i′：i₀）は、部分距離記憶部９に記憶される。以下、同様に（i₀＋ｉ，J₁），（i₀＋２，J₁），…
（i₀＋Ｗ−１）を始点とした部分距離D₀ ¹（i′：i₀＋
1D₀ ¹（i′：i₀＋２），…，D₀ ¹（i′：i₀＋Ｗ−１）を
計
算し、部分距離記憶部９に記憶させる。次に２番目の単語（ｎ＝２）について、同様に
フレーム間距離d²（ｉ，ｊ）を計算するとともに
部分距離D₀ ²（i′：ｉ）の計算を行う。以下、Ｎ番
目の単語（ｎ＝Ｎ）まで同様に計算するとともに
部分距離D₀ ²（i′：ｉ），…，D₀ ^N（i′：ｉ）を部分距
離記憶部９に記憶する。次に入力パターンのフレームｉで終端する単語
列の最小累積距離Ｄ(i)は、部分距離記憶部９に記
憶されている部分距離Dⁿ ₀（i′：ｉ）を用いて、累
積距離計算部１０において求められる。すなわち
累積距離計算部１０おいては、フレームi′〜ｉの
入力パターンと単語ｎの標準パターンとの部分距
離Dⁿ ₀（i′：ｉ）と、入力パターンのフレームi′−１
で終端する単語列の最小累積距離Ｄ（i′：１）と
の和を求めるとともに前記和のｎおよびi′に関し
て最小となる累積距離Ｄ(i)を求める。累積距離計
算部１０は前記Ｄ(i)を入力パターンのフレームｉ
で終端する単語例の最小累積距離として累積距離
情報記憶部１１に記憶させる。また累積距離計算
部１０は、前記Ｄ(i)を求めたときのｎ，ｉをそれ
ぞれｎ，i′とするとき、Ｎ(i)＝ｎ，Ｂ(i)＝i′−１
として、この認識候補単語Ｎ(i)、単語境界を示す
バツクポインタＢ(i)をＤ(i)とともに累積距離情報
記憶部１１に記憶さする。 DPマツチング部７および累積距離計算部１０
における以上の処理は、マツチング開始フレーム
設定部８により開始フレームi₀が変えられるごと
に繰り返される。マツチング開始フレーム設定部８により入力パ
ターンの最終フレームＩに相当する開始フレーム
が設定され、この開始フレームにより定まる領域
Ａについて累積距離計算部１０でＤ(i)、Ｎ(i)、Ｂ
(i)が求められ累積距離記憶部１１に記憶された
後、セグメンテーシヨン部１２で入力単語列の単
語境界を定める処理が行われる。この処理は、ま
ずフレームＩにおけるＢ(i)すなわちＢ（Ｉ）を累
積距離情報記憶部１１から読み出し、次にその読
み出したＢ（）の値をもとにフレームＢ（）に
おけるＢ(i)すなわちＢ（Ｂ（））を累積距離情報
記憶部１１から読み出す。以下、同様に読み出し
たＢ(i)の値をもとに順次単語境界として、Ｂ（Ｂ
（Ｂ（）），…，Ｂ（Ｂ（Ｂ（…Ｂ（Ｂ（））…）
）），
０が読み出される。なお０は入力パターンの入力
開始フレームの１つ手前のフレームということで
ある。以上のようにセグメンテーシヨン部１２に
より読み出されたＢ(i)は単語決定部１３に入力さ
れる。単語決定部１３は、このＢ(i)を基に累積距
離情報記憶部１１からＮ(i)を読み出す。すなわち
最初は最終フレームＩで終端する単語Ｎ（）を、
次はＢ（）フレームで終端する単語Ｎ（Ｂ（））
を、以下、同様に、Ｎ（Ｂ（Ｂ（））），…Ｎ（Ｂ（
Ｂ
（Ｂ（…Ｂ（Ｂ（））…））））を読み出す。この読
み
出された単語Ｎ(i)が端子６より認識結果として出
力される。第１０図は第９図に示した装置の機能をソフト
ウエアで実現する場合のフローチヤートを示して
いる。図において、ステツプ１００〜ステツプ１０２
は最小累積距離の初期値設定を行う部分である。
ステツプ１０５〜ステツプ１０７はフレーム間距
離を求める部分、ステツプ１０８〜ステツプ１１
４は部分距離を求める部分、ステツプ１０４はこ
れらをすべての単語について行うことを示してい
る。ステツプ１１５〜ステツプ１１６は最小累積
距離、認識単語、単語境界を求める部分であり、
ステツプ１０３は、ステツプ１０４〜ステツプ１
１６をＷフレーム毎に繰り返し行うことを示して
いる。ステツプ１１７〜ステツプ１２０は最終フ
レームＩまでの認識単語、単語境界が求まつた
後、最終フレームより逆順に単語境界、認識単語
を決定する部分である。第１１図は入力単語数が既知（Ｘ）の場合の実
施例についてソフトウエアで実現したときのフロ
ーチヤートを示している。その各ステツプは第１０図に示した入力単語数
が末知の場合とほとんど同じである。違いは、累
積距離を求める際に、そのフレームに到るまでの
入力単語の個数を仮定し、各単語数について累積
距離D_x(i)を求めるとともに、認識単語N_x(i)，単
語境界B_x(i)を求める点である。この場合、計算
量はＸの値に応じて増加する認識精度は向上す
る。第９図において示した実施例においては、Ｄ(i)
は、ｎ＝１〜Ｎのすべてに対するDⁿ ₀（i′：ｉ）を
計算したのちに求めたが、各ｎ毎に求めるように
しても良い。このようにした場合の実施例につい
てソフトウエアで実現したときのフローチヤート
を第１２図に示す。このフローチヤートにおい
て、第１０図のフローチヤートと異なるのはステ
ツプ１１５′および１１６′である。このようにス
テツプを変えることにより、累積距離Ｄ(i)を計算
するために必要な部分距離Dⁿ ₀（ｉ：ｉ）を記憶し
ておくためのメモリーを大幅に減らすことができ
る。また第９図に示した実施例において、累積距離
記憶部１１に記憶される認識候補単語Ｎ(i)として
は、最小累積距離Ｄ(i)に対応するもののみであつ
たが、最小累積距離Ｄ(i)の次に小さい累積距離
（以下、次最小累積距離という）に対応する認識
候補単語（以下、次認識候補単語N′(i)という）
をも累積距離情報記憶部１１に記憶させるように
してもよい。この場合、単語決定部１３として
は、累積距離情報記憶部１１から読み出されるＮ
(i)，N′(i)を基に各種の認識単語列を認識結果と
して出力することになる。認識単語列としては、
Ｎ(i)のみを用いたもの（第９図の実施例における
認識結果と同じ）、Ｎ(i)の単語列のうちの単語１
個をN′(i)で置換したもの、単語２個を置換した
ものなど種々考えられる。この場合、単語決定部
としては、例えばＮ(i)、N′(i)を記憶する記憶部
と、この記憶部から選択的にＮ(i)、N′(i)を読み
思す選択読出部とで構成できる。またマツチング計算を行う範囲としては第８図
に示す領域Ａとしたが、この領域Ａのうち領域Ｂ
に相当する部分を、第６図の破線で囲またた幅Ｒ
の領域との論理積をとつた領域（以下、領域
B′という）とし、新しい領域Ａとして、領域
B′をＷフレームずらして得られる領域としても
よい。このような領域の設定は必要とする認識精度、
計算速度などを考慮して行う。ところで、マツチ
ング計算の始端をフレームi₀とすると、その終端
フレームは、標準パターン長J_oに相当するフレー
ム長だけ始端フレームi₀より戻つた付近が最も終
端フレームの位置として妥当と考えられ、領域Ａ
の設定もこれらをもとに行われる。従つて、終端
フレームの位置は一般に標準パターン長Jⁿの凾数
として与えられる。すなわち始端フレームをi₀、
終端フレームをi′とすると、i₀＋R₁（Jⁿ）i′i₀
＋R₂（Jⁿ）となる。領域Ａにおいては始端は開始フレームi₀からフ
レームi₀＋Ｗ−１の範囲で変化するので終端i′は、
i₀＋R₁（Jⁿ）i′i₀＋R₂（Jⁿ）＋Ｗとなる。以上のように本発明のパターン比較装置はマツ
チング計算の開始フレームi₀をＷフレーム毎に設
定し、Ｗフレーム毎に定まるパターン比較領域に
ついてｄ（ｉ，ｊ）の計算をまとめて行うように
構成したので、従来の２段DP法を用いたパター
ン比較装置に較べ、フレーム間距離ｄ（ｉ，ｊ）
の計算回数を大幅に減少させることができる。例
えばＪ＝30、Ｗ＝10の場合、 3/4NIJ²／NIJ（1/2＋Ｊ／Ｗ）＝3/4Ｊ／（1/2＋Ｊ／Ｗ）≒6.4 となり、計算回数は約６分の１となる。一方、記
憶量としては、（3/4J²＋WJ）／3/4J² ＝１＋3/4・Ｗ／Ｊ≒1.4 となり４割増加するだけである。 _[ Table] _When _the number of _input ^words , _B From the last word name and segmentation result to the first word name and segmentation result are found sequentially. When _the _number _of ^input ^words ]
...(15) N(i)=n, B(i)=m (n, m are n and m that satisfy equation (15)) From the solution of the recurrence formula, the recognition result is obtained according to the flowchart in Figure 5. is found. The problem with calculating equations (14) and (15) is (a) the small amount of calculation. (b) The required storage capacity should be as small as possible. (c) It must be a real-time algorithm. It is. Regarding (a), the main calculations to find the solution are mainly D ⁿ ₀ (s:t) and d ⁿ (i, j) necessary to find this. In particular, each frame is usually
It is expressed by parameters with more than 10 dimensions, and the problem is how to reduce the amount of calculation. Next, the conventional calculation method is 2.
The step DP method will be explained. The two-stage DP method first sets D ⁿ ₀ (s:t) to any s,
Find the combination of t by DP, then D
It is a method to obtain (i) using DP, and is characterized by having two stages of DP. A forward algorithm and a backward algorithm have been proposed as this two-stage DP method, but the backward algorithm will be explained here. For frame i-1 of the input pattern, D
It is assumed that (i-1), N(i-1), and B(i-1) have been found. The standard pattern of word n (n=1, 2, . . . , N) and the input pattern are DP matched in reverse time direction starting from i ₀ . Therefore, the constraint conditions for the path are as shown in Fig. 7, corresponding to c, d, e, and f in Fig. 3.
Figures a, b, c, and d are shown. The matching range is
Although it is conceivable to perform this with the matching window width R, here it is assumed that the adjustment is performed with an inclination in the range of 1/2 to 2 (within the inclination limit, the shaded area in FIG. 6). This matching is performed with the termination free, and as a result, D ⁿ ₀ (s:
i) is found. However, i−2J ⁿ +1si
−(1/2)J ⁿ . Find D(i), N(i), and B(i) in equation (15). Return to i=i+1. This method requires calculating the interframe distance d and the cumulative distance D within the matching range for each word for each input frame. Therefore, the total number of calculations is N・I・for both the interframe distance d and the cumulative distance D.
3/4J ² (when the matching window width R is N・I・J・
R). The above is the two-stage DP matching method, but the drawback of this method is that it still requires a large number of calculations.
The reason is that for each word in each input frame 3/
4J This is to calculate d and D ^twice . On the other hand, the simplest method for reducing the number of calculations is to save all the values of d once calculated until they are no longer needed. However, with this method, the amount of calculation for d is N.I.J, but the amount of storage required for this calculation is quite large. Moreover, with this method, it is necessary to change the storage address of the calculation result of d for each input frame. The present invention eliminates the above-mentioned drawbacks, significantly reduces the amount of calculation required when comparing an input pattern and a standard pattern without significantly increasing the amount of memory, and provides a pattern comparison device with high calculation speed. purpose. In order to achieve this object, the present invention performs the backward algorithm of the two-stage DP method at regular intervals.The principle of the present invention will be explained below with reference to FIG. In FIG. 8, the shaded area indicates an area where d is calculated at one time (hereinafter referred to as area A). This area A is a W×J area (hereinafter referred to as area B) shown in the hatched area in FIG. 6 (hereinafter referred to as area B).
(referred to as area C). W
is the number of frames, and the calculation of d is thereafter performed by shifting by W frames. In this way, the amount of calculation for d is only J ² +W/2J on average for each W frame, and this is performed I/W times for the number of frames I of the input pattern, so the total amount of calculation for d is N・It becomes IJ (1/2 + J/W). Embodiments of the present invention using the above principle will be described below with reference to the drawings. FIG. 9 is a block diagram showing an embodiment in which the pattern comparison device of the present invention is applied to a speech recognition device. In the figure, 1 is an input terminal for audio signals, and 2 is an acoustic analysis unit that converts the input audio signal of continuous words into a series of feature vectors and outputs them as input patterns A=a ₁ , a ₂ . . . a _I. Let I be the length of the input pattern output from this acoustic analysis section 2. That is, the input pattern consists of I frames. 3 is a series of feature vectors in which the words to be recognized (number of words N)
In _the standard pattern storage unit, ^the standard pattern is stored as B ⁿ =b ₁ ⁿ b ₂ ⁿ . The acoustic analysis section 2 and standard pattern storage section 3 are similar to those shown in FIG. 7 is a DP matching unit, which includes an interframe distance calculation unit 7a that calculates the interframe distance d ⁿ (i, j) between the standard pattern feature vector b ⁿ _j and the input pattern feature vector a ₁ ; d ⁿ (i,
j) and the interframe distance d ⁿ (i, j) stored in this interframe distance storage 7b, the input pattern of word n and frames i' to i is determined. The partial distance D ⁿ ₀ (i′:
It is composed of a partial distance calculation section 7c that calculates i). 8 is a matching start frame setting unit that provides matching start frame information to the interframe distance calculation unit 7a and partial distance calculation unit 7c in the DP matching unit 7, and the start frame i ₀ has an initial value of 1.
From then on, values are set for each W frame, such as W+1, 2W+1, . . . 9 is the partial distance calculated by the partial distance calculation unit 7c
This is a partial distance storage unit that stores D ⁿ ₀ (i′:i).
However, n=1, 2,..., N, i=i ₀ , i ₀ +1,
..., i ₀ +W-1, i' is i-2J ⁿ +1i'i-1/2J ⁿ
is within the range of 10 calculates the minimum cumulative distance D(i) of the word string ending at frame i of the input pattern from the partial distances stored in the partial distance storage unit 9, i=i ₀ , i ₀ +1,
…, i ₀ +W−1, and this D
(i) from the last word N(i) of the word string ending in frame i, and from the starting frame of this word N(i)
This is a cumulative distance calculation unit that calculates B(i) indicating the frame from which . 11 is D(i) calculated by the cumulative distance calculation unit 10,
This is a cumulative distance information storage unit that stores N(i) and B(i). However, i=i ₀ , i ₀ +1,..., i ₀ +W−i, i ₀
=1, W+1, 2W+1,..., LW+1,
L=[(I-1)/W]. Note that [X] represents the integer part of X. Reference numeral 12 indicates a back pointer B(), B indicating the word boundary of the input continuous words at the time when the calculation of the cumulative distance to the final frame I of the input pattern is completed.
(B()),…,B(B((…B(B())…))
), 0
This is a segmentation unit that calculates the following in reverse order starting from the last frame. Reference numeral 13 denotes a word determining unit which sequentially reads words ending in the relevant boundary frame from the cumulative distance information storage unit 11 based on the back pointers determined by the segmentation unit 12 and uses them as recognized words. The operation of this embodiment will be described below with reference to FIGS. 8 and 9. When the input pattern is output from the acoustic analysis section 2, the interframe distance calculation section 7a starts calculating the interframe distance d ⁿ (i, j) between the feature vector of the standard pattern and the feature vector of the input pattern. The range in which this calculation is performed is the shaded area A shown in _FIG .
W + 1, ... and each time the width of W is changed sequentially, W
Move by the width of . Inter-frame distance d ⁿ (i, j), partial distance D ⁿ ₀ (i′:
i) Since the minimum cumulative distance D(i) is calculated using the same procedure for each area A, only the starting frame i ₀ will be described below to simplify the explanation. The interframe distance d ⁿ (i, d) is sequentially calculated for N words stored in the standard pattern storage section 3. This calculation is performed within area A. Note that the area A changes depending on the standard pattern length J ⁿ . For example, for the first word (n=1), point (i ₀ , J ⁱ ), point (i ₀ −2J′+1,1), point (i ₀
+W-i, J ¹ ) and the point (i ₀ +W-1/2J ¹ , 1) are connected to form an area A ₁ . Therefore, for n=1, the interframe distance d ¹ at each point in the area A ₁
(i,j) is calculated. Similarly, n=2,
For n=3,...,n=N, interframe distance d ²
(i, j), d ³ (i, j), ..., d ^N (i, j) are calculated. Note that the calculation of d ⁿ (i, j) is D ⁿ ₀ (i′:i)
This is done alternately with the calculation of That is, when the calculation of the interframe distance d ¹ (i, j) for n=1 is completed,
Next, the partial distance D ₀ ¹ (i′:j) is calculated. The partial distance D ₀ ¹ (i':j) is calculated based on the inter-frame distance d ¹ (i:j) determined previously.
In other words, the partial distance is from the starting point i = i ₀ , j = J ¹ ,
Routes are sequentially selected within the region _A1 and under the constraint conditions shown in FIG. Which of the three routes based on the constraint condition is selected is the route that minimizes the sum of the inter-frame distances on the route at the point immediately before reaching the point. As described above, starting point i=i ₀ , j=J ¹
Thus, the route to the end point i=j', j=1 is determined, and the partial distance D ₀ ¹ (i':i ₀ ) is calculated. In addition,
i′ has a value in the range of i ₀ −2J ¹ +1i′i ₀ −1/2J ¹ . The partial distance when this (i ₀ , J ¹ ) is the starting point
D ₀ ¹ (i′:i ₀ ) is stored in the partial distance storage section 9. Similarly, (i ₀ +i, J ₁ ), (i ₀ +2, J ₁ ),...
^{Partial distance D 0 1} ₍ _i ′:i ₀ +
1D ₀ ¹ (i': i ₀ +2), . . . , D ₀ ¹ (i': i ₀ +W-1) are calculated and stored in the partial distance storage section 9. Next, for the second word (n=2), the interframe distance d ² (i, j) is similarly calculated, and the partial distance D ₀ ² (i':i) is calculated. Thereafter, calculations are made in the same manner up to the Nth word (n=N), and the partial distances D ₀ ² (i':i), . . . , D ₀ ^N (i':i) are stored in the partial distance storage section 9. Next, the minimum cumulative distance D(i) of the word string ending at frame i of the input pattern is calculated by calculating the cumulative distance using the partial distance D ⁿ ₀ (i′:i) stored in the partial distance storage unit 9. It is found in section 10. That is, the cumulative distance calculation unit 10 calculates the partial distance D ⁿ ₀ (i':i) between the input pattern of frames i' to i and the standard pattern of word n, and the frame i'-1 of the input pattern.
and the minimum cumulative distance D(i':1) of the word string ending at , and the minimum cumulative distance D(i) with respect to n and i' of the sum. The cumulative distance calculation unit 10 converts the D(i) into frame i of the input pattern.
This is stored in the cumulative distance information storage unit 11 as the minimum cumulative distance of the word example ending in . Further, the cumulative distance calculation unit 10 calculates that when n and i are respectively n and i' when calculating the D(i), N(i)=n, B(i)=i'-1
The recognition candidate word N(i) and the back pointer B(i) indicating the word boundary are stored in the cumulative distance information storage unit 11 together with D(i). DP matching section 7 and cumulative distance calculation section 10
The above processing in is repeated every time the matching start frame setting section 8 changes the start frame _i0 . The matching start frame setting unit 8 sets a start frame corresponding to the final frame I of the input pattern, and the cumulative distance calculation unit 10 calculates D(i), N(i), B for the area A defined by this start frame.
After (i) is determined and stored in the cumulative distance storage unit 11, the segmentation unit 12 performs processing to determine the word boundaries of the input word string. This process first reads B(i) in frame I, that is, B(I), from the cumulative distance information storage unit 11, and then calculates B(i) in frame B() based on the read value of B(). That is, B (B()) is read out from the cumulative distance information storage section 11. Below, B(B
(B()),…,B(B(B(…B(B())…)
)),
0 is read. Note that 0 means the frame one frame before the input start frame of the input pattern. B(i) read by the segmentation unit 12 as described above is input to the word determination unit 13. The word determining unit 13 reads N(i) from the cumulative distance information storage unit 11 based on this B(i). That is, initially the word N() that ends in the final frame I is written as
The next word N(B()) ends in the B() frame.
Similarly, N(B(B())),...N(B(
B
Read out (B(...B(B())...)))). This read word N(i) is output from the terminal 6 as a recognition result. FIG. 10 shows a flowchart when the functions of the apparatus shown in FIG. 9 are realized by software. In the figure, steps 100 to 102
is the part that sets the initial value of the minimum cumulative distance.
Steps 105 to 107 are the parts for calculating the distance between frames, and steps 108 to 11
4 indicates a portion for calculating partial distances, and step 104 indicates that these are performed for all words. Steps 115 and 116 are steps for determining the minimum cumulative distance, recognized words, and word boundaries.
Step 103 is similar to Step 104 to Step 1.
16 is repeated every W frames. Steps 117 to 120 are steps in which, after the recognized words and word boundaries up to the final frame I have been determined, the word boundaries and recognized words are determined in reverse order from the final frame. FIG. 11 shows a flowchart of an embodiment in which the number of input words is known (X) and is realized by software. Each step is almost the same as in the case where the number of input words is unknown as shown in FIG. The difference is that when calculating the cumulative distance, we assume the number of input words up to that frame, calculate the cumulative distance D _x (i) for each number of words, and calculate the number of recognized words N _x (i) and word boundaries. This is the point to find B _x (i). In this case, the amount of calculation increases according to the value of X, and the recognition accuracy improves. In the embodiment shown in FIG. 9, D(i)
is obtained after calculating D ⁿ ₀ (i':i) for all n=1 to N, but it may be obtained for each n. FIG. 12 shows a flowchart of this embodiment implemented by software. This flowchart differs from the flowchart of FIG. 10 in steps 115' and 116'. By changing the steps in this way, the memory required to store the partial distances D ⁿ ₀ (i:i) required to calculate the cumulative distance D(i) can be significantly reduced. Further, in the embodiment shown in FIG. 9, the recognition candidate words N(i) stored in the cumulative distance storage unit 11 were only those corresponding to the minimum cumulative distance D(i); Recognition candidate word (hereinafter referred to as next recognition candidate word N'(i)) corresponding to the next smallest cumulative distance of D(i) (hereinafter referred to as next minimum cumulative distance)
may also be stored in the cumulative distance information storage unit 11. In this case, the word determining unit 13 reads N from the cumulative distance information storage unit 11.
Based on (i) and N′(i), various recognized word sequences are output as recognition results. As a recognized word string,
Using only N(i) (same as the recognition result in the example of FIG. 9), word 1 of the N(i) word string
Various options are possible, such as replacing N'(i) with N'(i), or replacing two words. In this case, the word determining unit includes a memory unit that stores N(i) and N′(i), and a selective readout unit that selectively reads N(i) and N′(i) from this memory unit. It can be composed of two parts. In addition, the area A shown in Figure 8 was used as the range for the matching calculation, but area B of this area A
The part corresponding to is surrounded by the broken line in Fig. 6, and the width R
area (hereinafter referred to as area)
B′), and as a new area A, the area
The area may be obtained by shifting B' by W frames. Setting such areas requires the required recognition accuracy,
This is done taking into consideration calculation speed, etc. By the way, if the starting point of the matching calculation is frame i ₀ , the most appropriate position for the ending frame is considered to be near the starting point frame i ₀ by a frame length corresponding to the standard pattern length J _o , and the area A
Settings are also made based on these. Therefore, the position of the end frame is generally given as a function of the standard pattern length J ⁿ . In other words, the starting frame is i ₀ ,
If the end frame is i′, then i ₀ +R ₁ (J ⁿ )i′i ₀
+R ₂ (J ⁿ ). In area A, the starting edge changes in the range from starting frame i ₀ to frame i ₀ +W-1, so the ending edge i' is
i ₀ +R ₁ (J ⁿ )i′i ₀ +R ₂ (J ⁿ )+W. As described above, the pattern comparison device of the present invention is configured to set the starting frame i ₀ of matching calculation for each W frame, and to collectively calculate d(i, j) for the pattern comparison area determined for each W frame. Therefore, compared to a pattern comparison device using the conventional two-stage DP method, the interframe distance d(i, j)
The number of calculations can be significantly reduced. For example, when J=30 and W=10, 3/4NIJ ² /NIJ (1/2+J/W) = 3/4J/(1/2+J/W)≒6.4, and the number of calculations is approximately 1/6th. . On the other hand, the amount of memory increases by only 40%, as (3/4J ² +WJ)/3/4J ² =1+3/4·W/J≒1.4.

[Brief explanation of drawings]

第１図は従来の音声認識装置のブロツク図、第
２図はパターンＡ、Ｂの特徴ベクトルの対応関係
を示す図、第３図ａ〜ｆはｉ−ｊ平面上の格子点
を選ぶ際の拘束条件例を示す図、第４図および第
５図はそれぞれ入力単語数が既知の場合、未知の
場合の連続単語音声認識におけるセグメンテーシ
ヨンおよび認識単語の決定手順を示すフローチヤ
ート、第６図は２段DP法の後向きアルゴリズム
の説明図、第７図ａ〜ｄはｉ−ｊ平面上の格子点
を選ぶ際の拘束条件例を示す図、第８図は本発明
の原理説明図、第９図は本発明の一実施例のブロ
ツク図、第１０図は同実施例装置の機能を実現し
たソフトウエアのフローチヤート、第１１図、第
１２図は同じく他の実施例におけるフローチヤー
トである。１…音響分析部、３…標準パターン記憶部、７
…DPマツチング部、７ａ…フレーム間距離計算
部、７ｂ…フレーム間距離記憶部、７ｃ…部分距
離計算部、８…マツチング開始フレーム設定部、
９…部分距離記憶部、１０…累積距離計算部、１
１…累積距離情報記憶部、１２…セグメンテーシ
ヨン部、１３…単語決定部。 Fig. 1 is a block diagram of a conventional speech recognition device, Fig. 2 is a diagram showing the correspondence between feature vectors of patterns A and B, and Figs. FIGS. 4 and 5 are diagrams showing examples of constraint conditions, and FIGS. 4 and 5 are flowcharts showing the segmentation and recognition word determination procedures in continuous word speech recognition when the number of input words is known and unknown, respectively. FIG. is an explanatory diagram of the backward algorithm of the two-stage DP method; Figures 7a to d are diagrams showing examples of constraint conditions when selecting grid points on the i-j plane; Figure 9 is a block diagram of one embodiment of the present invention, Figure 10 is a flowchart of software that realizes the functions of the device of the same embodiment, and Figures 11 and 12 are flowcharts of other embodiments. . 1...Acoustic analysis section, 3...Standard pattern storage section, 7
... DP matching unit, 7a... Inter-frame distance calculation unit, 7b... Inter-frame distance storage unit, 7c... Partial distance calculation unit, 8... Matching start frame setting unit,
9... Partial distance storage section, 10... Cumulative distance calculation section, 1
1... Cumulative distance information storage section, 12... Segmentation section, 13... Word determination section.

Claims

[Claims]

1. Feature extraction means for converting a continuous pattern input signal into a time series A=a ₁ , a ₂ ...a _I of feature vectors a _i ;
Standard pattern B ⁿ = b ⁿ ₁ , b ⁿ ₂ ... b ⁿ Jn (n = 1, 2,
..., N), and a standard pattern storage means for storing J ⁿ
In a lattice graph where the minimum value for n is Jmin, the horizontal axis is the input pattern, and the vertical axis is the standard pattern, when the maximum slope of the matching path is S _nax and the minimum slope is S _nio , W≦J _nio /S _nax For W,
a _matching _start frame setting means for setting a starting frame ⁱ ₀ of matching calculation for the time series A every W frames ^; ), (i _p +W-1,
1), the inter-frame distance d ⁿ (i, j) of the feature vectors of the standard pattern and the input pattern only for the grid points within the trapezoid surrounded by (i _p −J ⁿ /S _nio , 1)
an inter-frame distance calculation means that calculates each time the start frame i _p is set; and an inter-frame distance calculation means that calculates the inter-frame distance d ⁿ (i,
j), reads out the interframe distance stored in the interframe distance storage means, and generates a partial input pattern and a standard pattern B ⁿ with frame i as the starting point and frame i' as the ending point. A partial distance calculating means for calculating a partial distance D ⁿ o (i':i) between D ⁿ o ( i': partial distance storage means for storing i);
The sum of the cumulative distance D (I'-1) to frame i'-1 and the partial distance D ⁿ o (i':i) is minimized for i',n, and as a result, the cumulative distance D (I'-1) to frame i is i), and N(i)=n and B(i)= for the cumulative distance D(i) and the n and i' used in calculating the cumulative distance D(i); i'-1, and when the calculation of the cumulative distance to the final frame I is completed, the cumulative distance information stored in the cumulative distance information storage means is B(i) to B(), B.
(B()),...,B(B(...B())...), O, that is, segmentation means for obtaining boundaries of consecutively input patterns in reverse order; B(), B(B
Using ()),…,B(B(…B())…),O,
From the cumulative distance information storage means, N(), N(B()),..., N(B(...B()
))
...), and pattern determining means for determining ...) in reverse order.