JPS6019199A

JPS6019199A - Voice recognition

Info

Publication number: JPS6019199A
Application number: JP12623783A
Authority: JP
Inventors: 裕飯塚
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1983-07-13
Filing date: 1983-07-13
Publication date: 1985-01-31
Also published as: JPH0311479B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（技術分野）この発明は認識性能の向上を図るようにした音声認識方
法に関する。DETAILED DESCRIPTION OF THE INVENTION (Technical Field) The present invention relates to a speech recognition method designed to improve recognition performance.

（従来技術）従来の音声認識装置は第１図のように構成されておシ、
１は入力端子、２は周波数分析部、３はスペクトル変換
部、４は音声区間決定部、５は非類似度演算部、６は標
準音声ス被りトルノクタ〜ンメモリ、７は判定部、８は
認識結果出力端子である。(Prior art) A conventional speech recognition device is configured as shown in Figure 1.
1 is an input terminal, 2 is a frequency analysis section, 3 is a spectrum conversion section, 4 is a speech interval determination section, 5 is a dissimilarity calculation section, 6 is a standard speech overlapping circuit memory, 7 is a judgment section, and 8 is a recognition section. This is a result output terminal.

従来の音声認識装置では、ス被りトル変換した入力音声
スペクトルツクターンと標準スぜクトルノソターンｋ（
ｋ＝１〜Ｋ）との非類似度演算において、非類似度Ｄｋ
を入カスにクトルパターンの時間標本点第ｎ番目のｍチ
ャネル目の要素をＡ　（ｍ　＋　ｎ　）とし、標準スペ
クトルパターンにの時間標本点ｎ番目のｍチャネル目の
要素をＳｋ（ｍ、ｎ）とした時に、Ｄｋ＝ｆ、　’ｌ　
、ＩＡ（ｍ、ｎ）−８ｋ（ｍ、ｎ］ＸＷ（ｍ、ｎ）　（
１）ｎ　＝　１　ｍ＝　１（１）式によシ計算し、Ｋ個の標準スペクトルｉｅター
ンの中でＤｋを最小とする標準スペクトルパターンのカ
テゴリを認識結果としている。ここで重みＷ　（ｍ　ｒ
　ｎ　）の計算方法については数々の方式があるが、こ
の発明の目的でないので省略する。In conventional speech recognition devices, the input speech spectrum tsukturn that has been subjected to space-coverage conversion and the standard spectral nosoturn k (
k=1~K), the dissimilarity Dk
Let the m-th channel element of the n-th time sample point of the standard spectrum pattern be A (m + n ), and let the m-th channel element of the n-th time sample point of the standard spectrum pattern be Sk (m, n ), Dk=f, 'l
, IA(m,n)-8k(m,n]XW(m,n) (
1) n = 1 m = 1 Calculated according to equation (1), and the category of the standard spectrum pattern that minimizes Dk among the K standard spectrum ie turns is set as the recognition result. Here, the weight W (m r
There are many methods for calculating n), but they are not the purpose of this invention, so their description will be omitted.

従来の認識装置ではスペクトル変換により入力音声のパ
ワー情報は完全に失なわれる。その結果、例えば「イチ
」を「二」と誤認識したり「ゴ」を「ロク」に誤認識す
るという場合がある。In conventional recognition devices, the power information of the input speech is completely lost due to spectral conversion. As a result, for example, "ichi" may be mistakenly recognized as "two" or "go" may be mistakenly recognized as "roku".

第２図に「イチ」、「二」、「コゝ」、「ロク」の音声
パターンツナグラムの例を示す。第２図で横方向は周波
数軸、たて方向が時間軸である。FIG. 2 shows examples of voice pattern tunagrams for ``ichi'', ``ni'', ``ko'', and ``roku''. In FIG. 2, the horizontal direction is the frequency axis, and the vertical direction is the time axis.

このようにスペクトル変換によシ「イチ」と「二」。In this way, the spectral transformation allows the ``1'' and ``2''.

「ゴ」と「ロク」はかな′シ似かよったパターンとなシ
その差としては「イ」と「チ」の間の無音区間、「口」
と「り」の間の無音区間が大きいがパワー情報は失なわ
れているので、結果として誤認識されることがあシ、認
識率低下の原因となった。``Go'' and ``Roku'' have similar patterns, but the differences are the silent interval between ``i'' and ``chi'', and the ``mouth'' pattern.
Although there is a large silent section between "ri" and "ri", the power information is lost, resulting in misrecognition, which causes a drop in recognition rate.

（発明の目的）この発明の目的はこれらの欠点を解決し、認識率を向上
させることの出来る音声認識方法を提供するにある。(Objective of the Invention) An object of the present invention is to provide a speech recognition method capable of solving these drawbacks and improving the recognition rate.

（発明の概要）この発明では、非類似度演算処理時に音声入力と標準音
声間のパワー・ぐターンの比較を行わせるようにしだも
のであり、以下詳細に説明する。(Summary of the Invention) The present invention is designed to compare power and turn between speech input and standard speech during dissimilarity calculation processing, and will be described in detail below.

（発明の実施例）第３図はこの発明の１実施例を示したブロック図である
。第３図において、１００は入力端子、２θＯは周波数
分析部である。３００はスペクトル変換部であシ、カウ
ンタ３０１１乗算回路３０２、加算回路３０３、レジス
タ３θ４、加算回路３０５、レジスタ３０６、マルチプ
レクサ３０７．３０Ｂ、乗算回路３０９，３１０、減算
除算回路３１ｊルジスタ３１２、減算除算回路３１３、
レジスタ３１４、カウンタ３１５、乗算回路３１６、加
算回路、７７７、遅延回路３１８、減算回路、１１９、
切勺換え回路３２０．３２１、除算回路３２２から成る
。(Embodiment of the invention) FIG. 3 is a block diagram showing one embodiment of the invention. In FIG. 3, 100 is an input terminal, and 2θO is a frequency analysis section. 300 is a spectrum conversion unit, counter 3011 multiplication circuit 302, addition circuit 303, register 3θ4, addition circuit 305, register 306, multiplexer 307.30B, multiplication circuits 309, 310, subtraction/division circuit 31j register 312, subtraction/division circuit 313,
register 314, counter 315, multiplication circuit 316, addition circuit 777, delay circuit 318, subtraction circuit 119,
It consists of switching circuits 320 and 321 and a division circuit 322.

４００は音声区間決定部である。５０θは非類似度演算
部であり、入力音声スペクトル／、ｏターンメモリ５０
１、減算回路５０２、絶対値回路５０３、乗算回路５０
４、重み決定回路５０５、定数発生回路５０６、アキュ
ムレータ、入力音声ノぐワーノやターンメモリ５０８、
加算回路５０９、レジスタ５１０、除算回路５ノ１、標
準音声平均・ぐワーメモリ５１２、減算回路５１３、標
準音声パワーパターンメモリ５１４、加算回路５１５、
切シ換え回路５１６　、５１７　＋’　５２８から成る
。400 is a voice section determining section. 50θ is a dissimilarity calculation unit, and input voice spectrum/, o-turn memory 50
1. Subtraction circuit 502, absolute value circuit 503, multiplication circuit 50
4. Weight determination circuit 505, constant generation circuit 506, accumulator, input audio nozzle and turn memory 508,
Addition circuit 509, register 510, division circuit 5 No. 1, standard audio averaging/warming memory 512, subtraction circuit 513, standard audio power pattern memory 514, addition circuit 515,
It consists of switching circuits 516, 517+'528.

６００は標準音声ス被りトルパターンメモリ、７θＯは
判定部、８００は認識結果出力端子である。600 is a standard speech overlap pattern memory, 7θO is a determination unit, and 800 is a recognition result output terminal.

入力端子１θＯから入力される入力音声信号は周波数分
析部２０θに入力され、複数の周波数帯域に対応した量
子化信号として周波数分析され、スペクトル変換部３０
’０に送られる。The input audio signal inputted from the input terminal 1θO is inputted to the frequency analysis section 20θ, frequency-analyzed as a quantized signal corresponding to a plurality of frequency bands, and then transmitted to the spectrum conversion section 30.
'Sent to 0.

周波数分析部２０−０で、ある時刻ｎに分析されたＭ個
のデータをｘ（ｍ　、　ｎ　）　（ｍ−１〜Ｍ）とする
と、スペクトル変換された入カスベクトルデータＡ　（
ｍ　＋　ｎ　）　＋（ｍ−１〜Ｍ）は（１）式で与えら
れる。If the M pieces of data analyzed at a certain time n by the frequency analysis unit 20-0 are x(m, n) (m-1 to M), then the spectrum-converted input waste vector data A (
m + n ) + (m-1 to M) is given by equation (1).

Ａ（ｍ、ｎ）＝ｘ（ｍ、ｎ’）−（αｎ”ｍ＋β１）−
（１）（１）式においてαｎ、βｎはそれぞれＸ　（ｍ
　＋　ｎ　）の最小２乗近似直線の傾き及び切片を意味
するもので、それぞれ次式によってめられる。A (m, n) = x (m, n') - (αn"m + β1) -
(1) In equation (1), αn and βn are each X (m
+ n ) means the slope and intercept of the least squares approximation straight line, and are determined by the following equations, respectively.

Ｍ−Ｌ　ｒ＋ｒｘ（ｍ、　ｎ）−Ｌ　ｍ、ｆ：、　ｘ（
ｍ、ｎ　）ｍ”’１　ｍ＝Ｊ　ｍ＝１数となる。M−L r+rx(m, n)−L m, f:, x(
m, n ) m'''1 m=J m=1 number.

自＝’！ｌ＋　ｍ　＋　Ｃ２＝欠ｍ２とおけば、（２）
　、　（３）式はｍ＝　１　ｍ＝　１（１）式によシ入カスベクトルデータＡ　（ｍ　＋　ｎ
　）をめることができる。第４図ではこの入カスベクト
ルデータＡ（ｍ、ｎ）を次の如く作成している。まず、
周波数分析部２００よシ入力された入力データｘ（ｍ、
ｎ）と、入力データと同期して計算するカウンタ３０１
によって発生したｍとの積を乗算回路３０２によってめ
、さ′らに加算回路３０３とレジスタ３０４によＶ）　
ｉｎ　”　Ｘ　（ｍ　＋　ｎ　）の値を累積させること
によシ、レジスタ３０４にΣｍ”Ｘ（ｎｌ＋ｎ）の値を
−１セットすることができる。また、加算回路３０５とレジ
スタ３０６によシ同様に、レジスタ３０６パワーＰｎと
して出力する。Self='! If we set l + m + C2 = missing m2, (2)
, Equation (3) is m= 1 m= 1 Scatter vector data A (m + n
) can be set. In FIG. 4, this input waste vector data A(m, n) is created as follows. first,
Input data x (m,
n) and a counter 301 that calculates in synchronization with input data.
The multiplication circuit 302 calculates the product with m generated by , and then the addition circuit 303 and register 304 calculate
By accumulating the values of in''X(m+n), the value of Σm''X(nl+n) can be set to -1 in the register 304. Further, the adder circuit 305 and the register 306 similarly output the signal as the register 306 power Pn.

次にマルチプレクサ３０７．３０８において、それぞれ
Ｍ　＋　ＣＢの値を選択することにより、乗算回路３２
１によシ減算除算回路、？　Ｉ　ｌ側に接続させてさら
に減算除算回路Ｊ　Ｉ　１によシにより、結果すなわちαｎの値をレジスタ３１２にセッ
トし、これを非類似度演算部へ出力する。Next, in the multiplexers 307 and 308, by selecting the value of M + CB, the multiplier circuit 32
Subtraction/division circuit by 1? The result, that is, the value of αn, is set in the register 312 by the subtraction/division circuit J I 1 connected to the I l side, and outputted to the dissimilarity calculation section.

同様に、マルチプレクサ３０７．．３０Ｂにおいてそれ
ぞれＣ１，Ｃ２を選択させ、乗算回路、７０９．３１０
及び切シ換え回路３２０，３２１を減算除算回路３１３
側に切り換え、減算除算回路３１３を使用して行ない、
その結果すなわちｉｎの値をレジスタ３１４にセットす
る。Similarly, multiplexer 307. ．． In 30B, C1 and C2 are selected respectively, and a multiplication circuit, 709.310
and switching circuits 320 and 321 as a subtraction/division circuit 313
switch to the side, use the subtraction/division circuit 313,
The result, that is, the value of in, is set in the register 314.

続いてカウンタ３１５によｐｍを発生させ、乗算回路３
１６によシαｌ’ｍをめ、さ、らに加算回路３１７によ
シαｎ’ｍ＋βｎをめることができる。Next, the counter 315 generates pm, and the multiplication circuit 3
The addition circuit 317 can be used to add αl'm and αn'm+βn to the adder circuit 317.

次に遅延回路３１８によシ遅延した入力データ）［（ｍ
＋ｎ）と加算回路３１７でめたαｎ’ｍ＋βｎの減算を
減算回路３１９によって行なえば、ス被りトル変換され
た入カスベクトルデータＡ（ｍ、ｎ）が入カス波りトル
ノぐターンメモリ５０１に出力される。Next, the input data delayed by the delay circuit 318 )[(m
When the subtraction circuit 319 subtracts αn'm+βn obtained by the addition circuit 317 from be done.

第４図は入力データＸ　（ｍ　＋　ｎ　）　ｒ直線Ｙ＝
αｎ−ｍ＋βｎ。Figure 4 shows input data X (m + n) r straight line Y=
αn−m+βn.

入カスベクトル・母ターンデータＡ（ｍ、ｎ）の関係を
表わした図である。（ｎはある時刻ｌｍ＝１〜Ｍ）Ｙ＝
αｎ’ｒｎ＋βｎはｘ（ｍ、ｎ）の最小２乗近似直線で
あり、Ｘ（ｍ＋１）からαｎ’ｍ＋βｎをさし引いたも
のがＡ（ｍ、ｎ）である。It is a diagram showing the relationship between input waste vector and mother turn data A(m, n). (n is a certain time lm=1~M) Y=
αn'rn+βn is the least squares approximation straight line of x(m, n), and A(m, n) is obtained by subtracting αn'm+βn from X(m+1).

音声区間検出部４０θは音声区間の始端及θム端を検出
し非類似度演算・部に始端検出信号及び終端検出信号を
送るものであり、簡易的な検出法としてはサンプル周期
毎の周波数分析部からのＭ個の分析データの平均値をめ
その値があらかじめ設定された閾値を最初に越えた時点
を始点とし、最後に閾値以下に々った時点を終端とする
検出法がある。The voice section detection section 40θ detects the start end and θm end of the voice section and sends a start end detection signal and an end detection signal to the dissimilarity calculation section.A simple detection method is frequency analysis for each sample period. There is a detection method in which the starting point is the point in time when the average value of M pieces of analysis data from a section first exceeds a preset threshold value, and the ending point is the point in time when the value finally falls below the threshold value.

音声区間検出部４００において、音声め始端が検出され
ると、入カスイクトルデータＡ　（ｍ　＋　ｎ　）・の
入力音声スイクトルパターンメモリ５０１への書き込み
、入力音声の・ぐワー情報Ｐｎの入力音声・ぐワーパタ
ーンメモリ５０８への書き込みが開始される。また音声
の終端が検出されると、入力音声スにクトルパターンメ
モリ５０ノ、入力音声パワーパターンメモリ５０８への
書き込みが打ち切られ、非類似度演算処理が開始される
。入力音声スイクトルノやターンメモリ５０１は２次元
のメモリであり、その要素が入カス４クトルデータＡ（
ｍ、ｎ）（ｍ−１〜Ｍ　、　ｎ　＝　１〜Ｎ）で表わさ
れる。入力音声パワーパターンメモリ５０８は１次元の
メモリであシ、その要素をＩＰ（ｎ）　（ｎ　＝　１〜
Ｎ）で表わす。非類似度演算部５００ではに個の標準音
声と入力音声、との非類似度を計算するが、ここではに
番目の標準音声との非類似度を計算することを考える。When the voice section detection unit 400 detects the start of a voice, it writes the input voice sequence data A (m + n) into the input voice sequence pattern memory 501, and writes the voice information Pn of the input voice into the input voice. - Writing to the warp pattern memory 508 is started. When the end of the voice is detected, writing of the input voice to the vector pattern memory 50 and the input voice power pattern memory 508 is stopped, and dissimilarity calculation processing is started. The input audio switch and turn memory 501 is a two-dimensional memory, and its elements are input scrap data A (
m, n) (m-1 to M, n = 1 to N). The input voice power pattern memory 508 is a one-dimensional memory, and its elements are IP(n) (n = 1 to
N). The dissimilarity calculating unit 500 calculates the dissimilarity between the 5th standard speech and the input speech, and here, it is assumed that the dissimilarity with the 5th standard speech is calculated.

非類似度Ｄｋは次式で表わされる。The degree of dissimilarity Dk is expressed by the following equation.

ここで、Ｓｋ（ｍ、ｎ）はに番目の標準音声のスペクト
ルミ４ターンの要素（ｍ　＝　１〜Ｎ　、ｎ　＝　１〜
Ｎ　）。Ｗ（ｍ、ｎ）は重み決定回路５０５によシ決定
される重み、＝Ｐ　ｋ（ｎ）　（ｎ　＝　１〜Ｎ）は標
準音声にのパワーパターンの要素、ＰＰは入力音声の平
均・やワー、ＡＰｋは標準音の割合を設定するための重
み係数である。Here, Sk (m, n) is the 4-turn element of the spectrum of the second standard voice (m = 1~N, n = 1~
N). W (m, n) is the weight determined by the weight determination circuit 505, = P k (n) (n = 1 to N) is the element of the power pattern of the standard voice, and PP is the average of the input voice. APk is a weighting coefficient for setting the ratio of standard tones.

まず、非類似度計算用アキュムレータ５０５・をゼロク
リアする。First, the dissimilarity calculation accumulator 505 is cleared to zero.

次に、入力音声パワーパターンメモリ５０１　カら入力
音声の要素Ａ　（ｍ　＋’ｎ　）を切シ換え回路５１Ｇ
を通じ読み出し、又、標準音声スペクトルＡターンメモ
リｃｏｏから標準音声にの要素Ｓｋ　（ｍ＋　ｎ　）を
切シ換え回路５１７を通じ読み込み、減算回路５θ２に
よりＡ（ｍ’、　ｎ　）　−Ｓｋ（ｍ　、　ｎ　）を割
算し、絶対値回路５０３により絶対値をとシ、切シ換え
回路５１９を通じ乗算回路５０４によシ重み係数Ｗ（ｍ
、ｎ）を乗じる。重み係数Ｗ（ｍ、ｎ）は重み決定回路
５０５により決定される。重み決定方式については数々
の方式があシ、その例としては、特願昭５６−１８／Ｉ
４１６「音声認識装置」で開示されており、本発明の目
的ではないので説明は省略する。さらに、乗算回路の出
力をアキュムレータ５０５に加算する。ｍ。Next, the input audio element A (m +'n) is switched from the input audio power pattern memory 501 to the switching circuit 51G.
Also, the element Sk (m+n) of the standard voice from the standard voice spectrum A-turn memory coo is read through the switching circuit 517, and A(m', n) - Sk(m, n) is read out by the subtraction circuit 5θ2. is divided, the absolute value is obtained by the absolute value circuit 503, and the weighting coefficient W(m
, n). The weight coefficient W(m, n) is determined by the weight determination circuit 505. There are many methods for determining weights, for example, patent application 18/1982
416 "Voice Recognition Device" and is not the purpose of the present invention, so its explanation will be omitted. Further, the output of the multiplication circuit is added to the accumulator 505. m.

ｎをｍ　＝　１〜Ｍ、ｎ＝１〜Ｎ１で以上の動作をくり
返し、Ｄｋの第１項が計算されることになる。The above operation is repeated with m=1 to M and n=1 to N1 to calculate the first term of Dk.

次に入力音声の平均・ぐワーｐｐ　４計算する。入力音
声パワー・ぐターンメモＩＪ　ｓ　ｏ　ｓから入力音声
のノぐワーノやターンＩＰ（ｎ）、ｎ＝１〜Ｎを読み出
し、加算回路５０９とレノスタ５１０により累算してレ
ノスタ５１θに　ΣＩＰ（ｎ）の値をセットする。この
ｎ＝＋値を除算回路５１ノによ、９Ｎで除し、入力音声の平均
・ぐワーｐｐをめる。ＰＰは次式で表わせる。Next, calculate the average of the input audio. The input voice power and turn IP (n), n = 1 to N are read from the input voice power/turn memo IJ s o s, and are accumulated by the adder circuit 509 and the reno star 510 to the reno star 51θ ΣIP (n) Set the value of This n=+ value is divided by 9N by the division circuit 51 to find the average of the input audio. PP can be expressed by the following formula.

ｐｐｍ、！−景ＩＰ（Ｎ）Ｎ。＝＝４　（７）次に、標準音声平均パワーメモ！Ｊ５１２から標準音声
にの平均パワーＡＰｋを読み出し、減算回路５１３によ
シｐｐからＡＰｋを減じ、パワー補正値ＰＰ−ＡＰｋを
計算する。ppm! -Kei IP (N) N. ==4 (7) Next, standard voice average power memo! The average power APk of the standard voice is read from J512, and APk is subtracted from SHPP by the subtraction circuit 513 to calculate the power correction value PP-APk.

次に、標準音声パワーパターンメモリ５１４から標準音
声にのｉｅワーパターンｐｋ（ｎ）を加算回路５１５に
よシパワー補正値’（ｐｐ−Ａｐｋ）と加算する。Next, the ie power pattern pk(n) of the standard voice from the standard voice power pattern memory 514 is added to the power correction value '(pp-Apk) by the adding circuit 515.

加算結果は（Ｐｋ（ｎ）＋　（ＰＰ　−ＡＰｋ）　）と
なる。The addition result is (Pk(n)+(PP-APk)).

一方、入力音声パワーパターンメモリ５０Ｂから入力音
声ノセワーパターンＩＰ（ｎ）（ｎ＝１　、　Ｎ）を切
シ換え回路５１７を通じて読み出し、切シ換え回路５１
７によシ加算回路５１５の出力を選択し、減算回路５０
２でＩＰ（ｎ）　（Ｐｋ（ｎ）＋　（ＰＰ　−ＡＰｋ）
　）を計算し、絶対値回路５０３で絶対値をとる。On the other hand, the input voice power pattern IP(n) (n=1, N) is read from the input voice power pattern memory 50B through the switching circuit 517, and the switching circuit 51
7, selects the output of the adder circuit 515, and selects the output of the subtracter circuit 50.
2 and IP(n) (Pk(n) + (PP - APk)
), and the absolute value is taken by the absolute value circuit 503.

次に定数前発生回路５０６から定数詐を出力し、切シ換
え回路５１’８を通じ乗算回路５θ４によシ絶対値回路
の出力に乗じ′アキュムレータ５０５に加算していく。Next, a constant value is output from the pre-constant generation circuit 506, which is multiplied by the output of the absolute value circuit by the multiplier circuit 5θ4 through the switching circuit 51'8 and added to the accumulator 505.

ｎを１〜Ｎ−ｊで変化させてアキュムレータへの加算が
終了したら加算結果を非類似度演算結果として判定部７
００に出力する。判定部７００では非類似度が最も小さ
い標準音声のカテコゝりを認識結果とする。定数ｗｐの
値はシュミレーションの結果１／２〜２程度が最適であ
る。When the addition to the accumulator is completed by changing n from 1 to N-j, the determination unit 7 uses the addition result as a dissimilarity calculation result.
Output to 00. The determination unit 700 takes the categorization of the standard speech with the smallest degree of dissimilarity as the recognition result. As a result of simulation, the optimum value of the constant wp is about 1/2 to 2.

第５図はパワーパターンの比較を非類似度に組込む場合
の重み係数饗の値を決定するために行なイ、イイエ）を
学習し、標準音声・ぐターンを作成し評価したものであ
る。この時標準音声パターン数を１９２パターンとして
、■を０〜４１で変化させている。FIG. 5 shows a standard speech pattern created and evaluated by learning the steps 1 and 2 performed in order to determine the value of the weighting coefficient when incorporating power pattern comparison into dissimilarity. At this time, the number of standard voice patterns is 192 patterns, and ■ is varied from 0 to 41.

このように従来の非類似度演算部（Ｖｔ／Ｐ　＝　Ｏに
相当する）に比較して明らかに認識率が向上しｗｐの値
は１／２〜２が最適であることがわかる。In this way, it can be seen that the recognition rate is clearly improved compared to the conventional dissimilarity calculation unit (corresponding to Vt/P=O), and the optimum value of wp is 1/2 to 2.

以上説明したように、第１の実施例では、通常のパター
ンマツチングに加え音声のパワーパターンを比較してい
る。As explained above, in the first embodiment, in addition to normal pattern matching, audio power patterns are compared.

第６図は「イチ」と「二」の音声の・ぞワーを比較した
図である。「チ」は無声破裂音であるために「イ」と「
チ」の間は無音になる。一方「二」の方はパワーが連続
しているので、例えば「イチ」と発声された入力音声パ
ターンと、「二」の標準音声ｉ？ターンを本発明による
非類似度演算部で比較すれば、従来よりも非類似度が大
きくなる。Figure 6 is a diagram comparing the sounds of ``ichi'' and ``ni''. Since “chi” is a voiceless plosive, “i” and “
There will be no sound during "ch". On the other hand, the power of "two" is continuous, so for example, the input voice pattern of "ichi" and the standard voice i? of "two" are uttered. When the turns are compared using the dissimilarity calculation unit according to the present invention, the dissimilarity becomes larger than that of the conventional method.

又、「二」と発生された入力音声パターンを「二」の標
準音声パターンと比較すれば、両方とも単語内の無音区
間は存在しないし、声の大きさが異なったとしても、音
声の平均パワーが等しくなるように・やワー正規化して
いるため、非類似度は大きくならない。Also, if you compare the input speech pattern generated as "2" with the standard speech pattern for "2", you will find that there is no silent section within the word in both cases, and even if the voice volume is different, the average of the speech is Since the powers are normalized somewhat so that the powers are equal, the degree of dissimilarity does not increase.

したがって、「二」の標準音声パターンと「イチ」と発
声された音声との非類似度はよシ犬きくなシ、「二」と
発声された音声との非類似度はほとんど変化しないので
、誤認識が少なくなシ認識率が向上する。これらの関係
は「ゴ」と「ロク」、「ハイ」と「ハチ」の間でも成立
する。Therefore, the degree of dissimilarity between the standard speech pattern of ``two'' and the voice uttered as ``ichi'' is yoshiinu kikunashi, and the degree of dissimilarity between the voice uttered as ``two'' hardly changes. The recognition rate is improved with fewer misrecognitions. These relationships also exist between ``go'' and ``roku,'' and between ``hai'' and ``hachi.''

（発明の効果）この発明は通常のパターンマツチングに加え、パワー正
規化した形で音声のノ４ワーノ４ターンを比較し、非類
似度を演算しているので、「イチＪと「二」、「ゴ」と
「ロク」、「ハイ」と「ハチ」などの間の誤認識が少々
く、認識率が向上するので、音声認識応答システムに利
用することができる。(Effect of the invention) In addition to normal pattern matching, this invention compares the four turns of speech in a power-normalized form and calculates the degree of dissimilarity. , ``go'' and ``roku'', ``hai'' and ``hachi'', etc. are less likely to be misrecognized, and the recognition rate is improved, so it can be used in voice recognition response systems.

[Brief explanation of the drawing]

第１図は従来の音声認識装置のプＯツク図、第２図は音
声ｉｅターンの例、第３図はこの発明による音声認識装
置の一実施例を示した図、第４図は入力データＸ　（ｍ
　＋　ｎ　）と入カス被りトルパターンデータＡ、　（
ｍ　、　ｎ　）との関係を示した図、第５図は重み係数
詐決定のためのシーミレー７ヨン結果、第６図はパワー
パターンの例である。１００・・入力端子、２００・・周波数分析部。３００・・スペクトル変換部、４００・・・音声区間決
定部、５００・・・非類似度演算部、５０１・・入力音
声スペクトルパターンメモリ、５ｏ２・・・渥、算回路
、５０３・・・絶対値回路、５ｏ４・・・乗算回路、５
０５・・・重み決定回路、５０６・・・定数発生回路、
５ｏｚ・・・アキュムレータ、５ｏ８・・・入力音声パ
ワーｉｅターンメモリ、５０９・・・加算回路、５１０
’・レジスタ、５１１・・・除算回路、５１２・・・標
準音声平均パワーメモリ、５１３・・・減算回路、５１
４・・・標準音声パワーパターンメモリ、５１５・・・
加ｊｆＥ　回路、５１６．５１７，５１８・・・切り換
え回路、６００・・・標準音声ス被りトルパターンメモ
リ、７００・・・判定部。特許出願人　沖電気工業株式会社第１図第２図二　丁イチ　ｔ＋７八呼 →彫を数 ↓ ｉ１８問ル（ｂｌ第５図Ｐ第６図手続補正書輸幻昭和　−９°１１゛月１　日特許庁長官　殿１、事件の表示昭和５８年　特　許　願第１２６２３７　号２、発明の
名称音声認識方法３、補正をする者事件との関係　特許出願人任　所（〒１０５）　東京都港区虎ノ門１丁目７番１２
号４代理人住　所（〒１０５）　東京都港区虎ノ門１丁目７番１２
号６、補正の内容　別紙。とおシロ、補正の内容（１）明細書第９頁第１６行、第１７行、第１８行目、
第１０頁第２行、第４行、第５行、第９行目に「検出」
とあるのを「決定」と補正する。（２）同書第１１頁式（６）を次のとおシ補正する。＋Σｌ　ＩＰ（ｎ）　−（Ｐｋ（ｎ）　＋（ＰＰ　ＡＰ
ｋ））　ｌｘｗｐｉ１＝１・・・・・・（６）（３）開明第１４行第２行、第６行目、第１６頁第１１
行目に［シュミレーションｊとあるのを「シミュレーシ
ョン」と補正する。（４）図面「第２図」を別紙のとおり補正する。Fig. 1 is a block diagram of a conventional speech recognition device, Fig. 2 is an example of a speech ie turn, Fig. 3 is a diagram showing an embodiment of a speech recognition device according to the present invention, and Fig. 4 is a diagram showing input data. X (m
+ n ) and incoming scrap overlap pattern data A, (
m, n), FIG. 5 is a diagram showing the sea milling result for determining the weighting coefficient fraud, and FIG. 6 is an example of the power pattern. 100...Input terminal, 200...Frequency analysis section. 300... Spectrum conversion section, 400... Speech interval determination section, 500... Dissimilarity calculation section, 501... Input speech spectrum pattern memory, 5o2... Atsushi, arithmetic circuit, 503... Absolute value Circuit, 5o4... Multiplication circuit, 5
05... Weight determination circuit, 506... Constant generation circuit,
5oz...Accumulator, 5o8...Input audio power ie turn memory, 509...Addition circuit, 510
'・Register, 511...Division circuit, 512...Standard audio average power memory, 513...Subtraction circuit, 51
4...Standard audio power pattern memory, 515...
Addition jfE circuit, 516, 517, 518... Switching circuit, 600... Standard audio overlap pattern memory, 700... Judgment unit. Patent Applicant: Oki Electric Industry Co., Ltd. Figure 1 Figure 2 Figure 2 Dongichi t + 7 Eight calls → Number of carvings ↓ i18 questions (bl Figure 5 P Figure 6 Procedural Amendments Transfer Showa -9°11゛ Month 1 Commissioner of the Japan Patent Office 1. Indication of the case Patent Application No. 126237 of 1982 2. Name of the invention Speech recognition method 3. Relationship with the person making the amendment case Patent applicant's office (105) Minato-ku, Tokyo Toranomon 1-7-12
No. 4 Agent address (105) 1-7-12 Toranomon, Minato-ku, Tokyo
Item 6. Contents of amendment Attachment. Toshiro, Contents of amendment (1) Page 9, lines 16, 17, and 18 of the specification,
"Detection" on page 10, lines 2, 4, 5, and 9
I corrected it to ``decision''. (2) Formula (6) on page 11 of the same book is amended as follows. +Σl IP(n) −(Pk(n) +(PP AP
k)) lxwpi1=1 ・・・・・・(6) (3) Kaimei line 14, line 2, line 6, page 16, number 11
In the first line, [Simulation j is corrected to ``Simulation.'' (4) Amend the drawing “Figure 2” as shown in the attached sheet.

Claims

[Claims] A process of creating a power no P turn of an input voice, a process of creating a swictor pattern of the input voice normalized by a swictor slope, and a spectral A' turn prepared in advance of a standard voice. A process of calculating a first degree of dissimilarity by performing tsukturn matching with the spectrum/mother turn of the input speech, and a step of matching the average power, power and After normalizing the input voice based on the average number of words, the normalized iQ word/e turn is matched with the power number and turn of the input voice, and a second dissimilarity is calculated. and a step of adding a weight of (i to 2) to the first dissimilarity and then adding it to the second dissimilarity, and the added value is added to the input speech and the standard speech. A speech recognition method characterized by recognizing input speech as a degree of dissimilarity between the input speech and the input speech.