JPH0120438B2

JPH0120438B2 -

Info

Publication number: JPH0120438B2
Application number: JP58056716A
Authority: JP
Inventors: Yasuo Sato; Takayuki Fujimoto; Mitsuo Furumura; Hiroo Tanaka; Koji Tajima; Takahisa Kimura
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1983-03-31
Filing date: 1983-03-31
Publication date: 1989-04-17
Also published as: JPS59181398A

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は、複数個の単語あるいは音節が連続し
ている入力音声の認識方式に関し、特に単語ある
いは音節間に調合結合が生じている場合の認識精
度を加善するための方式に関する。[Detailed Description of the Invention] [Technical Field of the Invention] The present invention relates to a recognition method for input speech in which a plurality of words or syllables are consecutive, and particularly to recognition when a conjunctive combination occurs between words or syllables. This article relates to a method for improving accuracy.

[Technology background]

一般に複数個の単語あるいは音節が連続して発
声される場合、特に早口で発声されるほど、隣合
う単語あるいは音節の端部に、調音結合と呼ばれ
る変形および短縮が生じ、それにより、用意され
ている標準パターンとのマツチング精度が低下す
るという問題があつた。この場合の解決方法とし
て、すべての標準パターンについて、予め調音結
合による変化を登録しておくことが考えられる
が、調音結合を生じる標準パターン同士の組合わ
せと、調音結合の深さによる変化が多すぎて、実
際上困難であつた。 In general, when multiple words or syllables are uttered in succession, especially the faster they are uttered, the ends of adjacent words or syllables undergo transformations and shortenings called articulatory combinations, which result in preparedness. There was a problem that the matching accuracy with the existing standard pattern decreased. A possible solution to this case would be to register in advance changes due to articulatory combinations for all standard patterns, but there are many changes due to combinations of standard patterns that cause articulatory combinations and the depth of articulatory combinations. So much so that it was actually difficult.

[Object and structure of the invention]

本発明の目的は、調音結合を含む連続音声入力
パターンの認識において、簡易な方法で標準パタ
ーンに擬似的な調音結合変化を導入し、入力パタ
ーンとのマツチング精度を向上させることにあ
る。 An object of the present invention is to improve the accuracy of matching with input patterns by introducing pseudo articulatory combination changes into standard patterns using a simple method in recognition of continuous speech input patterns including articulatory combinations.

本発明は、そのための構成として、複数個の単
語あるいは音節を連続して発声した未知入力音声
を分析して得られた音響的特徴を表す入力パター
ンの各部分に、前以つて記憶しておいた単語音声
あるいは音節の標準パターンを必要な個数だけマ
ツチングさせる際に、各隣合う標準パターンの端
部同士がその境界で重複している場合、該重複部
の標準パターンとして各標準パターンの重複部の
パターン同士から求めた平均化パターンを使用す
ることに依り、入力パターンとの類似度を算出
し、該類似度が最大となる標準パターン系列をも
とめ、得られた標準パターン系列に対応する単語
あるいは音節系列を認識結果として出力すること
を特徴としている。 To this end, the present invention has a structure in which acoustic features obtained by analyzing unknown input speech in which a plurality of words or syllables are successively uttered are stored in each part of an input pattern in advance. When matching the required number of standard patterns of word sounds or syllables, if the edges of adjacent standard patterns overlap at the boundary, the overlapping part of each standard pattern is used as the standard pattern of the overlapping part. By using the averaged pattern obtained from the patterns of It is characterized by outputting a syllable sequence as a recognition result.

[Embodiments of the invention]

以下に、本発明の詳細を図にしたがつて説明す
る。 The details of the present invention will be explained below with reference to the drawings.

第１図は、連続音声入力パターンにおける調音
結合変化とそのパターンマツチングの説明図であ
る。入力パターン中の部分パターンC₁およびC₂
同士の隣接する幅Ｑの区間（m₁、l₂）は、調音結
合により変形し、かつ短縮されており、そのた
め、本来、入力部分パターンC₁、C₂とそれぞれ
マツチングするべき標準パターンＡ，Ｂは、その
端部同士が重複した形となり、しかもパターンＡ
の終端部P_eAあるいはパターンＢの始端部P_SBは、
いずれも入力パターン中の調音結合変形区間Ｑ
（m₁、l₂）に対するマツチング特性が悪くなり、
瞹昧さを増やす原因となる。 FIG. 1 is an explanatory diagram of articulatory combination changes and pattern matching in continuous speech input patterns. Subpatterns C ₁ and C ₂ in the input pattern
The adjacent sections (m ₁ , l ₂ ) of width Q are deformed and shortened by articulatory combination, and therefore, the standard patterns A, which should originally be matched with the input partial patterns C ₁ and C ₂ , respectively. B has a shape in which the ends overlap each other, and pattern A
The terminal end P _eA of pattern B or the start end P _SB of pattern B is
In both cases, the articulatory combination deformation section Q in the input pattern
The matching characteristics for (m ₁ , l ₂ ) deteriorate,
This causes an increase in ambiguity.

第２図ａ，ｂ，ｃは、本発明の基本原理の説明
図である。ａは隣接する２つの標準パターンn₁、
n₂を示す。ｂは標準パターンn₁、n₂を、幅Ｐ＝５
で重複させ、またｃは幅Ｐ＝10で重複させたもの
である。重複区間内のパターンは、擬似的に生成
した調音結合部パターンである。すなわち、u₁お
よびu₂で例示されるように、予め、認識の前処理
の中で、すべての標準パターンの２個（たとえば
n₁、n₂）の組み合わせについて、その端部同士を
重複させ、平均化して作成されたものである。 FIGS. 2a, b, and c are explanatory diagrams of the basic principle of the present invention. a is two adjacent standard patterns n ₁ ,
Indicates n ₂ . b is the standard pattern n ₁ , n ₂ , width P = 5
c is overlapped with width P=10. The pattern within the overlapping section is a pseudo-generated articulatory joint pattern. That is, as exemplified by u ₁ and u ₂ , two of all standard patterns (for example,
It is created by overlapping the ends of the combinations n ₁ , n ₂ ) and averaging them.

平均化パターンは、２つのパターンを滑らかに
つなぐ方法、直線で結ぶ方法、各位置で単純に算
術平均する方法等種々の方法で作成することがで
きる。 The averaging pattern can be created in various ways, such as by connecting two patterns smoothly, by connecting them with a straight line, or by simply performing arithmetic averaging at each position.

次に、上記平均化パターンを用いた認識処理の
実施例について説明する。 Next, an example of recognition processing using the above averaged pattern will be described.

第３図に示すように、標準パターンn_iの始端
P_i-1フレームおよび終端P_iフレームを除いたもの
と、入力の部分パターンC_i-1（ｌ、ｍ）との距離
を、Ｄ（ｌ、ｍ、n_i、P_i-1、P_i）とする。 As shown in Figure 3, the starting point of the standard pattern n _i
The distance between the input partial pattern C _i- _{1 (l, m) and the input partial pattern C i-1} (l, m) excluding the P i-1 frame and the terminal P _i frame is D(l, m, n _i , P _i-1 , P _i ).

さらに標準パターンn_i-1の終端P_i-1フレームと、
標準パターンn_iの始端P_i-1フレームとにより求め
た平均化パターン（P_i-1フレーム）と、調音結合
区間に相当する入力部分パターンC_i′（l′、m′）
との距離を、Dm（l′、m′、n_i-1、n_i、P_i-1）とす
る。 Furthermore, the terminal P _i-1 frame of the standard pattern n _i-1 ,
The averaged pattern (P _i-1 frame) obtained from the starting point P _i-1 frame of the standard pattern n _i and the input partial pattern C _i ′ (l′, m′) corresponding to the articulatory combination section
_Let _the distance from _the

各ｌ、ｍ、l′、m′を、それぞれ図示のように
l′_i+1、l_i、l_i-1＋１、l_i′で一般化したとき、パター
ン間の最小累積距離S₀は次式により求めることが
できる。 Each l, m, l', m' as shown in the figure.
When generalized by l' _i+1 , l _i , l _i-1 +1, l _i ', the minimum cumulative distance S ₀ between patterns can be determined by the following equation.

S₀＝ min ｋ_k 〓ⁱ⁼¹ l₁、ｌminⁱ⁼¹ l₁、ｌmin ｉ，ni，Pi、〔Dm（l_i-1＋１、l_i′、n_i-1、n_i、P_i-1）
＋Ｄ（l_i′＋１、l_i、n_i、P_i-1、P_i）〕…(1) このS₀を与えるn_i（ｉ＝１、２、…ｋ）が認識
結果となる。S ₀ = min k _k 〓 ⁱ⁼¹ l ₁ , lmin ⁱ⁼¹ l ₁ , lmin i, ni, Pi, [Dm(l _i-1 +1, l _i ′, n _i-1 , n _i , _{Pi -1} )
+D(l _i '+1, l _i , n _i , P _i-1 , P _i )]...(1) n _i (i=1, 2,...k) which gives this S ₀ becomes the recognition result.

第４図は実施例システムの構成図であり、図
中、１は標準パターン記憶部、２は標準パターン
n_iから前処理で作成した平均化パターンの記憶
部、３は認識対象の入力パターンの記憶部、４は
(1)式の〔〕内のDm＋Ｄを計算する距離計算
部、５はその距離計算結果の最小値計算部、６は
最小累積距離S₀を与えるn_i（ｉ＝１、２、３、…
ｋ）を決定する類似度計算部である。類似度計算
部６の処理結果は認識出力として取り出される。 FIG. 4 is a configuration diagram of the embodiment system, in which 1 is a standard pattern storage unit, 2 is a standard pattern storage unit, and 2 is a standard pattern storage unit.
3 is a storage unit for the averaging pattern created from n _i in preprocessing, 3 is a storage unit for the input pattern to be recognized, 4 is
In equation (1), the distance calculation unit calculates Dm+D in [ ], 5 is the minimum value calculation unit for the distance calculation result, and 6 is the minimum cumulative distance S ₀ n _i (i=1, 2, 3,...
k) is a similarity calculation unit that determines k). The processing result of the similarity calculation unit 6 is taken out as a recognition output.

次に、(1)式の具体的な計算手順について述べ
る。 Next, we will discuss the specific calculation procedure for equation (1).

まず、 D^（l₁、ｍ、n₁、n₂、P₁、P₂）＝ min min ｊ〔Dm（ｌ、ｊ、n₁、z₂、P₁）＋Ｄ（ｊ＋１、ｍ、n₂、
P₁、P₂）〕…(2) を求める。次に、Ｓ（１、ｉ、ｎ、Ｐ）＝Ｄ（１、ｉ、ｎ、ｏ、Ｐ） …(3) を初期値として、以下の漸化式を解くＳ（ｋ、ｉ、ｎ、Ｐ）＝ min min ｊ，n′，P′｛Ｓ（（ｋ−１、ｊ、n′、P′）＋D^（ｊ
＋１、ｉ、n′、ｎ、P′、Ｐ）｝…(4) 上記Ｓ（ｋ、ｉ、ｎ、Ｐ）を求めるとき、同時
に、Ｂ（ｋ、ｉ、ｎ、Ｐ）＝ argmin ｊ min min n′，P′｛Ｓ（ｋ−１、ｊ、n′、P′）＋D^（ｊ＋１、
ｉ、n′、ｎ、P′、Ｐ）｝…(5) Ｎ（ｋ−１、ｉ、ｎ、Ｐ）＝ argmin n′ min min j′，P′｛Ｓ（ｋ−１、ｊ、n′、P′）D^（ｊ＋１、ｉ
、n′、ｎ、P′、Ｐ）｝…(6) Ｐ（ｋ、ｉ、ｎ、Ｐ）＝ argmin P′ min min ｊ，n′｛Ｓ（ｋ−１、ｊ、n′、P′）D^（ｊ＋１、ｉ
、n′、ｎ、P′、Ｐ）｝…(7) を求めておき、記憶しておく。 First, D^(l ₁ , m, n ₁ , n ₂ , P ₁ , P ₂ ) = min min j [Dm(l, j, n ₁ , z ₂ , P ₁ )+D(j+1, m, n ₂ ,
P ₁ , P ₂ )]...Find (2). Next, with S(1, i, n, P) = D(1, i, n, o, P) ...(3) as the initial value, solve the following recurrence formula S(k, i, n, P) = min min j, n', P'{S((k-1, j, n', P') + D^(j
+1, i, n', n, P', P)}...(4) When calculating the above S(k, i, n, P), at the same time, B(k, i, n, P) = argmin j min min n', P'{S (k-1, j, n', P') + D^ (j+1,
i, n', n, P', P)}...(5) N(k-1, i, n, P) = argmin n' min min j', P'{S(k-1, j, n ′, P′) D^(j+1, i
, n', n, P', P)}...(6) P(k, i, n, P) = argmin P' min min j, n'{S(k-1, j, n', P' )D^(j+1,i
, n', n, P', P)}...(7) and memorize it.

これにより最小累積距離S₀は、 S₀＝ min ｋ，ｎＳ（k′、Ｉ、ｎ、ｏ） …(8) により与えられる。 As a result, the minimum cumulative distance S ₀ is given by S ₀ = min k, nS (k', I, n, o) (8).

さらに、認識結果を得るためには、 k₀＝ argmin ｋ min ｎＳ（ｋ、Ｉ、ｎ、ｏ） …(9) Nk₀＝ argmin ｎＳ（k₀、Ｉ、ｎ、ｏ） …(10) とし、次にｋ＝k₀、ｉ＝Ｉ、Ｐ＝ｏ …(11) を初期値として N_k-1＝Ｎ（ｋ−１、ｉ、N_k、Ｐ） …(12) を求め、次にｉ、Ｐ、ｋを以下のように同時に置
き換える。 Furthermore, in order to obtain the recognition result, k ₀ = argmin k min nS (k, I, n, o) ...(9) Nk ₀ = argmin nS (k ₀ , I, n, o) ... (10) , then use k=k ₀ , i=I, P=o...(11) as initial values to find N _k-1 =N(k-1, i, N _k , P)...(12), and then Replace i, P, and k simultaneously as follows.

ｉ←Ｂ（ｋ、ｉ、N_k、Ｐ）Ｐ←Ｐ（ｋ、ｉ、N_k、Ｐ）ｋ←ｋ←１ …(13) ｋ２ならば(12)式から繰り返す。 i←B(k, i, _Nk , P) P←P(k, i, _Nk , P) k←k←1...(13) If k2, repeat from equation (12).

このようにして得られた N₁、N₂、…、N_k0 が認識結果となる。 N ₁ , N ₂ , . . . , N _k0 obtained in this way are the recognition results.

〔Effect of the invention〕

以上のように、本発明によれば、入力パターン
が調音結合を含んでいても、比較的容易に精度の
よいパターンマツチングを行なうことができる。 As described above, according to the present invention, even if the input pattern includes an articulatory combination, accurate pattern matching can be performed relatively easily.

[Brief explanation of drawings]

第１図は調音結合をもつ入力パターンのパター
ンマツチングの説明図、第２図は本発明にもとづ
く平均化パターンの説明図、第３図は本発明にも
とづく認識処理の説明図、第４図は実施例システ
ムの構成図である。図中、１は標準パターン記憶部、２は平均化パ
ターン記憶部、３は入力パターン記憶部、４は距
離計算部、５は最小値計算部、６は類似度計算部
を表わす。 Fig. 1 is an explanatory diagram of pattern matching of input patterns with articulatory combinations, Fig. 2 is an explanatory diagram of the averaging pattern based on the present invention, Fig. 3 is an explanatory diagram of recognition processing based on the present invention, and Fig. 4 is an explanatory diagram of the recognition processing based on the present invention. 1 is a configuration diagram of an example system. In the figure, 1 represents a standard pattern storage section, 2 an averaging pattern storage section, 3 an input pattern storage section, 4 a distance calculation section, 5 a minimum value calculation section, and 6 a similarity calculation section.

Claims

[Claims] 1. Words previously stored in each part of an input pattern representing acoustic features obtained by analyzing unknown input speech in which a plurality of words or syllables are continuously uttered. When matching the required number of standard patterns of sounds or syllables, if the edges of adjacent standard patterns overlap at the boundary, the pattern of the overlapping part of each standard pattern is used as the standard pattern of the overlapping part. By using the averaged pattern obtained from each other, the similarity with the input pattern is calculated, the standard pattern sequence with the maximum similarity is found, and the word or syllable sequence corresponding to the obtained standard pattern sequence. A continuous speech recognition method characterized by outputting as a recognition result. 2. In the continuous speech recognition method described in item 1 above, when matching, all standard partial patterns whose starting and ending parts are removed within a predetermined length of each standard pattern and all input arbitrary lengths are used for matching. All averaged patterns of combinations of equal length and all parts of arbitrary length of the input, calculated from the partial similarity with the partial pattern and the end and start parts of all standard patterns within a predetermined length. A continuous speech recognition method characterized by calculating an averaged pattern part similarity with a pattern and then determining an optimal standard pattern sequence using dynamic programming.