JPH0120439B2

JPH0120439B2 -

Info

Publication number: JPH0120439B2
Application number: JP58056717A
Authority: JP
Inventors: Takayuki Fujimoto; Yasuo Sato; Mitsuo Furumura; Hiroo Tanaka; Koji Tajima; Takahisa Kimura
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1983-03-31
Filing date: 1983-03-31
Publication date: 1989-04-17
Also published as: JPS59181399A

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は、複数個の単語あるいは音節が連続し
ている入力音声の認識方式に関し、特に単語ある
いは音節間に調音結合が生じている場合の認識精
度を改善するための方式に関する。[Detailed Description of the Invention] [Technical Field of the Invention] The present invention relates to a recognition method for input speech in which a plurality of words or syllables are consecutive, and in particular to recognition when an articulatory combination occurs between words or syllables. Concerning a method for improving accuracy.

[Technology background]

一般に複数個の単語あるいは音節が連続して発
声される場合、特に早口で発声されるほど隣合う
単語あるいは音節の端部に、調音結合と呼ばれる
変形が生じる。たとえば、〔aKa〕と〔aSa〕と
を続けて発声すると〔aKaaSa〕となり、中間の
２つのaaが結合して短くなる傾向を示す。その
ため、用意されている標準パターンとのマツチン
グ精度が低下するという問題があつた。この場合
の解決方法として、すべての標準パターンについ
て、予め調音結合による変化を登録しておくこと
が考えられるが、調音結合を生じる標準パターン
同士の組合わせと、調音結合の深さによる変化が
多いことから、実際上困難であつた。 Generally, when a plurality of words or syllables are uttered in succession, a deformation called articulatory combination occurs at the ends of adjacent words or syllables, especially when uttered rapidly. For example, when [aKa] and [aSa] are uttered consecutively, it becomes [aK aa Sa], and the two aas in the middle are combined and tend to become shorter. Therefore, there was a problem in that the matching accuracy with the prepared standard pattern decreased. A possible solution to this case would be to register in advance the changes due to articulatory combinations for all standard patterns, but most of the changes depend on the combination of standard patterns that cause articulatory combinations and the depth of the articulatory combinations. Therefore, it was difficult in practice.

[Object and structure of the invention]

本発明の目的は、調音給合を含む連続音声入力
パターンの認識において、簡易な方法で擬似的な
調音結合変化の標準パターンを作成し、入力パタ
ーンとのマツチング精度を向上させることにあ
る。 An object of the present invention is to create a standard pattern of pseudo articulatory combination changes using a simple method in recognition of continuous speech input patterns including articulatory combinations, and to improve matching accuracy with input patterns.

本発明は、そのための構成として、複数個の単
語あるいは音節を連続して発声した未知入力音声
を分析して得られた音響的特徴を表す入力パター
ンの各部分に、前以つて記憶しておいた単語／音
節音声の標準パターンを必要な個数だけマツチン
グさせる連続音声認識方式において、該マツチン
グに際して、各隣合う標準パターンの端部同志が
その境界で離隔している場合、該離隔部の標準パ
ターンとして標準パターンの端部同士から求めた
補間パターンを適用することにより、入力パター
ンとの類似度を算出し、該類似度が最大となる標
準パターン系列をもとめ、得られた標準パターン
系列に対応する単語あるいは音節系列を認識結果
として出力することを特徴としている。 To this end, the present invention has a structure in which acoustic features obtained by analyzing unknown input speech in which a plurality of words or syllables are successively uttered are stored in each part of an input pattern in advance. In a continuous speech recognition method that matches the required number of standard patterns of word/syllable sounds, if the ends of each adjacent standard pattern are separated by the boundary, the standard pattern of the separated part is By applying the interpolated pattern obtained from the ends of the standard pattern as It is characterized by outputting words or syllable sequences as recognition results.

[Embodiments of the invention]

以下に、本発明の詳細を図にしたがつて説明す
る。 The details of the present invention will be explained below with reference to the drawings.

第１図は、連続音声入力パターンにおける調音
給合変化の説明図である。横軸が入力パターン、
縦軸が標準パターン系列を示している。入力パタ
ーン中の部分パターンC₁およびC₂同士の隣接す
る幅Ｐの区間（m₁、l₂間）は、調音結合により変
形し、かつ短縮されておりそのため、本来入力部
分パターンC₁，C₂にそれぞれマツチングさせる
べく用意されている標準パターンＡ，Ｂは、その
端部同士が重複した形となり、しかもパターンＡ
の終端部P_eAあるいはパターンＢの始端部P_SBは、
いずれも入力パターン中の調音給合変形部分m₁，
l₂に対するマツチング特性が悪くなり、曖昧さを
増やす原因となる。 FIG. 1 is an explanatory diagram of changes in articulation ratio in a continuous speech input pattern. The horizontal axis is the input pattern,
The vertical axis shows the standard pattern series. The adjacent section of width P (between m ₁ and l ₂ ) between partial patterns C ₁ and C ₂ in the input pattern is deformed and shortened by articulatory combination, and therefore, the input partial patterns C ₁ , C Standard patterns A and _B , which are prepared to be matched to pattern A and B, respectively, have overlapping ends, and pattern A
The terminal end P _eA of pattern B or the start end P _SB of pattern B is
In both cases, the articulatory supply deformation part m ₁ in the input pattern,
The matching characteristics for l ₂ deteriorate, causing increased ambiguity.

本発明では、このため標準パターンとして、調
音結合にもとづく変形を強く受けやすいパターン
端部、すなわち、たとえば第１図の標準パターン
ＢについてはP_SBおよびB_eBを予め除去し、中央部
のみの幅の狭いパターンを使用するものである。
しかし、その結果パターンマツチングを行なうと
き、各標準パターン間に空隙部が生じる。これは
入力パターン中の調音結合で変形した区間Ｑに対
応している。そこで、調音結合区間に対する標準
パターンを予め擬似的に作成する。すなわち、２
つの幅の狭い標準パターンのすべての組み合わせ
について、それらの端部同士を結ぶ補間パターン
を作成しておき、パターンマツチング時に、幅の
狭い標準パターン間の空隙部に、適合する補間パ
ターンを選択して嵌めこむようにするものであ
る。 For this reason, in the present invention, as a standard pattern, P _SB and B _eB are removed in advance from the edge part of the pattern that is highly susceptible to deformation based on articulatory coupling, for example, standard pattern B in FIG. 1, and the width of only the central part is reduced. It uses a narrow pattern of
However, as a result, when pattern matching is performed, gaps are created between each standard pattern. This corresponds to the section Q transformed by articulatory combination in the input pattern. Therefore, a standard pattern for the articulatory combination section is created in advance in a pseudo manner. That is, 2
For all combinations of two narrow standard patterns, interpolation patterns are created that connect their ends, and during pattern matching, a matching interpolation pattern is selected for the gap between the narrow standard patterns. It is designed so that it fits into place.

第２図は、本発明による補間パターンの説明図
であり、ｎ，n₂は幅の狭い標準パターン、n_cは補
間パターンである。補間の方法は、直線補間ある
いは２次補間等の適当なものが使用できる。また
補間パターンの長さは、（n₁，n₂）により変化さ
せてもよい。 FIG. 2 is an explanatory diagram of an interpolation pattern according to the present invention, where n and n ₂ are narrow standard patterns, and n _c is an interpolation pattern. As the interpolation method, an appropriate method such as linear interpolation or quadratic interpolation can be used. Further, the length of the interpolation pattern may be changed depending on (n ₁ , n ₂ ).

次に、このような補間パターンを用いた連続音
声認識の実施例について説明する。 Next, an example of continuous speech recognition using such an interpolation pattern will be described.

第３図に示すように、両端を除いた標準パター
ンn_iと入力の部分パターンC_i（ｌ、ｍ）との距離
をＤ（ｌ、ｍ、n_i）とし、補間パターンn_i′と入力
パターンの対応する調音結合区間C_i′（l′、m′）
との距離をD_n（l′、m′、n_i−１、n_i）とする。この
とき、最小累積距離をS₀とし、ｌ、ｍ、l′、m′を
それぞれl_i′＋１、l_i、l_i-1＋１、l_i′で一般化する
と、次式によつて求められる。 As shown in Figure 3, the distance between the standard pattern n _i excluding both ends and the input partial pattern C _i (l, m) is D(l, m, n _i ), and the interpolation pattern n _i ' and the input The corresponding articulatory combination interval C _i ′(l′, m′) of the pattern
Let the distance between the two points be D _n (l', m', n _i -1, n _i ). In this case, if the minimum cumulative distance is S ₀ and l, m, l', and m' are generalized by l _i '+1, l _i , l _i-1 +1, and l _i ', respectively, it can be obtained by the following equation. It will be done.

S₀＝ min ｋ _k 〓_i-1 min min li，l′i，ni〔D_n（l_i-1＋１、l_i′、n_i−１、n_i）＋Ｄ
（l_i′＋１、l_i、n_i）〕……(1) そして、このS₀を与えるn_i（ｉ＝１、２、…、
ｋ）を、認識結果とする。 S ₀ = min k _k 〓 _i-1 min min li, l′i, ni [D _n (l _i-1 +1, l _i ′, n _i −1, n _i ) + D
(l _i '+1, l _i , n _i )]...(1) Then, n _i (i=1, ₂ ,...,
Let k) be the recognition result.

第４図は実施例システムの構成図であり、図
中、１は標準パターン記憶部、２は標準パターン
n_iから前処理で作成した補間バターンの記憶部、
３は認識対象の入力パターンの記憶部、４は(1)式
の〔〕内のD_n＋Ｄを計算する距離計算部、５
はその距離計算結果の最小値計算部、６は最小累
積距離S₀を与えるn_i（ｉ＝１、２、３、…、ｋ）
を決定する類似度計算部である。類似度計算部６
の処理結果は認識出力として取り出される。 FIG. 4 is a configuration diagram of the embodiment system, in which 1 is a standard pattern storage unit, 2 is a standard pattern storage unit, and 2 is a standard pattern storage unit.
Storage part of interpolation pattern created from n _i in preprocessing,
3 is a storage unit for the input pattern to be recognized, 4 is a distance calculation unit that calculates D _n +D in [ ] in equation (1), 5
is the minimum value calculation part of the distance calculation result, and 6 is the minimum cumulative distance S ₀ n _i (i=1, 2, 3, ..., k)
This is a similarity calculation unit that determines the Similarity calculation unit 6
The processing result is extracted as a recognition output.

次に(1)式の計算手順の１例について述べる。ま
ず、最小距離 D^（ｌ、ｍ、n₁、n₂）＝ min min ｊ〔D_n（ｌ、ｊ、n_i、n₂）＋Ｄ（ｊ＋1m、n₂）〕……(2
) を求め、記憶する。 Next, an example of the calculation procedure of equation (1) will be described. First, the minimum distance D^(l, m, n ₁ , n ₂ ) = min min j [D _n (l, j, n _i , n ₂ ) + D (j+1m, n ₂ )]...(2
) and remember it.

次に、Ｓ（１、ｉ、ｎ）＝Ｄ（１、ｉ、ｎ） ……(3) を初期値として、以下の漸化式を解く。 next, S (1, i, n) = D (1, i, n) ...(3) Solve the following recurrence formula using as the initial value.

Ｓ（ｋ、ｉ、ｎ）＝ min min ｊ，n′〔Ｓ（ｋ−１、ｊ、n′）＋D^（ｊ＋１、ｉ、n
′、ｎ）〕……(4) この時、同時にＢ（ｋ、ｉ、ｎ）＝ argmin ｊ min min n′〔Ｓ（ｋ−１、ｊ、n′）＋D^（ｊ＋１、ｉ、n′、
ｎ）〕……(5) Ｎ（ｋ−１、ｉ、ｎ）＝ argmin n′ min min ｊ〔Ｓ（ｋ−１、ｉ、n′＋D^（ｊ＋１、ｉ、n′、ｎ）
〕……(6) を求め、記憶しておく。 S(k, i, n) = min min j, n' [S(k-1, j, n') + D^(j+1, i, n
', n)]...(4) At this time, at the same time B (k, i, n) = argmin j min min n' [S (k-1, j, n') + D^ (j+1, i, n' ,
n)]...(5) N(k-1, i, n) = argmin n' min min j[S(k-1, i, n'+D^(j+1, i, n', n)
]...Find and memorize (6).

最小累積距離は、 S₀＝ min ｋ，ｎＳ（ｋ、Ｉ、ｎ） ……(7) により与えられる。 The minimum cumulative distance is given by S ₀ = min k, nS (k, I, n) (7).

さらに認識結果を得るためには、 k₀＝ argmin ｋ min ｎＳ（ｋ、Ｉ、ｎ） ……(8) N_k0＝ argmin ｎＳ（k₀、Ｉ、ｎ） ……(9) とし、次にｋ＝k₀ ｉ＝Ｉ ……(10) を初期値として、 N_k-1＝Ｎ（ｋ−１、ｉ、N_k） ……(11) ｉ、ｋを以下のように置き換える。 In order to obtain further recognition results, k ₀ = argmin k min nS (k, I, n) ...(8) N _k0 = argmin nS (k ₀ , I, n) ... (9), and then k=k ₀ i=I (10) as the initial value, N _k-1 = N (k-1, i, N _k )... (11) i and k are replaced as follows.

ｉ←Ｂ（ｋ、ｉ、N_k）ｋ←ｋ−１ ……(12) ｋ２ならば、(11)式から繰り返す。このように
して得られる N₁、N₂、…、Nk0が認識結果である。 i←B (k, i, N _k ) k←k−1 (12) If k2, repeat from equation (11). N ₁ , N ₂ , . . . , Nk0 obtained in this way are the recognition results.

〔Effect of the invention〕

以上のように、本発明によれば、標準パターン
を、調音結合に対して安定なパターン中央部分で
主として構成し、調音結合により変動しやすいパ
ターン端部は、擬似的に補間して作成したものを
用いることにより、比較的簡単な構成で効率的な
連続音声認識システムを実現することができる。 As described above, according to the present invention, the standard pattern is mainly composed of the central part of the pattern that is stable against articulatory combination, and the edge parts of the pattern that are likely to fluctuate due to articulatory combination are created by pseudo interpolation. By using this, it is possible to realize an efficient continuous speech recognition system with a relatively simple configuration.

[Brief explanation of drawings]

第１図は入力パターンに調音結合を含む場合の
従来のパターンマツチングの説明図、第２図は本
発明による補間パターンの説明図、第３図は補間
パターンを用いたパターンマツチングの説明図、
第４図は本発明実施例システムの構成図である。図中、１は標準パターン記憶部、２は補間パタ
ーン記憶部、３は入力パターン記憶部、４は距離
計算部、５は最小値計算部、６は類似度計算部を
表わす。 FIG. 1 is an explanatory diagram of conventional pattern matching when the input pattern includes an articulatory combination, FIG. 2 is an explanatory diagram of an interpolation pattern according to the present invention, and FIG. 3 is an explanatory diagram of pattern matching using an interpolation pattern. ,
FIG. 4 is a configuration diagram of a system according to an embodiment of the present invention. In the figure, 1 is a standard pattern storage section, 2 is an interpolation pattern storage section, 3 is an input pattern storage section, 4 is a distance calculation section, 5 is a minimum value calculation section, and 6 is a similarity calculation section.

Claims

[Scope of Claims] 1. Preliminary information stored in each part of an input pattern representing acoustic features obtained by analyzing perceptual input speech in which a plurality of words or syllables are uttered in succession. In a continuous speech recognition method that matches the required number of standard patterns of word/syllable sounds, if the ends of each adjacent standard pattern are separated by the boundary during the matching, the standard pattern of the separated part is By applying the interpolation pattern obtained from the ends of the reference pattern, the degree of similarity with the input pattern is calculated, the standard pattern sequence with the maximum similarity is found, and the system corresponds to the obtained standard pattern sequence. A continuous speech recognition method characterized by outputting a word or syllable sequence as a recognition result. 2. In the continuous speech recognition method described in item 1 above, during matching, the partial similarity between each standard pattern and all input partial patterns of arbitrary length, and the end and start ends of all standard patterns are determined in advance. The method is characterized by calculating the interpolation pattern part similarities between all the interpolation patterns found from each part and all input partial patterns of arbitrary length, and then finding the optimal standard pattern sequence using dynamic programming. Continuous speech recognition method.