JPS6026399A

JPS6026399A - Word recognition equipment

Info

Publication number: JPS6026399A
Application number: JP58134112A
Authority: JP
Inventors: 隆夫渡辺
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1983-07-22
Filing date: 1983-07-22
Publication date: 1985-02-09

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】本発明は、単語音声認識装置の改良に関する。[Detailed description of the invention] The present invention relates to improvements in word speech recognition devices.

音声認識において認識すべき単語のバタンをあらかじめ
標準パタンとして登録しておき認識時に入力される未知
バタンを標準パタンと比較し最も類似度の高いものを認
識結果として決定するバタンマツチング法は従来から広
く用いられている。Conventionally, there has been a method of matching words in which the words to be recognized in speech recognition are registered as standard patterns, the unknown words input during recognition are compared with the standard patterns, and the one with the highest degree of similarity is determined as the recognition result. Widely used.

この方法は、特定話者だけでなく不特定話者に対しても
適用できる。すなわち標準パタンを各単語に対して複数
個用意することにより話者によるバタンの変動に対処す
ることができる。これらの標準パタンは多数の話者のサ
ンプルから、これらを代表するものとしてクラスタリン
グの手法によりめることができる。This method can be applied not only to specific speakers but also to unspecified speakers. In other words, by preparing a plurality of standard patterns for each word, it is possible to cope with variations in the slam patterns depending on the speaker. These standard patterns can be determined as representative patterns from a large number of speakers using a clustering method.

このような複数個の標準パタンを用いて、ある話者が複
数の単語を入力したものを認識する場合、最初の入力単
語は標準パタンＲ１と最も類似していて、次の入力単語
は標準バタン馬と最も類似しているが標準バタン鳥とＲ
３は全く異なったタイプの話者の発声であるというケー
スが起こり得る。このような不合理は誤認識をもたらす
ものであるが、認識すべき複数の単語が同一話者により
発声されたことが既知である場合には、このような不合
理を避けることができる。When recognizing multiple words input by a certain speaker using such multiple standard patterns, the first input word is most similar to the standard pattern R1, and the next input word is the standard pattern R1. The most similar to the horse is the standard bang bird and R
A case may arise where 3 is the utterance of a completely different type of speaker. Although such unreasonableness leads to erroneous recognition, such unreasonableness can be avoided if it is known that the plural words to be recognized were uttered by the same speaker.

本発明の目的は、話者相違による誤認識を避け、認識精
度を向上させることのできる不特定話者の単語認識装置
を提供することにある。An object of the present invention is to provide a speaker-independent word recognition device that can avoid erroneous recognition due to differences in speakers and improve recognition accuracy.

本発明は、入力された音声信号に対し単語区間を検出す
る単語検出手段と、検出された単語区間のバタンを少な
くとも二つの連続する単語分について一時格納する入力
バタン記憶部と、少なくとも二つの連続する単語毎に用
意された標準バタンを格納する標準バタン記憶部と、−
単語検出毎に上記人力バタン記憶部から読み出された人
力バタンと標準バタン記憶部から読み出された標準パタ
ンとのマツチングを行ない認識結果を得る識別部とを含
んで構成される。The present invention includes a word detecting means for detecting a word section in an input audio signal, an input drum storage section for temporarily storing the drums of the detected word section for at least two consecutive words; a standard button storage unit that stores standard buttons prepared for each word;
The recognition unit includes an identification unit that performs matching between the manual slam read out from the manual slam storage unit and the standard pattern read out from the standard slap storage unit each time a word is detected, and obtains a recognition result.

以下、本発明における認識の原理について述べる。本発
明においては、入力された単語を連続する複数の単語毎
にまとめた単語連結を単位として認識を行う。標準パタ
ンを作成するため用意された多数話者の単語バタンをＡ
ｐ　（Ｗｌで表わす。ただしここでｐは話者、Ｗは単語
を表わす。これらの単語パタンから二単語連結バタンＡｐ　（Ｗ、　、Ｗ２）　＝　Ａｐ　（Ｗｌ　）■Ａｐ
　（％　）をめる。ここで記号■は２つの単語パタンを
並置することを示す。The principle of recognition in the present invention will be described below. In the present invention, recognition is performed in units of word concatenation, in which inputted words are grouped into a plurality of consecutive words. The word patterns of multiple speakers prepared to create standard patterns are A.
p (represented by Wl. Here, p represents the speaker and W represents the word. From these word patterns, the two-word concatenation button Ap (W, , W2) = Ap (Wl) ■Ap
Add (%). Here, the symbol ■ indicates that two word patterns are juxtaposed.

各単語連結（ｗ、、ｗ２）の組み合せに対して多数話者
のバタンセット（Ａ−ｐ（Ｗｌ　、　Ｗｔ　）　＊　ｐ＝　Ｌ２．・・
・　）からクラスタリングにより単語連結の標準バタン
セット（Ｒｋ（ｗ、　、　ｗ２）　、　ｋ＝　１．２．・・・
　）を得る。クラスタリング法としてはｋ　−ｍｅａｎ
ｓ　法等任意の方法が使用できる。For each combination of word connections (w,, w2), the slam set of multiple speakers (A-p(Wl, Wt) * p= L2...
・ From ), a standard batan set of word connections (Rk(w, , w2), k= 1.2...
). As a clustering method, k-mean
Any method such as the s method can be used.

次に第１図を参照して本発明における認識の原理を説明
する。第１図において、（ａ）は、音声波形、（ｂｊは
単語パターン、ｆｃ）は連結した２単語からなる単語連
結パターンを示している。Next, the principle of recognition in the present invention will be explained with reference to FIG. In FIG. 1, (a) shows a speech waveform, (bj is a word pattern, and fc) is a word concatenation pattern consisting of two connected words.

入力された複数個の単語パタンを”！＋・・・Ｓ３・・
・。The input multiple word patterns are ``!+...S3...
・.

８、Ｔとする。これより単語連結バタンＳ！、・・・、
Ｓｊ。8. Let it be T. From now on, word concatenation slam S! ,...,
Sj.

・・・＊　５Ｊ−１を次のようにして生成する。...*5J-1 is generated as follows.

Ｓ書＝８．■５２８ｊ工Ｓ」■Ｓｊ＋を各単語連結バタンに対して、単語連結標準バタンセット
を用いてマツチングが行われる。すなわち、Ｄ（Ｓｊ　
、Ｒｋ（ｖｌ、ｗ２））をに０ｗ２について計算シ、Ｄ
が最小となるときのＷ２が認識結果である。但しここで
Ｄ（Ａ、Ｂ）は２つのバタンＡ、Ｂの間の距離を表わし
ｖ４　は３番目の単語の認識結果を表わす。Book S = 8. ■52 8j 工S'' ■Sj+ is matched against each word concatenation button using a word concatenation standard button set. That is, D(Sj
, Rk(vl, w2)) for 0w2, D
W2 when is the minimum is the recognition result. However, here, D(A, B) represents the distance between the two buttons A and B, and v4 represents the recognition result of the third word.

標準バタン作成用の単語連結は同一話者の単語の連結と
なっているので、単語連結を単位としたマツチングも単
語連結内の単語は同一話者であると仮定されたマツチン
グとなっている。単語連結は一単語づつ入力単語をずら
して生成されているので全ての入力単語が同一話者であ
ることが仮定されていることになる。Since the word concatenation for standard baton creation is a concatenation of words from the same speaker, matching based on word concatenation as a unit also assumes that the words in the word concatenation are from the same speaker. Since word concatenation is generated by shifting input words one by one, it is assumed that all input words are from the same speaker.

本発明によオフば一連の入力ｌこ対する話者の同一性が
認識時に仮定されるので精度の高い不特定話者の単語認
識装置を実現することができる。According to the present invention, since the identity of speakers for a series of inputs is assumed at the time of recognition, it is possible to realize a highly accurate speaker-independent word recognition device.

以下に嬉２図を参照して本発明の一実施例について説明
する。An embodiment of the present invention will be described below with reference to Figure 2.

第２図は本発明による装置の一実施例を示すブロック図
である。入力された音声信号は、単語検出部１により、
単語区間が検出され区間内の信号を分析して入力単語バ
タンか出力される。ここで分析方法としてはバンドパス
フィルタ群、線形予測分析、高速フーリエ変換等任意の
ものが可能であり、また単語区間検出法としてもエネル
ギーを用いる方法等任意のものが可能である。バタンの
形式は分析して得られた特徴ベクトルの時系列であるが
、この他、この時系列を次元圧縮・時間軸圧縮した形式
のものであってよい。入力バタン記憶部２は２つのシフ
トバッファ２１　、２２より構成さ１する。−単語検出
されるとバッファ２１の内容は２２に移され、２１には
新しく検出された単語のバタンか格納される。記憶部３
は単語連結標準バタンを格納するものである。識別部４
は一単語検出される毎に２１　、２２からいっしょに読
み出された入力の単語連結バタンと記憶部３から読み出
された標準バタンとのマツチングを行ない認識結果を出
力する。４１は入力バタンと標準バタンとのマツチング
を行い類似度を算出するマツチング部である。ここでマ
ツチング方法としては類似度が算出されるものであれば
任意のものでよい。FIG. 2 is a block diagram showing one embodiment of the apparatus according to the present invention. The input audio signal is processed by the word detection unit 1.
A word section is detected, the signal within the section is analyzed, and the input word bang is output. As the analysis method, any method such as a group of band-pass filters, linear predictive analysis, or fast Fourier transform can be used, and any method such as a method using energy can be used as the word section detection method. The format of the button is a time series of feature vectors obtained by analysis, but it may also be a format in which this time series is dimensionally compressed or time axis compressed. The input button storage section 2 is composed of two shift buffers 21 and 22. - When a word is detected, the contents of the buffer 21 are transferred to 22, and 21 stores the button of the newly detected word. Storage part 3
is used to store word concatenation standard buttons. Identification part 4
Each time a word is detected, the input word concatenation button read out from 21 and 22 is matched with the standard button read out from the storage section 3, and a recognition result is output. Reference numeral 41 is a matching unit that performs matching between the input baton and the standard baton and calculates the degree of similarity. Here, any matching method may be used as long as the degree of similarity can be calculated.

４２はマツチング制御部であり、認識結果出力バッファ
４３に保持されている直前の認識結果ｌこ従って記憶部
３から読み出す標準バタンを指定するとともに、マツチ
ング部を制御する。例えばｊ番目の入力単語が検出され
たとき４３の内容の認識結果がＶｊであるときは標準パ
タンの単語連結（Ｗｌ。Reference numeral 42 denotes a matching control section, which specifies the previous recognition result l held in the recognition result output buffer 43 and a standard button to be read out from the storage section 3, and also controls the matching section. For example, when the j-th input word is detected and the recognition result of the content of 43 is Vj, the standard pattern of word concatenation (Wl.

Ｗ２　）が（ｖｊ、ｗ２）のもののみマツチング対象と
なり、式（１）が計算され、結果が４３へ出力される。Only those in which W2) is (vj, w2) are matched, equation (1) is calculated, and the result is output to 43.

以上説明した通り、本発明によ１１ば、認識精度の高い
不特定話者用単語認識装置を実現することができる。As explained above, according to the present invention, it is possible to realize a word recognition device for non-specific speakers with high recognition accuracy.

また以上の説明においては、２単語連結を単位として認
識を行う場合について述べたが、単語連結を３以上とす
ることも同様に可能である。その場合には、記憶部２に
含まれるシフトバッファの個数が変化する。たとえば３
単語連結の場合には、記憶部２はシフトバッファ３個に
より構成されることとなる。Further, in the above description, a case has been described in which recognition is performed in units of two-word concatenation, but it is also possible to use three or more word concatenations. In that case, the number of shift buffers included in the storage section 2 changes. For example 3
In the case of word concatenation, the storage unit 2 is composed of three shift buffers.

[Brief explanation of the drawing]

第１図は認識の原理を説明するための図、第２図は本発
明による装置の一実施例を示すブロック図である。図において、■・・・単語検出部、２．３−・・記憶部
、２１　、２２・・・シフトバッファ、４・・・識別部
、４１・・・マツチング部、４２・・・マツチング制御
部、４３・・出力バラ″ｔｌ　図第２図FIG. 1 is a diagram for explaining the principle of recognition, and FIG. 2 is a block diagram showing an embodiment of the apparatus according to the present invention. In the figure, ■... word detection section, 2.3-- storage section, 21, 22... shift buffer, 4... identification section, 41... matching section, 42... matching control section , 43... Output rose "tl" Figure 2

Claims

[Claims]

word detecting means for detecting a word section in an input audio signal; an input slam storage section for temporarily storing a slam in the detected word section for at least two consecutive words; A recognition result is obtained by matching the input button read from the input button memory section and the standard pattern read from the standard button memory section each time a word is detected. What is claimed is: 1. A word recognition device comprising: an identification unit that obtains a word recognition unit;