JPS595292A

JPS595292A - Word voice recognition method

Info

Publication number: JPS595292A
Application number: JP57112492A
Authority: JP
Inventors: 入間野　孝雄; 秋場　国夫; 金指　久則
Original assignee: Computer Basic Technology Research Association Corp
Current assignee: Computer Basic Technology Research Association Corp
Priority date: 1982-07-01
Filing date: 1982-07-01
Publication date: 1984-01-12
Also published as: JPS6310439B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】本発明は、入力音声に対して先ず音素認識を行ない、こ
の認識音素系列を音素表記された単語辞書と照合して単
語を認識する単語音声認識方法に関し、従来よシ高い単
語認識率の得られる単語音声認識方法を提供するもので
ある。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a word speech recognition method that first performs phoneme recognition on input speech and then recognizes words by comparing this recognized phoneme sequence with a word dictionary in which phonemes are expressed. The present invention provides a word speech recognition method that achieves a high word recognition rate.

従来の単語認識方法を第１図とともに説明する。A conventional word recognition method will be explained with reference to FIG.

第１図に示すように、入力音声に対して先ず分析を行な
い、この入力単語音声の特徴を抽出して、入力単語音声
を構成する音素を認識する。この認識された音素系列を
、単語辞書中の各辞書項目の辞書音素系列と照合し、２
つの音素系列間の尤度を音素間のコンフユージヨンマト
リクス（Ｃｏｎｆ　ｕｓ　ｉｏｎＭａｔｒｉＸ　＋以下
Ｃ，Ｍ、と略す）を用いて、各音素毎の認識確率を求め
ることによシ算出し、音素系列間の尤度が最大となる辞
書項目をもって認識単語とするものである。As shown in FIG. 1, input speech is first analyzed, features of the input word speech are extracted, and phonemes making up the input word speech are recognized. This recognized phoneme sequence is compared with the dictionary phoneme sequence of each dictionary entry in the word dictionary, and 2
The likelihood between two phoneme sequences is calculated by finding the recognition probability for each phoneme using a confusion matrix between phonemes (ConfusionMatriX + hereinafter abbreviated as C, M). The dictionary entry with the maximum likelihood is selected as a recognized word.

第１表は、前記単語音声認識方法に用いる単語辞書の一
例を示しておシ、各単語は第２表に示す音素表記法に従
って表記されている。第２図は前記Ｃ，Ｍ、の一部を示
す。第２図において、縦は単語辞書中の音素を示し、横
は認識音素を示している。Table 1 shows an example of a word dictionary used in the word speech recognition method, and each word is written according to the phoneme notation shown in Table 2. FIG. 2 shows a part of the above-mentioned C and M. In FIG. 2, the vertical lines indicate phonemes in the word dictionary, and the horizontal lines indicate recognized phonemes.

また第２図中の数字は単語辞書中の各音素がどのような
音素に認識されるかの確率をチで示したものである。例
えば第２図において、単語辞書中の音素工が工と認識さ
れる確率は７５％、ＵＫ認識される確率は５％、Ａに認
識される確率は０％、°脱落する確率は８％・・・等を
示している。Further, the numbers in FIG. 2 indicate the probability of what kind of phoneme each phoneme in the word dictionary will be recognized as. For example, in Figure 2, the probability that the phoneme word in the word dictionary will be recognized as "work" is 75%, the probability that it will be recognized as "UK" is 5%, the probability that it will be recognized as "A" is 0%, and the probability that it will be omitted is 8%. ...etc.

第　　　　１　　　　表第　　　２　　　　表音素認識率の低い音素を持つ単語の場合、その単語の入
力音声の認識音素系列と辞書音素系列との尤度は音素１
識結果が良好であっても低くなシ、他の単語の辞書項目
の辞書音素系列との尤度の差は小さくなる傾向にある。Table 1, Table 2. In the case of a word that has a phoneme with a low phoneme recognition rate, the likelihood of the recognized phoneme sequence of the input speech of that word and the dictionary phoneme sequence is phoneme 1.
Even if the identification result is good, the difference in likelihood between the dictionary entries of other words and the dictionary phoneme series tends to be small.

このような単語においては、入力音声の音素認識に誤シ
があると、正解辞書項目（ある単語の入力音声に対し、
その単語の辞書項目を正解辞書項目と言う）よシ他の辞
書項目の方が音素系列間の尤度が高い状態が容易に生じ
、これは、従来の単語認識方法では全て単語誤認識とな
シ単語認識率低下の原因となっていた。For such words, if there is an error in the phoneme recognition of the input voice, the correct dictionary entry (for the input voice of a certain word,
(The dictionary entry for that word is called the correct dictionary entry.) A situation easily arises in which the likelihood between phoneme sequences is higher for other dictionary entries than for other dictionary entries, and this is the case when conventional word recognition methods result in incorrect word recognition. This caused a decline in the word recognition rate.

本発明は、上記従来例の欠点を大幅に改善するものであ
シ、以下に本発明の一実施例について説明する。The present invention is intended to significantly improve the drawbacks of the above-mentioned conventional examples, and one embodiment of the present invention will be described below.

本実施例では、第１図に示す従来例と同様に、まず、入
力音声の音素認識を行ない、この認識音素系列と、単語
辞書中の各辞書項目の辞書音素系列とを照合し、尤度を
求める。ここまでは従来例と同様であるが、この尤度に
、各辞書項目毎に予め定めておいた尤度重み値を加算し
て重み付尤度値を算出し、この重み付尤度値が最大とな
る辞書項目をもって認識単語とする。即ち、尤度そのも
のではなく、重み付尤度値を用いて単語を認識するもの
である。In this embodiment, as in the conventional example shown in FIG. seek. The process up to this point is the same as the conventional example, but a weighted likelihood value is calculated by adding a likelihood weight value predetermined for each dictionary item to this likelihood, and this weighted likelihood value is The word with the maximum number of dictionary entries is recognized. That is, words are recognized using weighted likelihood values rather than the likelihood itself.

次に、本実施例における尤度重み値の定め方について説
明する。まず、従来の単語音声認識方法において、音声
データを用いて単語認識を行なう。Next, a method of determining the likelihood weight value in this embodiment will be explained. First, in a conventional word speech recognition method, word recognition is performed using speech data.

そして、単語認識結果が正しかった音声データのみ用い
て、各単語毎の認識音素系列と正解辞書項目の辞書音素
系列との間の尤度の平均を求める。Then, using only the speech data for which the word recognition result was correct, the average likelihood between the recognized phoneme sequence for each word and the dictionary phoneme sequence of the correct dictionary item is calculated.

ここで求めた各単語毎の尤度平均値は、各単語毎の音素
認識の難易度を表わすものでおる。尤度平均値の低い単
語は、従来例の説明で述べたように、正解辞書項目より
も他の辞書項目の方が高い尤度を示す可能性が強い。な
お、尤度平均値を求めるのに、単語認識結果の正しかっ
た音声データのみを用いたのは、単語認識を誤まった音
声データは、データそのものが不良であることも多く、
そのようなデータを排除するためである。ところで、音
素系列間の尤度は、音素系列を構成する各音素毎の認識
確率を対数化して加算し、音素数にかかわらず９００点
満点と々るように正規化したものである。本実施例にお
いて、各単語辞書項目毎の尤度重み値は、次式で求めた
。The average likelihood value for each word obtained here represents the difficulty level of phoneme recognition for each word. As described in the explanation of the conventional example, for words with a low average likelihood value, there is a strong possibility that other dictionary items have a higher likelihood than the correct dictionary item. Note that the reason why we used only the speech data with correct word recognition results to calculate the average likelihood value is because speech data with incorrect word recognition is often defective in itself.
This is to eliminate such data. By the way, the likelihood between phoneme sequences is obtained by adding the logarithmic recognition probabilities of each phoneme constituting the phoneme sequence, and normalizing the result to give a perfect score of 900 regardless of the number of phonemes. In this example, the likelihood weight value for each word dictionary item was calculated using the following equation.

次に、本実施例による単語認識結果の一例を、従来例と
比較して示す。単語「マツバラ」（辞書音素系列ＭＡＣ
ＵＢＡＲＡ）のちる音声データの音素認識結果はＡ　Ｐ
　Ｕ　Ｂ　ＡＷＡであった。この時、正解辞書項目との
尤度は７８８点、辞書項目ＫＡＫＯＧ＊ＡＷＡ（カコガ
ワ）との尤度は７９０点であった。この場合、従来例に
よる単語認識結果は「カコガワ」であシ、単語誤認識で
あった。本実施例によれば、事前に定めだ尤度重み値は
ＭＡＣＵＢＡＲＡは７点、ＫＡＫＯＧ＊ＡＷＡは２点で
あるので、重み伺き尤度値は、ＭＡＣＵＢＡＲＡでは７
９５点、ＫＡＫＯＧ才ＡＷＡでは７９２点となって、正
しく単語認識されるようになる。Next, an example of word recognition results according to this embodiment will be shown in comparison with a conventional example. Word "Matsubara" (dictionary phoneme series MAC
UBARA) The phoneme recognition results of Chiru's voice data are A P
It was UB AWA. At this time, the likelihood with the correct dictionary item was 788 points, and the likelihood with the dictionary item KAKOG*AWA was 790 points. In this case, the word recognition result according to the conventional example was "kakogawa" and the word was incorrectly recognized. According to this embodiment, the predetermined likelihood weight value is 7 points for MACUBARA and 2 points for KAKOG*AWA, so the likelihood value determined in advance is 7 points for MACUBARA.
The score is 95 points, and the score is 792 points for KAKOG Sai AWA, and the word is correctly recognized.

このように、本発明の単語音声認識方法によれば、従来
の方法で単語認識率の低かった単語の単語認識率を向上
させることができる利点を有するものでちる。As described above, the word speech recognition method of the present invention has the advantage of being able to improve the word recognition rate for words for which the word recognition rate was low using conventional methods.

[Brief explanation of the drawing]

第１図は、従来例および本発明の一実施例における単語
音声認識方法を示す概略図、第２図は、従来例および本
発明の一実施例におけるＣ、Ｍ、の一部分を示す図であ
る。FIG. 1 is a schematic diagram showing a word speech recognition method in a conventional example and an embodiment of the present invention, and FIG. 2 is a diagram showing a portion of C and M in the conventional example and an embodiment of the present invention. .

Claims

[Claims]

When performing phoneme recognition on input speech to obtain a recognized phoneme sequence, and calculating the likelihood between this recognized phoneme sequence and the dictionary phoneme sequence of each dictionary entry in a word dictionary with phoneme notation, when recognizing a word, A likelihood weight value is determined in advance for each dictionary item, and a weighted likelihood value is calculated by adding or multiplying this likelihood weight value to the likelihood between the phoneme sequences. A word speech recognition method characterized in that a dictionary item with a maximum value is recognized as a word.