JPS595292A - Word voice recognition method - Google Patents
Word voice recognition methodInfo
- Publication number
- JPS595292A JPS595292A JP57112492A JP11249282A JPS595292A JP S595292 A JPS595292 A JP S595292A JP 57112492 A JP57112492 A JP 57112492A JP 11249282 A JP11249282 A JP 11249282A JP S595292 A JPS595292 A JP S595292A
- Authority
- JP
- Japan
- Prior art keywords
- word
- dictionary
- phoneme
- likelihood
- recognized
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims description 12
- 238000010586 diagram Methods 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 241000609816 Pantholops hodgsonii Species 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
Abstract
(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.
Description
【発明の詳細な説明】
本発明は、入力音声に対して先ず音素認識を行ない、こ
の認識音素系列を音素表記された単語辞書と照合して単
語を認識する単語音声認識方法に関し、従来よシ高い単
語認識率の得られる単語音声認識方法を提供するもので
ある。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a word speech recognition method that first performs phoneme recognition on input speech and then recognizes words by comparing this recognized phoneme sequence with a word dictionary in which phonemes are expressed. The present invention provides a word speech recognition method that achieves a high word recognition rate.
従来の単語認識方法を第1図とともに説明する。A conventional word recognition method will be explained with reference to FIG.
第1図に示すように、入力音声に対して先ず分析を行な
い、この入力単語音声の特徴を抽出して、入力単語音声
を構成する音素を認識する。この認識された音素系列を
、単語辞書中の各辞書項目の辞書音素系列と照合し、2
つの音素系列間の尤度を音素間のコンフユージヨンマト
リクス(Conf us ionMatriX +以下
C,M、と略す)を用いて、各音素毎の認識確率を求め
ることによシ算出し、音素系列間の尤度が最大となる辞
書項目をもって認識単語とするものである。As shown in FIG. 1, input speech is first analyzed, features of the input word speech are extracted, and phonemes making up the input word speech are recognized. This recognized phoneme sequence is compared with the dictionary phoneme sequence of each dictionary entry in the word dictionary, and 2
The likelihood between two phoneme sequences is calculated by finding the recognition probability for each phoneme using a confusion matrix between phonemes (ConfusionMatriX + hereinafter abbreviated as C, M). The dictionary entry with the maximum likelihood is selected as a recognized word.
第1表は、前記単語音声認識方法に用いる単語辞書の一
例を示しておシ、各単語は第2表に示す音素表記法に従
って表記されている。第2図は前記C,M、の一部を示
す。第2図において、縦は単語辞書中の音素を示し、横
は認識音素を示している。Table 1 shows an example of a word dictionary used in the word speech recognition method, and each word is written according to the phoneme notation shown in Table 2. FIG. 2 shows a part of the above-mentioned C and M. In FIG. 2, the vertical lines indicate phonemes in the word dictionary, and the horizontal lines indicate recognized phonemes.
また第2図中の数字は単語辞書中の各音素がどのような
音素に認識されるかの確率をチで示したものである。例
えば第2図において、単語辞書中の音素工が工と認識さ
れる確率は75%、UK認識される確率は5%、Aに認
識される確率は0%、°脱落する確率は8%・・・等を
示している。Further, the numbers in FIG. 2 indicate the probability of what kind of phoneme each phoneme in the word dictionary will be recognized as. For example, in Figure 2, the probability that the phoneme word in the word dictionary will be recognized as "work" is 75%, the probability that it will be recognized as "UK" is 5%, the probability that it will be recognized as "A" is 0%, and the probability that it will be omitted is 8%. ...etc.
第 1 表
第 2 表
音素認識率の低い音素を持つ単語の場合、その単語の入
力音声の認識音素系列と辞書音素系列との尤度は音素1
識結果が良好であっても低くなシ、他の単語の辞書項目
の辞書音素系列との尤度の差は小さくなる傾向にある。Table 1, Table 2. In the case of a word that has a phoneme with a low phoneme recognition rate, the likelihood of the recognized phoneme sequence of the input speech of that word and the dictionary phoneme sequence is phoneme 1.
Even if the identification result is good, the difference in likelihood between the dictionary entries of other words and the dictionary phoneme series tends to be small.
このような単語においては、入力音声の音素認識に誤シ
があると、正解辞書項目(ある単語の入力音声に対し、
その単語の辞書項目を正解辞書項目と言う)よシ他の辞
書項目の方が音素系列間の尤度が高い状態が容易に生じ
、これは、従来の単語認識方法では全て単語誤認識とな
シ単語認識率低下の原因となっていた。For such words, if there is an error in the phoneme recognition of the input voice, the correct dictionary entry (for the input voice of a certain word,
(The dictionary entry for that word is called the correct dictionary entry.) A situation easily arises in which the likelihood between phoneme sequences is higher for other dictionary entries than for other dictionary entries, and this is the case when conventional word recognition methods result in incorrect word recognition. This caused a decline in the word recognition rate.
本発明は、上記従来例の欠点を大幅に改善するものであ
シ、以下に本発明の一実施例について説明する。The present invention is intended to significantly improve the drawbacks of the above-mentioned conventional examples, and one embodiment of the present invention will be described below.
本実施例では、第1図に示す従来例と同様に、まず、入
力音声の音素認識を行ない、この認識音素系列と、単語
辞書中の各辞書項目の辞書音素系列とを照合し、尤度を
求める。ここまでは従来例と同様であるが、この尤度に
、各辞書項目毎に予め定めておいた尤度重み値を加算し
て重み付尤度値を算出し、この重み付尤度値が最大とな
る辞書項目をもって認識単語とする。即ち、尤度そのも
のではなく、重み付尤度値を用いて単語を認識するもの
である。In this embodiment, as in the conventional example shown in FIG. seek. The process up to this point is the same as the conventional example, but a weighted likelihood value is calculated by adding a likelihood weight value predetermined for each dictionary item to this likelihood, and this weighted likelihood value is The word with the maximum number of dictionary entries is recognized. That is, words are recognized using weighted likelihood values rather than the likelihood itself.
次に、本実施例における尤度重み値の定め方について説
明する。まず、従来の単語音声認識方法において、音声
データを用いて単語認識を行なう。Next, a method of determining the likelihood weight value in this embodiment will be explained. First, in a conventional word speech recognition method, word recognition is performed using speech data.
そして、単語認識結果が正しかった音声データのみ用い
て、各単語毎の認識音素系列と正解辞書項目の辞書音素
系列との間の尤度の平均を求める。Then, using only the speech data for which the word recognition result was correct, the average likelihood between the recognized phoneme sequence for each word and the dictionary phoneme sequence of the correct dictionary item is calculated.
ここで求めた各単語毎の尤度平均値は、各単語毎の音素
認識の難易度を表わすものでおる。尤度平均値の低い単
語は、従来例の説明で述べたように、正解辞書項目より
も他の辞書項目の方が高い尤度を示す可能性が強い。な
お、尤度平均値を求めるのに、単語認識結果の正しかっ
た音声データのみを用いたのは、単語認識を誤まった音
声データは、データそのものが不良であることも多く、
そのようなデータを排除するためである。ところで、音
素系列間の尤度は、音素系列を構成する各音素毎の認識
確率を対数化して加算し、音素数にかかわらず900点
満点と々るように正規化したものである。本実施例にお
いて、各単語辞書項目毎の尤度重み値は、次式で求めた
。The average likelihood value for each word obtained here represents the difficulty level of phoneme recognition for each word. As described in the explanation of the conventional example, for words with a low average likelihood value, there is a strong possibility that other dictionary items have a higher likelihood than the correct dictionary item. Note that the reason why we used only the speech data with correct word recognition results to calculate the average likelihood value is because speech data with incorrect word recognition is often defective in itself.
This is to eliminate such data. By the way, the likelihood between phoneme sequences is obtained by adding the logarithmic recognition probabilities of each phoneme constituting the phoneme sequence, and normalizing the result to give a perfect score of 900 regardless of the number of phonemes. In this example, the likelihood weight value for each word dictionary item was calculated using the following equation.
次に、本実施例による単語認識結果の一例を、従来例と
比較して示す。単語「マツバラ」(辞書音素系列MAC
UBARA)のちる音声データの音素認識結果はA P
U B AWAであった。この時、正解辞書項目との
尤度は788点、辞書項目KAKOG*AWA(カコガ
ワ)との尤度は790点であった。この場合、従来例に
よる単語認識結果は「カコガワ」であシ、単語誤認識で
あった。本実施例によれば、事前に定めだ尤度重み値は
MACUBARAは7点、KAKOG*AWAは2点で
あるので、重み伺き尤度値は、MACUBARAでは7
95点、KAKOG才AWAでは792点となって、正
しく単語認識されるようになる。Next, an example of word recognition results according to this embodiment will be shown in comparison with a conventional example. Word "Matsubara" (dictionary phoneme series MAC
UBARA) The phoneme recognition results of Chiru's voice data are A P
It was UB AWA. At this time, the likelihood with the correct dictionary item was 788 points, and the likelihood with the dictionary item KAKOG*AWA was 790 points. In this case, the word recognition result according to the conventional example was "kakogawa" and the word was incorrectly recognized. According to this embodiment, the predetermined likelihood weight value is 7 points for MACUBARA and 2 points for KAKOG*AWA, so the likelihood value determined in advance is 7 points for MACUBARA.
The score is 95 points, and the score is 792 points for KAKOG Sai AWA, and the word is correctly recognized.
このように、本発明の単語音声認識方法によれば、従来
の方法で単語認識率の低かった単語の単語認識率を向上
させることができる利点を有するものでちる。As described above, the word speech recognition method of the present invention has the advantage of being able to improve the word recognition rate for words for which the word recognition rate was low using conventional methods.
第1図は、従来例および本発明の一実施例における単語
音声認識方法を示す概略図、第2図は、従来例および本
発明の一実施例におけるC、M、の一部分を示す図であ
る。FIG. 1 is a schematic diagram showing a word speech recognition method in a conventional example and an embodiment of the present invention, and FIG. 2 is a diagram showing a portion of C and M in the conventional example and an embodiment of the present invention. .
Claims (1)
、この認識音素系列と、音素表記された単語辞書の各辞
書項目の辞書音素系列との尤度を計算して単語を認識す
るに際し、前記各辞書項目毎に予め尤度重み値を定めて
おき、この尤度重み値を前記音素系列間の尤度に加算ま
たは乗算して重み付尤度値を算出し、この重み付尤度値
が最大となる辞書項目をもって認識単語とすることを特
徴とする単語音声認識方法。When performing phoneme recognition on input speech to obtain a recognized phoneme sequence, and calculating the likelihood between this recognized phoneme sequence and the dictionary phoneme sequence of each dictionary entry in a word dictionary with phoneme notation, when recognizing a word, A likelihood weight value is determined in advance for each dictionary item, and a weighted likelihood value is calculated by adding or multiplying this likelihood weight value to the likelihood between the phoneme sequences. A word speech recognition method characterized in that a dictionary item with a maximum value is recognized as a word.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP57112492A JPS595292A (en) | 1982-07-01 | 1982-07-01 | Word voice recognition method |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP57112492A JPS595292A (en) | 1982-07-01 | 1982-07-01 | Word voice recognition method |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| JPS595292A true JPS595292A (en) | 1984-01-12 |
| JPS6310439B2 JPS6310439B2 (en) | 1988-03-07 |
Family
ID=14587995
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP57112492A Granted JPS595292A (en) | 1982-07-01 | 1982-07-01 | Word voice recognition method |
Country Status (1)
| Country | Link |
|---|---|
| JP (1) | JPS595292A (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS6146995A (en) * | 1984-08-11 | 1986-03-07 | 富士通株式会社 | Voice recognition system |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH0523108U (en) * | 1991-09-10 | 1993-03-26 | 三菱電線工業株式会社 | Sampler |
-
1982
- 1982-07-01 JP JP57112492A patent/JPS595292A/en active Granted
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS6146995A (en) * | 1984-08-11 | 1986-03-07 | 富士通株式会社 | Voice recognition system |
Also Published As
| Publication number | Publication date |
|---|---|
| JPS6310439B2 (en) | 1988-03-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JPH04362699A (en) | Speech recognition method and device | |
| KR102144345B1 (en) | Voice recognition processing device for performing a correction process of the voice recognition result based on the user-defined words and operating method thereof | |
| JPS595292A (en) | Word voice recognition method | |
| JP3444108B2 (en) | Voice recognition device | |
| JP6527000B2 (en) | Pronunciation error detection device, method and program | |
| JPS5968796A (en) | Recognition of word voice | |
| JPH04296799A (en) | voice recognition device | |
| JPH08314490A (en) | Word spotting type speech recognition method and device | |
| JPS5968794A (en) | Recognition of word voice | |
| JPH0158519B2 (en) | ||
| JPS5872996A (en) | Word voice recognition | |
| JPH1097284A (en) | Speech recognition method, speech recognition device, and storage medium | |
| JPS5872995A (en) | Word voice recognition | |
| JPS59143200A (en) | Continuous speech recognition device | |
| JPS617896A (en) | Word voice recognition method | |
| JPS59189398A (en) | Continuous speech recognition method | |
| JPS58159598A (en) | Monosyllabic voice recognition system | |
| JPS63250698A (en) | Voice recognition equipment | |
| JPH0126080B2 (en) | ||
| JPS60147799A (en) | Voice recognition | |
| JPS5968797A (en) | Recognition of word voice | |
| JPS6039692A (en) | Work voice recognition | |
| JPS5978399A (en) | Recognition of word voice | |
| JPS62255999A (en) | Word voice recognition equipment | |
| JPH0313600B2 (en) |