JPH0632014B2

JPH0632014B2 - Word detection method

Info

Publication number: JPH0632014B2
Application number: JP61307047A
Authority: JP
Inventors: 香一郎畑崎
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1986-12-22
Filing date: 1986-12-22
Publication date: 1994-04-27
Anticipated expiration: 2009-04-27
Also published as: JPS63158597A

Description

【発明の詳細な説明】（産業上の利用分野）本発明は単語検出装置に関し、特に音声認識、音声入力
装置等において入力音声中に含まれる単語を検出する単
語検出装置に関する。Description: TECHNICAL FIELD The present invention relates to a word detection device, and more particularly to a word detection device that detects a word included in an input voice in a voice recognition device, a voice input device, or the like.

（従来の技術）音声認識装置、音声入力装置等においては、入力音声中
の単語を検出する方法のひとつとして、入力音声を音
節、音素、音素クラスのカテゴリの列と見なし、入力音
声から検出したカテゴリを用いて作成したカテゴリ列が
単語辞書に記憶されている単語のカテゴリ列に対応すれ
ばその単語を検出結果とするという方法がある。(Prior Art) In a voice recognition device, a voice input device, etc., as one of the methods for detecting words in the input voice, the input voice is regarded as a sequence of categories of syllables, phonemes, and phoneme classes and detected from the input voice. There is a method in which if a category string created using a category corresponds to a category string of a word stored in a word dictionary, that word is used as a detection result.

一般に、上述したカテゴリは、その発声時間長が短いこ
とや類似するカテゴリが存在することなどから、入力音
声中から正しいカテゴリだけを誤りなく検出することが
困難である。そこで従来から、入力音声中の各カテゴリ
区間に対して複数個のカテゴリ候補を検出したのち、単
語辞書を参照して単語に対応するカテゴリ候補列を見つ
けるという方法を用いている。しかしながらこの場合に
も、発声のなまけや隣接するカテゴリ例えば音節どうし
の調音結合等の原因によって、あるカテゴリ区間の存在
が検出できなかったり、あるいはカテゴリ区間に正しい
カテゴリ候補が検出できない場合がある。In general, it is difficult to detect only the correct category from the input speech without error because the utterance time length of the above-mentioned category is short and similar categories exist. Therefore, conventionally, a method of detecting a plurality of category candidates for each category section in the input speech and then finding a category candidate string corresponding to a word by referring to a word dictionary has been used. However, also in this case, the existence of a certain category section may not be detected or a correct category candidate may not be detected in the category section due to the cause of the voicing bluntness or adjacent categories such as articulation of syllables.

そこで、特願昭６１−１９０２５８，１９０２５９，１
９０２６０，１９０２６１の「単語検出方式」に述べら
れている方式のように、入力音声中のカテゴリ候補を検
出したのち単語辞書中の単語のカテゴリ並びに従ってカ
テゴリ候補を選択するとともに、単語中のあるカテゴリ
が検出されなかった場合にはその前後のカテゴリに対応
するカテゴリ候補を手掛かりにして単語のカテゴリ列に
対応するカテゴリ候補列を見つけている。また、対応す
るカテゴリ候補列が見つかった単語については、そのカ
テゴリ候補列のスコアを計算して単語のスコアとすると
いう手段をとっている。Therefore, Japanese Patent Application No. 61-190258, 190259, 1
As in the method described in “Word detection method” of 90260, 190261, after detecting the category candidate in the input speech, the category candidate is selected according to the category arrangement of the words in the word dictionary, and a certain category in the word is selected. If is not detected, the category candidates corresponding to the categories before and after that are used as clues to find the category candidate string corresponding to the word category string. Further, regarding the word for which the corresponding category candidate string is found, the score of the category candidate string is calculated and used as the word score.

一方、多数の単語のうちからスコアの良い単語を選択す
る場合には、それぞれの単語について個別に上述の方法
で対応するカテゴリ候補列及びスコアを求めるとする
と、特に単語辞書中の単語数が多い場合には多大な計算
量を必要とし現実的でない。On the other hand, when selecting a word with a good score from a large number of words, if the corresponding category candidate string and score are individually obtained for each word by the above method, the number of words in the word dictionary is particularly large. In that case, it requires a large amount of calculation and is not realistic.

そこで、スコアの良いカテゴリ候補列を優先して求める
ことにより、すべての単語について対応するカテゴリ候
補列を求めることなく、スコアの良い単語を求めるよう
にしている。すなわち、単語のカテゴリ列をその始端か
ら辿ってその途中までの一部分に対応するカテゴリ候補
列を求めたうえそこまでのカテゴリ候補列のスコアを求
める。そして、各時点でもっともスコアの良いカテゴリ
候補列を選び、それに対応するカテゴリ列をさらに先に
辿る。この処理を繰り返しつつある単語の終端に達した
ときにその単語を結果として出力する。この結果、スコ
アの良いカテゴリ候補列に対応する単語が優先して求め
られることになる。Therefore, by preferentially obtaining the category candidate sequence having a good score, the word having a good score is obtained without obtaining the corresponding category candidate sequence for all the words. That is, the category string of a word is traced from its start end to obtain a category candidate sequence corresponding to a part of the way up to that point, and then the scores of the category candidate sequences up to that point are obtained. Then, at each time point, the category candidate sequence having the best score is selected, and the corresponding category sequence is further traced. When the end of the word which is repeating this process is reached, that word is output as a result. As a result, the word corresponding to the category candidate sequence having a good score is preferentially obtained.

ここで、カテゴリ候補列のスコアとしてはそのカテゴリ
候補列を構成するカテゴリ候補のスコアの平均値が与え
られることが常である。Here, as the score of the category candidate string, the average value of the scores of the category candidates forming the category candidate string is always given.

（発明が解決しようとする問題点）上述のように、カテゴリ列の始端から途中までに対応す
るカテゴリ候補列のスコアをそのカテゴリ候補列を構成
するカテゴリ候補のスコアの平均値から求めた場合、正
しい単語のカテゴリ列の始端に近いカテゴリに対応する
候補のスコアが他の候補に比べて大幅に悪いときにはそ
の単語がなかなか検出されないことが多いという欠点が
ある。(Problems to be Solved by the Invention) As described above, when the score of the category candidate sequence corresponding to the beginning to the middle of the category sequence is obtained from the average value of the scores of the category candidates forming the category candidate sequence, When the score of the candidate corresponding to the category close to the beginning of the category sequence of the correct word is much worse than the other candidates, there is a drawback that the word is often not detected easily.

第３図は入力音声から抽出された音声候補の一例を示す
説明図である。例えば、カテゴリとして音節を用い、
「ザイセイ」という音声が入力されたときにその各音節
に対して第３図に示されるような音節候補が得られたと
する。ここで、各音節候補に記されている数字はその音
節候補のスコアで、その値が小さいほど良い、すなわち
より信頼できるとする。このとき、単語「セイゲン（制
限）」、「ザイゲン（財源）」、「ゼイセイ（税
制）」、「ザイセイ（財政）」の各単語に対応する音節
候補列を第３図の音節候補を用いて作成するとそのスコ
アは、セイゲン：（３＋７＋８＋４）／４＝５．５ザイゲン：（８＋７＋８＋４）／４＝６．７５ゼイセイ：（１２＋７＋１＋１）／４＝５．２５ザイセイ：（８＋７＋１＋１）／４＝４．２５となり、正しい単語である「ザイセイ」のスコアが最も
小さく、従って最も良いスコアとなる。ところが、これ
らの各単語の音節列の始端から途中までの、長さがｎ
（ｎ＝１，２，３，４）の部分音節列に対応する音節候
補列のスコアを、同じく各音節候補のスコアの平均値と
して求めると、次のようになる。FIG. 3 is an explanatory diagram showing an example of voice candidates extracted from the input voice. For example, using syllables as categories,
It is assumed that when the voice "Zaysei" is input, a syllable candidate as shown in FIG. 3 is obtained for each syllable. Here, the number written in each syllable candidate is the score of the syllable candidate, and the smaller the value, the better, that is, the more reliable. At this time, the syllable candidate sequence corresponding to each of the words “seigen (restriction)”, “seigen (finance source)”, “zeisei (tax system)”, and “seisei (financial)” is calculated using the syllable candidates shown in FIG. When created, the score is: Seigen: (3 + 7 + 8 + 4) /4=5.5 Seigen: (8 + 7 + 8 + 4) /4=6.75 Zeisei: (12 + 7 + 1 + 1) /4=5.25 Seisei: (8 + 7 + 1 + 1) /4=4.25 And the correct word "Zaysei" has the lowest score and therefore the highest score. However, the length from the beginning to the middle of the syllable string of each of these words is n.
The score of the syllable candidate sequence corresponding to the (n = 1,2,3,4) partial syllable sequence is calculated as the average value of the scores of the respective syllable candidates as follows.

セ：３／１＝３セイ：（３＋７）／２＝５セイゲ：（３＋７＋８）／３＝６セイゲン：（３＋７＋８＋４）／４＝５．５ザ：８／１＝８ザイ：（８＋７）／２＝７．５ザイゲ：（８＋７＋８）／３＝７．６６ザイゲン：（８＋７＋８＋４）／４＝６．７５ゼ：１２／１＝１２ゼイ：（１２＋７）／２＝９．５ゼイセ：（１２＋７＋１）／３＝６．６６ゼイセイ：（１２＋７＋１＋１）／４＝５．２５ザ：８／１＝８ザイ：（８＋７）／２＝７．５ザイセ：（８＋７＋１）／３＝５．３３ザイセイ：（８＋７＋１＋１）／４＝４．２５従って、上述のようにスコアの最も良い音節候補列を選
びながら音節列を辿るとすると、上の４単語の音節列は
次の順序で辿られることになる。ここで、コロン（：）
右側に記す数字はその音節列に対応する音節候補列のス
コアである。Center: 3/1 = 3 Say: (3 + 7) / 2 = 5 Sayage: (3 + 7 + 8) / 3 = 6 Seigen: (3 + 7 + 8 + 4) /4=5.5 The: 8/1 = 8 Zay: (8 + 7) / 2 = 7.5 Zaige: (8 + 7 + 8) /3=7.66 Zeigen: (8 + 7 + 8 + 4) /4=6.75 Ze: 12/1 = 12 Zei: (12 + 7) /2=9.5 Zeise: (12 + 7 + 1) / 3 = 6.66 Zeise: (12 + 7 + 1 + 1) /4=5.25 The: 8/1 = 8 Zei: (8 + 7) /2=7.5 Zeise: (8 + 7 + 1) /3=5.33 Zeisei: (8 + 7 + 1 + 1) /4=4.25 Therefore, when the syllable string is traced while selecting the syllable candidate string having the best score as described above, the syllable string of the above four words is traced in the following order. Where a colon (:)
The number on the right is the score of the syllable candidate sequence corresponding to that syllable sequence.

まず、この４単語の各々の先頭の音節に対応する音節候
補列を求める。First, a syllable candidate string corresponding to the leading syllable of each of these four words is obtained.

セ：３ザ：８ザ：８ゼ：１２次に、この中でスコアのもっとも良い（小さい）音節候
補列を選び先に辿るという処理を繰り返す。すると、以
下の順序で音節候補列が作成される。S: 3 The: 8 The: 8 Z: 12 Next, the process of selecting the syllable candidate sequence having the best (smallest) score among these and tracing it to the destination is repeated. Then, the syllable candidate sequence is created in the following order.

セイ：５セイゲ：６セイゲン：５．５ザイ：７．５ザイゲ：７．６６ザイゲン：６．７５ザイ：７．５ザイセイ：４．２５このように、単語「セイゲン」、「ザイゲン」の音節列
を辿り終えてから、初めて単語「ザイセイ」の音節列を
得ることにより、誤った単語「制限」、「財源」が正し
い単語「財政」よりも先に検出されてしまう。これは
「ザイセイ」の先頭の「ザ」に対応する音節候補のスコ
ア８と悪いために、８よりも良いスコアの音節候補列が
なくなるまで「ザイゲン」の「ザ」が選ばれないことに
起因する。Say: 5 Sayge: 6 Saygen: 5.5 Zay: 7.5 Zayge: 7.66 Zaygen: 6.75 Zay: 7.5 Zaysei: 4.25 Thus, the syllables of the words “Saigen” and “Zaigen” By obtaining the syllable sequence of the word "Zaisei" for the first time after finishing the sequence, the wrong words "limit" and "finance source" are detected before the correct word "finance". This is because the score of the syllable candidate corresponding to the first "Za" in "Zaisei" is bad, so "Za" of "Saigen" is not selected until there is no syllable candidate string with a score better than 8. To do.

本発明の目的は、上述した欠点を除去し、このように正
しい単語のカテゴリ列の始端に近いカテゴリに対応する
候補のスコアが他の候補に比べて大幅に悪いときにも正
しい単語をより先にかつ少ない処理量で検出することを
可能にする単語検出方式を提供することにある。The object of the present invention is to eliminate the above-mentioned drawbacks, and thus, even when the score of the candidate corresponding to the category close to the start of the category sequence of the correct word is significantly worse than the other candidates, the correct word is further forwarded. Another object of the present invention is to provide a word detection method that enables detection with a small amount of processing.

（問題点を解決するための手段）本発明の単語検出方式は、音節、音素、音素クラス等の
カテゴリの列である入力音声から複数個のカテゴリ候補
およびこれらカテゴリ候補の検出評価における信頼度の
尺度としてのスコアの位置情報とを抽出するとともに単
語辞書に記憶されている単語のカテゴリ列を辿りながら
単語のカテゴリ列に対応するカテゴリ候補列を求めるこ
とによって入力音声中の単語を検出する単語検出方式に
おいて、ｎ個のカテゴリ候補からなるカテゴリ候補列の
スコアをｎがあらかじめ定めた数Ｎ以上のときにはｎ個
のカテゴリ候補のスコアの平均値を用いて算出しｎがＮ
未満のときにはｎ個のカテゴリ候補にｎおよびＮに依存
しあらかじめ設定する関数としての個数ｍ個の仮想的な
カテゴリ候補を加えた（ｎ＋ｍ）個のカテゴリ候補のス
コアの平均値を用いて算出して常にこのスコアが最も良
いカテゴリ候補列を求めるように前記単語辞書中の単語
のカテゴリ列を辿る手段を備えて構成される。(Means for Solving Problems) The word detection method of the present invention is a method of detecting a plurality of category candidates from an input speech that is a sequence of categories such as syllables, phonemes, and phoneme classes, and reliability in detection evaluation of these category candidates. Detecting words in the input speech by extracting the position information of the score as a measure and finding the category candidate string corresponding to the word category string while tracing the word category string stored in the word dictionary In the method, when the score of a category candidate string consisting of n category candidates is n or more, which is a predetermined number N, the average value of the scores of the n category candidates is used to calculate n.
When it is less than, it is calculated by using the average value of the scores of (n + m) category candidates obtained by adding n virtual category candidates as a preset function depending on n and N to n category candidates. And a means for tracing the category sequence of words in the word dictionary so as to always obtain the category candidate sequence having the best score.

（作用）上述の例においては、音節列「ザイセイ」に対応する音
節候補列のスコアは４．２５と他の単語のスコアよりも
良いのにもかかわらず、その先頭の音節「ザ」に対応す
る音節候補のスコアが８と悪い。一方、音節列「セイゲ
ン」に対応する音節候補列のスコアは５．５と単語「ザ
イセイ」よりも悪いが、その先頭の音節「セ」に対応す
る音節候補のスコアは３と単語「ザイセイ」の先頭の音
節候補のスコアよりも良くなっている。(Operation) In the above example, although the syllable candidate sequence corresponding to the syllable sequence “Zaysei” has a score of 4.25, which is better than the scores of other words, it corresponds to the leading syllable “The”. The score of the syllable candidate who does is bad as 8. On the other hand, the syllable candidate sequence corresponding to the syllable sequence “Saigen” has a score of 5.5, which is worse than that of the word “Zaisei”, but the score of the syllable candidate corresponding to the leading syllable “SE” is 3 and the word “Zaisei”. It is better than the score of the first syllable candidate.

このように、単語全体のカテゴリ列に対応するカテゴリ
候補列から算出したスコアとしてはその単語の信頼性を
正しく評価しているが、従来技術では単語の一部分のカ
テゴリ列に対応するカテゴリ候補列だけから算出したス
コアを単語のスコアとして扱っているために、その一部
分に偶然悪いスコアのカテゴリ候補が含まれている場合
にその単語のスコアが悪くなってしまう。As described above, the reliability of a word is correctly evaluated as a score calculated from the category candidate string corresponding to the category string of the entire word, but in the conventional technique, only the category candidate string corresponding to the category string of a part of the word is evaluated. Since the score calculated from is treated as the score of the word, the score of the word becomes worse when a part of it accidentally contains a category candidate having a bad score.

一方、単語の一部分のカテゴリ列にしか対応するカテゴ
リ候補列が定まっていない段階では、単語全体のカテゴ
リ列に対応するカテゴリ候補列のスコアを用いることは
できない。On the other hand, when the category candidate string corresponding to only the category string of a part of the word is not determined, the score of the category candidate string corresponding to the category string of the entire word cannot be used.

そこで本発明の方法では、単語のカテゴリ列中のカテゴ
リのうち、まだ対応するカテゴリ候補が定まっていない
カテゴリに対しては、ある平均的なスコアを持つカテゴ
リ候補を仮想する。すなわち、カテゴリ候補列のスコア
を求める際に、その長さｎがあらかじめ定めた長さＮよ
りも短い場合には、そのカテゴリ候補列は単語の一部分
のカテゴリ列に対応するものであると判断し、ｎ及びＮ
に依存するしあらかじめ設定する関数としての個数のカ
テゴリ候補を仮想する。この結果、対応するカテゴリ候
補列の一部分に悪いスコアのカテゴリ候補が含まれてい
た場合にも、仮想されたカテゴリ候補列のスコアによっ
て平均化されることにより、単語のスコアはそれほど悪
くならない。従って、その単語に対応するカテゴリ候補
列をすばやく求めることができる。Therefore, in the method of the present invention, among the categories in the word category string, the category candidate having a certain average score is hypothesized for the category for which the corresponding category candidate has not been determined yet. That is, when the score of a category candidate string is obtained, if the length n is shorter than a predetermined length N, it is determined that the category candidate string corresponds to a category string that is a part of a word. , N and N
The number of category candidates is hypothesized as a function that depends on and is set in advance. As a result, even when a part of the corresponding category candidate string includes a category candidate having a bad score, the score of the word is not so bad because the category candidate string is averaged by the score of the virtual category candidate string. Therefore, the category candidate sequence corresponding to the word can be quickly obtained.

（実施例）次に図面を参照して本発明を詳細に説明する。(Example) Next, this invention is demonstrated in detail with reference to drawings.

第１図は本発明の一実施例を示すブロック図である。第
１図に示す実施例では日本語の音声が入力されるものと
し、またカテゴリとしては音節を用いている。FIG. 1 is a block diagram showing an embodiment of the present invention. In the embodiment shown in FIG. 1, Japanese voices are input, and syllables are used as categories.

音節候補抽出部１０１は入力音声中の音節候補を検出
し、その候補をそのスコアの入力音声中での位置ととも
に音節候補記憶部１０２に記憶する。The syllable candidate extraction unit 101 detects a syllable candidate in the input voice and stores the candidate in the syllable candidate storage unit 102 together with the position of the score in the input voice.

第２図は音節候補抽出部１０１の一例を示すブロック図
である。第２図において、入力音声は音声パッファ２０
１に一旦格納される。まず、母音候補検出部２０２が音
声バッファ２０１に格納された音声中の母音候補を検出
し母音候補記憶部２０３に格納する。母音候補の検出は
母音パタン記憶部２０４にあらかじめ格納されている各
母音の音声標準パタンと入力音声の各区間とを照合する
ことによって行われる。母音の音声信号は比較的定常で
あるので検出は容易ある。各母音は少なくとも母音名の
ほか、入力音声中での位置の情報を保持している。FIG. 2 is a block diagram showing an example of the syllable candidate extraction unit 101. In FIG. 2, the input voice is the voice puffer 20.
It is temporarily stored in 1. First, the vowel candidate detection unit 202 detects a vowel candidate in the voice stored in the voice buffer 201 and stores it in the vowel candidate storage unit 203. The vowel candidate is detected by comparing the voice standard pattern of each vowel previously stored in the vowel pattern storage unit 204 with each section of the input voice. The vowel voice signal is relatively stationary and therefore easy to detect. Each vowel holds at least the vowel name and position information in the input voice.

母音候補の検出が終了した後、子音候補検出部２０５に
よって子音候補が次に述べるようにして検出される。日
本語においては、音節は子音（Ｃ）−母音（Ｖ）の組で
ある。従って入力音声中では、２個の母音に挾まれた区
間のうちのある時間長以下の区間（以下これをＶＣＶ区
間と呼ぶ）及び入力音声の始端からある時間長以内にあ
る区間（以下これをＣＶ区間と呼ぶ）までの各に１個の
子音が存在するといえる。子音候補検出部２０４は母音
候補検出部２０３に記憶されている母音候補から作られ
るすべてのＶＣＶ区間及びＣＶ区間の各々に対して、あ
らかじめ子音パタン記憶部２０６に記憶されているＶＣ
Ｖ及びＣＶ標準音声パタンとの照合を行い、類似度の高
い複数個の音声パタンの名前を子音候補とする。以上で
決定された母音候補と子音候補とを組み合わせて音節候
補とし、入力音声中での位置と共に音節候補記憶部１０
２に記憶する。After the detection of the vowel candidate is completed, the consonant candidate detection unit 205 detects the consonant candidate as described below. In Japanese, a syllable is a consonant (C) -vowel (V) pair. Therefore, in the input speech, a section within a certain time length (hereinafter referred to as a VCV section) of a section sandwiched between two vowels and a section within a certain time length from the beginning of the input speech (hereinafter referred to as a VCV section). It can be said that there is one consonant in each of up to the CV section). The consonant candidate detection unit 204 stores VCs stored in advance in the consonant pattern storage unit 206 for all VCV sections and CV sections created from the vowel candidates stored in the vowel candidate detection unit 203.
The V and CV standard voice patterns are collated, and the names of a plurality of voice patterns having a high degree of similarity are used as consonant candidates. The vowel candidate and the consonant candidate determined above are combined into a syllable candidate, and the syllable candidate storage unit 10 together with the position in the input voice.
Store in 2.

例として、「ザイセイ（財政）」という音声が入力され
たとする。この場合、音節認識結果として例えば第３図
に示されるような音節候補が抽出される。第３図におい
ては、各音節区間に複数個の音節候補が抽出されてお
り、各音節候補に記されている数字がその候補のスコア
である。As an example, it is assumed that the voice "Zaisei (Finance)" is input. In this case, syllable candidates such as those shown in FIG. 3 are extracted as the syllable recognition result. In FIG. 3, a plurality of syllable candidates are extracted in each syllable section, and the number described in each syllable candidate is the score of the candidate.

単語記憶部１０３には検出すべき単語の音節列が記憶さ
れている。いま、単語記憶部１０３には「セイゲン（制
限）」、「ザイゲン（財源）」、「ゼイセイ（税制）」
「ザイセイ（財政）」の４単語が記憶されているとす
る。The word storage unit 103 stores syllable strings of words to be detected. Now, in the word storage unit 103, "seigen (restriction)", "seigen (finance resource)", "zeisei (tax system)"
It is assumed that the four words “Zaysei (Finance)” are stored.

音節候補列生成部１０４は、まず、単語記憶部１０３の
各単語の先頭に対応する音節候補を音節候補記憶部１０
２に記憶されている音節候補から選し、各々を長さ１の
音節候補列とする。次にスコア計算部１０５によってそ
れぞれの音節候補列のスコアを計算する。本実施例で
は、Ｎ＝４とし、ｎおよびＮに依存してあらかじめ設定
する関数としてのｍは、ｍ＝Ｎ−ｎを利用し、仮想する
音節候補のスコアは１としている。The syllable candidate string generation unit 104 first obtains the syllable candidate corresponding to the beginning of each word in the word storage unit 103 as the syllable candidate storage unit 10.
The syllable candidates stored in No. 2 are selected, and each is made a syllable candidate string of length 1. Next, the score calculation unit 105 calculates the score of each syllable candidate string. In this embodiment, N = 4, m as a function preset depending on n and N uses m = N−n, and the score of the virtual syllable candidate is 1.

例えば、単語「セイゲン」の先頭の音節「セ」に対応す
る音節候補セ「３」だけからなる音節候補列のスコア
は、この音節候補と３個のスコア１の仮想的な音節候補
の計４個の音節候補のスコアの平均値であるから、（３
＋１＋１＋１）／４＝１．５となる。For example, the score of a syllable candidate sequence consisting of only the syllable candidate sequence “3” corresponding to the first syllable “S” of the word “Saigen” has a total of 4 virtual syllable candidates including this syllable candidate and 3 scores 1. Since it is the average value of the scores of the individual syllable candidates, (3
+ 1 + 1 + 1) /4=1.5.

これらの音節候補列は、そのスコアと対応する単語と共
に、音節候補列記憶部１０６に記憶される。この結果、
音節候補列記憶部１０６には、セ：１．５「セイゲン」ザ：２．７５「ザイゲン」ゼ：３．７５「ゼイセイ」ザ：２．７５「ザイセイ」の４個の音節候補列が記憶されている。ここで「」の
中は対応する単語である。These syllable candidate strings are stored in the syllable candidate string storage unit 106 together with the words corresponding to the scores. As a result,
The syllable candidate string storage unit 106 stores four syllable candidate strings of C: 1.5 “Saigen” The: 2.75 “Zaigen” Z: 3.75 “Zeisei” The: 2.75 “Zaisei” Has been done. Here, the inside of "" is a corresponding word.

次に、音節候補列選択部１０７は、音節候補列記憶部１
０６中の音節候補列のうち、もっともスコアの良い、即
ちその値の小さい音節候補列を取り出し、その音節候補
列及び単語を音節候補列生成部１０４に送る。音節列候
補列生成部１０４は受け取った音節候補列を、単語の音
節列に従って更に延長し、改めてそのスコアをスコア計
算部１０５で計算した後、音節候補列記憶部１０６に記
憶する。Next, the syllable candidate string selection unit 107 determines the syllable candidate string storage unit 1
Among the syllable candidate sequences in 06, the syllable candidate sequence having the best score, that is, the smallest value is taken out, and the syllable candidate sequence and the word are sent to the syllable candidate sequence generation unit 104. The syllable string candidate string generation unit 104 further extends the received syllable string candidate according to the syllable string of the word, calculates the score again by the score calculation unit 105, and then stores the score in the syllable candidate string storage unit 106.

いまの場合、セ：１．５「セイゲン」が取り出され、新たに、セイ：３「セイゲン」が音節列記憶部１０６に記憶される。この結果、音節候
補列記憶部１０６には、ザ：２．７５「ザイゲン」ゼ：３．７５「ゼイセイ」：２．７５「ザイセイ」セイ：３「セイゲン」が記憶されている。従って次に、この中でもっとも良い
スコアを持つ音節候補列である、ザ：２．７５「ザイゲン」が取り出される。In the present case, SE: 1.5 “SEIGEN” is taken out, and SEI: 3 “SEIGEN” is newly stored in the syllable string storage unit 106. As a result, the syllable candidate string storage unit 106 stores the: 2.75 “Zaygen” Ze: 3.75 “Zaysei”: 2.75 “Zaysei” Sei: 3 “Saigen”. Therefore, next, the syllable candidate sequence having the best score among these, the: 2.75 "Zeigen", is extracted.

以上の処理を繰り返すと、次に示す音節候補列が作られ
ていく。By repeating the above process, the following syllable candidate sequence is created.

ザイ：４．２５「ザイゲン」ザイ：４．２５「ザイセイ」セイゲ：４．７５「セイゲ」ザイゲ：６「ザイゲン」ザイセ：４．２５「ザイセイ」ザイセイ：４．２５「ザイセイ」ここで、音節候補列は単語「ザイセイ」の終端に達し
ているので、音節候補列選択部１０７はこの単語を検出
結果として出力する。ZAI: 4.25 “Zaygen” ZAI: 4.25 “Zaysei” SAIGE: 4.75 “SAIGE” ZAIGE: 6 “ZAIGEN” ZISE: 4.25 “ZAIsei” ZAISEI: 4.25 “ZAISEI” where syllables Since the candidate string has reached the end of the word “Zaysei”, the syllable candidate string selection unit 107 outputs this word as a detection result.

このように、正しい単語「ザイセイ」が最初に検出され
る。しかも従来の方法では前述のように１３の音節候補
を作成した段階で「ザイセイ」が検出されたのに対し
て、本発明の方法では１１個の音節候補列を作成した段
階で検出されている。Thus, the correct word "Zaysei" is first detected. Moreover, in the conventional method, "Zaysei" is detected when 13 syllable candidates are created as described above, whereas in the method of the present invention, 11 syllable candidate strings are detected. .

以上、本発明の一実施例を説明した。この実施例では説
明を簡単にするために音節認識の段階で音節認識誤りが
起こらなかった場合、すなわち入力されたすべての音節
に対して少なくとも正しい音節候補が抽出された場合に
ついて述べたが、音節認識誤りが生じた場合にも、前述
した特願昭６１−１９０２５８，１９０２５９，１９０
２６０，１９０２６１の「単語検出方式」に述べられて
いる方式を用いることによって、上記実施例と同様に効
率よく正しい単語を検出することができる。The embodiment of the present invention has been described above. In this embodiment, for simplification of description, the case where no syllable recognition error occurs in the stage of syllable recognition, that is, the case where at least correct syllable candidates are extracted for all input syllables is described. Even when a recognition error occurs, the aforementioned Japanese Patent Application No. 61-190258, 190259, 190
By using the method described in the "word detection method" of 260, 190261, a correct word can be detected efficiently as in the above embodiment.

なお、検出対象の単語が多数存在するときには、それら
の単語を木構造形式で表現することにより、すなわち音
節を節点とし、根節点から葉節点までの節点列が各の単
語の音節列を表わすことが一般的であるが、その場合に
も各の音節列についてみれば、本発明の方法を適用する
と上記の実施例と同様の結果となる。When many words to be detected exist, they are expressed in a tree structure format, that is, syllables are used as nodes, and the node string from the root node to the leaf node represents the syllable string of each word. However, in this case also, regarding each syllable string, when the method of the present invention is applied, the same result as the above-mentioned embodiment is obtained.

（発明の効果）以上説明したように本発明によれば、正しい単語のカテ
ゴリ列の始端に近いカテゴリに対応する候補のスコアや
他の候補に比べて大幅に悪いときにも、正しい単語を他
の単語よりも先に検出し、しかも生成されるカテゴリ候
補列の数が少なく、効率の良い単語検出を行うことが可
能となる単語検出方式が実現できるという効果がある。(Effects of the Invention) As described above, according to the present invention, even when the score of a candidate corresponding to a category close to the start end of the category sequence of a correct word or when the score is significantly worse than other candidates, the correct word is excluded. There is an effect that it is possible to realize a word detection method that can detect words earlier than, and generate a small number of category candidate strings, and can perform efficient word detection.

[Brief description of drawings]

第１図は本発明の一実施例を示すブロック図、第２図は
第１図の実施例における音節候補抽出部の一例を示すブ
ロック図、第３図は入力音声から抽出された音節候補の
一例を示す説明図である。１０１……音節候補抽出部、１０２……音節候補記憶
部、１０３……単語記憶部、１０４……音節候補列生成
部、１０５……スコア計算部、１０６……音節候補列記
憶部、１０７……音節候補列選択部、２０１……音声バ
ッファ、２０２……母音候補検出部、２０３……母音候
補記憶部、２０４……母音パタン記憶部、２０５……子
音候補検出部、２０６……子音パタン記憶部。FIG. 1 is a block diagram showing an embodiment of the present invention, FIG. 2 is a block diagram showing an example of a syllable candidate extraction unit in the embodiment of FIG. 1, and FIG. 3 is a syllable candidate extracted from an input voice. It is explanatory drawing which shows an example. 101 syllable candidate extraction unit, 102 syllable candidate storage unit, 103 word storage unit, 104 syllable candidate string generation unit, 105 score calculation unit, 106 syllable candidate string storage unit, 107 ... Syllable candidate string selection unit, 201 ... Voice buffer, 202 ... Vowel candidate detection unit, 203 ... Vowel candidate storage unit, 204 ... Vowel pattern storage unit, 205 ... Consonant candidate detection unit, 206 ... Consonant pattern Memory.

Claims

[Claims]

1. A plurality of category candidates and a score and position information as a measure of reliability in detection and evaluation of these category candidates are extracted from an input speech that is a sequence of categories such as syllables, phonemes, and phoneme classes, and words are extracted. In a word detection method for detecting a word in an input voice by finding a category candidate string corresponding to a word category string while tracing a category string of words stored in a dictionary, a category candidate string consisting of n category candidates Is calculated by using the average value of the scores of n category candidates when n is a predetermined number N or more, and when n is less than N, a function that is preset depending on n and N for the n category candidates Is calculated by using the average value of the scores of (n + m) category candidates including m virtual category candidates as A word detecting method comprising means for tracing a category string of words in the word dictionary so as to always obtain a category candidate string having the best score.