JPH0632015B2

JPH0632015B2 - Word detector

Info

Publication number: JPH0632015B2
Application number: JP61307048A
Authority: JP
Inventors: 香一郎畑崎
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1986-12-22
Filing date: 1986-12-22
Publication date: 1994-04-27
Anticipated expiration: 2009-04-27
Also published as: JPS63158598A

Description

【発明の詳細な説明】（産業上の利用分野）本発明は単語検出装置に関し、特に音声認識、音声入力
装置等において入力音声中に含まれる単語を検出する単
語検出装置に関する。Description: TECHNICAL FIELD The present invention relates to a word detection device, and more particularly to a word detection device that detects a word included in an input voice in a voice recognition device, a voice input device, or the like.

（従来の技術）音声認識装置、音声入力装置等においては、入力音声中
の単語を検出する方法のひとつとして、入力音声を音
節、音素、音素クラス等のカテゴリの列と見なし、入力
音声から検出したカテゴリを用いて作成したカテゴリ列
が、単語辞書に記憶されている単語のカテゴリ列に対応
すれば、その単語を検出結果とするという方法がある。(Prior Art) In a voice recognition device, a voice input device, etc., as one of the methods for detecting words in the input voice, the input voice is regarded as a sequence of categories such as syllables, phonemes, and phoneme classes, and detected from the input voice. There is a method in which if a category string created using the above categories corresponds to the category string of a word stored in the word dictionary, that word is used as the detection result.

一般に上述したカテゴリは、その発声時間長が短いこと
や類似するカテゴリが存在することなどから、入力音声
中から正しいカテゴリだけを誤りなく検出することは困
難である。そこで従来から、入力音声中の各カテゴリ区
間に対して複数個のカテゴリ候補を検出したのち、単語
辞書を参照して、単語に対応するカテゴリ候補列を見つ
けるという方法を用いている。しかしながらこの場合に
も、発声のなまけや隣接するカテゴリ例えば音節どうし
の調音結合等の原因によって、あるカテゴリ区間の存在
が検出できなかったり、あるカテゴリ区間に正しいカテ
ゴリ候補が検出できない場合がある。In general, the above-mentioned categories have a short utterance time length, and there are similar categories. Therefore, it is difficult to detect only correct categories from the input speech without error. Therefore, conventionally, a method of detecting a plurality of category candidates for each category section in the input speech and then referring to the word dictionary to find a category candidate string corresponding to the word is used. However, even in this case, the presence of a certain category section may not be detected, or a correct category candidate may not be detected in a certain category section, due to the causes such as the looseness of vocalization and the adjacent categories such as articulation of syllables.

そこで、特願昭６１−１９０２５８，１９０２５９，１
９０２６０，１９０２６１の「単語検出方式」に述べら
れている方式のように、入力音声中のカテゴリ候補を検
出したのち単語辞書中の単語のカテゴリ並びに従ってカ
テゴリ候補を選択するとともに、単語中のあるカテゴリ
が検出されなかった場合にはその前後のカテゴリに対応
するカテゴリ候補を手掛かりにして、単語のカテゴリ列
に対応するカテゴリ候補列を見つけ、また、対応するカ
テゴリ候補列が見つかった単語についてはそのスコアを
計算するという手段をとっている。Therefore, Japanese Patent Application No. 61-190258, 190259, 1
As in the method described in “Word detection method” of 90260, 190261, after detecting the category candidate in the input speech, the category candidate is selected according to the category arrangement of the words in the word dictionary, and a certain category in the word is selected. When is not detected, the category candidates corresponding to the categories before and after that are used as a clue to find the category candidate sequence corresponding to the word category column, and the score for the word for which the corresponding category candidate sequence is found. Is taken to calculate.

一方、多数の単語のうちからスコアの良い単語を選択す
る場合に、それぞれの単語について個別に上述の方法で
対応するカテゴリ候補列及びスコアを求めるとすると、
特に単語辞書中の単語数が多い場合には多大な計算量を
必要とし現実的でない。On the other hand, when selecting a word with a good score from a large number of words, if the corresponding category candidate sequence and score are individually calculated for each word by the above method,
Especially when the number of words in the word dictionary is large, it requires a large amount of calculation and is not realistic.

そこで、通常、多数の単語のカテゴリ列をいわゆるツリ
ー（tree）構造で表現する木構造形式の単語辞書を用い
る。このような単語辞書では、節点間の枝がカテゴリに
対応し、木の根節点から葉節点に至る枝列すなわちカテ
ゴリ列のそれぞれが単語のカテゴリ列を表す。また、複
数個の単語が同じカテゴリ列で始まるときには、その同
じカテゴリ列は木の上で共有され、これによって、同じ
カテゴリ列を何度も辿る必要がなくなる。Therefore, a tree-structured word dictionary that represents a category string of many words in a so-called tree structure is usually used. In such a word dictionary, the branches between the nodes correspond to the categories, and each branch row from the root node to the leaf node of the tree, that is, the category row, represents the category row of the word. Also, when multiple words begin with the same category string, the same category string is shared on the tree, which eliminates the need to traverse the same category string multiple times.

また、木の根節点から途中節点まで辿り終えたカテゴリ
列のそれぞれになんらかの基準で検出評価における信頼
度の尺度としてのスコアを与え、常に、スコアの良いす
なわち信頼度の高いカテゴリ列を選択して更に先に辿る
ことにより、最終的にスコアの良いカテゴリ列に対応す
る単語を得ることができる。この方法は一般に、最良優
先探索法として呼ばれている。In addition, a score as a measure of the reliability in the detection evaluation is given to each of the category strings that have been traced from the root node to the middle node of the tree, and a category string with a good score, that is, a highly reliable category string is always selected and further advanced. By following, it is possible to finally obtain the word corresponding to the category sequence having a good score. This method is commonly referred to as the best-first search method.

この方法によれば、すべての単語のカテゴリ列を辿るこ
となくスコアの良いカテゴリ列に対応する単語を求める
ことができる。この場合、カテゴリ列のスコアとして
は、そのカテゴリ列に対応するカテゴリ候補列中のカテ
ゴリ候補のスコアの平均値が与えられることが常であ
る。According to this method, a word corresponding to a category string having a good score can be obtained without following the category string of all words. In this case, as the score of the category string, the average value of the scores of the category candidates in the category candidate string corresponding to the category string is always given.

（発明が解決しようとする問題点）上述のようにカテゴリ列のスコアを対応するカテゴリ候
補列中のカテゴリ候補のスコアの平均値から求めた場合
には、正しい単語のカテゴリ列の始端に近いカテゴリに
対応する候補のスコアが他の候補に比べて大幅に悪いと
きは、その単語がなかなか検出されないという欠点があ
る。(Problems to be Solved by the Invention) As described above, when the score of the category string is obtained from the average value of the scores of the category candidates in the corresponding category candidate string, the category close to the start end of the category string of the correct word When the score of the candidate corresponding to is significantly worse than the other candidates, there is a drawback that the word is hard to be detected.

第３図は入力音声から抽出された音声候補の一例を示す
説明図である。例えば、第３図に示す如くカテゴリとし
て音節を用い、「ザイセイ」という音声が入力されたと
きにその各音節に対してそれぞれ音節候補が得られたと
する。ここで、各音節候補に記されている数字はその音
節候補のスコアで、その値が小さいほど良い、すなわち
より信頼できるとする。このとき、単語「セイゲン（制
限）」、「ザイゲン（財源）」、「ゼイセイ（税
制）」、「ザイセイ（財政）」の各単語に対応する音節
候補列を第３図の音節候補を用いて作成するとそのスコ
アは、セイゲン：（３＋７＋８＋４）／４＝５．５ザイゲン：（８＋７＋８＋４）／４＝６．７５ゼイセイ：（１２＋７＋１＋１）／４＝５．２５ザイセイ：（８＋７＋１＋１）／４＝４．２５となり、正しい単語である「ザイセイ」のスコアが最も
小さく、従って最も良いスコアとなる。ところが、これ
らの各単語の音節列の始端から途中までの、長さがｎ
（ｎ＝１，２，３，４）の部分音節列のスコアを、上述
のように対応する音節候補列中の音節候補のスコアの平
均値として求めると、次のようになる。FIG. 3 is an explanatory diagram showing an example of voice candidates extracted from the input voice. For example, suppose that syllables are used as categories as shown in FIG. 3, and when a voice "Zaysei" is input, a syllable candidate is obtained for each syllable. Here, the number written in each syllable candidate is the score of the syllable candidate, and the smaller the value, the better, that is, the more reliable. At this time, the syllable candidate sequence corresponding to each of the words “seigen (restriction)”, “seigen (finance source)”, “zeisei (tax system)”, and “seisei (financial)” is calculated using the syllable candidates shown in FIG. When created, the score is: Seigen: (3 + 7 + 8 + 4) /4=5.5 Seigen: (8 + 7 + 8 + 4) /4=6.75 Zeisei: (12 + 7 + 1 + 1) /4=5.25 Seisei: (8 + 7 + 1 + 1) /4=4.25 And the correct word "Zaysei" has the lowest score and therefore the highest score. However, the length from the beginning to the middle of the syllable string of each of these words is n.
When the score of the (n = 1,2,3,4) partial syllable string is obtained as the average value of the scores of the syllable candidates in the corresponding syllable candidate string as described above, the result is as follows.

セ：３／１＝３セイ：（３＋７）／２＝５セイゲ：（３＋７＋８）／３＝６セイゲン：（３＋７＋８＋４）／４＝５．５ザ：８／１＝８ザイ：（８＋７）／２＝７．５ザイゲ：（８＋７＋８）／３＝７．６６ザイゲン：（８＋７＋８＋４）／４＝６．７５ゼ：１２／１＝１２ゼイ：（１２＋７）／２＝９．５ゼイセ：（１２＋７＋１）／３＝６．６６ゼイセイ：（１２＋７＋１＋１）／４＝５．２５ザ：８／１＝８ザイ：（８＋７）／２＝７．５ザイセ：（８＋７＋１）／３＝５．３３ザイセイ：（８＋７＋１＋１）／４＝４．２５このように、正しい単語「ザイセイ」の先頭の音節
「ザ」だけからなる音節列のスコアは、単語「セイゲ
ン」、「ザイゲン」のどの部分音節列のスコアよりも良
くない。すなわち、単語「セイゲン」、「ザイゲン」の
カテゴリ列を辿り終えたあとで初めて正しい単語「ザイ
セイ」のカテゴリ列が辿られることになりこのぶん正し
い単語の検出が遅れることになる。Center: 3/1 = 3 Say: (3 + 7) / 2 = 5 Sayage: (3 + 7 + 8) / 3 = 6 Seigen: (3 + 7 + 8 + 4) /4=5.5 The: 8/1 = 8 Zay: (8 + 7) / 2 = 7.5 Zaige: (8 + 7 + 8) /3=7.66 Zeigen: (8 + 7 + 8 + 4) /4=6.75 Ze: 12/1 = 12 Zei: (12 + 7) /2=9.5 Zeise: (12 + 7 + 1) / 3 = 6.66 Zeise: (12 + 7 + 1 + 1) /4=5.25 The: 8/1 = 8 Zei: (8 + 7) /2=7.5 Zeise: (8 + 7 + 1) /3=5.33 Zeisei: (8 + 7 + 1 + 1) /4=4.25 Thus, the score of the syllable string consisting only of the leading syllable "Za" of the correct word "Zaisei" is not better than the score of any partial syllable string of the words "Seigen" and "Zaigen". . That is, the category sequence of the correct word “Zaisei” is traced only after the category sequences of the words “Saigen” and “Zaigen” are completed, which delays the detection of the correct word.

本発明の目的は、上述した欠点を除去し、正しい単語の
カテゴリ列の始端に近いカテゴリに対応する候補のスコ
アが他の候補に比べて大幅に悪い場合でも正しい単語を
より早くかつ少ない処理量で検出することを可能にする
単語検出装置を提供することにある。An object of the present invention is to eliminate the above-mentioned drawbacks and to execute a correct word faster and with less processing amount even when the score of a candidate corresponding to a category close to the beginning of the category sequence of a correct word is significantly worse than other candidates. An object of the present invention is to provide a word detection device that can be detected by.

（問題点を解決するための手段）本発明の単語検出装置は、音節、音素、音素クラス等の
カテゴリの列である入力音声から複数個のカテゴリ候補
およびカテゴリ候補の検出評価における信頼度の尺度と
してのスコアの位置情報とを抽出して記憶するカテゴリ
候補抽出手段と、節点間の枝がカテゴリに対応し根節点
から葉節点までの枝列としてのカテゴリ列を検出対象の
単語のカテゴリ列として木構造形式の単語辞書と、前記
単語辞書に含まれる少なくとも１個のカテゴリからなる
カテゴリとこれに対応するカテゴリ候補列との組を少な
くとも１個格納するカテゴリ列記憶手段と、前記カテゴ
リ列記憶手段中のカテゴリ列のスコアを対応するカテゴ
リ候補列中のカテゴリ候補の数ｎがあらかじめ定めた数
Ｎ以上のときにはｎ個のカテゴリ候補のスコアの平均値
を用いて算出しｎがＮ未満のときにはｎ個のカテゴリ候
補にｎおよびＮに依存しあらかじめ設定する関数として
の個数ｍ個の仮想的なカテゴリ候補を加えた（ｎ＋ｍ）
個のカテゴリ候補のスコアの平均値を用いて算出するス
コア計算手段と、前記カテゴリ列記憶手段に記憶されて
いるカテゴリ列のうちで最もスコアの良いカテゴリ列と
これに対応するカテゴリ候補列とを取り出し前記最もス
コアの良いカテゴリ列が単語辞書の葉節点に達している
場合にはそのカテゴリ列に対応する単語を検出結果とし
て出力するとともに単語辞書の葉節点に達してない場合
にはそのカテゴリ列とカテゴリ候補列とを未検出結果と
して出力するカテゴリ選択手段と、前記カテゴリ候補選
択手段から未検出結果を受取ってカテゴリ列の終端の節
点から単語辞書をさらに辿りより長いカテゴリ列とこれ
に対応するカテゴリ候補列の組を１個以上生成したうえ
それらを前記カテゴリ列記憶手段に追加するカテゴリ列
生成手段とを有することを特徴とする。(Means for Solving Problems) A word detection device of the present invention is a measure of reliability in detecting and evaluating a plurality of category candidates and category candidates from an input speech that is a sequence of categories such as syllables, phonemes, and phoneme classes. The category candidate extraction means for extracting and storing the position information of the score and the category sequence as a branch sequence from the root node to the leaf node as the category sequence of the detection target word A tree-structured word dictionary, a category string storage unit that stores at least one set of a category consisting of at least one category included in the word dictionary, and a category candidate sequence corresponding to the category, and the category string storage unit. When the number n of the category candidates in the corresponding category candidate sequence is equal to or more than the predetermined number N, the score of the category category Calculated using the average value of the scores, and when n is less than N, the number of virtual category candidates (n + m) is added to the number of category candidates of m as a preset function depending on n and N.
Score calculation means for calculating using the average value of the score of each category candidate, a category string with the best score among the category strings stored in the category string storage means, and a category candidate string corresponding to this If the category string with the highest score reaches the leaf node of the word dictionary, the word corresponding to that category string is output as the detection result, and if it does not reach the leaf node of the word dictionary, the category string And a category candidate sequence are output as undetected results, and the undetected results are received from the category candidate selection means, and the word dictionary is further traced from the end node of the category sequence to correspond to a longer category sequence. And a category string generating means for generating one or more sets of category candidate strings and adding them to the category string storing means. It is characterized in.

（作用）上述の例においては、音節列「ザイセイ」に対応する音
節候補列のスコアは４．２５と他の単語のスコアよりも
良いのにもかかわらず、その先頭の音節「ザ」に対応す
る音節候補のスコアが８と悪い。一方、音節列「セイゲ
ン」に対応する音節候補列のスコアは５．５と単語「ザ
イセイ」よりも悪いが、その先頭の音節「セ」に対応す
る音節候補のスコアは３と単語「ザイセイ」の先頭の音
節候補のスコアよりも良くなっている。(Operation) In the above example, although the syllable candidate sequence corresponding to the syllable sequence “Zaysei” has a score of 4.25, which is better than the scores of other words, it corresponds to the leading syllable “The”. The score of the syllable candidate who does is bad as 8. On the other hand, the syllable candidate sequence corresponding to the syllable sequence “Saigen” has a score of 5.5, which is worse than that of the word “Zaisei”, but the score of the syllable candidate corresponding to the leading syllable “SE” is 3 and the word “Zaisei”. It is better than the score of the first syllable candidate.

このように、単語全体のカテゴリ列に対応するカテゴリ
候補列全体のスコアから算出したスコアとしてはその単
語の信頼性を正しく評価しているが、従来技術では単語
の一部分のカテゴリ列に対応するカテゴリ候補列だけか
ら算出したスコアを単語のスコアとして扱っているため
に、その一部分に偶然悪いスコアのカテゴリ候補が含ま
れている場合にその単語のスコアが悪くなってしまう。In this way, the reliability of the word is correctly evaluated as a score calculated from the score of the entire category candidate string corresponding to the category string of the entire word, but in the prior art, the category corresponding to the category string of a part of the word Since the score calculated from only the candidate sequence is treated as the score of the word, the score of the word becomes worse when a part of the category accidentally contains a category candidate having a bad score.

一方、単語の一部分のカテゴリ列にしか対応するカテゴ
リ候補列が定まっていない段階では、単語全体のカテゴ
リ列に対応するカテゴリ候補列のスコアを用いることは
できない。On the other hand, when the category candidate string corresponding to only the category string of a part of the word is not determined, the score of the category candidate string corresponding to the category string of the entire word cannot be used.

そこで本発明の方法では、単語のカテゴリ列中のカテゴ
リのうち、まだ対応するカテゴリ候補が定まっていない
カテゴリに対しては、ある平均的なスコアを持つカテゴ
リ候補を仮想する。すなわち、カテゴリ候補列のスコア
を求める際に、その長さｎがあらかじめ定めた長さＮよ
りも短い場合には、そのカテゴリ候補列は単語の一部分
のカテゴリ列に対応するものであると判断し、ｎおよび
Ｎに依存する個数ｍ個のカテゴリ候補を仮想する。この
結果、対応するカテゴリ候補列の一部分に悪いスコアの
カテゴリ候補が含まれていた場合にも、仮想されたカテ
ゴリ候補列のスコアによって平均化されることにより、
単語のスコアはそれほど悪くならないという手法をとっ
ている。従って、その単語に対応するカテゴリ候補列を
すばやく求めることができる。Therefore, in the method of the present invention, among the categories in the word category string, the category candidate having a certain average score is hypothesized for the category for which the corresponding category candidate has not been determined yet. That is, when the score of a category candidate string is obtained, if the length n is shorter than a predetermined length N, it is determined that the category candidate string corresponds to a category string that is a part of a word. , N and N, the number m of category candidates is hypothesized. As a result, even when a part of the corresponding category candidate string includes a category candidate having a bad score, by averaging the score of the virtual category candidate string,
The word score is not so bad. Therefore, the category candidate sequence corresponding to the word can be quickly obtained.

例えば、Ｎ＝４とし、ｎおよびＮに依存しあらかじめ設
定する関数としてのｍは、ｍ＝Ｎ−ｎ、また仮想的な音
節候補のスコアを１とすると、上述の例における各単語
の部分的なカテゴリ例のスコアは次のようになる。For example, if N = 4 and m as a function preset depending on n and N is m = N−n, and if the score of a virtual syllable candidate is 1, the partial words of each word in the above example are represented. The scores for various example categories are as follows:

セ：（３＋１＋１＋１）／４＝１．５セイ：（３＋７＋１＋１）／４＝３セイゲ：（３＋７＋８＋１）／４＝４．７５セイゲン：（３＋７＋８＋４）／４＝５．５ザ：（８＋１＋１＋１）／４＝２．７５ザイ：（８＋７＋１＋１）／４＝４．２５ザイゲ：（８＋７＋８＋１）／４＝６ザイゲン：（８＋７＋８＋４）／４＝６．７５ゼ：（１２＋１＋１＋１）／４＝３．７５ゼイ：（１２＋７＋１＋１）／４＝５．２５ゼイセ：（１２＋７＋１＋１）／４＝５．２５ゼイセイ：（１２＋７＋１＋１）／４＝５．２５ザ：（８＋１＋１＋１）／４＝２．７５ザイ：（８＋７＋１＋１）／４＝４．２５ザイセ：（８＋７＋１＋１）／４＝４．２５ザイセイ：（８＋７＋１＋１）／４＝４．２５従って、このスコアの順に音節列を辿ると、次の順に辿
ることになる。ここで「」内は対応する単語である。Center: (3 + 1 + 1 + 1) /4=1.5 Say: (3 + 7 + 1 + 1) / 4 = 3 Saege: (3 + 7 + 8 + 1) /4=4.75 Seigen: (3 + 7 + 8 + 4) /4=5.5 The: (8 + 1 + 1 + 1) / 4 = 2.75 Zai: (8 + 7 + 1 + 1) /4=4.25 Zaige: (8 + 7 + 8 + 1) / 4 = 6 Zeigen: (8 + 7 + 8 + 4) /4=6.75 Ze: (12 + 1 + 1 + 1) /4=3.75 Zei: (12 + 7 + 1 + 1) /4=5.25 Zeise: (12 + 7 + 1 + 1) /4=5.25 Zeise: (12 + 7 + 1 + 1) /4=5.25 The: (8 + 1 + 1 + 1) /4=2.75 Zei: (8 + 7 + 1 + 1) /4=4.25 Zeise: (8 + 7 + 1 + 1) /4=4.25 Zeise: (8 + 7 + 1 + 1) /4=4.25 Therefore, if you follow the syllable sequence in the order of this score, It will be traced in the following order. Here, the inside of "" is a corresponding word.

セ：１．５「セイゲン」ザ：２．７５「ザイゲン」ザ：２．７５「ザイセイ」セイ：３「セイゲン」ザイ：４．２５「ザイゲン」ザイ：４．２５「ザイセイ」セイゲ：４．７５「セイゲン」ザイゲ：６「ザイゲン」ザイセ：４．２５「ザイセイ」ザイセイ：４．２５「ザイセイ」このように、正しい単語「ザイセイ」のカテゴリ列を最
初に辿り終えることができる。また、辿るべきカテゴリ
の数も少なくなる。C: 1.5 “Saigen” The: 2.75 “Zaigen” The: 2.75 “Zaisei” Sei: 3 “Saigen” Zai: 4.25 “Zaigen” Zai: 4.25 “Zaisei” Saige: 4. 75 “Saigen” Seige: 6 “Zaigen” Zeise: 4.25 “Zaisei” Zaisei: 4.25 “Zaisei” Thus, the category sequence of the correct word “Zaisei” can be traced first. Also, the number of categories to follow is reduced.

（実施例）次に、図面を参照しつつ本発明を詳細に説明する。(Example) Next, the present invention will be described in detail with reference to the drawings.

第１図は本発明の一実施例を示すブロック図である。第
１図に示す実施例では日本語の音声が入力されるものと
し、またカテゴリとして音節を用いる。FIG. 1 is a block diagram showing an embodiment of the present invention. In the embodiment shown in FIG. 1, it is assumed that Japanese voice is input and syllables are used as categories.

音節候補抽出部１０１は入力音声中の音節候補を検出
し、その候補をそのスコアと入力音声中での位置ととも
に音節候補記憶部１０２に記憶する。The syllable candidate extraction unit 101 detects a syllable candidate in the input voice and stores the candidate in the syllable candidate storage unit 102 together with the score and the position in the input voice.

第２図は音節候補抽出部１０１の一例を示すブロック図
である。第２図において、入力音声は音声バッファ２０
１に一旦格納される。まず、母音候補検出部２０２が音
声バッファ２０１に格納された音声中の母音候補を検出
し母音候補記憶部２０３に格納する。母音候補の検出
は、母音パタン記憶部２０４にあらかじめ格納されてい
る各母音の音声標準パタンと入力音声の各区間とを照合
することによって行われる。母音の音声信号は比較的定
常であるので検出は容易である。各母音は、少なくとも
母音名のほか、入力音声中での位置の情報を保持してい
る。FIG. 2 is a block diagram showing an example of the syllable candidate extraction unit 101. In FIG. 2, the input voice is the voice buffer 20.
It is temporarily stored in 1. First, the vowel candidate detection unit 202 detects a vowel candidate in the voice stored in the voice buffer 201 and stores it in the vowel candidate storage unit 203. The vowel candidate is detected by comparing the voice standard pattern of each vowel stored in advance in the vowel pattern storage unit 204 with each section of the input voice. The vowel voice signal is relatively stationary and therefore easy to detect. Each vowel holds at least the vowel name and position information in the input voice.

母音候補の検出が終了した後、子音候補検出部２０５に
よって子音候補が次に述べるようにして検出される。日
本語においては、音節は子音（Ｃ）−母音（Ｖ）の組で
ある。従って入力音声中では、２個の母音に挾まれた区
間のうちのある時間長以下の区間（以下これをＶＣ区間
と呼ぶ）及び入力音声の始端からある時間長以内にある
区間（以下これをＣＶ区間と呼ぶ）までの各に１個の子
音が存在するといえる。子音候補検出部２０４は母音候
補記憶部２０３に記憶されている母音候補から作られる
すべてのＶＣＶ区間及びＣＶ区間の各々に対して、あら
かじめ子音パタン記憶部２０６に記憶されているＶＣＶ
及びＣＶ標準音声パタンとの照合を行い、類似度の高い
複数個の音声パタンの名前を子音候補とする。以上で決
定された母音候補と子音候補とを組み合わせて音節候補
とし、入力音声中での設置と共に音節候補記憶部１０２
に記憶する。After the detection of the vowel candidate is completed, the consonant candidate detection unit 205 detects the consonant candidate as described below. In Japanese, a syllable is a consonant (C) -vowel (V) pair. Therefore, in the input speech, a section within a certain time length (hereinafter, referred to as a VC section) of a section sandwiched between two vowels and a section within a certain time length from the start end of the input speech (hereinafter, referred to as a VC section). It can be said that there is one consonant in each of up to the CV section). The consonant candidate detection unit 204 stores the VCV section previously stored in the consonant pattern storage section 206 for all the VCV sections and CV sections created from the vowel candidates stored in the vowel candidate storage section 203.
And CV standard voice patterns are compared, and the names of a plurality of voice patterns having a high degree of similarity are used as consonant candidates. The vowel candidate and the consonant candidate determined in the above are combined into a syllable candidate, and the syllable candidate storage unit 102 is installed in the input voice.
Remember.

例として、「ザイセイ（財政）」という音声が入力され
たとする。この場合、音節認識結果として例えば第３図
に示されるような音節候補が抽出される。第３図におい
ては各音節区間に複数個の音節候補が抽出されており、
各音節候補に記されている数字がその候補のスコアであ
る。As an example, it is assumed that the voice "Zaisei (Finance)" is input. In this case, syllable candidates such as those shown in FIG. 3 are extracted as the syllable recognition result. In FIG. 3, a plurality of syllable candidates are extracted in each syllable section,
The number written in each syllable candidate is the score of that candidate.

単語辞書１０３には検出すべき単語の音節列が木構造形
式で記憶されている。いま、単語辞書１０３には「ケイ
サン（計算）」、「ザイゲン（財源）」、「ザイサン
（財産）」、「セイゲン（制限）」「セイジ（政
治）」、「ゼイセイ（税制）」の７単語が記憶されてい
るとする。この様子を第４図に示す。第４図は第１図の
実施例における単語辞書の記憶内容の一例を示す説明図
である。枝に付けられている数字は以降の説明で用いる
ための枝番号を示す。The word dictionary 103 stores syllable strings of words to be detected in a tree structure format. Now, in the word dictionary 103, 7 words of "Keisan (calculation)", "Zaigen (finance source)", "Zaisan (property)", "Seigen (restriction)", "Sage (politics)", and "Zaysei (tax)" Is stored. This is shown in FIG. FIG. 4 is an explanatory diagram showing an example of the stored contents of the word dictionary in the embodiment of FIG. The numbers attached to the branches indicate branch numbers used in the following description.

音節候補列生成部１０４は、まず始めに、単語辞書１０
３の根節点に続く枝の音節のそれぞれについて、その音
節に対応する音節候補が音節候補記憶部１０２に記憶さ
れている場合には、その音節を長さ１の音節列として、
対応する音節候補と共に音節列記憶部１０５に記憶す
る。さらに、スコア計算部１０６がそれらの音節列に対
してそのスコアを計算して付与する。本実施例では、Ｎ
＝４，ｍ＝Ｎ−ｎ、仮想的な音節候補のスコアを１とす
る。First, the syllable candidate string generation unit 104 first determines the word dictionary 10
For each of the syllables of the branch following the root node of No. 3, when the syllable candidate corresponding to the syllable is stored in the syllable candidate storage unit 102, the syllable is regarded as a syllable string of length 1.
It is stored in the syllable string storage unit 105 together with the corresponding syllable candidate. Further, the score calculation unit 106 calculates and assigns the score to those syllable strings. In this embodiment, N
= 4, m = N−n, and the score of the virtual syllable candidate is 1.

例えば、枝セ（１３）に対応する音節候補はセ〔３〕で
あり、この音節列のスコアは、この音節候補のスコアの
平均値であるから、（３＋１＋１＋１）／４＝１．５と
なる。For example, the syllable candidate corresponding to the branch center (13) is center [3], and the score of this syllable string is the average value of the scores of this syllable candidate, so that (3 + 1 + 1 + 1) /4=1.5. .

この結果、今の場合、音節列記憶部１０５には次の３個
の音節列が記憶されることになる。ここで、各音節列に
対して順に、音節列番号、音節列のスコア、音節列、対
応する音節候補列を示す。また、（）内は枝番号、
〔〕内は音節候補のスコアである。As a result, in this case, the syllable string storage unit 105 stores the following three syllable strings. Here, for each syllable string, the syllable string number, the score of the syllable string, the syllable string, and the corresponding syllable candidate string are shown in order. In addition, () is the branch number,
The value in [] is the score of the syllable candidate.

２．７５ザ（５）ザ〔８〕１．５セ（１３）セ〔３〕３．７５ゼ（１８）ゼ〔１２〕次に、音節列選択部１０７は、音節列記憶部１０５中の
音節列のうち、もっともスコアの良い、すなわちその値
の小さい音節列を取り出し、その音節列及び音節候補列
を音節列生成部１０４に送る。音節列生成部１０４は受
け取った音節列をその終端点から更に延長し、より長い
音節列を生成する。すなわち、単語辞書１０３中でその
終端節点に続く枝に対応する音節候補が音節候補記憶部
１０２に含まれ、かつその音節候補が現在の音節候補列
に接続し得るならば、その音節候補を現在の音節候補列
に接続する。生成された音節列と音節候補列は音節記憶
部１０５に記憶さ、そのスコアがスコア計算部１０６に
よって計算される。2.75 The (5) The [8] 1.5 C (13) C [3] 3.75 Z (18) Z [12] Next, the syllable string selection unit 107 stores the syllable string storage unit 105. Of the syllable strings, the syllable string having the highest score, that is, the smallest value is extracted, and the syllable string and the syllable candidate string are sent to the syllable string generation unit 104. The syllable string generation unit 104 further extends the received syllable string from its end point to generate a longer syllable string. That is, if the syllable candidate corresponding to the branch following the terminal node in the word dictionary 103 is included in the syllable candidate storage unit 102 and the syllable candidate can be connected to the current syllable candidate string, the syllable candidate is currently selected. Connect to the syllable candidate sequence of. The generated syllable string and syllable candidate string are stored in the syllable storage unit 105, and the score thereof is calculated by the score calculation unit 106.

今の場合、音節列が延長される。この結果、音節候補列３セ（１３）−イ（１４）セ〔３〕−イ〔７〕が音節列記憶部１０５に記憶される。この結果、音節列
記憶部には次の音節列が記憶されていることになる。In this case, the syllable string is extended. As a result, the syllable candidate sequence 3 (13) -A (14) C [3] -A [7] is stored in the syllable sequence storage unit 105. As a result, the syllable string storage unit stores the next syllable string.

２．７５ザ（５）ザ〔８〕３．７５ゼ（１８）ゼ〔１２〕３セ（１３）−イ（１４）セ〔３〕−イ〔７〕同様に、音節列記憶部１０５の内容は次のように変更さ
れていく。2.75 the (5) the [8] 3.75 ze (18) ze [12] 3 c (13) -a (14) c [3] -a [7] Similarly, in the syllable string storage unit 105 The contents will be changed as follows.

音節列から音節列が生成される。A syllable string is generated from the syllable string.

３．７５ゼ（１８）ゼ〔１２〕３セ（１３）−イ（１４）セ〔３〕−イ〔７〕４．２５（５）−イ（６）ザ〔８〕−イ〔７〕音節列から音節列が生成される。 3.75 Ze (18) Ze [12] 3 Ce (13) -i (14) Ce [3] -i [7] 4.25 (5) -i (6) The [8] -i [7] A syllable string is generated from the syllable string.

３．７５ゼ（１８）ゼ〔１２〕４．２５ザ（５）−イ（６）ザ〔８）−イ〔７〕４．７５セ（１３）−イ（１４）−ゲ（１５）セ〔３）−イ〔７〕−ゲ〔８〕音節列から音節列が生成される。 3.75 Ze (18) Ze [12] 4.25 The (5) -A (6) The [8] -A [7] 4.75 C (13) -A (14) -G (15) C [3) -a [7] -gear [8] A syllable string is generated from the syllable string.

４．２５ザ（５）−イ（６）ザ〔８〕−イ〔７〕４．７５セ（１３）−イ（１４）−ゲ（１５）セ〔３〕−イ〔７〕−ゲ〔８〕５．２５ゼ（１８）−イ（１９）ゼ〔１２〕−イ〔７〕音節列から音節列，が生成される。 4.25 The (5) -a (6) The [8] -a [7] 4.75 The (13) -a (14) -ge (15) The [3] -a [7] -ge [ 8] 5.25 ze (18) -a (19) ze [12] -a [7] A syllable string is generated from the syllable string.

４．７５セ（１３）−イ（１４）−ゲ（１５）セ〔３〕−イ〔７〕−ゲ〔８〕５．２５ゼ（１８）−イ（１９）ゼ〔１２〕−イ〔７〕６ザ（５）−イ（６）−ゲ（７）ザ〔８〕−イ〔７〕−ゲ〔８〕４．２５ザ（５）−イ（６）−セ（１１）ザ〔８〕−イ〔７〕−セ〔１〕音節列から音節列が生成される。 4.75 Ce (13) -i (14) -Ge (15) Ce [3] -i [7] -Ge [8] 5.25 Ze (18) -i (19) Ze [12] -i [ 7] 6 the (5) -a (6) -ge (7) the [8] -a [7] -ge [8] 4.25 the (5) -a (6) -se (11) the [ 8] -a [7] -c [1] A syllable string is generated from the syllable string.

４．７５セ（１３）−イ（１４）−ゲ（１５）セ〔３）−イ〔７〕−ゲ〔８〕５．２５ゼ（１８）−イ（１９）ゼ〔１２〕−イ〔７〕６ザ（５）−イ（６）−ゲ（７）ザ〔８〕−イ〔７〕−ゲ〔８〕４．２５ザ（５）−イ（６）−セ（１１）−イ
（１２）ザ〔８〕−イ〔７〕−セ〔１〕−イ〔１〕ここで、音節列の終端は単語辞書１０３の葉節点に達
しているので、音節列選択部１０７はこの単語「ザイセ
イ」を検出結果として出力する。4.75 Ce (13) -i (14) -Ge (15) Ce [3] -i [7] -Ge [8] 5.25 Ze (18) -i (19) Ze [12] -i [ 7] 6 the (5) -a (6) -ge (7) the [8] -a [7] -ge [8] 4.25 the (5) -a (6) -se (11) -a (12) The [8] -a [7] -c [1] -a [1] Here, since the end of the syllable string reaches the leaf node of the word dictionary 103, the syllable string selecting unit 107 selects this word. "Zaysei" is output as the detection result.

このように、正しい単語「ザイセイ」が最初に検出され
る。本実施例では説明を簡単にするために、音節認識の
段階で音節認識誤りが起こらなかった場合、すなわち入
力されたすべての音節に対して少なくとも正しい音節候
補が抽出された場合について述べたが、音節認識誤りが
生じた場合にも、前述した特願昭６１−１９０２５８，
１９０２５９，１９０２６０，１９０２６１の「単語検
出方式」に述べられている方式を用いることによって上
記実施例と同様に効率よく正しい単語を検出することが
できる。Thus, the correct word "Zaysei" is first detected. In the present embodiment, in order to simplify the description, the case where no syllable recognition error occurs in the stage of syllable recognition, that is, the case where at least a correct syllable candidate is extracted for all input syllables is described. Even when a syllable recognition error occurs, the aforementioned Japanese Patent Application No. 61-190258,
By using the method described in "Word Detection Method" of 190259, 190260, 190261, a correct word can be detected efficiently as in the above embodiment.

（発明の効果）以上説明したように本発明によれば、正しい単語のカテ
ゴリ列の始端に近いカテゴリに対応する候補のスコアが
他の候補に比べて大幅に悪いときにも、正しい単語を他
の単語よりも先に検出し、しかも生成されるカテゴリ列
の数が少なく効率の良い単語検出を行うことが可能とな
る単語検出装置が実現することができるという効果があ
る。(Effects of the Invention) As described above, according to the present invention, even when the score of the candidate corresponding to the category close to the start end of the category sequence of the correct word is significantly worse than the other candidates, the correct word is excluded. There is an effect that it is possible to realize a word detection device that can detect words earlier than, and can efficiently perform word detection with a small number of generated category strings.

[Brief description of drawings]

第１図は本発明の一実施例を示すブロック図、第２図は
第１図の実施例における音節候補抽出部の一例を示すブ
ロック図、第３図は入力音声から抽出された音節候補の
一例を示す説明図、第４図は第１図実施例における単語
辞書の記憶内容一例を示す説明図である。１０１……音節候補抽出部、１０２……音節候補記憶
部、１０３……単語辞書、１０４……音節列生成部、１
０５……音節列記憶部、１０６……スコア計算部、１０
７……音節列選択部、２０１……音声バッファ、２０２
……母音候補検出部、２０３……母音候補記憶部、２０
４……母音パタン記憶部、２０５……子音候補検出部、
２０６……子音パタン記憶部。FIG. 1 is a block diagram showing an embodiment of the present invention, FIG. 2 is a block diagram showing an example of a syllable candidate extraction unit in the embodiment of FIG. 1, and FIG. 3 is a syllable candidate extracted from an input voice. FIG. 4 is an explanatory diagram showing an example, and FIG. 4 is an explanatory diagram showing an example of stored contents of the word dictionary in the embodiment of FIG. 101 ... Syllable candidate extracting section, 102 ... syllable candidate storing section, 103 ... word dictionary, 104 ... syllable string generating section, 1
05 ... syllable string storage unit, 106 ... score calculation unit, 10
7 ... Syllable string selection unit, 201 ... voice buffer, 202
Vowel candidate detection unit, 203 vowel candidate storage unit, 20
4 ... Vowel pattern storage unit, 205 ... Consonant candidate detection unit,
206 ... Consonant pattern storage unit.

Claims

[Claims]

1. A plurality of category candidates and a score and position information as a measure of reliability in detection and evaluation of these category candidates are extracted and stored from an input speech which is a sequence of categories such as syllables, phonemes, and phoneme classes. Category candidate extracting means, a tree-structured word dictionary in which branches between nodes correspond to categories, and a category string as a branch string from a root node to a leaf node is a category string of words to be detected, and the word dictionary A category string storing means for storing at least one set of a category string consisting of at least one category and a category candidate string corresponding to the category string, and a category corresponding to the score of the category string in the category string storing means. When the number n of category candidates in the candidate sequence is equal to or more than the predetermined number N, n is calculated using the average value of the category candidates, and n is When less than, it is calculated by using the average value of the scores of (n + m) category candidates obtained by adding m virtual category candidates as a preset function depending on n and N to n category candidates. A score calculating means and a category string having the highest score among the category strings stored in the category string storing means and a category candidate string corresponding thereto are extracted, and the category string having the highest score is a leaf node of a word dictionary. If the category string has not reached the leaf node of the word dictionary, the category string and the category candidate string are treated as undetected results. And a category selection means for outputting the result, and an undetected result is received from the category candidate selection means, and a word dictionary is further created from the node at the end of the category string. 1 set of a category string longer than the trace and a corresponding category candidate string
A word detecting device, comprising: a category string generating means for generating more than one and adding them to the category string storing means to detect a word.