JPH0475098A

JPH0475098A - Voice recognition device

Info

Publication number: JPH0475098A
Application number: JP2191315A
Authority: JP
Inventors: Akira Tsuruta; 彰鶴田
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1990-07-17
Filing date: 1990-07-17
Publication date: 1992-03-10

Abstract

PURPOSE:To automatically remove a meaningless recognition result which is added to the heat of a word by erasing all recognition results corresponding to the syllable of the head of the work and re-editing a recognition result, and then instructing a process based on the recognition result after the re-editing to a language processing part and a display part. CONSTITUTION:There is the meaning less recognition result due to a noise at the position corresponding to the work head of a paragraph candidate displayed in a menu display area 11 and when an operator judges that the meaningless recognition result should be removed, the removal of the recognition result of the word head is instructed by depressing a key on a keyboard 8. The syllable candidate corresponding to the syllable of the word head in a generated syllable lattice is erased under the control of a control part 6. Then a paragraph candidate array is generated again according to a re-edited syllable lattice and displayed in the menu display area 11 by a display part 7. Consequently, the meaningless recognition result added to the word head of each candidate array can automatically be removed.

Description

[Detailed description of the invention] [Industrial application field]

この発明は、′入力された音声を認識して文法的に正し
いと思われる候補列を作成する音声認識装置に関する。The present invention relates to a speech recognition device that recognizes input speech and creates a candidate sequence that is considered to be grammatically correct.

[Conventional technology]

入力された音声を音声認識装置によって認識する際にお
いて、人力音声の前後にオペレータの息や音声認識装置
の操作音等のノイズが入り、このノイズに起因する無意
味な認識結果が入力音声の前後に付加されるために正し
い認識結果か得られない場合かある。従来、上述のように人力音声の前後にノイズか入った場
合には、“後退”キーを操作することによってノイズに
よる無意味な認識結果を除去するようにしている。When the input voice is recognized by the voice recognition device, noise such as the operator's breath or the operation sound of the voice recognition device is introduced before and after the human voice, and meaningless recognition results due to this noise are generated before and after the input voice. There are cases in which the correct recognition result may or may not be obtained due to the addition of the recognition result. Conventionally, when noise appears before or after a human voice as described above, the meaningless recognition result due to the noise is removed by operating the "reverse" key.

[Problem to be solved by the invention]

上述のように、“後退”キーの操作によって無意味な認
識結果を除去する場合において、入力音声の後に無意味
な認識結果が付加された場合には無意味な認識結果は音
声区間の後にあるので、“後退”キーを操作することに
よって上記無意味な認識結果を容易に消去することがで
きる。ところが、入力音声の前に無意味な認識結果が付加され
た場合には無意味な認識結果が語頭にあるので、カーソ
ルを無意味な認識結果の箇所まで移動するためのキー操
作が煩わしいという問題がある。また、煩わしいキー操
作を避けるために、入力音声の認識結果を取り消して同
じ音声を再発声する方法もある。しかしながら、この場
合には再発声の結果圧しい認識結果が得られるとは限ら
ず、結局煩わしいキー操作に頼らなければならないとい
う問題かある。そこで、この発明の目的は、煩わしいキー操作や認識結
果の取消／再発声を実施することなく、語頭に付加され
た無意味な認識結果を自動的？こ除去できる音声認識装
置を提供することにある。As mentioned above, when removing meaningless recognition results by operating the "backward" key, if meaningless recognition results are added after the input speech, the meaningless recognition results are after the speech section. Therefore, the meaningless recognition results can be easily erased by operating the "backward" key. However, when a meaningless recognition result is added before the input speech, the meaningless recognition result is at the beginning of the word, so key operations to move the cursor to the meaningless recognition result are cumbersome. There is. Furthermore, in order to avoid troublesome key operations, there is also a method of canceling the recognition result of the input voice and re-uttering the same voice. However, in this case, it is not always possible to obtain impressive recognition results as a result of repeating the voice, and there is a problem in that the user must rely on cumbersome key operations. Therefore, the purpose of this invention is to automatically remove meaningless recognition results added to the beginning of words without having to perform cumbersome key operations or canceling/re-voicing recognition results. An object of the present invention is to provide a speech recognition device that can eliminate this problem.

[Means to solve the problem]

上記目的を達成するため、この発明は、音声分析部によ
って入力音声信号から抽出された特徴ノくラメータに基
づいて認識部によって人力音声を音韻単位または音節単
位で認識し、この認識結果に基づいて生成された候補列
の中から正しい候補を選択して出力する音声認識装置に
おいて、上記認識部における音韻単位または音節単位の
認識によって得られた認識結果を格納する認識結果格納
部と、上記認識結果格納部に格納された上記認識結果を
用いて言語処理を行って単語単位または文節単位の上記
候補列を生成する言語処理部と、上記言語処理部によっ
て生成された上記候補列をメニュー表示する表示部と、
上記表示部によってメニュー表示された候補列の語頭に
無意味な認識結果が付加されている場合に、オペレータ
によって操作されてこの無意味な認識結果の除去を指示
する指示手段と、語頭に付加された無意味な認識結果の
除去か上記指示手段によって指示されると、上記認識結
果格納部に格納された認識結果の中から語頭の音節に対
応する総ての認識結果を消去して上記認識結果を再編し
た後、上記言語処理部および表示部に対して再編後の認
識結果に基づく処理を指示する認識結果再編手段を備え
たことを特徴としている。In order to achieve the above object, the present invention recognizes human speech in units of phonemes or syllables by a recognition unit based on feature parameters extracted from an input speech signal by a speech analysis unit, and based on the recognition results. A speech recognition device that selects and outputs a correct candidate from a generated candidate string, comprising: a recognition result storage unit that stores recognition results obtained by phoneme unit or syllable unit recognition in the recognition unit; a language processing unit that performs language processing using the recognition results stored in the storage unit to generate the candidate string on a word-by-word or phrase-by-phrase basis; and a display that displays a menu of the candidate strings generated by the language processing unit. Department and
When a meaningless recognition result is added to the beginning of a word in a candidate string displayed in the menu by the display section, an instruction means is operated by an operator to instruct removal of the meaningless recognition result; When instructed by the instruction means to remove meaningless recognition results, all recognition results corresponding to the initial syllable of a word are deleted from among the recognition results stored in the recognition result storage section and the recognition results are deleted. The present invention is characterized by comprising recognition result reorganization means for instructing the language processing unit and display unit to perform processing based on the reorganized recognition results after the reorganization.

[Effect]

入力された音声信号から音声分析部によって特徴パラメ
ータが抽出される。そして、この抽出された特徴パラメ
ータに基づいて認識部によって入力音声が音韻単位また
は音節単位で認識される。そして、得られた認識結果が認識結果格納部に格納され
る。そうすると、上記認識結果格納部に格納された上記
認識結果を用いて、言語処理部によって言語処理が行わ
れて単語単位または文節単位の候補列が生成される。そ
して、この生成された候補列が表示部によってメニュー
表示される。上記表示部でメニュー表示された候補列がオペレータに
よって参照される。そして、各候補列における語頭に無
意味な認識結果が付加されていると判断された場合には
、オペレータによって指示手段が操作されて、上記無意
味な認識結果の除去が指示される。そうすると、認識結
果再編手段によって、上記認識結果格納部に格納された
認識結果の中から語頭の音節に対応する総ての認識結果
が消去される。そして更に、上記言語処理部および表示
部に対して再編後の認識結果に基づく処理が指示される
。そうすると、上記言語処理部によって、語頭に付加され
た無意味な認識結果が除去されて再編された後の認識結
果に基づいて候補列が再び生成され、語頭に無意味な認
識結果が付加されていない認識結果が表示部に表示され
る。したがって、オペレータによって上記指示手段か操作さ
れるたけで、各候補列の語頭に付加された無意味な認識
結果か自動的に除去されて、語順に無意味な認識結果が
付加されていない認識結果が表示部によってメニコー表
示される。A voice analysis unit extracts feature parameters from the input voice signal. Then, based on the extracted feature parameters, the recognition unit recognizes the input speech in units of phonemes or units of syllables. Then, the obtained recognition result is stored in the recognition result storage section. Then, using the recognition results stored in the recognition result storage section, language processing is performed by the language processing section to generate a candidate string on a word-by-word or phrase-by-phrase basis. The generated candidate string is then displayed as a menu on the display unit. The candidate columns displayed in the menu on the display section are referred to by the operator. If it is determined that a meaningless recognition result is added to the beginning of a word in each candidate string, the operator operates the instruction means to instruct removal of the meaningless recognition result. Then, the recognition result reorganization means deletes all the recognition results corresponding to the initial syllable of the word from among the recognition results stored in the recognition result storage section. Further, the language processing unit and display unit are instructed to perform processing based on the recognition result after the reorganization. Then, the language processing unit generates a candidate string again based on the recognition results after removing and reorganizing the meaningless recognition results added to the beginning of the word, and eliminates the meaningless recognition result added to the beginning of the word. An incorrect recognition result is displayed on the display. Therefore, just by operating the above-mentioned instruction means by the operator, meaningless recognition results added to the beginning of each word of each candidate string are automatically removed, and recognition results in which meaningless recognition results are not added to the word order are obtained. is displayed in a menu on the display.

【Example】

以下、この発明を図示の実施例により詳細に説明する。第１図はこの発明の一実施例を示すブロック図である。マイクロホンｌから入力された音声信号は音声分析部２
によってＡ／Ｄ変換され、フレーム毎に特徴パラメータ
が抽出される。こうして抽出された特徴パラメータの時
系列が音節に切り出されて特徴パターンが得られ、この
得られた特徴パターンか音節認識部３に送出される。上記音節認識部３では、入力された特徴パターンと予め
作成された音節標準パターンとの類似度に基づいて入力
音声が音節単位で認識され、認識の結果得られた認識結
果（すなわち、音節ラティス）が認識結果格納部４に格
納される。言語処理部５では、音節認識部３によって求
められた音節ラティスの組み合わせから文節候補が作成
され、単語辞書等を用いて文法的に正しい文節候補列が
作成される。こうして得られた文節候補列は表示部７に
表示される。キーボード８は、上記表示部７に表示された文節候補列
の中から正しい文節候補を選択する際におけるカーソル
移動の指示のような、種々の指示をキー人力する際に用
いられる。また、制御部６は、上記音声分析部２．音節
認識部３．言語処理部５、表示部７およびキーボード８
を制御して、後に詳述するような音声認識処理を実行す
る。この発明は、上記制御部６の制御の下に音声認識処理を
実行している際において、オペレータの息やキーボード
８の操作音等のノイズがマイクロホンｌに入力され、こ
の入力されたノイズに起因する無意味な認識結果が語頭
に付加された場合に、この無意味な認識結果を自動的に
削除するものである。以下、上記構成の音声認識装置の動作を、文節／しゃか
いわ（社会は）／が音声入力された場合を例に具体的に
説明する。オペレータがマイクロホンｌに向かって／しゃかいわ／
と発声し始めると、音声分析部２は人力音声／しゃかい
わ／の音響分析を行って特徴パラメータを順次出力する
。そうすると、音節認識部３は、出力された特徴パラメ
ータが音節に切り出された特徴パターンと音節標準パタ
ーンとの類似度を用いて音節の認識を行い、第２図に示
すような認識結果を得る。その際に、音声の認識には音
節区間の誤検出や音節の誤認識等によって曖昧さが含ま
れるので、認識結果は音節ラティスとして出力されて認
識結果格納部４に格納されるのである。ここで、オペレータがマイクロホンｌに向かって／しゃ
かいわ／と発声する直前に、ノイズが発生して無意味な
認識結果が語頭に付加されたとする。したがって、第２図においては音節列／しゃ／、／か／
、／い／、／わ／に対する音節ラティスの前にノイズに
対する認識結果（音節候補）が付加されている。こうして、音声／しゃかいわ／の人力が終了した段階で
、例えばキーボード８から言語処理部５による言語処理
の実行が指示されたとする。そうすると、言語処理部５において、まず音節認識部３
によって生成された第２図に示す音節ラティスを参照し
て、音節ラティスの中から類似度が第１位の音節候補／
きゅ／、／さ／、７′か／、／い／、／わ／のみから成
る候補順位が１位の文節候補／きゅさかいわ／を生成し
て出力する。以下、文節候補／きゅさかいわ／を構成す
る音節候補を、所定の規則に従って音節ラティスから読
み出した音節候補と順次入れ換えて次々文節候補を作成
し、累積類似度の高い順に出力するのである。その結果
、第３図に示すような文節候補列が生成される。次に、
言語処理部５は上述のようにして生成された文節候補列
と単語辞書（図示せず）の内容との照合を行って、単語
辞書にはない文法的に正しくない文節候補を棄却する。そうすると、制御部６の制御に従って、表示部７は公知
技術である例えばウィンドウ表示技術によって、第４図
に示すように、言語処理部５によって生成された複数の
文節候補から成る文節候補列を、候補順位の高い順に表
示画面のメニュー表示領域１１にメニュー表示する。そ
の際に、メニュー表示領域Ｉｆの上部の表示領域１２に
は、既に認識された文節「日本の」と現在認識処理が実
行されている入力音声／しやかいわ／に対する文節候補
列のうちカーソル１３によって指定された文節候補「し
ゅさかいは」とから成る文章「日本のしゅさかいは」が
同時に表示されている。この場合には、上述のようにノイズに起因する無意味な
認識結果（／シゅ／又は／きゆ／）が語頭に付加されて
いるので、正しい文節候補口じやかいは」が得られない
のである。ところが、第４図から分かるように、メニュ
ー表示領域１１の上から２段目に表示されている文節候
補「しゅしゃかいは」から、上記ノイズに起因する無意
味な認識結果「しゅ」を除去すれば、正しい文節候補「
しやかいはＪが得られることが分かる。そこで、本実施例においては、語頭に付加された上記ノ
イズに起因する無意味な認識結果の除去を、例えばキー
ボード８の一つのキーを押圧するという一動作によって
指示するのである。そうすると、上記制御部６の制御の下に、上記記憶部に
格納され１こ入力音声／じゃかいね／′に係る音節ラテ
ィスを構成する音節候補の中からノイズ（語頭の音節）
に対応する音節候補を消去して、音節ラティスを再編す
る。その後、ノイズに対応する音節候補か消去された再
編後の音節ラティスを再度展開して第５図に示すような
文節候補列を得る。そして、言語処理部５で再度言語処
理を実施し、第６図に示すように、表示画面のメニュー
表示領域１１に言語処理の結果を再度メニュー表示する
のである。つまり、上記キーボード８て上記指示手段を構成する一
方、上記制御部６で上記認識結果再編手段を構成するの
である。上述の場合には、既にノイズに起因する無意味な認識結
果は除去されているので、正しい文節候補「しゃかいは
」がメニュー表示領域１１の上から２段目に表示されて
いる。したがって、キーボード８からのキー人力によっ
てカーソル１３を文節候補「しゃかいは」の位置に移動
して文節候補「しやかいはＪを選出するのである。こうして文節候補「しやかいは」が選出された状態で、
例えば仮名漢字変換キーを操作すると、メニュー表示領
域１１にメニュー表示された文節候補列の中から文節候
補「しゃかいは」が選択され、入力音声／しゃかいわ／
に対する認識文節「しやかいは」が仮名漢字変換されて
表示領域１２に「日本の社会は」と表示されるのである
。このように、本実施例においては、言語処理部５からの
音節認識結果（すなわち、音節ラティス）を用いて候補
順位の順に配列された文節候補列を生成する。そして、
この生成された文節候補列を表示部７によってメニュー
表示領域１１にメニュー表示する。ココテ、メニュー表示領域ｉｔにメニュー表示された文
節候補列の語頭に当たる位置に、ノイズに起因すると思
われる無意味な認識結果があり、この無意味な認識結果
を除去すべきであるとオペレータが判断した場合には、
キーボード８におけるキーの押圧等によって語頭の認識
結果の除去が指示される。こうして、語頭の認識結果の
除去か指示されると、上述のようにして生成された音節
ラティスにお（」る語頭の音節に対応する音節候補が制
御部６の制御の下に消去される。そして、こうして再編
された音節ラティスに基ついて、再度文節候補列が作成
されて表示部７によってメニュー表示領域１．１にメニ
ュー表示されるのである。したがって、オペレータの息やキーホード８の操作音等
のノイズによって、語頭に無意味な認識結果か付加され
た場合に、キーの押圧等の一動作による簡単な指示で自
動的に除去することがてきる。すわなち、カーソルの位
置を語頭に移動するための煩わしいキー操作や認識結果
の取消／再発声を実施することなく、ノイズの影響の無
い文節候補列を自動的に得ることができるのである。上記実施例においては、入力音声を音節認識部３で音節
単位で認識し、その認識結果に基づいて言語処理部５で
文節候補列を得るようにしている。しかしながら、この発明はこれに限定されるものではな
く、例えば入力音電を音韻単位で認識して単語候補列を
得るようにしてもよい。Hereinafter, the present invention will be explained in detail with reference to illustrated embodiments. FIG. 1 is a block diagram showing one embodiment of the present invention. The audio signal input from the microphone l is sent to the audio analysis unit 2.
A/D conversion is performed by , and feature parameters are extracted for each frame. The time series of feature parameters extracted in this way is cut out into syllables to obtain a feature pattern, and the obtained feature pattern is sent to the syllable recognition unit 3. In the syllable recognition unit 3, the input speech is recognized in units of syllables based on the similarity between the input feature pattern and the syllable standard pattern created in advance, and the recognition result (i.e., syllable lattice) obtained as a result of the recognition is is stored in the recognition result storage section 4. In the language processing section 5, phrase candidates are created from the combinations of syllable lattices found by the syllable recognition section 3, and a grammatically correct phrase candidate string is created using a word dictionary or the like. The phrase candidate string thus obtained is displayed on the display section 7. The keyboard 8 is used to enter various instructions manually, such as an instruction to move a cursor when selecting a correct clause candidate from the list of clause candidates displayed on the display section 7. The control unit 6 also controls the voice analysis unit 2. Syllable recognition unit 3. Language processing section 5, display section 7 and keyboard 8
to perform voice recognition processing, which will be described in detail later. In the present invention, noise such as the operator's breath or the operation sound of the keyboard 8 is input to the microphone l when voice recognition processing is executed under the control of the control unit 6, and When a meaningless recognition result is added to the beginning of a word, this meaningless recognition result is automatically deleted. Hereinafter, the operation of the speech recognition device having the above configuration will be specifically explained, taking as an example the case where the phrase /shakaiwa (society wa)/ is input by voice. The operator speaks into the microphone.
When the human voice starts to be uttered, the voice analysis unit 2 performs an acoustic analysis of the human voice /shakaiwa/ and sequentially outputs the characteristic parameters. Then, the syllable recognition unit 3 performs syllable recognition using the degree of similarity between the feature pattern in which the output feature parameters are cut out into syllables and the syllable standard pattern, and obtains the recognition result as shown in FIG. At this time, the recognition results are output as a syllable lattice and stored in the recognition result storage section 4, since speech recognition includes ambiguity due to erroneous detection of syllable sections, erroneous recognition of syllables, etc. Here, suppose that noise occurs and a meaningless recognition result is added to the beginning of the word just before the operator utters /shakaiwa/ into the microphone l. Therefore, in Figure 2, the syllable string /sha/, /ka/
The recognition results (syllable candidates) for noise are added before the syllable lattice for , /i/, /wa/. Assume that, at the stage where the voice/challenge/manpower has been completed, an instruction is given from the keyboard 8 for the language processing unit 5 to perform language processing. Then, in the language processing section 5, first, the syllable recognition section 3
With reference to the syllable lattice shown in FIG. 2 generated by
A clause candidate /kyusakaiwa/ with the first candidate rank consisting of only kyū/, /sa/, 7'ka/, /i/, and /wa/ is generated and output. Thereafter, the syllable candidates constituting the phrase candidate /kyusakaiwa/ are sequentially replaced with the syllable candidates read from the syllable lattice according to a predetermined rule to create phrase candidates one after another, and the phrase candidates are output in order of cumulative similarity. As a result, a phrase candidate string as shown in FIG. 3 is generated. next,
The language processing unit 5 compares the phrase candidate sequence generated as described above with the contents of a word dictionary (not shown), and rejects grammatically incorrect phrase candidates that are not in the word dictionary. Then, under the control of the control section 6, the display section 7 uses a well-known technique, for example, a window display technique, to display a phrase candidate string consisting of a plurality of phrase candidates generated by the language processing section 5, as shown in FIG. Menus are displayed in the menu display area 11 of the display screen in descending order of candidate ranking. At this time, the display area 12 at the top of the menu display area If displays the cursor among the phrase candidate strings for the already recognized phrase "Japan" and the input speech /Shiyakaiwa/ for which the recognition process is currently being performed. The sentence ``Japanese Shusakaiha'' consisting of the clause candidate ``Shusakaiha'' specified by No. 13 is displayed at the same time. In this case, as mentioned above, the meaningless recognition result (/shu/ or /kiyu/) due to noise is added to the beginning of the word, so the correct phrase candidate ``Kujiyakaiwa'' cannot be obtained. There isn't. However, as can be seen from FIG. 4, the meaningless recognition result "shu" caused by the above noise is removed from the phrase candidate "shushakaiha" displayed in the second row from the top of the menu display area 11. Then, the correct phrase candidate ``
It turns out that Shiyakai can get J. Therefore, in this embodiment, the removal of meaningless recognition results caused by the noise added to the beginning of a word is instructed by one action of pressing one key on the keyboard 8, for example. Then, under the control of the control unit 6, one noise (initial syllable) is selected from among the syllable candidates constituting the syllable lattice related to the input voice /Jakaine/', which is stored in the storage unit.
The syllable lattice is reorganized by erasing the syllable candidates corresponding to . Thereafter, the reorganized syllable lattice from which the syllable candidates corresponding to the noise have been deleted is expanded again to obtain a string of phrase candidates as shown in FIG. Then, the language processing section 5 performs language processing again, and the results of the language processing are displayed as a menu again in the menu display area 11 of the display screen, as shown in FIG. That is, the keyboard 8 constitutes the instruction means, while the control section 6 constitutes the recognition result reorganization means. In the above case, since meaningless recognition results due to noise have already been removed, the correct clause candidate "Shakaiha" is displayed in the second row from the top of the menu display area 11. Therefore, the cursor 13 is moved to the position of the bunsetsu candidate "Shiyakai wa" using the keys from the keyboard 8, and the bunsetsu candidate "Shiyakai wa" is selected. In this way, the bunsetsu candidate "Shiyakai wa" is selected. in a state where
For example, when you operate the kana-kanji conversion key, the phrase candidate "shakaiwa" is selected from the phrase candidate string displayed in the menu display area 11, and the input voice /shakaiwa/
The recognized phrase ``Shiyakaiha'' is converted into kana and kanji characters and displayed in the display area 12 as ``Japanese society wa''. In this manner, in this embodiment, the syllable recognition results (ie, syllable lattice) from the language processing unit 5 are used to generate a string of phrase candidates arranged in the order of candidate ranks. and,
The generated sentence candidate string is displayed as a menu in the menu display area 11 by the display unit 7. Here, there is a meaningless recognition result that seems to be caused by noise at the beginning of the phrase candidate string displayed in the menu display area IT, and the operator has determined that this meaningless recognition result should be removed. If you do,
Removal of the recognition result of the beginning of a word is instructed by pressing a key on the keyboard 8 or the like. In this way, when an instruction is given to remove the recognition result of the beginning of a word, syllable candidates corresponding to the beginning syllable of the word in the syllable lattice generated as described above are deleted under the control of the control unit 6. Then, based on the syllable lattice rearranged in this way, a line of phrase candidates is created again and the menu is displayed in the menu display area 1.1 by the display unit 7. Therefore, the sounds of the operator's breath, the operation of the keyboard 8, etc. When a meaningless recognition result is added to the beginning of a word due to noise, it can be automatically removed with a simple instruction such as pressing a key. It is possible to automatically obtain a string of phrase candidates that are free from the effects of noise without having to perform cumbersome key operations for movement or canceling/re-voicing the recognition results. The recognition unit 3 recognizes each syllable, and the language processing unit 5 obtains a phrase candidate string based on the recognition result.However, the present invention is not limited to this. The word candidate string may be obtained by recognizing each phoneme.

【Effect of the invention】

以上より明らかなように、この発明の音声認識装置は、
言語処理部によって単語単位または文節単位の候補列を
生成する。そして、オペレータが表示部にメニュー表示
された候補列を参照して指示手段を操作し、語頭に付加
された無意味な認識結果を除去することを指示すると、
認識結果再編手段によって、上記認識結果格納部に格納
された語頭の音節に対応する総ての認識結果を消去して
上記認識結果を再編した後、上記言語処理部および表示
部に対して再編後の認識結果に対する処理を指示するよ
うにしたので、上記指示手段を操作するという一動作に
よって、各候補列の語頭に付加されている無意味な認識
結果を自動的に除去できる。したがって、この発明によれば、煩わしいキー操作や認
識結果の取消／再発声を実施することなく、音声入力時
におけるノイズ等に起因して語頭に付加された無意味な
認識結果を自動的に除去できろ。As is clear from the above, the speech recognition device of the present invention is
The language processing unit generates a candidate string for each word or phrase. Then, when the operator refers to the candidate string displayed on the menu on the display and operates the instruction means to instruct that the meaningless recognition result added to the beginning of the word be removed,
After the recognition result reorganization means erases all recognition results corresponding to the initial syllables of words stored in the recognition result storage section and reorganizes the recognition results, the language processing section and display section are configured to display the reorganized results on the language processing section and display section. Since the process for the recognition result is instructed, the meaningless recognition result added to the beginning of each candidate string can be automatically removed by one action of operating the instruction means. Therefore, according to the present invention, meaningless recognition results added to the beginning of words due to noise etc. during voice input are automatically removed without bothersome key operations or canceling/re-speaking the recognition results. You can do it.

[Brief explanation of the drawing]

第１図はこの発明の音声認識装置における一実施例のブ
ロック図、第２図は第１図における音節認識部によって
生成される音節ラティスの一例を示す図、第３図は第１
図における言語処理部によって生成される文節候補列の
一例を示す図、第４図は第１図における表示部によって
メニュー表示される入力音声／しやかいわ／に対する文
節候補列の一例を示す図、第５図は再編後の音節ラティ
スに基づいて生成された文節候補列の一例を示す図、第
６図は第５図に示す文節候補列に基づいてメニュー表示
された文節候補列の一例を示す図である。 ■・・・マイクロボン、　　２・・・音声分析部、３・
・音節認識部、　　　４・・・認識結果格納部、訃・・
言語処理部、　　　６・・制御部、７・表示部、　　　
　　８・・・キーボード、１１　メニュー表示領域、１２・・・表示領域、　　　１３・・・カーソル。第１図第４図第５図入力ノイズ　／しゃ／　／か、／／い／　／わ／類似度　１
位　　きゅ　　さ　　　が　　い　　ゎ順位　２位　　
しゅ　　しや　　が　　リ　　は３位　　しょ　　　　
　　　　　　　　げさかいわしゃかいわさかいは第３図きゅさかいわきゅしつかいわきゅさかいは第６図FIG. 1 is a block diagram of one embodiment of the speech recognition device of the present invention, FIG. 2 is a diagram showing an example of a syllable lattice generated by the syllable recognition unit in FIG. 1, and FIG.
FIG. 4 is a diagram showing an example of a phrase candidate string generated by the language processing unit in FIG. , FIG. 5 is a diagram showing an example of a phrase candidate string generated based on the reorganized syllable lattice, and FIG. 6 is an example of a phrase candidate string displayed in a menu based on the phrase candidate string shown in FIG. FIG. ■...Microbon, 2...Speech analysis department, 3.
・Syllable recognition unit, 4...Recognition result storage unit, 訲...
Language processing unit, 6. Control unit, 7. Display unit,
8... Keyboard, 11 Menu display area, 12... Display area, 13... Cursor. Figure 1 Figure 4 Figure 5 Input noise /sha/ /ka, //i/ /wa/ Similarity 1
Ranking: 2nd place
Shusiya is in 3rd place
Figure 3.

Claims

[Claims]

(1) The recognition unit recognizes the input speech in units of phonemes or syllables based on the feature parameters extracted from the input speech signal by the speech analysis unit, and the correct candidate is selected from the candidate string generated based on the recognition results. A speech recognition device that selects and outputs a recognition result storage unit that stores recognition results obtained by phoneme unit or syllable unit recognition in the recognition unit; a language processing section that performs language processing using the computer to generate the candidate string on a word-by-word or phrase-by-phrase basis; a display section that displays the candidate string generated by the language processing section as a menu; If a meaningless recognition result is added to the beginning of a word in a candidate string, the instruction means is operated by an operator to instruct the removal of the meaningless recognition result; When deletion is instructed by the instruction means, all recognition results corresponding to the initial syllable of the word are deleted from among the recognition results stored in the recognition result storage section, the recognition results are reorganized, and then the recognition results of the language A speech recognition device comprising recognition result reorganization means for instructing a processing section and a display section to perform processing based on the reorganized recognition results.