JPH0228159B2

JPH0228159B2 -

Info

Publication number: JPH0228159B2
Application number: JP57100012A
Authority: JP
Inventors: Yasuhiko Shichino
Original assignee: PFU Ltd
Current assignee: PFU Ltd
Priority date: 1982-06-11
Filing date: 1982-06-11
Publication date: 1990-06-21
Also published as: JPS58216298A

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は、音声単語認識装置の認識状況の応答
確認方式に係り、特に発声者に可聴音で応答する
場合において認識の状況に応じて可聴音の周波数
や継続時間を変化させると共に、特殊効果を付加
するようにした音声認識装置の応答確認方式に関
するものである。[Detailed Description of the Invention] [Technical Field of the Invention] The present invention relates to a response confirmation method for the recognition status of a speech word recognition device, and in particular, when responding to a speaker with an audible sound, an audible sound is generated according to the recognition status. The present invention relates to a response confirmation method for a speech recognition device in which the frequency and duration of the speech are changed and special effects are added.

[Prior art and problems]

第１図は従来の音声単語認識装置の応答確認方
式の１例を示すものであつて、１は電子計算機、
２は音声単語認識装置、３は音声合成装置、４は
イヤホーン、５はマイクロフオンをそれぞれ示し
ている。発声者の発声した単語はマイクロフオン
５を介して音声単語認識装置２に入力され、音声
認識される。電子計算機１は音声認識結果に従つ
て音声合成装置３を制御する。音声合成装置３は
音声認識結果を音声で出力し、この出力はイヤホ
ーン４を介して発声音に伝えられる。 FIG. 1 shows an example of a response confirmation method of a conventional speech word recognition device, in which 1 is an electronic computer;
Reference numeral 2 indicates a speech word recognition device, 3 a speech synthesis device, 4 an earphone, and 5 a microphone. The words uttered by the speaker are input to the voice word recognition device 2 via the microphone 5 and are voice recognized. The computer 1 controls the speech synthesizer 3 according to the speech recognition result. The speech synthesis device 3 outputs the speech recognition result in the form of speech, and this output is transmitted to the vocalized sound via the earphone 4.

第２図は従来の音声単語認識装置の応答確認方
式の他例を示すものであつて、６はデイスプレイ
装置を示している。なお、第１図と同一符号は同
一物を示している。発声音の発音した単語はマイ
クロフオン５を介して音声単語認識装置２に入力
される。音声認識結果は電子計算機１に入力さ
れ、電子計算機は音声認識結果に従つてデイスプ
レイ装置６を制御する。デイスプレイ装置６に
は、音声認識の結果が表示される。 FIG. 2 shows another example of the response confirmation method of the conventional speech word recognition device, and 6 indicates a display device. Note that the same reference numerals as in FIG. 1 indicate the same parts. The words pronounced by the vocalizations are input to the speech word recognition device 2 via the microphone 5. The voice recognition result is input to the computer 1, and the computer controls the display device 6 according to the voice recognition result. The display device 6 displays the voice recognition results.

第１図および第２図の従来例の欠点は、いずれ
も正常な入力の場合でも無駄な時間を要すること
であり、特に前者は複雑で高価であるという欠点
を有しており、後者は入力毎に発声音の視線がデ
イスプレイ装置に移るという欠点を有している。 The disadvantages of the conventional examples shown in FIGS. 1 and 2 are that they both require wasted time even when the input is normal, and the former has the disadvantages of being complicated and expensive, and the latter It has the disadvantage that the line of sight of the vocalization moves to the display device each time.

[Purpose of the invention]

本発明は、上記の欠点を除去するものであつて
構成が簡単で安価であると共に応答確認を効率良
く行い得るようにした音声単語認識装置の応答確
認方式を提供することを目的としている。 SUMMARY OF THE INVENTION It is an object of the present invention to provide a response confirmation method for a speech word recognition device that eliminates the above-mentioned drawbacks, has a simple and inexpensive structure, and is capable of efficiently confirming responses.

[Structure of the invention]

そしてそのため、本発明の音声単語認識装置の
応答確認方式は、登録パターン・メモリに格納さ
れている登録単語の標準パターンと音声単語入力
についての認識パターンとを比較し上記認識パタ
ーンと標準パターンとの間の類似度を示す数値を
生成する音声単語認識装置、該音声単語認識装置
の認識状況を可聴音信号によつて発声者に伝える
応答確認装置、並びに上記音声単語認識装置によ
つて得られる最大類似度の数値、最大類似度の数
値と次位類似度の類値の差およびエラー発生の有
無を示すエラー情報を上記応答確認装置へ通知す
る電子計算機を備える音声単語認識装置の応答確
認方式であつて、上記応答確認装置は、可変周波
数の可聴音信号を生成するシンセサイザ、設定時
間だけ上記シンセサイザの生成する可聴音信号を
通すタイマ、該タイマの出力する可聴音信号に対
して新たな効果を付加する効果付加回路、並びに
上記電子計算機から送られて来る最大類似度の数
値、最大類似度の数値と次位類似度の数値の差お
よびエラー情報に従つて上記シンセサイザ、タイ
マおよび効果付加回路の中の１個又は複数個の装
置を制御するコントローラを有していることを特
徴とするものである。 Therefore, the response confirmation method of the speech word recognition device of the present invention compares the standard pattern of registered words stored in the registered pattern memory with the recognition pattern for the speech word input, and compares the recognition pattern with the standard pattern. a voice word recognition device that generates a numerical value indicating the degree of similarity between words; a response confirmation device that conveys the recognition status of the voice word recognition device to the speaker by an audible signal; and a maximum value obtained by the voice word recognition device. A response confirmation method for a spoken word recognition device comprising a computer that notifies the response confirmation device of a numerical value of similarity, a difference between a numerical value of maximum similarity and a similar value of next-order similarity, and error information indicating the presence or absence of an error. The response confirmation device includes a synthesizer that generates an audible sound signal with a variable frequency, a timer that passes the audible sound signal generated by the synthesizer for a set time, and a new effect on the audible sound signal output from the timer. of the synthesizer, timer, and effect adding circuit according to the effect adding circuit to be added, the maximum similarity value sent from the computer, the difference between the maximum similarity value and the next similarity value, and error information. It is characterized in that it has a controller that controls one or more devices therein.

[Embodiments of the invention]

以下、本発明を図面を参照しつつ説明する。 Hereinafter, the present invention will be explained with reference to the drawings.

第３図は本発明で使用される音声単語認識装置
の１例を示す図、第４図は本発明の１実施例を示
す図、第５図は応答確認に用いられる可聴音信号
の周波数および継続時間を説明するための図であ
る。 FIG. 3 is a diagram showing an example of the speech word recognition device used in the present invention, FIG. 4 is a diagram showing one embodiment of the present invention, and FIG. 5 is a diagram showing the frequency and frequency of the audible sound signal used for response confirmation. FIG. 3 is a diagram for explaining duration time.

第３図において、１１は自動ボリユーム・コン
トローラ、１２はバンドパス・フイルタ郡から成
るスペクトラム分析部、１３はマルチプレクサ、
１４はＡ―Ｄ変換器、１５はサンプリング回路、
１６は中央処理装置、１７はプログラム・メモ
リ、１８は登録パターン・メモリをそれぞれ示し
ている。 In FIG. 3, 11 is an automatic volume controller, 12 is a spectrum analysis section consisting of a group of bandpass filters, 13 is a multiplexer,
14 is an A-D converter, 15 is a sampling circuit,
Reference numeral 16 indicates a central processing unit, 17 a program memory, and 18 a registered pattern memory.

自動ボリユーム・コントローラ１１は、音声入
力の大きさを一定に制御するものである。自動ボ
リユーム・コントローラ１１の出力は、スペクト
ラム分析部１２によつて分析される。スペクトラ
分析部１２から出力される複数の出力信号は、マ
ルチプレクサ１３によつて選択され、マルチプレ
クサ１３の出力はＡ―Ｄ変換器１４によつてデイ
ジタル信号に変換される。Ａ―Ｄ変換器１４の出
力はサンプリング回路１５によつて時間サンプリ
ングされる。音声認識の認識パターンは、例えばというものであり、上記マトリツクスにおいて列
はサンプリング時刻に対応し、行は周波数帯域に
対応している。音声入力の認識パターンは、例え
ばプログラム・メモリ１７の作業域に格納され
る。プログラム・メモリ１７の中には、プログラ
ムも格納されていることは言うまでもない。登録
パターン・メモリ１８には、単語の標準パターン
が格納されている。音声入力の認識パターンは、
登録パターン・メモリの中の標準パターンと逐次
比較され、各登録語との距離が求められる。距離
の最大設定値と、登録語との距離の差を求め、こ
れをスコアとする。スコアの大なる程、正常認識
の可能性が大きい。スコアが最大になる登録語
と、２位の登録語を選び出す。この場合、第１位
と第２位の差が大なるもの程、正常認識の可能性
がある。 The automatic volume controller 11 controls the volume of audio input to a constant level. The output of automatic volume controller 11 is analyzed by spectrum analyzer 12. A plurality of output signals outputted from the spectrum analysis section 12 are selected by a multiplexer 13, and the output of the multiplexer 13 is converted into a digital signal by an AD converter 14. The output of the AD converter 14 is time sampled by a sampling circuit 15. The recognition pattern for speech recognition is, for example, In the above matrix, columns correspond to sampling times, and rows correspond to frequency bands. The recognition pattern of the voice input is stored in a working area of the program memory 17, for example. Needless to say, programs are also stored in the program memory 17. The registered pattern memory 18 stores standard patterns of words. The recognition pattern for voice input is
It is successively compared with the standard pattern in the registered pattern memory, and the distance to each registered word is determined. The difference in distance between the maximum distance setting value and the registered word is determined, and this is used as a score. The higher the score, the greater the possibility of normal recognition. Select the registered word with the maximum score and the registered word in second place. In this case, the greater the difference between the first and second place, the higher the possibility of normal recognition.

第４図は本発明の１実施例のブロツク図であ
る。第４図において、１９はコントローラ、２０
はシンセサイザ、２１はタイマ、２２は効果付加
回路、２３は増幅器をそれぞれ示している。 FIG. 4 is a block diagram of one embodiment of the present invention. In FIG. 4, 19 is a controller, 20
21 is a synthesizer, 21 is a timer, 22 is an effect adding circuit, and 23 is an amplifier.

コントローラ１９は、電子計算機から送られて
来るデータに従つてシンセサイザ２０、タイマ２
１および効果付加回路２２を制御する。シンセサ
イザ２０は、可変周波数発振器であつてコントロ
ーラ１９からの制御指令に従つてその発振周波数
を変化する。タイマ２１は、コントローラ１９か
らの制御指令に従つて、指定された期間だけシン
セサイザ２０の出力信号を通す。効果付加回路２
２は、コントローラ１９の制御指令に従つて入力
信号を断続する。効果付加回路２２の出力は増幅
器２３で増幅され、スペーカ又はイヤホーンに送
られる。 The controller 19 operates a synthesizer 20 and a timer 2 according to data sent from the computer.
1 and the effect adding circuit 22. The synthesizer 20 is a variable frequency oscillator, and changes its oscillation frequency according to control commands from the controller 19. The timer 21 passes the output signal of the synthesizer 20 for a specified period according to a control command from the controller 19. Effect addition circuit 2
2 intermittents the input signal according to a control command from the controller 19. The output of the effect adding circuit 22 is amplified by an amplifier 23 and sent to a speaker or earphone.

第５図は応答確認に用いられる可聴音信号の周
波数および継続時間を説明するものである。第３
図の音声単語認識装置の認識状況は電子計算機
（図示せず）に送られる。電子計算機は、入力さ
れた単語についての最大スコア、最大スコアと次
位スコアの差、文法もしくはソフトウエア処理の
エラー有無を示すエラー情報をコントローラ１９
に送る。コントローラ１９は、第５図に示される
グラフに従つてシンセサイザ２０及びタイマ２１
を制御する。第５図において、I₁は最大スコア、
I₂は次位スコア、Ｌは最大設定値をそれぞれ示し
ている。第５図から判るように、最大スコアI₁が
大い程、可聴音信号の周波数は大きくされ、ま
た、最大スコアと次位スコアの差が小さい程、可
聴音信号の継続時間は長くなる。図示の例では、
最大スコアI₁が最大設定値Ｌに近い場合には可聴
音信号の周波数は1000Hzとされ、最大スコアI₁が
零に近い場合には可聴音信号の周波数は300Hzと
されている。また、図示の例では、最大スコアI₁
と次位スコアI₂の差が零に近い場合には可聴音信
号の継続時間は１秒とされ、上記の差が最大設定
値Ｌに近い場合には可聴音信号の継続時は0.1秒
とされる。電子計算機から送られて来るエラー情
報がエラー有りを示している場合には、コントロ
ーラ１９は効果付加回路２２を動作させる。動作
を開始すると、効果付加回路２２は、入力信号を
断続する。この断続周期は、例えば20Hzである。
効果付加回路２２で以て可聴音信号を断続させる
代りに、シンセサイザ２０をバイブレート・コン
トロールすることも出来る。 FIG. 5 illustrates the frequency and duration of the audible signal used for response confirmation. Third
The recognition status of the speech word recognition device shown in the figure is sent to an electronic computer (not shown). The electronic computer sends error information to the controller 19 indicating the maximum score for the input word, the difference between the maximum score and the next score, and whether there is an error in grammar or software processing.
send to The controller 19 controls the synthesizer 20 and the timer 21 according to the graph shown in FIG.
control. In Figure 5, I ₁ is the maximum score,
I ₂ indicates the runner-up score, and L indicates the maximum setting value. As can be seen from FIG. 5, the greater the maximum score _I1 , the greater the frequency of the audible signal, and the smaller the difference between the maximum score and the next score, the longer the duration of the audible signal. In the illustrated example,
When the maximum score I ₁ is close to the maximum setting value L, the frequency of the audible sound signal is 1000 Hz, and when the maximum score I ₁ is close to zero, the frequency of the audible sound signal is 300 Hz. Also, in the illustrated example, the maximum score I ₁
If the difference between and the next score _I2 is close to zero, the duration of the audible sound signal is set to 1 second, and if the above difference is close to the maximum setting value L, the duration of the audible sound signal is set to 0.1 seconds. be done. If the error information sent from the computer indicates that there is an error, the controller 19 operates the effect adding circuit 22. When the operation starts, the effect adding circuit 22 intermittents the input signal. This intermittent cycle is, for example, 20Hz.
Instead of using the effect adding circuit 22 to intermittent the audible sound signal, the synthesizer 20 can also be controlled to vibrate.

第２図の従来の装置と本発明の装置とを組合わ
せ、応答確認をより正しく、効果的に行うことも
可能である。この場合には発声者は下記のような
オペレーシヨンを行う。 It is also possible to perform response confirmation more correctly and effectively by combining the conventional device shown in FIG. 2 with the device of the present invention. In this case, the speaker performs the following operation.

(イ) 応答確認用の可聴音が高く短かい場合には、
入力が正常に行われたと判断して、次に進む。(b) If the audible tone for response confirmation is high and short,
Determine that the input was successful and proceed to the next step.

(ロ) 応答確認用の可聴音が断続する場合には、入
力が正常に行われなかつたと判断し、再度前回
と同一の入力を行う。(b) If the audible tone for response confirmation is intermittent, it is determined that the input was not performed correctly, and the same input as the previous one is performed again.

(ハ) 応答確認用の可聴音が低く長い場合には、デ
イスプレイ装置に視線を移し、正常に入力が行
われていれば次に進み、異常であれば異常処理
のための再度同一の入力を行う。(c) If the audible tone for response confirmation is low and long, look at the display device, and if the input is successful, proceed to the next step; if there is an error, enter the same input again to correct the error. conduct.

〔Effect of the invention〕

以上の説明から明らかなように、本発明によれ
ば、音声単語入力の認識状況を簡単な回路によつ
て発声者に知らせることが可能であり、また、正
常入力の場合は短音により効率よく、異常の場合
にはそれに応じて長い音を発声音にフイードバツ
クすることが出来るので、音声単語認識装置の使
用効率を向上させることが出来る。音声単語認識
装置は、通常95％ないし99％の認識を行い得るも
のであり、誤りが極めて少ない場合に適用するこ
との意義が大きい。もし、50％位の認識しか行え
ないとなると本発明の意義は小さい。 As is clear from the above explanation, according to the present invention, it is possible to notify the speaker of the recognition status of the spoken word input using a simple circuit, and in the case of normal input, it is possible to efficiently inform the speaker using a short sound. In the case of an abnormality, a long sound can be fed back to the vocalization sound accordingly, so that the efficiency of use of the speech word recognition device can be improved. Spoken word recognition devices can usually perform 95% to 99% recognition, and are of great significance when applied in cases where there are very few errors. If only about 50% recognition could be performed, the present invention would have little significance.

[Brief explanation of drawings]

第１図および第２図は従来の音声単語認識装置
の応答確認方式を示す図、第３図は本発明で使用
される音声単語認識装置の１例を示す図、第４図
は本発明の１実施例を示す図、第５図は応答確認
に用いられる可聴音信号の周波数および継続時間
を説明するための図である。１１…自動ボリユーム・コントローラ、１２…
バンドパス・フイルタ群から成るスペクトラム分
析部、１３…マルチプレクサ、１４…Ａ―Ｄ変換
器、１５…サンプリング回路、１６…中央処理装
置、１７…プログラム・メモリ、１８…登録パタ
ーン・メモリ、１９…コントローラ、２０…シン
セサイザ、２１…タイマ、２２…効果付加回路、
２３…増幅器。 1 and 2 are diagrams showing a response confirmation method of a conventional spoken word recognition device, FIG. 3 is a diagram showing an example of a spoken word recognition device used in the present invention, and FIG. 4 is a diagram showing an example of a spoken word recognition device used in the present invention. FIG. 5, which is a diagram showing one embodiment, is a diagram for explaining the frequency and duration of an audible sound signal used for response confirmation. 11... automatic volume controller, 12...
Spectrum analysis section consisting of a group of bandpass filters, 13...Multiplexer, 14...A-D converter, 15...Sampling circuit, 16...Central processing unit, 17...Program memory, 18...Registered pattern memory, 19...Controller , 20...Synthesizer, 21...Timer, 22...Effect addition circuit,
23...Amplifier.

Claims

[Claims]

1. A spoken word recognition device that compares a standard pattern of registered words stored in a registered pattern memory with a recognized pattern for inputted spoken words and generates a numerical value indicating the degree of similarity between the recognized pattern and the standard pattern; a response confirmation device that conveys the recognition status of the speech word recognition device to the speaker by an audible sound signal; a maximum similarity value obtained by the speech word recognition device; the maximum similarity value and the next-order similarity; This is a response confirmation method for a voice word recognition device that includes a computer that notifies the response confirmation device of error information indicating the difference between numerical values and the presence or absence of an error, the response confirmation device transmitting an audible sound signal of variable frequency. a timer that passes the audible sound signal generated by the synthesizer for a set time, an effect adding circuit that adds a new effect to the audible sound signal output by the timer, and a maximum output signal sent from the computer. a controller for controlling one or more devices among the synthesizer, timer, and effect adding circuit according to a numerical value of similarity, a difference between a numerical value of maximum similarity and a numerical value of next-order similarity, and error information; A response confirmation method for a voice word recognition device characterized by: