JPH0459637B2

JPH0459637B2 -

Info

Publication number: JPH0459637B2
Application number: JP58088636A
Authority: JP
Inventors: Masayuki Iida; Hiroki Oonishi; Masanori Myatake
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1983-05-19
Filing date: 1983-05-19
Publication date: 1992-09-22
Also published as: JPS59214100A

Description

【発明の詳細な説明】 (イ) 産業上の利用分野本発明は音声を認識す音声認識装置に関する。[Detailed description of the invention] (b) Industrial application fields The present invention relates to a speech recognition device that recognizes speech.

(ロ) 従来技術従来の音声認識装置の構成を第１図に示す。同
図に於いて、１は音声を電気的な音声信号に変換
するマイクロフオン、２は該音声信号を周波数分
析してスペクトル値の時系列からなる音声パター
ンを得るパターン作成回路、３は該音声パターン
作成回路２から得られる音声パターンを一時的に
貯えるバツフアメモリである。４は登録音声パタ
ーンメモリであり、音声登録モード時に上記バツ
フアメモリ３に貯えられた音声パターンがｔ側に
接続されたモードスイツチSmを介して順次導入
され、複数の登録音声パターンとして格納され
る。５はパターン認識回路であり、上記登録音声
パターンメモリ４の各登録音声パターンに基づい
て、音声認識モード時に上記バツフアメモリ３か
らｎ側に接続されたモードスイツチSmを介して
得られた音声パターンがパターン認識され、最も
類似した登録音声パターンの音声が検出される。(b) Prior Art The configuration of a conventional speech recognition device is shown in FIG. In the figure, 1 is a microphone that converts audio into an electrical audio signal, 2 is a pattern creation circuit that frequency-analyzes the audio signal and obtains an audio pattern consisting of a time series of spectral values, and 3 is the audio signal. This is a buffer memory that temporarily stores voice patterns obtained from the pattern generation circuit 2. Reference numeral 4 denotes a registered voice pattern memory, in which the voice patterns stored in the buffer memory 3 during the voice registration mode are sequentially introduced via the mode switch Sm connected to the t side and stored as a plurality of registered voice patterns. Reference numeral 5 denotes a pattern recognition circuit, in which a voice pattern obtained through a mode switch Sm connected to the n side from the buffer memory 3 during the voice recognition mode is determined based on each registered voice pattern in the registered voice pattern memory 4. The most similar registered speech pattern is detected.

而して、音声登録モード時に、モードスイツチ
Smをｔ側にに接続した状態で、特定話者がマイ
クロフオン１に例えば68の単音節の音声を順次入
力していく事に依つて、これ等の音声パターン化
２されてバツフアメモリ３に貯えられた後登録音
声パターンメモリ４に格納される。そして、音声
認識モード時に、モードスイツチSmをｎ側に接
続した状態で、登録を行なつた特定話者がマイク
ロフオン１に例えば“サ″なる音声パターンが、
登録音声パターンメモリ４の各登録音声パターン
に基づいてパターン認識回路５にてパターン認識
され、“サ″なる登録音声バターンを検出する事
に依つてこの時の入力音声が“サ″であると判定
される。 Therefore, when in voice registration mode, the mode switch
With the Sm connected to the T side, a specific speaker inputs, for example, 68 monosyllables into the microphone 1 in sequence, and these speech patterns 2 are created and stored in the buffer memory 3. After that, it is stored in the registered voice pattern memory 4. Then, in the voice recognition mode, with the mode switch Sm connected to the
The pattern recognition circuit 5 performs pattern recognition based on each registered voice pattern in the registered voice pattern memory 4, and determines that the input voice at this time is "Sa" by detecting the registered voice pattern "Sa". be done.

斯様な従来の不特定話者を対象とした音声認識
装置は、音声認識モードに先だつ音声登録モード
に於いて、マイクロフオン１に登録音声を発声入
力する際に、周囲雑音が混入したり、話者の発音
があいまいであつたりする理由から、特定話者の
正確な登録音声パターンが得られず、この為に音
声認識モード時に話者がマイクロフオン１への音
声入力をくり返しても、この音声の認識が不可能
となり、認識率の低下を来たす欠点があつた。 Such conventional speech recognition devices targeted at unspecified speakers have problems such as ambient noise being mixed in when inputting the registered speech to the microphone 1 in the speech registration mode that precedes the speech recognition mode. Because the speakers' pronunciations are ambiguous and inconsistent, it is not possible to obtain an accurate registered speech pattern for a particular speaker. This method had the disadvantage that speech recognition became impossible and the recognition rate decreased.

(ハ) 発明の目的本発明は斯る点に鑑みて為され、正確な登録音
声パターンを得る事に依つて認識率の低下を防止
した音声認識装置を提供するものである。(c) Object of the Invention The present invention has been made in view of the above points, and an object thereof is to provide a speech recognition device that prevents a decrease in recognition rate by obtaining accurate registered speech patterns.

(ニ) 発明の構成本発明の音声認識装置は特定話者の登録音声パ
ターンを格納する登録音声パターンメモリの他に
不特定話者の標準音声パターンを格納した標準音
声パターンメモリを備え、音声登録モードに於い
て特定話者の音声パターンをこれと同一音声の標
準音声パターンメモリ標準音声パターンと比較
し、この時の比較誤差が許容範囲にある時のみ、
この特定私者の音声パターンを登録音声パターン
メモリに格納するものである。(d) Structure of the Invention The speech recognition device of the present invention includes a registered speech pattern memory that stores registered speech patterns of specific speakers and a standard speech pattern memory that stores standard speech patterns of unspecified speakers. mode, the voice pattern of a specific speaker is compared with the standard voice pattern memory standard voice pattern of the same voice, and only when the comparison error is within the allowable range,
This particular private person's voice pattern is stored in a registered voice pattern memory.

(ホ) 実施例第２図に本発明の音声認識装置の一実施例の構
成を示す。同図に於いて、１〜５は第１図の従来
装置と同一構成のマイクロフオン〜パターン認識
回路を示している。６は標準音声パターンメモリ
であり、不特定話者、即ち多数の話者の音声の特
徴を表わす例えば68単音節の標準音声パターンが
格納されいる。７は比較回路であり音声登録モー
ド時にバツフアメモリ３からモードスイツチSm
を介して得られる入力音声の音声パターンとこれ
と同一音声の上記標準音声パターンメモリ６の標
準音声とを比較し、この比較誤差が許容範囲にあ
る時、許容信号を出力する。８はゲート回路であ
り、上記比較回路７から許容信号を受信した時の
み、該ゲート回路８を開き上記バツフアメモリ３
からモードスイツチSmを介して得られる入力音
声の音声パターンが登録音声パターンとして登録
音声パターンメモリ４に導入され格納される。(E) Embodiment FIG. 2 shows the configuration of an embodiment of the speech recognition device of the present invention. In the figure, numerals 1 to 5 indicate a microphone to pattern recognition circuit having the same configuration as the conventional device shown in FIG. Reference numeral 6 denotes a standard speech pattern memory, which stores standard speech patterns of, for example, 68 monosyllables representing the characteristics of the speech of unspecified speakers, that is, a large number of speakers. 7 is a comparison circuit which converts buffer memory 3 to mode switch Sm in voice registration mode.
The voice pattern of the input voice obtained through the input voice is compared with the standard voice of the same voice stored in the standard voice pattern memory 6, and when the comparison error is within the permissible range, an allowable signal is output. Reference numeral 8 denotes a gate circuit, which opens the gate circuit 8 only when receiving a permission signal from the comparison circuit 7 and transfers the buffer memory 3 to the buffer memory 3.
The voice pattern of the input voice obtained from the mode switch Sm is introduced and stored in the registered voice pattern memory 4 as a registered voice pattern.

次に斯る音声認識装置の動作を示す。先ず、特
定話者が斯る装置を使用するに際し、モードスイ
ツチSmをｔ側に接続して音声登録モードに設定
した状態で、マイクロフオン１に68音節の音声を
順次入力する。例えば“サ″なる音声パターンを
作成してバツフアメモリ３に貯える。従つて、比
較回路７は、この“サ″なるバツフアメモリ３の
音声パターンと標準音声パターンメモリ６の上“
サ″なる音声パターンとを比較する。この時、特
定話者が正確に“サ″なる音声を発声し、そして
周囲雑音の悪影響等がなくて正確な音声パターン
が得られていたとすると、比較回路７での比較誤
差は、特定話者の個人差に依る誤差のみとなり、
少なくとも音声の特徴は一致している事からその
誤差は一定の許容範囲内に収まる事となるので、
比較回路７は許容信号にてゲート回路８を開き“
サ″なる入力音声パターンを登録音声パターンメ
モリ４に格納する。一方、特定話者の発声のあい
まいさ又は周囲雑音の悪影響があり、正確な音声
パターンが得られない場合には、比較回路７での
比較誤差は極端に大きくなり、許容範囲を上回わ
るので、許容信号は得られず、これに依つてゲー
ト回路４は閉じられたままの状態であつて、何ら
かの表示手段（図示せず）にて“サ″なる音声の
再度の発声入力を特定話者に指示する。従つて、
この特定話者は、新ためて発音の良否、又は周囲
雑音の影響を考慮しながらより良い条件でマイク
ロフオン１への“サ″なる音声の発声入力を試る
事となり、正確な音声パターンが得られるまでこ
の発声入力が行なわれる。この結果、特定話者の
68の音節全ての正確な音声パターンが登録音声パ
ターンとして登録音声パターンメモリ４に格納さ
れる事となる。 Next, the operation of such a speech recognition device will be described. First, when a specific speaker uses such a device, he or she connects the mode switch Sm to the t side and sets the voice registration mode, and sequentially inputs 68 syllables into the microphone 1. For example, a voice pattern called "sa" is created and stored in the buffer memory 3. Therefore, the comparison circuit 7 compares the voice pattern of the buffer memory 3, which is the "sa", with the upper part of the standard voice pattern memory 6.
At this time, if the specific speaker accurately utters the sound "sa" and an accurate speech pattern is obtained without the adverse effects of ambient noise, the comparison circuit The comparison error in 7 is only the error due to individual differences among specific speakers,
At least the characteristics of the voices are the same, so the error will be within a certain tolerance range, so
The comparator circuit 7 opens the gate circuit 8 with the permission signal.
On the other hand, if an accurate voice pattern cannot be obtained due to ambiguity in the utterance of a specific speaker or the negative influence of ambient noise, the comparison circuit 7 Since the comparison error becomes extremely large and exceeds the permissible range, a permissible signal cannot be obtained, so that the gate circuit 4 remains closed and some display means (not shown) is used. Instructs the specific speaker to input the sound "sa" again.Therefore,
This specific speaker will try inputting the voice "sa" into microphone 1 under better conditions, taking into consideration the quality of pronunciation and the influence of ambient noise, and will be able to obtain an accurate speech pattern. This voice input is performed until the voice is obtained. As a result, a particular speaker's
Accurate speech patterns for all 68 syllables will be stored in the registered speech pattern memory 4 as registered speech patterns.

次の音声認識モードに於いては、従来装置と同
様にモードスイツチSmをｎ側に接続して、特定
話者がマイクロフオン１に任意の音声を発声入力
してパターン化２し、この音声パターンがバツフ
アメモリ３に貯えられた状態で上述の如く音声登
録モードにて格納された登録音声パターンメモリ
４の正確な68音節の各登録音声パターンに基づい
てパターン認識される。従つて、バツフアメモリ
３の音声パターンが例えば“サ″なる音声の特徴
を正確に示している登録音声パターンであるとパ
ターン認識されたなら、この時の入力音声は非常
に高い確率で、即ち高い認識率で“サ″であると
判定される。 In the next voice recognition mode, mode switch Sm is connected to the n side in the same way as in the conventional device, and a specific speaker inputs an arbitrary voice into microphone 1 to create a pattern 2, and this voice pattern is is stored in the buffer memory 3, and pattern recognition is performed based on each registered speech pattern of accurate 68 syllables in the registered speech pattern memory 4 stored in the speech registration mode as described above. Therefore, if the voice pattern in the buffer memory 3 is recognized as a registered voice pattern that accurately shows the characteristics of the voice "sa", for example, the input voice at this time has a very high probability, that is, it is highly recognized. It is determined that it is “Sa” based on the percentage.

(ヘ) 発明の効果本発明の音声認識装置は、以上の説明から明ら
かな如く、音声登録の際に、特定話者の登録すべ
き音声パターンをこれと同一の不特定話者の音声
パターンと比較してこの比較誤差が許容範囲内で
ある時のみ登録音声パターンとして使用するもの
であるので、特定話者の発音のあいまいさ及び周
囲雑音の悪影響のない正確な登録音声パターンを
得ることができる。これに依つて、音声認識の際
にこれ等登録音声パターンに基づいて行なわれる
パターン認識処理の精度が上がり、大巾な認識率
の向上が望める。(f) Effects of the Invention As is clear from the above description, the speech recognition device of the present invention, when registering speech, can distinguish between the speech pattern of a specific speaker to be registered and the same speech pattern of an unspecified speaker. Since it is used as a registered speech pattern only when the comparison error is within an allowable range, it is possible to obtain an accurate registered speech pattern that is free from ambiguity in the pronunciation of a specific speaker and the negative effects of ambient noise. . This increases the precision of pattern recognition processing performed based on these registered speech patterns during speech recognition, and can be expected to greatly improve the recognition rate.

[Brief explanation of the drawing]

第１図は従来の音声認識装置の構成を示すブロ
ツク図、第２図は本発明の音声認識装置の一実施
例の構成を示すブロツク図であり、２はパターン
作成回路、４は登録音声パターンメモリ、５はパ
ターン認識回路、６は標準音声パターンメモリ、
７は比較回路を夫々示している。 FIG. 1 is a block diagram showing the structure of a conventional speech recognition device, and FIG. 2 is a block diagram showing the structure of an embodiment of the speech recognition device of the present invention, where 2 is a pattern generation circuit, and 4 is a registered speech pattern. Memory, 5 is a pattern recognition circuit, 6 is a standard voice pattern memory,
Reference numeral 7 indicates a comparison circuit.

Claims

[Claims]

1 Create a voice pattern that represents the characteristics of a specific speaker's voice based on the voice of the specific speaker, and recognize this voice pattern based on registered voice patterns of the specific speaker that have been stored in advance. In addition to the registered voice pattern memory for storing registered voice patterns of specific speakers, the voice recognition device has a standard voice pattern memory that stores standard voice patterns representing voice characteristics of unspecified speakers. In preparation, when storing the registered voice pattern of a specific speaker in the registered voice pattern memory in advance, the voice pattern of the specific speaker and the same voice of an unspecified speaker stored in the standard voice pattern memory are stored in advance. A speech recognition device that compares a speech pattern of a specific speaker with a standard speech pattern and stores the speech pattern of a specific speaker in a registered speech pattern memory only when the comparison error is within an allowable range.