JPS60250400A

JPS60250400A - voice recognition device

Info

Publication number: JPS60250400A
Application number: JP59107795A
Authority: JP
Inventors: 武志則松; 藤恵　英樹
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1984-05-28
Filing date: 1984-05-28
Publication date: 1985-12-11
Also published as: JPH0570159B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】産業上の利用分野本発明は、特定話者を対象とした主として登録型単語を
識識する音声認識装置に関する。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a speech recognition device that mainly recognizes registered words for a specific speaker.

従来例の構成とその問題点特定話者を対象とした登録型単語を認識する音声認識装
置では、あらかじめ登録単語の特徴ベクトルの時系列を
標準バタンとしてメモリに記憶しておき、入力音声バタ
ンと各標準バタンとの間でバタンマツチングを行ない、
最も類似度の高いものを認識結果とする方法が一般に行
なわれている。Conventional configuration and its problems In a speech recognition device that recognizes registered words for a specific speaker, a time series of feature vectors of registered words is stored in memory as a standard button in advance, and the input speech button is used as a standard button. Performs batan matching with each standard batan,
A method is generally used in which the recognition result is the one with the highest degree of similarity.

しかし、同一話者が同一単語を発声しても、時間の経過
と共にスペクトルバタンは変化しているため、上記の音
声認識装置を長時間に渡って使用する場合には、認識性
能の低下を生じる原因になるという問題点を有していた
。However, even if the same speaker utters the same word, the spectral pattern changes over time, so if the above speech recognition device is used for a long time, recognition performance will deteriorate. This has the problem of causing the problem.

発明の目的本発明は上記の従来の問題点を解消するもので、認識処
理の結果が十分に信頼できる時に、入力音声バタンを標
準バタンとして自動的に変更することにより話者の発声
の時間的変動に対応できる音声認識装置を提供すること
を目的とする。OBJECT OF THE INVENTION The present invention solves the above-mentioned conventional problems by automatically changing the input voice button to a standard button when the recognition processing result is sufficiently reliable. The purpose of the present invention is to provide a speech recognition device that can respond to fluctuations.

発明の構成本発明は、音声の特徴ベクトルを抽出する音声分析手段
と、音声のエネルギー包絡線上の主要なピークを検出す
るエネルギーピーク検出手段と、登録音声の特徴ベクト
ルの時系列を標準バタンとして記憶する記憶手段と、登
録音声バタンと入力音声バタンとのマツチングにより、
認識候補音声を導き出す認識手段と、入力音声と認識候
補音声の音声長の比較を行なう音声長比較手段を有し、
認識手段とエネルギービーク検出手段と音声長比較手段
の結果によシ、認識候補音声が十分に正確である時に、
入力音声バタンを標準ノくタンとして採用し、標準バタ
ンの一部を入れ換えるように構成した音声認識装置であ
る。Structure of the Invention The present invention includes a voice analysis means for extracting voice feature vectors, an energy peak detection means for detecting main peaks on the voice energy envelope, and a time series of registered voice feature vectors that is stored as a standard beat. By matching the registered voice button with the input voice button,
It has a recognition means for deriving recognition candidate speech, and a speech length comparison means for comparing the speech length of the input speech and the recognition candidate speech,
When the recognition candidate speech is sufficiently accurate according to the results of the recognition means, energy peak detection means, and speech length comparison means,
This is a voice recognition device that uses the input voice button as a standard button and replaces a part of the standard button.

実施例の説明第１図は、本発明の一実施例における音声認識装置を示
すブロック図である。第１図において１は音声入力部で
、話者からの音声がマイクロホン等を通して入力される
。２は音声分析手段で、入力された音声信号から特徴ベ
クトルを抽出する。DESCRIPTION OF THE EMBODIMENTS FIG. 1 is a block diagram showing a speech recognition device according to an embodiment of the present invention. In FIG. 1, reference numeral 1 denotes a voice input section, into which voice from a speaker is input through a microphone or the like. 2 is a voice analysis means that extracts a feature vector from the input voice signal.

３はエネルギービーク検出手段で、入力音声のエネルギ
ー包絡線上から主要なピークを検出する。Reference numeral 3 denotes an energy peak detection means that detects major peaks on the energy envelope of the input voice.

４は認識手段で、記憶手段７に蓄えられた標準バタンと
入力音声バタンとの間でバタンマツチングを行なう。５
は入力音声と認識手段４で得られた認識候補音声の時間
長の比較を行なう時間長比較手段、６は入力音声バタン
と標準バタンとの入れ換えを行なうバタン入れ換え手段
である。Reference numeral 4 denotes a recognition means that performs bang matching between the standard bangs stored in the storage means 7 and the input voice bangs. 5
Reference numeral 6 denotes a time length comparing means for comparing the time lengths of the input speech and the recognition candidate speech obtained by the recognition means 4, and 6 is a bang exchanging means for exchanging the input speech bang and the standard bang.

第２図は本実施例の構成を示す回路図である。FIG. 2 is a circuit diagram showing the configuration of this embodiment.

１０は第１図のエネルギービーク検出手段３、認識手段
４、音声長比較手段５、パタヘン入れ換え手段６、記憶
手段７の各手段を実現するためのマイクロコンピュータ
で、認識対象となる単語群の特徴パラメータの時系列及
び、エネルギービーク検出手段３により得られた各単語
のピーク数を記憶する記憶部１２と、入力音声と標準バ
タンのノくタンマツチング、及び標準バタン入れ換えの
だめの判定を行なう演算制御部１３、及び入力部１１、
出力部１４により等測的に構成されている。10 is a microcomputer for realizing each of the energy peak detection means 3, the recognition means 4, the voice length comparison means 5, the pattern exchange means 6, and the storage means 7 shown in FIG. A storage unit 12 that stores the time series of parameters and the peak number of each word obtained by the energy peak detection means 3, and an arithmetic control unit that performs matching of the input voice and the standard button, and determines whether or not to replace the standard button. 13, and input section 11,
The output unit 14 is configured isometrically.

８は音声入力を行々うマイクロホン、９はマイクロホン
８から入力された音声信号をアナログ−ディジタル変換
し特徴パラメータを抽出するアナログ−ディジタル変換
器（以下Ａ　／　Ｄ変換器という）である。１５は認識
登録処理を開始させるスイッチ、１６は認識候補音声を
表示する認識結果表示器である。8 is a microphone for inputting audio, and 9 is an analog-to-digital converter (hereinafter referred to as an A/D converter) for converting the audio signal input from the microphone 8 into analog-to-digital conversion and extracting characteristic parameters. 15 is a switch for starting the recognition registration process, and 16 is a recognition result display for displaying recognition candidate voices.

第３図は本実施例のマイクロコンピュータの動作を説明
するための要部のフローチャートである。FIG. 3 is a flow chart of essential parts for explaining the operation of the microcomputer of this embodiment.

以上の構成による本実施例の動作を、各登録単語の標準
バタンを３個ずつ用意した場合について、詳細に説明す
る。The operation of this embodiment with the above configuration will be described in detail for the case where three standard buttons for each registered word are prepared.

本実施例による音声認識装置は、まずスイッチ１５を押
すことによシ、音声入力の待ち状態となる。音声が入力
されると、ステップ１７により、音声信号の入力処理を
行なう。これはＡ／Ｄ変換器９によシ、アナログ−ディ
ジタル変換された音声信号がマイクロコンピュータ１ｏ
に入力され、演算制御部１３で特徴ベクトルを抽出し、
この特徴ベクトルの時系列を記憶部１２に記憶する。こ
の後、ステップ１８により、入力音声のエネルギー包絡
線上の主要なピークを検出し、ピーク数を記憶部１２に
記憶する。ステップ１８のエネルギーピーク数の検出は
、できるだけ母音部分のエネルギーの高い所のピークを
検出するために、低周波側（例えばＩＫＨｚ以下）の特
徴ベクトルの総和をエネルギー値と見なし、このエネル
ギー包絡線上のディップとピーク部分をまず検出する。The voice recognition device according to this embodiment first enters a voice input waiting state by pressing the switch 15. When audio is input, in step 17, audio signal input processing is performed. This is done by the A/D converter 9, and the analog-to-digital converted audio signal is sent to the microcomputer 1o.
is inputted to , the arithmetic control unit 13 extracts a feature vector,
The time series of this feature vector is stored in the storage unit 12. Thereafter, in step 18, main peaks on the energy envelope of the input voice are detected and the number of peaks is stored in the storage unit 12. In the detection of the number of energy peaks in step 18, in order to detect peaks with high energy in the vowel part as much as possible, the sum of feature vectors on the low frequency side (for example, below IKHz) is regarded as the energy value, and the sum of the feature vectors on the energy envelope is Dip and peak parts are first detected.

このディップ部分とピーク部分のエネルギーの差が、閾
値以上である時、これを１つのピークと見なし、全体の
エネルギー包絡線上のピーク数を計数していく。このピ
ーク検出が終了すると、ステップ１９によシ、記憶部１
２に蓄えられた標準バタンと、入力音声バタンとの間で
バタンマツチング処理を行ない、マツチング距離の小さ
なものから順に、第１．第２候補の単語を選択する。When the difference in energy between the dip portion and the peak portion is equal to or greater than a threshold value, this is regarded as one peak, and the number of peaks on the entire energy envelope is counted. When this peak detection is completed, in step 19, the storage unit 1
2, the standard drums stored in 1.2 are matched with the input voice drums, and the 1st and 1st. Select the second candidate word.

ステップ２ｏでは、選択された第１候補、第２候補単語
が同じ単語であるか調べ、同じであれば、さらにステッ
プ２１に進み、第１候補、第２候補と入力音声との距離
差が閾値以下であれば、ステップ２２で、認識結果とし
て認識候補単語等を認識結果表示器１６で表示する。ス
テップ２０．２１の条件を満たさなければ、パタンマツ
チングの結果は正確でないと判断しステップ１７に戻る
。In step 2o, it is checked whether the selected first candidate word and second candidate word are the same word, and if they are the same, the process proceeds to step 21, and the distance difference between the first candidate word, the second candidate word and the input speech is determined as a threshold value. If it is below, in step 22, recognition candidate words and the like are displayed on the recognition result display 16 as a recognition result. If the conditions in steps 20 and 21 are not satisfied, it is determined that the pattern matching result is not accurate and the process returns to step 17.

認識結果が得られると、現在の入力音声パタンを標準バ
タンとして採用するために、入力音声が認識結果と同一
であるかを、以下に示す処理により判断する。When the recognition result is obtained, in order to adopt the current input voice pattern as the standard button, it is determined whether the input voice is the same as the recognition result by the process described below.

まず、ステップ２３で入力音声と第１候補のエネルギー
ピーク数を比較し、同じであれば、ステップ２４で入力
音声と第２候補のピーク数を比較する。ピーク数が同じ
であれば、入力音声のエネルギーパタンは認識結果のも
のと非常に類似していると見なし次のステップに進む。First, in step 23, the number of energy peaks of the input voice and the first candidate are compared, and if they are the same, the number of peaks of the input voice and the second candidate are compared in step 24. If the number of peaks is the same, it is assumed that the energy pattern of the input speech is very similar to that of the recognition result, and the process proceeds to the next step.

ステップ２５では、入力音声と第１候補との音声長を比
較する。これは、入力音声と第１候補の音声長比りを計
算し、Ｄが、１−ａ（Ｄ（１＋ａ　（０（ａ（１）の範囲に入っているかを調べる。この条件を満たしてい
ると、さらにステップ２６により入力音声と第２候補の
音声長をステップ２５と同様に比較する。ここでａは、
音゛声長比較のだめの閾値であるＯこれらの条件をすべ上溝たしている時に、入力音声は、
認識結果の単語と同一であると判断し、ステップ２７で
標準バタンの入れ換えを行なう。この標準バタン入れ換
えでは、バタンマツチングにより得られた第１．第２候
補の２個の標準バタンの他の残シの標準バタンを、入力
音声のバタンと入れ換え、記憶部１２に再び格納する。In step 25, the input speech and the first candidate are compared in speech length. This calculates the voice length ratio of the input voice and the first candidate, and checks whether D is within the range of 1-a(D(1+a(0(a(1)).If this condition is met. Then, in step 26, the input speech and the second candidate's speech length are compared in the same way as in step 25.Here, a is
O is the threshold for voice length comparison. When all these conditions are met, the input voice is
It is determined that the word is the same as the recognized word, and the standard button is replaced in step 27. In this standard baton replacement, the first. The remaining standard drums other than the two second candidate standard drums are replaced with the drums of the input voice and stored in the storage unit 12 again.

こうして標準バタンは、新しいバタンに順次変更されて
いく・ステップ２３，２４，２５．２６の条件文が満たされな
い場合は、ステップ１７に戻り音声入力待ち状態となる
。In this way, the standard button is changed to a new button one by one. If the conditional statements in steps 23, 24, 25, and 26 are not satisfied, the process returns to step 17 and waits for voice input.

上記実施例の構成によれば、バタンマツチングの結果と
、エネルギーピークの情報、及び音声長の比較結果を用
い、認識結果が正確であると判断された時に、標準バタ
ンを新しいバタンに変更していくことにより、話者の発
声の時間変動に対応することができる。According to the configuration of the above embodiment, when the recognition result is determined to be accurate by using the results of bang matching, energy peak information, and voice length comparison results, the standard baton is changed to a new baton. By doing so, it is possible to respond to temporal fluctuations in the speaker's utterances.

発明の効果本発明は、入力音声のエネルギー包絡線上のピ゛−りを
検出するエネルギーピーり検出手段と、人、力音声と認
識結果の標準バタンとの時間長を比較する音声長比較手
段を有し、バタンマツチングの結果、第１．第２候補が
同一で、入力音声の距離差も小さく、またエネルギーピ
ーク数がすべて一致し、各音声長も一定範囲に納まって
いる時に、認識結果は信用できると判断し、標準バタン
の一部を入力音声パタンと入れ換えることにより、話者
の発声の時間経過による変動に応じて標準バタンを更新
することのできる音声認識装置を提供できるものである
。Effects of the Invention The present invention provides an energy peak detection means for detecting a peak on the energy envelope of an input voice, and a voice length comparison means for comparing the time length of a human voice and a standard bang as a recognition result. As a result of slam matching, the first. When the second candidates are the same, the distance difference between the input voices is small, the number of energy peaks all match, and the length of each voice is within a certain range, the recognition result is judged to be reliable, and a part of the standard baton is By replacing the input speech pattern with the input speech pattern, it is possible to provide a speech recognition device that can update the standard stamp in response to changes in the speaker's utterance over time.

さらに、標準バタンか自動的に更新されていくことによ
り、登録をいちいちやり直す必要がなくなシ、長時間の
使用にも十分に対応できる音声認識装置が提供できるも
のである。Furthermore, since the standard button is automatically updated, there is no need to re-register each time, and it is possible to provide a speech recognition device that can be used for a long time.

[Brief explanation of the drawing]

第１図は本発明の一実施例における音声認識装置のブロ
ック図、第２図は同装置の構成を示す回路図、第３図は
同動作説明のだめの要部フローチャートである。２・・・・・・音声分析手段、３・・・・エネルギービ
ーク検出手段、４・・・・・認識手段、５・・・・時間
長比較手段、６・・・・・バタン入れ換え手段、７・・
・・記憶手段、８・・・・・マイクロホン、９・・・・
・・Ａ／Ｄ変換器、１０・・・・・・マイクロコンピュ
ータ。代理人の氏名　弁理士　中　尾　敏　男　ほか１名第１
図第２図FIG. 1 is a block diagram of a speech recognition device according to an embodiment of the present invention, FIG. 2 is a circuit diagram showing the configuration of the same device, and FIG. 3 is a flowchart of essential parts for explaining the operation. 2... Voice analysis means, 3... Energy peak detection means, 4... Recognition means, 5... Time length comparison means, 6... Bang replacement means, 7...
...Storage means, 8...Microphone, 9...
...A/D converter, 10...Microcomputer. Name of agent: Patent attorney Toshio Nakao and 1 other person No. 1
Figure 2

Claims

[Claims]

a speech analysis means for extracting a % signature vector from input speech;
energy peak detection means for deriving an energy envelope from the feature vector YI obtained by the voice analysis means and detecting the number of main peaks on the envelope; and storing a plurality of time series of feature vectors of each registered voice as standard beats. a recognition means for deriving a recognition candidate voice by matching a standard bang of the registered voice stored by the storage means with the input voice, and comparing the voice lengths of the input voice and the recognition candidate voice. As a result of bang matching by the speech length comparison means and the recognition means, the top two candidates are the same word and sufficiently similar to the input speech, and the number of peaks determined by the energy peak detection means match both the input speech and the recognition candidate speech. , and further comprising a button replacement means for replacing one of the standard clicks other than the top two candidates of recognition candidate voices with an input voice button when the voice lengths of both voices are within a certain range as a result of the voice length comparison means. A voice recognition device featuring: