JPH07219596A

JPH07219596A - Voice recognizer

Info

Publication number: JPH07219596A
Application number: JP6008256A
Authority: JP
Inventors: Kazuaki Obara; 和昭小原
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1994-01-28
Filing date: 1994-01-28
Publication date: 1995-08-18

Abstract

(57)【要約】【目的】大きな雑音環境下でも認識率の低下の少な
く、さまざまな環境で使えることを可能にする音声認識
装置を提供すること。【構成】音声を異なる中心周波数を持つ複数の周波数
帯域に分割する手段１０２と、前記分割して得られた各
信号波形に含まれる相関性の大小を求める手段１０３
と、複数の周波数帯域に分割する手段１０２によって得
られた各信号波形のパワー成分を検出する手段１０４
と、分割された各信号波形に含まれる同期周波数成分を
前記パワー成分を用いて正規化する手段１０５と、正規
化された同期信号成分を加算する手段１０６を備えたこ
とを特徴とする音声認識装置。 (57) [Abstract] [Purpose] To provide a speech recognition device that can be used in various environments with little reduction in recognition rate even in a noisy environment. A means 102 for dividing a voice into a plurality of frequency bands having different center frequencies, and a means 103 for obtaining the magnitude of the correlation included in each signal waveform obtained by the division.
And means 104 for detecting the power component of each signal waveform obtained by the means 102 for dividing into a plurality of frequency bands.
And a means 105 for normalizing a synchronizing frequency component included in each divided signal waveform by using the power component, and a means 106 for adding the normalized synchronizing signal components. apparatus.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は音声認識装置に関し、更
に詳しくは大きな騒音下でも認識性能を損ねることなく
利用することの出来る音声認識装置に関するものであ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition apparatus, and more particularly to a speech recognition apparatus which can be used even in a noisy environment without impairing the recognition performance.

【０００２】[0002]

【従来の技術】従来の音声認識装置としては、図４に示
された様な構成が一般的に用いられてきている。図４に
おいて、４０１は音声を入力するためのマイク、４０２
は４０１のマイクで入力された音声の特徴量を取り出す
ための特徴抽出器、４０３は４０２の特徴抽出器で取り
出した音声の特徴量を用いて入力された音声を認識する
ための識別部である。以上のように構成された従来の音
声認識装置について、以下その動作を説明する。図４に
おいて、音声を入力するためのマイク４０１で入力され
た音声は特徴抽出器４０２に入力され音声の特徴量が抽
出される。この音声の特徴量の抽出にはこれまで公知の
フィルタ分析、ＤＦＴ(Discrete FourierTransform)分
析、ＬＰＣ（Linear Predictive Coding）分析等が用い
られている。音声特徴抽出器４０２で求められた音声の
特徴量は識別部４０３に入力され公知のＤＴＷ(Dinamic
Time Warping)、ＨＭＭ(Hidden Markoff Model)、ＮＮ
(Neural Network)等の様々な音声パターンの識別方法に
よってされた入力された音声の認識を行う。2. Description of the Related Art As a conventional voice recognition device, a structure as shown in FIG. 4 has been generally used. In FIG. 4, 401 is a microphone for inputting voice, and 402
Is a feature extractor for extracting the feature amount of the voice input by the microphone 401, and 403 is an identification unit for recognizing the input voice using the feature amount of the voice extracted by the feature extractor 402. . The operation of the conventional speech recognition apparatus configured as described above will be described below. In FIG. 4, the voice input from the microphone 401 for inputting voice is input to the feature extractor 402 and the feature amount of the voice is extracted. Conventionally known filter analysis, DFT (Discrete Fourier Transform) analysis, LPC (Linear Predictive Coding) analysis and the like have been used to extract the feature amount of the voice. The voice feature amount obtained by the voice feature extractor 402 is input to the identification unit 403 and a known DTW (Dinamic
Time Warping), HMM (Hidden Markoff Model), NN
It recognizes the input voice by various voice pattern identification methods such as (Neural Network).

【０００３】[0003]

【発明が解決しようとする課題】以上のように構成され
た従来の構成の音声認識装置において、公知のフィルタ
分析、ＤＦＴ(Discrete Fourier Transform)分析、Ｌ
ＰＣ（Linear Predictive Coding）分析等を用いる特徴
抽出器４０２では、大きな騒音下での音声の特徴分析が
的確に行われなくなり認識性能が低下してしまうといっ
た問題があった。In the conventional speech recognition apparatus having the above-described configuration, known filter analysis, DFT (Discrete Fourier Transform) analysis, L
In the feature extractor 402 using PC (Linear Predictive Coding) analysis or the like, there is a problem that the feature analysis of voice under a large noise is not accurately performed and the recognition performance is deteriorated.

【０００４】本発明はかかる点に鑑み、大きな騒音下で
も高い認識率が得られる音声認識装置を提供することを
目的とする。In view of the above points, the present invention has an object to provide a voice recognition apparatus which can obtain a high recognition rate even under a large noise.

【０００５】[0005]

【課題を解決するための手段】本発明は、音声を異なる
中心周波数を持つ複数の周波数帯域に分割する手段と、
前記分割された音声信号の各信号波形の相関性の大小を
求める手段と、前記分割された音声信号の各信号波形の
パワー成分を検出する手段と、前記分割された音声信号
の各信号波形の相関性の大小を前記パワー成分を用いて
正規化する手段と、前記正規化された相関性の大小を加
算する手段と、前記加算された信号を用いて入力音声の
識別を行う識別部を備えたことを特徴とする音声認識装
置である。SUMMARY OF THE INVENTION The present invention comprises means for dividing speech into a plurality of frequency bands having different center frequencies,
A means for obtaining the magnitude of the correlation between the signal waveforms of the divided audio signal, a means for detecting the power component of each signal waveform of the divided audio signal, and a signal waveform of each of the signal waveforms of the divided audio signal. A unit for normalizing the magnitude of the correlation using the power component, a unit for adding the normalized magnitude of the correlation, and a discriminating unit for discriminating the input voice using the added signal. It is a voice recognition device characterized by the above.

【０００６】[0006]

【作用】本発明は前記した構成により、大きな騒音のあ
る環境でも音声認識装置の認識率を低下することなく利
用することができ、音声認識装置の利用環境を大きく広
げることができる。With the above-described structure, the present invention can be used in an environment with a large amount of noise without lowering the recognition rate of the voice recognition device, and the environment in which the voice recognition device is used can be greatly expanded.

【０００７】[0007]

【実施例】図１は本発明の第１の実施例における音声認
識装置の構成図を示すもので１０１は音声を入力するた
めのマイク、１０２は音声を異なる中心周波数を持つ複
数の周波数帯域に分割するためのフィルタバンク、１０
３はフィルタによって周波数分析された音声信号の各信
号波形の自己相関を求めるための自己相関器、１０４は
フィルタによって周波数分析された音声信号波形のパワ
ー成分を求めるパワー検出回路、１０５は自己相関器１
０３の出力を前記パワー検出器１０４で求めた各フィル
タ出力のパワーを用いて正規化するための除算器、１０
６は除算器１０５で正規化された各フィルタの自己相関
の値を加え合わせるための加算器、１０６は加算器１０
５で求めた音声特徴量を用いて音声認識をするための識
別部である。1 is a block diagram of a voice recognition apparatus according to a first embodiment of the present invention, in which 101 is a microphone for inputting voice, and 102 is voice in a plurality of frequency bands having different center frequencies. Filter bank for partitioning, 10
3 is an autocorrelator for obtaining an autocorrelation of each signal waveform of the voice signal frequency-analyzed by the filter, 104 is a power detection circuit for obtaining a power component of the voice signal waveform frequency-analyzed by the filter, and 105 is an autocorrelator 1
A divider for normalizing the output of No. 03 using the power of each filter output obtained by the power detector 104, 10
6 is an adder for adding the autocorrelation values of the respective filters normalized by the divider 105, and 106 is an adder 10
This is an identification unit for performing voice recognition using the voice feature amount obtained in 5.

【０００８】以上のように構成された従来の音声認識装
置について、以下その動作を説明する。図１において、
音声を入力するためのマイク１０１によって入力された
音声は、フィルタバンク１０２に入力され、異なる中心
周波数を持つ複数の周波数帯域を有する音声周波数帯域
に分割される。この帯域フィルタバンクの中心周波数は
音声信号帯域を対数的に等間隔に分割することによって
設定しており、音声情報の重要な成分を受け持つ低域の
周波数成分は細かく、高域の周波数成分は大まかに分析
する事によって効率的に音声の周波数分析を行ってい
る。このフィルタバンク１０２で分析された音声信号
は、自己相関器１０３とパワー検出器１０４に入力さ
れ、各フィルタ出力に含まれる各帯域成分の周期性検出
とフィルタに含まれるパワーの検出が行われる。その後
パワー成分で正規化するために除算器１０５に入力され
自己相関器１０３の出力はフィルタ出力のパワーに応じ
て正規化される。正規化された各自己相関器出力は加算
器１０６へと入力され各フィルタの正規化された自己相
関値が加え合わされる。識別部１０７では加え合わされ
た自己相関の値をを用いて、公知の音声パターンの識別
方法であるＤＴＷ(DinamicTime Warping)によって、入
力された音声の認識を行う。音声パターンの識別方法と
してはＨＭＭ(Hidden Markoff Model)、ＮＮ(Neural Ne
twork)等の様々な音声パターンの識別方法をＤＴＷの代
わりに用いることが出来る。The operation of the conventional speech recognition apparatus configured as described above will be described below. In FIG.
The voice input by the microphone 101 for inputting voice is input to the filter bank 102 and divided into voice frequency bands having a plurality of frequency bands having different center frequencies. The center frequency of this band filter bank is set by dividing the audio signal band logarithmically at equal intervals.The low-frequency components responsible for important components of audio information are fine, and the high-frequency components are roughly. The frequency of the voice is efficiently analyzed by analyzing. The audio signal analyzed by the filter bank 102 is input to the autocorrelator 103 and the power detector 104, and the periodicity of each band component included in each filter output and the power included in the filter are detected. After that, the signal is input to the divider 105 for normalization with the power component, and the output of the autocorrelator 103 is normalized according to the power of the filter output. The normalized autocorrelator outputs are input to the adder 106, and the normalized autocorrelation values of the filters are added. The identifying unit 107 uses the added value of the autocorrelation to recognize the input voice by DTW (Dinamic Time Warping) which is a known voice pattern identifying method. HMM (Hidden Markoff Model) and NN (Neural Neural Neural
Various voice pattern identification methods such as twork) can be used instead of DTW.

【０００９】図２に上記に示した本発明を用いた音声認
識装置の雑音環境下での音声認識率を、従来の認識装置
を用いたときの認識率と比較した結果を示している。図
２の縦軸は音声認識率、横軸は音声が発話される環境の
雑音の大きさをしめすS/N比を示している。従来の認識
装置としては、良く用いられている音声分析手法である
ＤＦＴ(Discrete Fourier Transform)を用いた音声認
識装置の結果を示している。この図より明らかなよう
に、従来の手法では雑音が高くなる（S/N比が低くな
る）と認識率が大きく低下してくるのに対して、本発明
の音声認識装置では、雑音が大きくなってきても認識率
の低下は少なく、良好な認識性能が維持できていること
がわかる。FIG. 2 shows the result of comparing the speech recognition rate in the noise environment of the speech recognition apparatus using the present invention described above with the recognition rate when the conventional recognition apparatus is used. The vertical axis of FIG. 2 represents the voice recognition rate, and the horizontal axis represents the S / N ratio indicating the noise level of the environment in which the voice is uttered. As a conventional recognition device, the result of a speech recognition device using DFT (Discrete Fourier Transform) which is a widely used speech analysis method is shown. As is clear from this figure, in the conventional method, when the noise becomes high (the S / N ratio becomes low), the recognition rate greatly decreases, whereas in the voice recognition device of the present invention, the noise becomes large. It can be seen that the recognition rate does not decrease even if it becomes worse and good recognition performance can be maintained.

【００１０】以上のように本実施例によれば、大きな雑
音環境下においても認識率の低下が少ない優れた音声認
識装置を得ることが出来、音声認識装置を大きな雑音が
ある環境でも利用することを可能にするなど、その実用
的な価値は非常に大きい。As described above, according to this embodiment, it is possible to obtain an excellent speech recognition apparatus with a small reduction in recognition rate even in a large noise environment, and to use the speech recognition apparatus even in an environment with large noise. Its practical value is extremely large, such as enabling.

【００１１】図３は本発明の第２の実施例における音声
認識装置の構成図を示すものである。３０１は音声を入
力するためのマイク、３０２は音声を異なる中心周波数
を持つ複数の周波数帯域に分割するためのフィルタバン
ク、３０３はフィルタによって周波数分析された音声信
号の各信号波形を各フィルタの中心周波数の逆数により
決定される遅延量だけフィルタ出力信号を遅延させるた
めの遅延回路、３０４は遅延回路３０３により遅延され
た信号と、遅延しないフィルタ出力信号との掛け算をす
るための乗算器、３０５は乗算器３０４の出力をフィル
タ毎に設定した積分時間だけ積分するための積分回路、
３０６はフィルタ出力信号の自乗を計算するための２乗
回路、３０７は２乗器３０６の出力をフィルタ毎に設定
した積分時間だけ積分するための積分回路、３０８は積
分回路３０６の出力を積分回路３０７で除算する事によ
って乗算器３０４の出力を、各フィルタ出力のパワーを
用いて正規化するための除算器、３０９は各フィルタ出
力につながる除算器３０８の出力を加算する加算器、３
１０は３０９の出力である音声特徴量を用いて音声認識
をするための識別部である。FIG. 3 is a block diagram of a speech recognition apparatus according to the second embodiment of the present invention. Reference numeral 301 is a microphone for inputting voice, 302 is a filter bank for dividing voice into a plurality of frequency bands having different center frequencies, and 303 is the center of each filter for each signal waveform of the voice signal frequency-analyzed by the filter. A delay circuit for delaying the filter output signal by a delay amount determined by the reciprocal of the frequency, 304 is a multiplier for multiplying the signal delayed by the delay circuit 303 and a filter output signal which is not delayed, 305 is An integrating circuit for integrating the output of the multiplier 304 for the integration time set for each filter,
306 is a square circuit for calculating the square of the filter output signal, 307 is an integrating circuit for integrating the output of the squarer 306 for the integration time set for each filter, and 308 is an integrating circuit of the output of the integrating circuit 306. A divider for normalizing the output of the multiplier 304 by using the power of each filter by dividing by 307, and an adder 309 for adding the output of the divider 308 connected to each filter output,
Reference numeral 10 is an identification unit for performing voice recognition using the voice feature amount output from 309.

【００１２】以上のように構成された従来の音声認識装
置について、以下その動作を説明する。図３において、
音声を入力するためのマイク３０１によって入力された
音声は、異なる中心周波数を持つ複数のフィルタ３０２
により異なる周波数帯域を有する複数の音声周波数帯域
に分割される。この帯域フィルタ３０２の中心周波数は
第一の実施例と同様に、音声信号帯域を対数的に等間隔
に分割することによって設定しており、音声情報の低次
のフォルマント等の重要な成分を受け持つ低域の周波数
成分は細かく、高域の周波数成分は大まかに分析する事
によって効率的に音声の周波数分析を行っている。この
フィルタバンク３０２で分析された音声信号は、前記フ
ィルタによって周波数分析された音声信号の各信号波形
を各フィルタの中心周波数の逆数により決定される遅延
量（例えばフィルタの中心周波数が200Hzの時の遅延量
は5msに設定）だけフィルタ出力信号を遅延させるため
の遅延回路３０３に入力され遅延された後、遅延しない
フィルタ出力信号との積を求めるために乗算器３０４に
入力された後、各フィルタ毎に設定される積分時間を持
つ積分器３０５に入力される。積分器３０５の出力は、
遅延した信号（遅延回路３０３の出力）と遅延しない信
号の相関が高ければ出力値は大きくなり、逆に相関が低
ければ出力は低くなる。積分器３０５の出力は、２乗回
路３０６の出力（フィルタ出力のパワー）を積分器３０
７により積分したものの出力用いて除算器３０８によっ
て除算される。この演算によってフィルタ出力に含まれ
る中心周波数成分を、フィルタに含まれるパワーを用い
て正規化した量として求めている。除算器３０８ー１〜
３０８ーＮの出力は加算器３０９で加算され識別部３１
０へと入力される。識別部３１０では入力された音声特
徴量を用いて、公知のＤＴＷ(Dinamic Time Warping)、
ＨＭＭ(Hidden Markoff Model)、ＮＮ(Neural Network)
等の様々な音声パターンの識別方法によって、入力され
た音声の認識を行う。The operation of the conventional speech recognition apparatus configured as above will be described below. In FIG.
The voice input by the microphone 301 for inputting voice is a plurality of filters 302 having different center frequencies.
Is divided into a plurality of audio frequency bands having different frequency bands. The center frequency of the bandpass filter 302 is set by logarithmically dividing the audio signal band into equal intervals as in the first embodiment, and is responsible for important components such as low-order formants of audio information. The low-frequency components are finely divided, and the high-frequency components are roughly analyzed to efficiently analyze the frequency of the voice. The audio signal analyzed by the filter bank 302 has a delay amount determined by the reciprocal of the center frequency of each filter of each signal waveform of the audio signal frequency-analyzed by the filter (for example, when the center frequency of the filter is 200 Hz). The delay amount is set to 5 ms) and is input to the delay circuit 303 for delaying the filter output signal and delayed, and then input to the multiplier 304 to obtain the product with the filter output signal that is not delayed, and then each filter It is input to the integrator 305 having the integration time set for each. The output of the integrator 305 is
The higher the correlation between the delayed signal (output of the delay circuit 303) and the non-delayed signal, the larger the output value, and conversely, the lower the correlation, the lower the output. The output of the integrator 305 is the output of the squaring circuit 306 (power of the filter output),
The output of the integrated signal by 7 is used for division by the divider 308. By this calculation, the center frequency component included in the filter output is obtained as a normalized amount using the power included in the filter. Divider 308-1-
The output of 308-N is added by the adder 309, and the identification unit 31
Input to 0. The identification unit 310 uses the input voice feature amount to perform known DTW (Dinamic Time Warping),
HMM (Hidden Markoff Model), NN (Neural Network)
The input voice is recognized by various voice pattern identification methods such as.

【００１３】以上のように本実施例によれば、複数のフ
ィルタによって周波数分析された音声信号の各信号波形
の自己相関を求める必要がなく、より簡易な構成でフィ
ルタ出力の相関性の大小を求めることができ、大きな雑
音環境下においても認識率の低下が少ない優れた音声認
識装置を得ることが出来、音声認識装置を大きな雑音が
ある環境でも利用することが可能なり、その実用的な価
値は非常に大きい。As described above, according to this embodiment, it is not necessary to obtain the autocorrelation of each signal waveform of the audio signal frequency-analyzed by a plurality of filters, and the correlation of the filter output can be reduced with a simpler configuration. It is possible to obtain an excellent speech recognition device with a small reduction in recognition rate even in a noisy environment, and it is possible to use the speech recognition device even in an environment with large noise. Is very large.

【００１４】[0014]

【発明の効果】以上説明したように、本発明によれば、
大きな雑音環境下でも認識率の低下の少ない音声認識装
置を得ることが出来、音声認識装置をさまざまな環境で
使えることを可能にすることができその実用的価値には
大なるものがある。As described above, according to the present invention,
It is possible to obtain a speech recognition apparatus with a small reduction in recognition rate even in a noisy environment, and to enable the speech recognition apparatus to be used in various environments, which is of great practical value.

[Brief description of drawings]

【図１】本発明における第１の実施例の音声認識装置の
構成図FIG. 1 is a configuration diagram of a voice recognition device according to a first embodiment of the present invention.

【図２】本発明における第１の実施例を用いたの音声認
識装置の性能説明図FIG. 2 is an explanatory diagram of performance of a voice recognition device using the first embodiment of the present invention.

【図３】本発明における第２の実施例の音声認識装置の
構成図FIG. 3 is a configuration diagram of a voice recognition device according to a second embodiment of the present invention.

【図４】従来の音声認識装置の構成図FIG. 4 is a block diagram of a conventional voice recognition device.

[Explanation of symbols]

１０１入力マイク１０２フィルタバンク１０３自己相関器１０４パワー検出回路１０５除算器１０６加算器１０７識別器３０１マイク３０２フィルタバンク３０３遅延回路３０４乗算器３０５積分器３０６２乗回路３０７積分器３０８加算器３０９除算器３１０識別部 101 Input Microphone 102 Filter Bank 103 Auto Correlator 104 Power Detection Circuit 105 Divider 106 Adder 107 Discriminator 301 Microphone 302 Filter Bank 303 Delay Circuit 304 Multiplier 305 Integrator 306 Square Circuit 307 Integrator 308 Adder 309 Divider 310 Identification unit

Claims

[Claims]

1. A means for dividing a voice into a plurality of frequency bands by filters having different center frequencies, and a means for obtaining the magnitude of correlation in the time direction contained in each filter output waveform obtained by the division. A means for detecting the power component of each signal waveform obtained by the means for dividing into the plurality of frequency bands; a means for normalizing the magnitude of the correlation between the divided signal waveforms using the power component; A voice recognition device comprising: means for adding the normalized signals of the plurality of frequency bands; and an identification section for identifying an input voice using the added signal components.

2. A means for obtaining the magnitude of correlation contained in each signal waveform divided by a filter, and a means for detecting a power component of each signal waveform of the divided audio signal,
The speech recognition apparatus according to claim 1, wherein the speech waveform is obtained by using autocorrelation of a signal waveform divided by the means for dividing into a plurality of frequency bands.

3. A means for determining the magnitude of correlation contained in each signal waveform divided by a filter, is a voice signal divided by the plurality of filters and a delay amount determined by a center frequency of each filter. 2. The voice recognition apparatus according to claim 1, wherein the voice signal is obtained by using a means for multiplying a signal obtained by delaying a voice signal divided by the plurality of filters and a means for integrating an output of the multiplying means.

4. The voice according to claim 1, wherein the center frequency of the filter for dividing the voice into a plurality of frequency bands is set by dividing the voice signal band logarithmically at equal intervals. Recognition device.