JPH06149290A

JPH06149290A - Speech recognizing device

Info

Publication number: JPH06149290A
Application number: JP4293412A
Authority: JP
Inventors: Shinichi Tsurufuji; 真一鶴藤
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1992-10-30
Filing date: 1992-10-30
Publication date: 1994-05-27

Abstract

PURPOSE:To improve the recognition rate of the speech recognizing device in noisy environment by absorbing the distortion of a feature parameter inputted to a microphone for speech input without being affected by the position of a noise source in a cabin and the position of the microphone for speech input, and efficiently performing noise removal. CONSTITUTION:The speech recognizing device is equipped with the microphone 1 to which a speech and an acoustic noise are inputted indirectly through a space, a sound signal input terminal 3 to which they are inputted directly from an acoustic device, an acoustic characteristic correcting means (sound field correction part 8) which compensates characteristics of the direct signals, a 1st feature extraction part 2 which extracts a 1st feature parameter by analyzing the input signal from the microphone 1, a 2nd feature extraction part 4 which extracts a 2nd feature parameter by analyzing the corrected input signal from the sound field correction part 8, a noise removal part 5 which generates a 3rd feature parameter by subtracting the 2nd feature parameter from the 1st feature parameter, a speech pattern generation part 6 which generates a speech pattern from the 3rd feature parameter, and a recognizing process part 7 which recognizes the speech pattern.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声認識装置に関し、
特に、テレビやラジオなどの音響装置等によって生じる
音響（音声認識にとっては雑音となる）の影響を排除し
た音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device,
In particular, the present invention relates to a voice recognition device that eliminates the influence of sound (which becomes noise for voice recognition) generated by an audio device such as a television or a radio.

【０００２】[0002]

【従来の技術】音声認識を行う場合、まず、話者はマイ
クに向かって音声を発声し、この音声をマイクに入力す
る事になるが、この時、話者の発声環境は必ずしも良い
とは限らず、むしろ何らかの雑音が存在するのが一般的
である。従って、音声入力の際には、通常雑音が重畳し
て入力される場合が多いので、雑音の影響を適切に排除
することが、認識率の高い音声認識を実現する事につな
がる。2. Description of the Related Art In the case of performing voice recognition, first, a speaker utters a voice into a microphone and inputs this voice into the microphone. At this time, the utterance environment of the speaker is not always good. It is not limited, but rather, some kind of noise is generally present. Therefore, in the case of voice input, noise is usually superimposed and input in many cases, and appropriately eliminating the influence of noise leads to realizing voice recognition with a high recognition rate.

【０００３】従って、近年、このような雑音を除去する
方法が各種提案されている。このような従来の雑音除去
方法には、例えば、特許出願公開昭和５４−９１００７
号公報に記載の音声認識装置の如く、音声入力用の第１
のマイクと雑音入力用の第２のマイクを用い、第１のマ
イクから入力された雑音重畳音声信号Ｎ＋Ｖ（音声信号
Ｖ＋雑音信号Ｎ）から、第２のマイクから入力された雑
音信号Ｎを差し引いて、雑音の無い音声信号Ｖを得よう
とするもがある。Therefore, in recent years, various methods for removing such noise have been proposed. Examples of such conventional noise removal methods include, for example, Japanese Patent Application Publication No. Showa 54-91007.
A voice recognition device described in Japanese Patent Publication No.
The noise signal N input from the second microphone is subtracted from the noise-superimposed voice signal N + V (voice signal V + noise signal N) input from the first microphone using the second microphone for noise input and the second microphone for noise input. Then, there is a case where an audio signal V without noise is tried to be obtained.

【０００４】このような従来技術においては、雑音入力
用マイクの設置位置の設定が難しく、例えば、雑音入力
用マイクを音声入力用マイクの近くに設置した場合に
は、雑音入力用マイクに音声が比較的大きく入力されて
しまう不都合がある。また、逆に、音声入力用マイクと
雑音入力用マイクをかなり離れた位置に設置する場合に
は、音声入力用マイクへ入力される音声と共に入力され
る雑音と雑音入力用マイクへ入力される雑音の特性（位
相、周波数、音量等）が一致せず、これらの差をとって
も効果的な雑音除去には至らない不都合があった。In such a conventional technique, it is difficult to set the installation position of the noise input microphone. For example, when the noise input microphone is installed near the voice input microphone, the noise is input to the noise input microphone. There is an inconvenience that the input is relatively large. On the contrary, when the voice input microphone and the noise input microphone are installed far apart from each other, the noise input with the voice input microphone and the noise input to the noise input microphone The characteristics (phase, frequency, volume, etc.) do not match, and even if these differences are taken, there is the inconvenience that effective noise removal cannot be achieved.

【０００５】而して、音声認識を行う環境によっては、
例えば、テレビ、ラジオなどの音響機器がそのスピーカ
ーから発する音楽や音声等の音響が存在する環境では、
これら音響が音声認識にとっては雑音となる。このよう
に雑音源が明確なテレビやラジオなででは、それらが発
する音響信号をこれら音響機器の出力端子から直接検出
する事ができるので、上述のような、雑音入力用マイク
を省略できる。Thus, depending on the environment in which voice recognition is performed,
For example, in an environment where there is sound such as music or voice emitted from the speakers of audio equipment such as TV and radio,
These sounds become noise for voice recognition. In such a television or radio with a clear noise source, the acoustic signals generated by them can be detected directly from the output terminals of these acoustic devices, so that the noise input microphone as described above can be omitted.

【０００６】しかしながら、このような音響機器が発す
る音楽や音声等の音響信号Ｎをこれら機器から直接得る
場合でも、この直接的な音響信号Ｎと、音声認識のため
の音声入力用マイクに音声と共に入力される間接的な音
響音声Ｎｚとが特性的に異なる場合がしばしばある。例
えば、自動車の車内に於ては、カーステレオのスピーカ
ーから発せられる音楽（音響雑音）は、自動車室内の空
間の音響特性の影響を受けるので、位相、周波数、音量
などの信号特性に変化を来した信号Ｎｚとなり、このカ
ーステレオの出力端子から得られる音楽（音響雑音）の
信号Ｎと特性的に異なったものとなる危惧がある。However, even when the acoustic signal N such as music or voice generated by such an acoustic device is directly obtained from these devices, the direct acoustic signal N and the voice are input to the voice input microphone for voice recognition together with the voice. The input indirect acoustic voice Nz is often characteristically different. For example, in the interior of an automobile, the music (acoustic noise) emitted from the car stereo speakers is affected by the acoustic characteristics of the space inside the automobile, so the signal characteristics such as phase, frequency, and volume change. The resulting signal Nz is different from that of the signal N of music (acoustic noise) obtained from the output terminal of the car stereo, and there is a fear that the signal Nz is different in characteristics.

【０００７】従って、上述の如く音響機器の出力端子か
らの直接音響信号Ｎの信号特性とそのスピーカーから発
せられて空間環境の影響で歪んだ音響をマイクで検出し
た間接音響信号Ｎｚとの信号特性が異なる場合には、マ
イクに入力された間接音響信号Ｎｚと音声信号Ｖとの合
成信号Ｎｚ＋Ｖから音響機器の出力端子からの直接音響
信号Ｎを差し引いても、この差信号Ｖ＋Ｎｚ−Ｎは、真
の音声信号Ｖとはならないため、このような雑音除去処
理では結局、認識率の高に音声認識を実現することはで
きなかった。Therefore, as described above, the signal characteristic of the direct acoustic signal N from the output terminal of the audio equipment and the signal characteristic of the indirect acoustic signal Nz, which is the sound emitted from the speaker and distorted by the influence of the spatial environment, is detected by the microphone. , The difference signal V + Nz-N is true even if the direct acoustic signal N from the output terminal of the audio device is subtracted from the combined signal Nz + V of the indirect acoustic signal Nz and the audio signal V input to the microphone. Therefore, it is impossible to realize voice recognition with a high recognition rate after all by such noise removal processing.

【０００８】[0008]

【発明が解決しようとする課題】本発明は、上述の点に
鑑みてなされたものであって、音響機器の出力端子から
の直接音響信号Ｎの信号特性とそのスピーカーから発せ
られて空間環境の影響で歪んだ音響となってマイクで検
出される間接音響信号Ｎｚとの音響特性が異なる場合
に、これらの音響特性（位相、周波数、音量等）の違い
を補正（Ｎ＝Ｎｚ）とすることによって、真の音声信号
Ｖを得、精度の高い音声認識装置を実現するものであ
る。DISCLOSURE OF THE INVENTION The present invention has been made in view of the above-mentioned points, and the signal characteristics of the direct acoustic signal N from the output terminal of the acoustic device and the spatial characteristics of the spatial environment emitted from the speaker. When the acoustic characteristics are distorted due to the influence and the acoustic characteristics are different from the indirect acoustic signal Nz detected by the microphone, the difference in these acoustic characteristics (phase, frequency, volume, etc.) should be corrected (N = Nz). By this, a true voice signal V is obtained, and a highly accurate voice recognition device is realized.

【０００９】[0009]

【課題を解決するための手段】本発明の音声認識装置
は、音声認識の為に発声された音声が入力されると共に
上記音響雑音が上記空間を介して間接的に入力されるマ
イクロホンと、上記音響雑音の信号が上記空間を介さず
に音響装置から直接的に入力される音響信号入力端子
と、上記マイクから得られる音響雑音の間接信号と上記
音響信号入力端子から得られる音響雑音の直接信号との
間の特性差を低減するように、上記音響信号入力端子か
ら入力される音響雑音の直接信号を特性補正する音響特
性補正手段と、上記マイクからの入力信号を分析して第
１の特徴パラメータを抽出する第１特徴抽出手段と、上
記音響特性補正手段からの補正された入力信号を分析し
て第２の特徴パラメータを抽出する第２特徴抽出手段
と、上記第１の特徴パラメータから上記第２の特徴パラ
メータを差し引いて第３の特徴パラメータを生成する雑
音除去手段と、上記第３特徴パラメータから音声パター
ンを作成する音声パターン作成手段と、上記音声パター
ン作成手段で作成された音声パターンに対して認識処理
を行う認識処理手段とを備えたものである。A voice recognition apparatus of the present invention includes a microphone to which a voice uttered for voice recognition is input and the acoustic noise is indirectly input via the space. An acoustic signal input terminal in which an acoustic noise signal is directly input from the acoustic device without passing through the space, an indirect signal of acoustic noise obtained from the microphone, and a direct signal of acoustic noise obtained from the acoustic signal input terminal. Acoustic characteristic correction means for characteristic-correcting a direct signal of acoustic noise input from the acoustic signal input terminal, and an input signal from the microphone are analyzed to reduce a characteristic difference between First characteristic extracting means for extracting a parameter, second characteristic extracting means for analyzing a corrected input signal from the acoustic characteristic correcting means to extract a second characteristic parameter, and the first characteristic parameter The noise removal means for subtracting the second characteristic parameter from the data to generate the third characteristic parameter, the voice pattern generating means for generating a voice pattern from the third characteristic parameter, and the voice pattern generating means. And a recognition processing means for performing recognition processing on the voice pattern.

【００１０】[0010]

【作用】本発明の音声認識装置は、上記音響特性補正手
段によって、音響機器の出力端子からの直接音響信号Ｎ
の信号特性を、マイクで検出される間接音響信号Ｎｚの
音響特性と一致するように補正する。即ち、音響機器の
出力端子からの直接音響信号Ｎにも、空間環境の影響に
よる特性歪みを与えてやり、間接音響信号Ｎｚと等しく
い特性（位相、周波数、音量等）にするのである。これ
によって、マイクで検出される間接音響信号Ｎｚと音声
信号Ｖとの合成信号Ｎｚ＋Ｖの信号特性から、音響機器
の出力端子の直接音響信号Ｎを補正して得られる補正直
接音響信号Ｎｚの信号特性を差し引いた真の音声信号Ｖ
に基づいて、音声認識が行える。In the voice recognition apparatus of the present invention, the direct acoustic signal N from the output terminal of the acoustic device is corrected by the acoustic characteristic correction means.
Of the indirect acoustic signal Nz detected by the microphone. That is, the direct acoustic signal N from the output terminal of the audio device is also subjected to characteristic distortion due to the influence of the spatial environment to have characteristics (phase, frequency, volume, etc.) that are not equal to the indirect acoustic signal Nz. Thereby, the signal characteristic of the corrected direct acoustic signal Nz obtained by correcting the direct acoustic signal N of the output terminal of the acoustic device from the signal characteristic of the combined signal Nz + V of the indirect acoustic signal Nz and the audio signal V detected by the microphone. True audio signal V minus
Based on the, voice recognition can be performed.

【００１１】[0011]

【実施例】以下に、本発明による音声認識処理装置を音
響装置を備えた自動車の室内において使用した場合の実
施例について説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment in which the speech recognition processing apparatus according to the present invention is used in the interior of an automobile equipped with an acoustic device will be described below.

【００１２】図１に本発明による音声認識処理装置の一
構成例を示す。図１において、１は音声を入力するマイ
クを、２は入力された音声を分析し、第１特徴パラメー
タを抽出する第１特徴抽出部を示す。３は音響装置の出
力端子等からの出力（音響雑音）を直接入力する雑音入
力端子であり、音響装置にスピーカが４個存在する場合
には雑音入力端子は４つ存在する。FIG. 1 shows an example of the configuration of a voice recognition processing device according to the present invention. In FIG. 1, 1 is a microphone for inputting a voice, and 2 is a first feature extraction unit for analyzing the input voice and extracting a first feature parameter. Reference numeral 3 is a noise input terminal for directly inputting an output (acoustic noise) from an output terminal of the audio device, and when the audio device has four speakers, there are four noise input terminals.

【００１３】４は第２特徴抽出部であり、上記雑音入力
端子３から得られる直接信号に後述する音場補正部８に
て補正を行った上で、雑音を分析し、その第２特徴パラ
メータを抽出する。Reference numeral 4 is a second feature extraction unit, which corrects a direct signal obtained from the noise input terminal 3 by a sound field correction unit 8 which will be described later, analyzes the noise, and outputs a second feature parameter thereof. To extract.

【００１４】尚、上記第１及び第２の特徴抽出部２，４
としては、同一の分析処理、例えばＦＦＴ処理装置など
が使用される。The first and second feature extracting units 2 and 4
For the same, the same analysis process, for example, an FFT processing device is used.

【００１５】５は第１特徴パラメータから第２特徴パラ
メータを差し引いて第３特徴パラメータを抽出する雑音
除去部である。Reference numeral 5 is a noise removing unit for extracting the third characteristic parameter by subtracting the second characteristic parameter from the first characteristic parameter.

【００１６】６は雑音除去部５から得られる第３特徴パ
ラメータから音声パターンを作成するパターン作成部を
示す。７はパターン作成部６から得られる音声パターン
の認識処理を行う認識処理部であって、認識対象となる
音声パターンが既に複数登録されており、未知の音声パ
ターンとの比較の結果、最も類似の既登録の音声パター
ンを見いだすことによって、音声の特定、即ち認識を行
う。Reference numeral 6 denotes a pattern creating section for creating a voice pattern from the third characteristic parameter obtained from the noise removing section 5. Reference numeral 7 denotes a recognition processing unit that performs recognition processing of a voice pattern obtained from the pattern creation unit 6, and a plurality of voice patterns to be recognized have already been registered. As a result of comparison with an unknown voice pattern, the most similar voice pattern is obtained. By identifying a registered voice pattern, the voice is identified, that is, recognized.

【００１７】尚、９は、必要に応じて、定常雑音（自動
車室内ではエンジン音等の走行音）の特徴パラメータを
記憶しておく定常雑音特徴パラメータ記憶部であり、上
記雑音除去部５での雑音除去処理に用いられる。即ち、
第１特徴パラメータから第２特徴パラメータを差し引い
て第３特徴パラメータを抽出する際に、さらに、この記
憶部９の定常雑音特徴パラメータも差しい引いて、第３
の特徴パラメータを得ることになる。Reference numeral 9 is a stationary noise characteristic parameter storage unit for storing characteristic parameters of stationary noise (running noise such as engine sound in the vehicle compartment) as necessary. Used for noise reduction processing. That is,
When the third feature parameter is extracted by subtracting the second feature parameter from the first feature parameter, the stationary noise feature parameter of the storage unit 9 is also subtracted to obtain the third feature parameter.
The characteristic parameters of

【００１８】ここで、本発明の音声認識装置が最も特徴
とする音場補正部について詳述する。この音場補正部８
は、上記マイク１から得られる音響雑音の間接信号と上
記雑音入力端子３から得られる音響雑音の直接信号との
間の特性（位相、周波数、音量等）の差を低減するよう
に、上記雑音入力端子３から入力される音響雑音の直接
信号を特性補正する音響特性補正手段である。Here, the sound field correction unit, which is the most characteristic feature of the voice recognition apparatus of the present invention, will be described in detail. This sound field correction unit 8
Is to reduce the difference in characteristics (phase, frequency, volume, etc.) between the indirect signal of acoustic noise obtained from the microphone 1 and the direct signal of acoustic noise obtained from the noise input terminal 3. It is an acoustic characteristic correction means for correcting the direct signal of the acoustic noise input from the input terminal 3.

【００１９】このような音響特性補正手段としての音場
補正部８の構成を図２に示す。同図に於て、８０、８０
・・は、入力された音響雑音信号をアナログ信号からデ
ジタル信号に変換するＡ／Ｄコンバータである。FIG. 2 shows the configuration of the sound field correction section 8 as such an acoustic characteristic correction means. In the figure, 80, 80
.. is an A / D converter that converts an input acoustic noise signal from an analog signal to a digital signal.

【００２０】８１はゲイン補正用データを保持するゲイ
ン補正用データテーブル、８２はこのゲイン補正用デー
タを用いてゲインを補正するゲイン補正部、８３は周波
数特性補正用データを保持する周波数特性補正用データ
テーブル、８４はこの周波数特性補正用データを用いて
周波数特性を補正する周波数特性補正部、８５は位相補
正用データを保持する位相補正用データテーブル、８６
は位相補正用データを用いて位相を補正する位相補正部
である。Reference numeral 81 is a gain correction data table that holds gain correction data, 82 is a gain correction unit that corrects the gain using this gain correction data, and 83 is frequency characteristic correction data that holds frequency characteristic correction data. A data table, 84 is a frequency characteristic correction unit that corrects frequency characteristics using the frequency characteristic correction data, and 85 is a phase correction data table that holds phase correction data.
Is a phase correction unit that corrects the phase using the phase correction data.

【００２１】更に、８７は４つ全てのスピーカに対する
位相補正部８６からの４つの各出力を加算する加算部で
あり、８８はこの加算値のデジタル信号をアナログ信号
に変換するＤ／Ａコンバータである。Further, 87 is an adder for adding each of the four outputs from the phase corrector 86 for all four speakers, and 88 is a D / A converter for converting the digital signal of the added value into an analog signal. is there.

【００２２】このような構成の音場補正部８では、音声
認識を行うに先立ち、音声用マイク１と雑音入力端子３
との間に生じる音響データの特性の差異を計測し、特性
補正用データの作成が行われる。In the sound field correction unit 8 having such a configuration, the voice microphone 1 and the noise input terminal 3 are provided before performing voice recognition.
The difference in the characteristics of the acoustic data that occurs between and is measured, and the characteristic correction data is created.

【００２３】具体的には、例えば音響装置にパルス音を
発生させて、マイク１へはスピーカから出力され自動車
の室内等の空間環境の影響を受けたパルス音を入力し、
雑音入力端子３へは音響装置の出力端子から電気信号の
形で直接入力する。この時、ゲイン補正部８２では音量
を調整し、周波数特性補正部８４では周波数特性を調整
し、位相補正部８６では位相を調整し、これらの各調整
の結果、雑音除去部５において、マイク１と雑音入力端
子３の双方から得られる特徴パラメータの差（第３特徴
パラメータ）が最小となるように、ゲイン補正用データ
テーブル８１、周波数特性補正用データテーブル８３、
位相補正用データテーブル８５のそれぞれの値が設定さ
れる。Specifically, for example, a pulse sound is generated in the audio device, and the pulse sound output from the speaker and influenced by the spatial environment such as the interior of the automobile is input to the microphone 1.
The noise input terminal 3 is directly input from the output terminal of the audio device in the form of an electric signal. At this time, the gain correction unit 82 adjusts the sound volume, the frequency characteristic correction unit 84 adjusts the frequency characteristic, and the phase correction unit 86 adjusts the phase. As a result of each of these adjustments, the noise removal unit 5 causes the microphone 1 to operate. Gain correction data table 81, frequency characteristic correction data table 83, so that the difference between the characteristic parameters (third characteristic parameter) obtained from both the noise input terminal 3 and the noise input terminal 3 is minimized.
The respective values of the phase correction data table 85 are set.

【００２４】即ち、ゲイン補正部８２、周波数特性補正
部８４、及び、周波数特性補正部８４では、雑音除去部
５の差分結果をフィードバックしながら各パラメータの
補正調節を行い、第３特徴パラメータが存在しないか若
しくは最小になった時点で、各パラメータの補正データ
（定数、あるいは適切な関数）をゲイン補正用データテ
ーブル８１、周波数特性補正用データテーブル８３、位
相補正用データテーブル８５に記憶する。That is, the gain correction unit 82, the frequency characteristic correction unit 84, and the frequency characteristic correction unit 84 perform correction adjustment of each parameter while feeding back the difference result of the noise removal unit 5, and the third characteristic parameter exists. When it is not performed or when it becomes the minimum, the correction data (constant or an appropriate function) of each parameter is stored in the gain correction data table 81, the frequency characteristic correction data table 83, and the phase correction data table 85.

【００２５】以下に、図１及び図２に従って、認識処理
動作の説明を加える。The recognition processing operation will be described below with reference to FIGS. 1 and 2.

【００２６】まず最初に、第１特徴抽出部２によって、
マイク１から入力された音声の音声分析が行なわれる。
コの第１特徴抽出部２では、入力されたアナログ音声信
号をデジタル信号に変換した後に、得られたデジタル信
号に対してフーリエ変換を行うことによって第１の特徴
パラメータを抽出する。ここで得られた第１の特徴パラ
メータは、さらに、雑音除去部５へと送られる。First, by the first feature extraction unit 2,
The voice analysis of the voice input from the microphone 1 is performed.
In the first feature extraction unit 2, the first feature parameter is extracted by converting the input analog audio signal into a digital signal and then performing a Fourier transform on the obtained digital signal. The first characteristic parameter obtained here is further sent to the noise removing unit 5.

【００２７】また、マイク１から音声が入力される間、
カーステレオの音響雑音がその出力端子から雑音入力部
３に直接的に入力される。例えば、スピーカが４つ存在
する場合には、雑音入力端子３への音響雑音の入力は、
４つの入力端子を介して行われる。While voice is being input from the microphone 1,
The acoustic noise of the car stereo is directly input to the noise input unit 3 from its output terminal. For example, when there are four speakers, the acoustic noise input to the noise input terminal 3 is
This is done via four input terminals.

【００２８】これら４つの入力端子３に入力された４系
統の音響雑音信号は、音場補正部８へと送られる。音場
補正部８は、これら４系統それぞれに対して、音響雑音
信号とマイク１から入力される音響雑音信号との間に生
じる位相、周波数特性、音量の差異の補正を行うことに
なる。The four systems of acoustic noise signals input to these four input terminals 3 are sent to the sound field correction section 8. The sound field correction unit 8 corrects the difference in phase, frequency characteristic, and volume generated between the acoustic noise signal and the acoustic noise signal input from the microphone 1 for each of these four systems.

【００２９】即ち、音場補正部８では、図２に示す如
く、まず始めにＡ／Ｄコンバータ８０が、入力された各
雑音信号をデジタル値に変換する。That is, in the sound field correction unit 8, as shown in FIG. 2, the A / D converter 80 first converts each input noise signal into a digital value.

【００３０】そして、ゲイン補正部８２が、ゲイン補正
用データテーブル８１に記録されているゲイン補正用デ
ータを用いて、Ａ／Ｄコンバータ８０からの各４出力に
対してゲインの補正を行う。Then, the gain correction section 82 uses the gain correction data recorded in the gain correction data table 81 to correct the gain for each of the four outputs from the A / D converter 80.

【００３１】次に、周波数特性補正部８４が、周波数特
性補正用データテーブル８３に記憶されている周波数特
性補正用データを用いて、ゲイン補正部８２からの各４
出力に対して周波数特性の補正を行う。Next, the frequency characteristic correction unit 84 uses the frequency characteristic correction data stored in the frequency characteristic correction data table 83 to obtain each 4 from the gain correction unit 82.
The frequency characteristic is corrected for the output.

【００３２】続いて、位相補正部８６が、位相補正用デ
ータテーブル８５に記録されている位相補正用データを
用いて、周波数特性補正部８４からの各４出力に対して
位相の補正を行う。Subsequently, the phase correction unit 86 uses the phase correction data recorded in the phase correction data table 85 to correct the phase for each of the four outputs from the frequency characteristic correction unit 84.

【００３３】これらの補正が終了すると、加算部８７
が、上述の最終段の位相補正部８６からの４つの各出力
を、１つの音響雑音信号に加算し、Ｄ／Ａコンバータ８
８がこれをアナログ信号に変換する。When these corrections are completed, the adding section 87
However, each of the four outputs from the above-mentioned final stage phase correction unit 86 is added to one acoustic noise signal, and the D / A converter 8
8 converts this into an analog signal.

【００３４】以上で、音場補正部８の処理が終了する。
上述のように音場補正部８により補正された音響雑音信
号は、第２特徴抽出部４にて、前述の第１特徴抽出部２
と同様の音声分析処理を行い、第２の特徴パラメータを
抽出する。This is the end of the processing of the sound field correction unit 8.
The acoustic noise signal corrected by the sound field correction unit 8 as described above is processed by the second feature extraction unit 4 and the above-described first feature extraction unit 2
Then, the same voice analysis process is performed to extract the second characteristic parameter.

【００３５】マイク１の入力から作成された第１の特徴
パラメータと、雑音入力部３の入力から作成された第２
の特徴パラメータは、雑音除去部５へと伝送される。The first characteristic parameter created from the input of the microphone 1 and the second characteristic parameter created from the input of the noise input unit 3
The characteristic parameters of are transmitted to the noise removing unit 5.

【００３６】雑音除去部５は、基本的に、音響雑音と音
声からなる第１の特徴パラメータから音響雑音からなる
第２の特徴パラメータを差し引き、音響雑音を含まない
第３の特徴パラメータを生成する。The noise removing section 5 basically subtracts the second characteristic parameter consisting of acoustic noise from the first characteristic parameter consisting of acoustic noise and voice to generate a third characteristic parameter not containing acoustic noise. .

【００３７】さらに、パターン作成部６が、第３の特徴
パラメータの音声区間を検出し、音声パターンの作成を
行う。Further, the pattern creating section 6 detects the voice section of the third characteristic parameter and creates a voice pattern.

【００３８】認識処理部７は、このようにして作成され
た音声パターンを、認識処理部７内に登録されている音
声パターンと比較することによって認識処理を行う。The recognition processing section 7 performs recognition processing by comparing the voice pattern thus created with the voice pattern registered in the recognition processing section 7.

【００３９】このように、本発明によれば、雑音源の位
置、音声入力用マイクの位置などに起因して生じる欠
点、即ち、実際の雑音そのものの特徴パラメータと実際
の雑音が空間中を伝搬している間にこの空間環境の影響
を受けた後にマイクから入力された雑音の特徴パラメー
タとの間に生じる特性（位相、周波数、音量等）の差異
を補正することが可能となり、特に、カーステレオを備
えた自動車の車内等の音響雑音が常時変化する環境にお
いては、確実に雑音が除去され、雑音による認識性能の
低下が防止される。As described above, according to the present invention, the defect caused by the position of the noise source, the position of the voice input microphone, etc., that is, the characteristic parameter of the actual noise itself and the actual noise propagate in the space. It becomes possible to correct the difference in the characteristics (phase, frequency, volume, etc.) that occur between the characteristic parameters of the noise input from the microphone after being affected by this spatial environment during operation. In an environment where acoustic noise is constantly changing, such as in a car equipped with a stereo, noise is reliably removed, and deterioration of recognition performance due to noise is prevented.

【００４０】尚、本実施例では、カーステレオを備えた
自動車の車内空間において本発明による音声認識装置を
使用した場合について述べたが、本発明は、この他の空
間においても、雑音の音響データが音声認識空間を介し
て音声認識装置へ入力される以前に、別の手段により既
知となるようなどのような環境においても適用できる。In the present embodiment, the case where the voice recognition device according to the present invention is used in the interior space of a car equipped with a car stereo has been described, but the present invention also applies to the acoustic data of noise in other spaces. Can be applied in any environment where is known by another means before is input to the speech recognition device via the speech recognition space.

【００４１】また、上述したような環境下で作動する音
声認識処理装置において認識精度を向上させるために
は、しばしば、エンジン音、走行音、自然現象によって
生じる音などの雑音（定常雑音）も考慮することが必要
である。以下に、本音声認識処理装置においてこのよう
な定常雑音を除去する場合について説明を加える。Further, in order to improve the recognition accuracy in the voice recognition processing apparatus which operates in the above-mentioned environment, noise (steady noise) such as engine sound, running sound, and sound generated by natural phenomenon is often taken into consideration. It is necessary to. Hereinafter, a case where such stationary noise is removed in the voice recognition processing device will be described.

【００４２】定常雑音の除去は、基本的には、雑音除去
部５において、得られる第３特徴パラメータから、その
変化に応じて、定常雑音パラメータ記憶部９の定常雑音
の特徴パラメータを差し引くことによってなされる。こ
こで、定常雑音パラメータとは、雑音入力端子３への音
響雑音の入力が存在せず、かつ、マイク１への音声並び
に音響雑音の入力が存在しない状態で作成された第１の
特徴パラメータのことを指す。The stationary noise is basically removed by subtracting the stationary noise characteristic parameter of the stationary noise parameter storage unit 9 from the obtained third characteristic parameter in the noise removing unit 5 according to the change. Done. Here, the stationary noise parameter is a first characteristic parameter created in a state in which there is no acoustic noise input to the noise input terminal 3 and no voice or acoustic noise input to the microphone 1. It means that.

【００４３】具体的には、本実施例では音響装置を雑音
源として用いているので、あらかじめ、音響装置から何
も出力されていない状態で、第１の特徴パラメータを抽
出し、これを定常雑音パラメータ記憶部９に記憶してお
けばよい。Specifically, since the audio device is used as a noise source in this embodiment, the first characteristic parameter is extracted in advance in a state where nothing is output from the audio device, and this is used as stationary noise. It may be stored in the parameter storage unit 9.

【００４４】従って、認識時に、マイク１からの入力並
びに雑音入力端子３からの入力は、既に述べたような方
法により処理され、第１の特徴パラメータ並びに第２の
特徴パラメータが抽出される。Therefore, at the time of recognition, the input from the microphone 1 and the input from the noise input terminal 3 are processed by the method as described above, and the first characteristic parameter and the second characteristic parameter are extracted.

【００４５】雑音除去部５は、第１の特徴パラメータか
ら第２の特徴パラメータを差し引きさらに、定常雑音パ
ラメータ記憶部９の定常雑音パラメータを差し引いて第
３の特徴パラメータを作成する。パターン作成部６は、
上述の如く定常雑音パラメータが差し引かれている第３
の特徴パラメータに対して音声区間の検出を行い、音声
パターンを作成する。認識処理部７は、このようにして
作成された音声パターンを、認識処理部７内に登録され
ている音声パターンと比較することによって認識処理を
行う。The noise removing section 5 subtracts the second characteristic parameter from the first characteristic parameter and further subtracts the stationary noise parameter of the stationary noise parameter storage section 9 to create a third characteristic parameter. The pattern creating section 6
Third, the stationary noise parameters are subtracted as described above
The voice pattern is created by detecting the voice section with respect to the characteristic parameter of. The recognition processing unit 7 performs recognition processing by comparing the voice pattern created in this way with the voice pattern registered in the recognition processing unit 7.

【００４６】また、エンジンノイズ等により生じる定常
雑音が緩やかに変化することもあるので、この場合に
は、定常雑音を効果的に除去するために、定常雑音パラ
メータ記憶部９の定常雑音パラメータを必要に応じて更
新するのが好ましい。このような更新処理を次に述べ
る。Further, since the stationary noise generated by the engine noise or the like may change gently, in this case, in order to effectively remove the stationary noise, the stationary noise parameter of the stationary noise parameter storage unit 9 is required. It is preferable to update according to. Such update processing will be described next.

【００４７】まず始めに、第３特徴パラメータの値がこ
れ以下であれば、カーステレオからの周囲雑音や音声が
音声認識処理装置へ入力されていない（入力は定常雑音
のみである）と断定できるようなしきい値を設定してお
く。First, if the value of the third characteristic parameter is less than this, it can be concluded that ambient noise and voice from the car stereo have not been input to the voice recognition processing device (input is only stationary noise). Set a threshold like this.

【００４８】従って、雑音除去部５は、認識時に、一定
周期（例えば１０ｍｓｅｃ）毎に、第３特徴パラメータ
の値をチェックし、第３パラメータの値がしきい値以下
の場合には、定常雑音がマイクから入力されたとみな
し、音声入力の検出は行わずに定常雑音パラメータの更
新を行う。Therefore, the noise removing unit 5 checks the value of the third characteristic parameter at every constant period (for example, 10 msec) at the time of recognition, and when the value of the third parameter is equal to or less than the threshold value, the stationary noise is detected. Is assumed to have been input from the microphone, and the stationary noise parameters are updated without detecting the voice input.

【００４９】このような更新方法は、例えば、定常雑音
パラメータと第３特徴パラメータとを定常雑音パラメー
タの割合が高くなるような比率で荷重平均した結果を新
たな定常雑音パラメータして登録する。In such an updating method, for example, the result of weighted averaging the stationary noise parameter and the third characteristic parameter at a ratio such that the ratio of the stationary noise parameter is high is registered as a new stationary noise parameter.

【００５０】実際には、定常雑音パラメータの値に大き
な変化が生じないように、定常雑音パラメータの比率を
大きくして（例えば１５：１の比率で）荷重平均する。In practice, the stationary noise parameter ratio is increased (for example, at a ratio of 15: 1) to perform weighted averaging so that the stationary noise parameter value does not change significantly.

【００５１】このように、本発明によれば、自動車にお
けるエンジンノイズ等の定常的な雑音にも効果的に対処
できるので、さらに、雑音による認識性能の低下が防止
される。As described above, according to the present invention, stationary noise such as engine noise in an automobile can be effectively dealt with, so that the deterioration of recognition performance due to noise can be prevented.

【００５２】尚、本実施例においては、周波数特性の変
更をＦＦＴ、ＩＦＦＴによって行ったが、その他の手段
により行うことも可能である。音場補正を特徴抽出後に
行うことも可能である。また、この実施例においては、
雑音入力部への入力は、スピーカから直接入力したが、
所望の音声が入力されないようにマイクを設置し、処理
することも可能である。これらの処理は、回路による構
成及びＤＳＰによる実現も可能である。In this embodiment, the frequency characteristic is changed by FFT or IFFT, but it may be changed by other means. It is also possible to perform sound field correction after feature extraction. Also, in this example,
The input to the noise input section was directly input from the speaker,
It is also possible to install and process a microphone so that desired voice is not input. These processes can be implemented by a circuit and a DSP.

【００５３】[0053]

【発明の効果】本発明の音声認識装置は、音響機器の出
力端子からの直接音響信号Ｎにも、空間環境の影響によ
る特性歪みを与えてやり、マイク入力の間接音響信号Ｎ
ｚと等しくい特性（位相、周波数、音量等）にできるの
で、マイクで検出される間接音響信号Ｎｚと音声信号Ｖ
との合成信号Ｎｚ＋Ｖの信号特性から、音響機器の出力
端子の直接音響信号Ｎを補正して得られる補正直接音響
信号Ｎｚの信号特性を差し引いた真の音声信号Ｖに基づ
いて、音声認識が行えることになり、雑音環境下でも認
識率の高い音声認識が実現できる。According to the speech recognition apparatus of the present invention, the direct acoustic signal N from the output terminal of the acoustic device is also subjected to the characteristic distortion due to the influence of the spatial environment, and the indirect acoustic signal N of the microphone input.
Since characteristics (phase, frequency, volume, etc.) that are not equal to z can be obtained, the indirect acoustic signal Nz and the voice signal V detected by the microphone are generated.
The voice recognition can be performed based on the true voice signal V obtained by subtracting the signal characteristic of the corrected direct acoustic signal Nz obtained by correcting the direct acoustic signal N of the output terminal of the acoustic device from the signal characteristic of the combined signal Nz + V of Therefore, it is possible to realize voice recognition with a high recognition rate even in a noisy environment.

[Brief description of drawings]

【図１】本発明の音声認識処理装置の基本構成図。FIG. 1 is a basic configuration diagram of a voice recognition processing device of the present invention.

【図２】本発明装置の音場補正手段８のブロック構成
図。FIG. 2 is a block configuration diagram of a sound field correction means 8 of the device of the present invention.

[Explanation of symbols]

１マイク２第１特徴抽出部３雑音入力端子４第２特徴抽出部５雑音除去部６パターン作成部７認識処理部８音場補正部８２ゲイン補正部８４周波数特性補正部８６位相補正部 1 Microphone 2 1st feature extraction part 3 Noise input terminal 4 2nd feature extraction part 5 Noise removal part 6 Pattern creation part 7 Recognition processing part 8 Sound field correction part 82 Gain correction part 84 Frequency characteristic correction part 86 Phase correction part

Claims

[Claims]

1. In a voice recognition device for performing voice recognition in a spatial environment in which the sound from the audio device becomes noise for voice recognition, a voice uttered for voice recognition is input and the acoustic noise is generated. A microphone that is indirectly input via the space, an acoustic signal input terminal to which the acoustic noise signal is directly input from the audio device without passing through the space, and an indirect acoustic noise obtained from the microphone. Acoustic characteristic correcting means for correcting the characteristic of the direct signal of the acoustic noise input from the acoustic signal input terminal so as to reduce the characteristic difference between the signal and the direct signal of the acoustic noise obtained from the acoustic signal input terminal; First feature extracting means for analyzing the input signal from the microphone to extract a first feature parameter, and second for analyzing the corrected input signal from the acoustic characteristic correcting means.
Second feature extracting means for extracting the feature parameter of No. 3, noise removing means for subtracting the second feature parameter from the first feature parameter to generate a third feature parameter, and a voice pattern from the third feature parameter. A voice recognition apparatus comprising: a voice pattern creating means for creating a voice pattern; and a recognition processing means for performing a recognition process on the voice pattern created by the voice pattern creating means.