JPH0443399A

JPH0443399A - On-vehicle voice recognition device

Info

Publication number: JPH0443399A
Application number: JP2152321A
Authority: JP
Inventors: Shoji Kuriki; 章次栗木
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1990-06-11
Filing date: 1990-06-11
Publication date: 1992-02-13

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】肢先光互本発明は、車載用音声認識装置に関する。[Detailed description of the invention] Limb tips alternate The present invention relates to a vehicle-mounted voice recognition device.

従」１刻毎自動車内の機器に対して音声で制御しようという技術が
一般化されてきている。例えば、特開昭５９−１０９０
９６号公報においては、音声認識装置に音声データを入
力させるマイクのスイッチをＯＮしている間、或いは、
マイクスイッチをＯＮＬだ後、一定時間内は機器の出力
を停止させて、誤作動をなくし、認識率を向上させるよ
うにしている。しかして、この場合、対照となる機器は
、例えば、ラジオであり、カセットデツキであり、空調
用の機器、自動車電話の発信などで、運転者（使用者）
が発声した音声が音声認識装置により認識され、その結
果によってあらかしめ対応づけられた機器が動作する。2. Description of the Related Art Techniques for controlling devices inside a car every moment using voice commands are becoming commonplace. For example, JP-A-59-1090
In Publication No. 96, while the microphone switch for inputting voice data to the voice recognition device is turned on, or
After the microphone switch is turned ON, the device's output is stopped for a certain period of time to prevent malfunctions and improve recognition rates. In this case, the device to be compared is, for example, a radio, a cassette deck, an air conditioning device, a car phone call, etc., and the driver (user)
The voice uttered by the person is recognized by the voice recognition device, and the device that is roughly associated with the result is operated.

音声認識装置として特定話者方式と不特定話者方式が用
いられるが、特定話者方式の場合、辞書は使用者が登録
するが、不特定話者方式の場合は機器に組み込まれてい
る。A speaker-specific system and a speaker-independent system are used as speech recognition devices. In the case of the speaker-specific system, the dictionary is registered by the user, but in the case of the speaker-independent system, it is built into the device.

一般に、不特定話者方式の辞書は多くの人の音声を収録
してその音声データベースから作成される。In general, a speaker-independent dictionary is created from a voice database that records the voices of many people.

音声データベースはスタジオなどの環境で収録され、ま
た、マイクもボーカルマイクなどを使用し、口もとに近
く配置されることが多い。これは一般の使用状態（例え
ば、ヘッドセットマイク使用）に近い状態の音声を収録
するためであり、また、音声の品質を高くするためであ
る。しかし、車内で音声認識装置を使用する場合には、
ヘッドセットを使用することは考えにくい。そのため、
マイクを口もとから離して（２０〜３０Ｃ！ｍ）使用す
るが、マイクは口もとから離れると周波数特性が変化す
る。Voice databases are recorded in environments such as studios, and microphones such as vocal microphones are often placed close to the mouth. This is to record audio in a state close to the normal usage state (for example, using a headset microphone) and to improve the quality of the audio. However, when using a voice recognition device in a car,
Using a headset is unlikely. Therefore,
The microphone is used away from the mouth (20-30C!m), but the frequency characteristics of the microphone change as it moves away from the mouth.

第４図は１口もと近くの周波数特性、第５図は、口もと
から離した場合の周波数特性を示すが、これらの図から
明らかなように、主に、数ｋＨｚ以上の周波数で差が生
じ、マイクが口から離れると高域の周波数特性のゲイン
が低域よりも高くなる。Figure 4 shows the frequency characteristics near the mouth, and Figure 5 shows the frequency characteristics when the mouth is away from the mouth.As is clear from these figures, differences occur mainly at frequencies above several kHz; When the microphone moves away from the mouth, the gain of the high frequency response becomes higher than that of the low frequency range.

更に、車室内であることから、車室の形状による周波数
特性の変化が加わる。スタジオなどでは音声の壁などに
よる反射音がマイクに入らないようになっているが、車
室内はガラスが多く使用され、かつ、狭いために反射音
がマイクに入りやすい。Furthermore, since it is inside a vehicle, the frequency characteristics change depending on the shape of the vehicle interior. In studios and other places, sound reflections from audio walls are prevented from entering the microphones, but the inside of a car is made up of a lot of glass and is small, so it is easy for sound reflections to enter the microphones.

そのため、車室の形状により高域の周波数特性に差が生
じやすい。前述の通り、不特定話者方式に使用される辞
書はスタジオ録音で収録されているため、実際の車室内
での音声と周波数特性が異なり、入カバターンが期待さ
れるパターンと異なっている。特に、ｕ　Ｆ　Ｐ”音、
′Ｈ′″音はローカルピークが立ちにくいため、パター
ンが異なり、認識率の低下を招いていた。Therefore, differences in high frequency characteristics tend to occur depending on the shape of the vehicle interior. As mentioned above, the dictionary used in the speaker-independent system is recorded in a studio, so the frequency characteristics differ from the actual voice inside the vehicle, and the pattern of input cover turns differs from the expected pattern. In particular, u F P” sound,
Since the 'H' sound does not have a local peak, the pattern is different and the recognition rate is lowered.

■−−匁本発明は、上述のごとき実情に鑑みてなされたもので、
特に１発声した音声の周波数特性を補正することにより
、期待される入カバターンを得、認識率を向上させるこ
とを目的としてなされたものである。■--The present invention was made in view of the above-mentioned circumstances.
In particular, by correcting the frequency characteristics of a single uttered voice, the purpose of this technique is to obtain the expected input pattern and improve the recognition rate.

碧−−」文本発明は、上記目的を達成するために、車載用音声認識
装置において、（１）マイクとマイクアンプとＡＧＣと
バンドパスフィルタと検波器とローパスフィルタとＡ／
Ｄ変換器と補正テーブルとかけ算器と音声区間検出部と
特徴抽出部と音声パターンメモリーと認識部と不特定話
者辞書と認識結果出力部と負荷能動部を有する車載用音
声認識装置において、車内で発声された入力音声の周波
数特性を、不特定話者辞書を作成した時に使用した音声
データベースを収録した特性に近似させるために補正テ
ーブルに記録しである定数をチャンネルパワーデータに
かけ算し、その結果得られた各チャンネルのパワーデー
タを用いて認識を行なうことを特徴としたものであり、
更には、（２）前記（１）において、補正テーブルを数
種類もち、車種毎にスイッチで切り替えること、或いは
、（３）前記（１）において、車種毎に補正テーブルを
カート化したことを特徴としたものである。In order to achieve the above object, the present invention provides an in-vehicle speech recognition device that includes (1) a microphone, a microphone amplifier, an AGC, a band pass filter, a detector, a low pass filter, and an A/
In an in-vehicle speech recognition device having a D converter, a correction table, a multiplier, a speech interval detection section, a feature extraction section, a speech pattern memory, a recognition section, a speaker-independent dictionary, a recognition result output section, and a load active section, In order to approximate the frequency characteristics of the input voice uttered in the voice database to the characteristics recorded in the voice database used when creating the speaker-independent dictionary, the channel power data is multiplied by a constant recorded in the correction table. The feature is that recognition is performed using the power data of each channel obtained as a result.
Furthermore, (2) in (1) above, several types of correction tables are provided and can be switched by a switch for each vehicle type, or (3) in (1) above, correction tables are provided in a cart for each vehicle type. This is what I did.

以下、本発明の実施例に基いて説明する。Hereinafter, the present invention will be explained based on examples.

第１図は、本発明の一実施例を説明するための構成図で
、図中、１はマイク、２はマイクアンプ、３はＡＧＣ（
自動利得調整装置）、４□〜４□、はバントパスフィル
タ（ＢＰＦ）、検波器（ＤＥＴ）。FIG. 1 is a configuration diagram for explaining one embodiment of the present invention. In the figure, 1 is a microphone, 2 is a microphone amplifier, and 3 is an AGC (
automatic gain adjustment device), 4□ to 4□ are band pass filters (BPF) and detectors (DET).

ローパスフィルタ（ＬＰＦ）等から成る各チャンネル帯
域でのパワー検出器、５はＡ／Ｄ変換器。A power detector for each channel band consists of a low-pass filter (LPF), etc., and 5 is an A/D converter.

６□〜６１．はチャンネルパワーデータ、７１，７□は
かけ算器、８は補正テーブル、１１は音声区間検出部、
１２は特徴抽出部、１３は音声入カバターン部、１４は
認識部、１５は不特定話者辞書、１６は認識結果出力部
、１７は負荷駆動部、１８　ａ　−１８ｄは負荷で、１
８ａはラジオ、１８ｂはカセットデツキ、１８ｃは空調
機器、１８ｄは自動車電話を示す。マイク１より入力さ
れた音声はマイクアンプ２で増幅され、ＡＧＣ３に入力
されて話者の音量差を正規化する。ＡＧＣは認識方式に
よっては必要がない場合がある。音声信号はｎ（＝１５
）チャンネルのバンドパスフィルタ（ＢＰＦ）、検波器
（ＤＥＴ）、ローパスフィルタ（ＬＰＦ）等から成るパ
ワー検出器４１〜４□、に入力され、各チャンネル帯域
でのパワーが検出される。このパワー電圧はＡ／Ｄ変換
器５で各チャンネル毎にデジタル値に変換される。各チ
ャンネルのパワー（デジタル値）は車室内の入力の周波
数特性を持っているため、このパワーに予めスタジオの
録音特性に対して計算された逆特性を掛けることにより
周波数特性を補正する。もちろん、単純なかけ算では正
確な補正はできないが、パワーになった時点では十分効
果がある。周波数補正用のかけ算器７□、７□は高域の
み使用すればよい。また、かけ算する補正値は補正テー
ブル８に記録されており、補正値はマイクの特性や車内
特性から実験的に求められた値である。図示の場合、音
声帯域を１５チヤンネルに分け、高域の２チヤンネルの
み補正を行なっており、１４チヤンネルのパワーを０．
５倍し、１５チヤンネルのパワーは０．２５倍にしてい
る。このようにして補正されたチャンネルパワーデータ
から音声区間検出器１１で音声区間を検出し、特徴抽出
部１２で音声の特徴量を検出して、音声入カバターン１
３を作成する。このパターンは不特定話者用辞書１５の
辞書パターンと比較され、認識部１４では最も類似して
いる単語を正解として結果出力部１６より認識結果を出
力する。この結果に対応して負荷制御部１７では結果に
対応する機器の制御を行なう。これは、例えば、ラジオ
１８ａの選局であったり、カセットデツキ１８ｂの早送
りであったり、自動車電話の音声ダイアリング１８ｄで
あったりする。6□〜61. is channel power data, 71, 7□ is a multiplier, 8 is a correction table, 11 is a voice section detector,
12 is a feature extraction section, 13 is a voice input cover pattern section, 14 is a recognition section, 15 is a speaker-independent dictionary, 16 is a recognition result output section, 17 is a load drive section, 18 a - 18 d are loads;
8a is a radio, 18b is a cassette deck, 18c is an air conditioner, and 18d is a car phone. Voice input from a microphone 1 is amplified by a microphone amplifier 2, and input to an AGC 3 to normalize differences in volume between speakers. AGC may not be necessary depending on the recognition method. The audio signal is n (=15
) is input to power detectors 41 to 4□ consisting of channel bandpass filters (BPF), detectors (DET), low pass filters (LPF), etc., and the power in each channel band is detected. This power voltage is converted into a digital value for each channel by an A/D converter 5. Since the power (digital value) of each channel has the frequency characteristics of the input inside the vehicle, the frequency characteristics are corrected by multiplying this power by the inverse characteristic calculated in advance for the recording characteristics of the studio. Of course, simple multiplication cannot provide accurate correction, but it is effective enough once it becomes a power. The multipliers 7□, 7□ for frequency correction need only be used for high frequencies. Further, the correction value to be multiplied is recorded in the correction table 8, and the correction value is a value experimentally determined from the characteristics of the microphone and the characteristics of the vehicle interior. In the case shown, the audio band is divided into 15 channels, only the two high-frequency channels are corrected, and the power of the 14 channels is reduced to 0.
Multiply by 5, and the power of 15 channels is multiplied by 0.25. A voice section detector 11 detects a voice section from the channel power data corrected in this way, and a feature extraction section 12 detects a voice feature amount, and then the voice input cover pattern 1
Create 3. This pattern is compared with the dictionary pattern of the speaker-independent dictionary 15, and the recognition unit 14 outputs the recognition result from the result output unit 16, with the most similar word as the correct answer. In response to this result, the load control section 17 controls the equipment corresponding to the result. This may be, for example, tuning the radio 18a, fast forwarding the cassette deck 18b, or voice dialing 18d of a car telephone.

しかしながら、第１図に示した実施例の場合。However, in the case of the embodiment shown in FIG.

補正テーブル８の値が一定のため、車種が変わることに
よる周波数特性の補正はできない。車種が変わるための
周波数特性の変化は主に車室の形状にある。Since the values in the correction table 8 are constant, it is not possible to correct the frequency characteristics depending on the type of vehicle. Changes in frequency characteristics due to changes in car models are mainly due to the shape of the passenger compartment.

第２図は、上述のごとき実情に適した車載用音声認識装
置の実施例を説明するための構成図で、図中、８ａ、８
ｂ、８ｃは補正テーブルで、その他、第１図に示した実
施例と同様の作用をする部分には、第１図の場合と同一
の参照番号が付しである。而して、本実施例においては
、代表的な車の形による補正値を求め、それぞれ別々に
補正テーブルを持ち、例えば、１ボツクスカー用、２ボ
ツクスカー用、セダン用、トラック用等に分け、それぞ
れの車種の代表的な補正値をそれぞれの補正テーブルに
記録しておく。使用時には使用者が車種を選んでスイッ
チ９８〜９Ｃで入力する。選ばれた補正テーブルの値に
より、パワーの補正が行なわれる。FIG. 2 is a block diagram for explaining an embodiment of an in-vehicle voice recognition device suitable for the above-mentioned actual situation, and in the figure, 8a, 8
Reference numerals b and 8c are correction tables, and other parts having the same functions as in the embodiment shown in FIG. 1 are given the same reference numerals as in the case of FIG. Therefore, in this embodiment, correction values for typical car shapes are determined, separate correction tables are provided for each type, and the values are divided into, for example, 1-box cars, 2-box cars, sedans, trucks, etc. Typical correction values for each vehicle type are recorded in each correction table. During use, the user selects the vehicle type and inputs it using switches 98 to 9C. The power is corrected based on the value of the selected correction table.

第３図は、本発明の他の実施例を説明するための構成図
で、この実施例は、第２図に示した実施例が車種を使用
者にスイッチで選ばせるのに対して、本実施例は、補正
テーブル８が記録されているカード１０を作成し、例え
ば、１ボツクスカー用カード、セダン用カードのように
使用者に車種にあったカードを挿入させることによって
、補正する値を変化させるようにしている。FIG. 3 is a configuration diagram for explaining another embodiment of the present invention. This embodiment is different from the embodiment shown in FIG. 2 in which the user selects the car type with a switch. In the embodiment, a card 10 in which a correction table 8 is recorded is created, and the value to be corrected is changed by having the user insert a card suitable for the car type, such as a one-box car card or a sedan card. I try to let them do it.

丸−一来以上の説明から明らかなように、本発明による車載用音
声認識装置においては、車内の入力音声の特性と、不特
定話者辞書を作成するのに使用する音声データベースの
特性とを近似させることにより１期待される入カバター
ンを得ることができ、不特定話者認識の認識率が向上す
る。更には、車種毎により細かな周波数特性の補正がで
きるため、期待される入カバターンを得ることができ、
不特定話者認識の認識率が向上する。また、車種毎にカ
ードを作成して使用者に装着させるこヒにより、車種別
の選択を使用者が行なうことなく使用できるため、より
便利となる等の利点がある。As is clear from the above explanation, the in-vehicle speech recognition device according to the present invention distinguishes between the characteristics of the input voice inside the car and the characteristics of the voice database used to create the speaker-independent dictionary. By approximating, one expected input pattern can be obtained, and the recognition rate of speaker-independent recognition is improved. Furthermore, since it is possible to make more detailed corrections to the frequency characteristics for each vehicle model, it is possible to obtain the expected input cover pattern.
The recognition rate of speaker-independent recognition is improved. Furthermore, by creating a card for each vehicle type and having the user wear it, the user can use the card without having to make a selection by vehicle type, which has the advantage of being more convenient.

[Brief explanation of the drawing]

第１図乃至第３図は、それぞれ本発明による車載用音声
認識装置の実施例を説明するための構成図、第４図及び
第５図は、マイクの周波数特性図である。１・・・マイク、２　マイクアンプ、３・ＡＧＣ１４□
〜４□、・各チャンネル帯域でのパワー検出器。５・Ａ／Ｄ変換器、６□〜６□、・チャンネルパワーデ
ータ、７１，７□・かけ算器、８．８ａ〜８ｃ・・・補
正テーブル、９８〜９ｃ・・スイッチ、１０・・・カー
ト、１１・・・音声区間検出部、１２・特徴抽出部、１
３・・・音声入カバターン部、１４・・認識部、１５・
・・不特定話者辞書、１６・・認識結果出力部、１７・
・・負荷駆動部、１８ａ・・・ラジオ、１８ｂ・・・カ
セットデツキ、１８ｃ・・・空調機器、１８ｄ・・・自
動車電話。FIGS. 1 to 3 are block diagrams for explaining embodiments of a vehicle-mounted voice recognition device according to the present invention, and FIGS. 4 and 5 are frequency characteristic diagrams of a microphone. 1...Microphone, 2 Microphone amplifier, 3.AGC14□
~4□,・Power detector in each channel band. 5・A/D converter, 6□~6□,・channel power data, 71,7□・multiplier, 8.8a~8c...correction table, 98~9c...switch, 10...cart , 11... Voice section detection section, 12. Feature extraction section, 1
3... Voice input cover turn section, 14... Recognition section, 15.
・・Speaker-independent dictionary, 16・・Recognition result output unit, 17・
...Load drive unit, 18a...Radio, 18b...Cassette deck, 18c...Air conditioner, 18d...Car phone.

Claims

[Claims]

1. A channel power data section for obtaining power data for each channel of input audio, a correction table, a multiplier, a speech interval detection section, a feature extraction section, a speech pattern memory, a recognition section, a speaker-independent dictionary, a recognition result output section, and a load drive. In an in-vehicle voice recognition device having a section, a constant in the correction table is recorded, the constant is multiplied by the channel power data, and a speaker-independent dictionary is created based on the frequency characteristics of the input voice uttered in the car. An in-vehicle speech recognition device characterized by approximating the recorded characteristics of the speech database used when the vehicle was used, and performing recognition using the power data of each channel obtained as a result.