JPS58169199A

JPS58169199A - Voice recognition system employing vowel information

Info

Publication number: JPS58169199A
Application number: JP57052785A
Authority: JP
Inventors: 裕二木島; 奈良　泰弘; 繁佐々木
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1982-03-31
Filing date: 1982-03-31
Publication date: 1983-10-05

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（１）発明の技術分野本発明は音声ｇｗ＆方式に関するものであって、特に使
用者の母音音声データと辞書内容発声者の母音音声デー
タとの差異を求めて、仁の差異情報を用いて辞書内容を
修正し、使用者の音声に適応させるようにした母音情報
を用いた音声ｇ織方式（２）従来技術と問題点従来の音声認識装置では、ｇ繊率を高めるために辞書フ
ァイルを特定話者、つｔｂ利用する者の音声によりあら
かじめ登録していた。しかしながらこのような特定話者
方式では、認識すべき単飴数が多い場合、それを全部利
用者が登録する必要があるので、辞書の作成に非常に手
間どると°いう欠点がある。DETAILED DESCRIPTION OF THE INVENTION (1) Technical field of the invention The present invention relates to the voice gw& method, and in particular, it is used to find the difference between the vowel voice data of the user and the vowel voice data of the dictionary content speaker. A voice g-weaving method using vowel information that corrects the dictionary contents using the difference information and adapts it to the user's voice (2) Conventional technology and problems Conventional speech recognition devices do not calculate the g-fiber ratio. In order to improve this, dictionary files were pre-registered with the voices of specific speakers, tb users. However, such a specific speaker method has the disadvantage that if there are a large number of single candies to be recognized, the user must register them all, making it very time-consuming to create a dictionary.

したがって他人の声で登録された辞書を用いて音声認識
を行うことができれば、利用者毎の辞書を作成する必要
がないので非常に便利になる。Therefore, if voice recognition could be performed using a dictionary registered with someone else's voice, it would be very convenient since there would be no need to create a dictionary for each user.

しかしながら、現在では他人の声で、登録された辞書を
用いて音声認識を行う場合には、音声の個人差のために
一般には自分自身の音声で登録した辞書を使用する場合
に比較して、かな夛低い認識率しか得られない。However, currently, when performing speech recognition using a registered dictionary with someone else's voice, it is generally more difficult to perform voice recognition than when using a registered dictionary with one's own voice due to individual differences in voice. Only a very low recognition rate can be obtained.

そこで他人の音声で登録した辞書を使用したときの認識
率を高める方式として、入力音声を辞書内容発声者の音
声に近づけるか、あるいは辞書内容を使用者の音声に近
づけるかのいずれかの適応処理が必要となる。Therefore, as a method to increase the recognition rate when using a dictionary registered with someone else's voice, an adaptive process is used that either brings the input voice closer to the voice of the person who is speaking the dictionary contents, or brings the dictionary contents closer to the user's voice. Is required.

しかるに音声認識力式の短縮ということに重点をおいて
考慮する場合、入力音声に手を加えて入力音声を辞書内
容発声者の音声に近づけるという方式には問題がある。However, when considering shortening the speech recognition ability formula, there is a problem with the method of modifying the input speech to make it closer to the speech of the dictionary content speaker.

すなわち、入力音声に手を加えて加工するということは
、その加工時間の分だけ認識速度が遅くなシ１例えば笑
時間応答が困難となるという問題がある。That is, when input speech is modified and processed, there is a problem in that the recognition speed is slowed down by the amount of time required for the processing, and, for example, it becomes difficult to respond with a laugh time response.

これに対して辞書内容を修正してこれを使用者の音声に
近づける方式は、使□用前にあらかじめ修正しておけば
よいので、この使用前の修正にあらかじめ時間がかかっ
ても問題ない。On the other hand, in the method of modifying the contents of the dictionary to bring it closer to the user's voice, it is sufficient to modify the contents in advance before use, so there is no problem even if it takes time to modify the contents before use.

しかし辞書項目数が多い場合１１Ｃｄ辞書メモリサイズ
及び認識時の計ｎ＋ｈ間の関係から各辞書項目の時間圧
縮が行われ、その結果キメの細かい修正は雉しくなる。However, when the number of dictionary items is large, time compression is performed for each dictionary item due to the relationship between the 11Cd dictionary memory size and the total n+h during recognition, and as a result, detailed correction becomes difficult.

例えば個人差の適応法として辞書項目発声者及び入力話
者のそれぞれの平均スペクトルを求め、その差スペクト
ルを用いて適応させる方法もあるが、これを音１ごとに
行なおうとした場合には、時間圧縮が音、４ｃｌｉ位て
“行われていればよいが、そうでなければ■かしくなる
。For example, as a method for adapting to individual differences, there is a method of finding the average spectrum of each of the dictionary entry speaker and the input speaker and using the difference spectrum to adapt, but if you try to do this for each sound, It would be fine if the time compression was done at about 4cli, but if not, it would be strange.

（３）　　光りりの目的本発明の目的は、このような４点を改善するためＱτ、
音声ｆｉｆｆｉ　＋Ｊ辞＾を入力話者つまシ使用者に適
応させる際に、母音の（這類毎に適応させることにより
認識率を誦め、また時１司圧縮時の母音量を記憶してお
くこと番′こよシ既に圧縮されている辞書に対しても新
たな記ｔシ電の大幅な増大を招くこともなく、容易Ｃτ
適応が行えるようにした母音情報　　・を用いた音声認
識力式を提供することにある。(3) Purpose of light The purpose of the present invention is to improve Qτ,
When adapting the voice fiffi +Jji^ to the input speaker and the user, the recognition rate is adjusted by adapting it to each vowel type, and the vowel volume at the time of compression is memorized. It is easy to write Cτ without causing a large increase in the number of new entries even for a dictionary that has already been compressed.
The purpose of this invention is to provide a speech recognition formula using vowel information that can be adapted.

（４）発明の構成この目的を達成する九めに本発明の母音情報を用い友音
声認識方式では、単語辞書と、＃単語辞書作成者の母音
情報保檀手段と、あらかじめ使用者の音声から作成した
母音情報と前記単語辞書作成者の母音情報によシ前記単
餠辞書を議正する単語辞書修正手段を具備し、使用者の
音声に応じて単語辞書の母音情報を修正することにより
単語辞書を使用者の音声に適応させることを特徴とする
。(4) Structure of the Invention Ninthly, in order to achieve this object, the friend voice recognition method using the vowel information of the present invention uses a word dictionary, a vowel information storage means of the #word dictionary creator, and a speech recognition system based on the user's voice in advance. A word dictionary modification means is provided for modifying the single dictionary according to the vowel information created and the vowel information of the word dictionary creator, and the vowel information of the word dictionary is modified according to the user's voice. It is characterized by adapting the dictionary to the user's voice.

（５）発明の実施例本発明の一実施例を第１図〜第３図にもとづき説明する
。(5) Embodiment of the Invention An embodiment of the present invention will be explained based on FIGS. 1 to 3.

まず第１図、第２図によシ本発明の詳細な説明する。First, the present invention will be explained in detail with reference to FIGS. 1 and 2.

本発明では辞書内容発声者つま夛単語辞書作成者と入力
話者のそれぞれの母音スペクトルを求める０例えば、母
音■について第１図に示すように、入力話者Ａの音声ス
ペクトルを求め辞書内容発声者Ｂの音声スペクトルを求
め、これよシ第１図（ロ）に示す如く１両スペクトルの
差Ａ−Ｂを求める。In the present invention, the vowel spectra of each of the dictionary creator and the input speaker are determined.For example, for the vowel ■, as shown in FIG. The voice spectrum of person B is determined, and the difference A-B between the two spectra is determined as shown in FIG. 1 (b).

そしてこれによシ、第２図に示すように、情報圧縮のた
めに、例えば周波数方向に８チヤネルに分割して各区分
毎に差Ａ−Ｂを求める。このようにして各母音ごとの両
者の差を求め、これにより単語辞書を補正すれば入力話
者ＡＫ適応した単語辞書が得られることＫなる。To this end, as shown in FIG. 2, in order to compress information, the channel is divided into, for example, eight channels in the frequency direction, and the difference A-B is determined for each section. In this way, by finding the difference between the two for each vowel and correcting the word dictionary based on this difference, a word dictionary adapted to the input speaker AK can be obtained.

本発明の一実施例を第３図により説明する。An embodiment of the present invention will be described with reference to FIG.

第３図において、ｌｔｉマイクロフォン、２は母音スペ
クトル収集部、３は母音差スペクトル計算部、４は単語
辞書、５は辞書発声者母音スペクトル保持部、６は単語
辞書修正部、７＃ｉ修正単語辞書、８ｔｄマイクロフオ
ン、９は認識部である。In FIG. 3, lti microphone, 2 is a vowel spectrum collection unit, 3 is a vowel difference spectrum calculation unit, 4 is a word dictionary, 5 is a dictionary speaker vowel spectrum holding unit, 6 is a word dictionary correction unit, and 7#i corrected word Dictionary, 8td Microphone, 9 is a recognition unit.

母音スペクトル収集部２は、入力話者人の母音スペクト
ルを収集して、第１図（イ）の曲線人を作成するもので
あり、母音の入力方法としては母音そのものを発声して
もよく、また標準単峰から抽出することもできるが、こ
の例では母音そのものの発声による例を示している。The vowel spectrum collection unit 2 collects the vowel spectrum of the input speaker to create the curved person shown in FIG. It is also possible to extract from a standard single peak, but in this example, the vowel itself is uttered.

母音差スペクトル計算部３は、第２図に示す如き、入力
話者Ａと辞書内容発声者Ｂとの各母音の各チャネル区分
毎の差を求めるものであり、第１図（イ）における入力
話者Ａの母音スペクトルは母音スペクトル収集部２から
伝達され、また辞書内容発声者Ｂの母音スペクトルは辞
書発声者母音スペクトル保持部５より伝達される。The vowel difference spectrum calculation unit 3 calculates the difference for each channel segment of each vowel between the input speaker A and the dictionary content speaker B, as shown in FIG. The vowel spectrum of speaker A is transmitted from the vowel spectrum collection section 2, and the vowel spectrum of speaker B in the dictionary is transmitted from the dictionary speaker vowel spectrum holding section 5.

単語辞書４＃′ｉ辞書内容発声者Ｂによシ作成されたも
のであって、各単語について音声ｇ識上必費な特徴がフ
ァイルされているものである。そしてこの辞書内容発声
者Ｂの各母音スペクトルが辞書発声者母音スペクトル保
持部５に保持されている。Word Dictionary 4#'i Dictionary Contents This dictionary was created by speaker B, and contains the essential characteristics of each word in a file. Each vowel spectrum of speaker B in the dictionary is held in dictionary speaker vowel spectrum holding section 5.

この母音スペクトルは、入力話者のそれと同様に母音そ
のものの発声により得られたものでも梯準単語より得ら
れたものでもよいが、この例では母音そのものの発声に
より得られたものが保持されている。This vowel spectrum, like that of the input speaker, may be obtained by uttering the vowel itself or from a step word, but in this example, the one obtained by uttering the vowel itself is retained. There is.

単語辞書修正部６は紡記母音差スペクトル計算ｓ３によ
り求められた、第２図に示す如き各母音のチャネル区分
毎の差にもとづき単語辞書４にファイルされている各単
語の母音情報を修正するものであ如、このようにして修
正されたものが修正単語辞書７に格納されるものである
。The word dictionary correction unit 6 corrects the vowel information of each word stored in the word dictionary 4 based on the difference for each channel category of each vowel as shown in FIG. 2, which is obtained by the spinning vowel difference spectrum calculation s3. The words corrected in this way are stored in the corrected word dictionary 7.

マイクロフォン８Ｆｉ入力話者が音声認識を行う場合の
音声入力部であり、このマイクロフォン８より入力され
た未知入力がｇｉｉｉ！部９により修正単語辞書７との
マツチングが行われることになる。Microphone 8Fi input This is the voice input section when the speaker performs voice recognition, and the unknown input input from this microphone 8 is giii! Matching with the corrected word dictionary 7 is performed by the unit 9.

次に第３図の動作について簡単に説明する。Next, the operation shown in FIG. 3 will be briefly explained.

■　単語辞書４を作成するときその辞書内容発声者Ｂの
母音情報が辞書発声者母音スペクトル保持部５に保持さ
れる。(2) When creating the word dictionary 4, the dictionary contents Vowel information of speaker B is held in the dictionary speaker vowel spectrum holding section 5.

■　この単語辞書４を使用して音声認識を行う場合、ま
ず利用者である入力話者人はマイクロフォン１より各母
音を入力する。この入力した各母音のスペクトルは母音
スペクトル収集部２に保持され、これと辞書発声者母音
スペクトル保持部５に保持された辞書内容発声者Ｂの各
母音スペクトルにもとづき、母音差スペクトル計算部３
は、第２図に示すように各母音に対するチャネル区分の
差分を演算する。そしてこの差分にもとづき単語辞書修
正部６は単語辞書４を読見してこれを修正し、修正単語
辞書７を作成する。■ When voice recognition is performed using this word dictionary 4, first, the input speaker who is the user inputs each vowel through the microphone 1. The input spectrum of each vowel is held in the vowel spectrum collection unit 2, and based on this and each vowel spectrum of the dictionary content speaker B held in the dictionary speaker vowel spectrum holding unit 5, the vowel difference spectrum calculation unit 3
calculates the difference in channel classification for each vowel as shown in FIG. Based on this difference, the word dictionary correction section 6 reads and corrects the word dictionary 4 to create a corrected word dictionary 7.

■　このようにして修正単語辞書７が作成されたのちに
、入力話者Ａｄ、マイクロフォン８よシ未知入力を入力
すれば、認識部９はこの未知入力を修正単語辞書７によ
るマツチングを行いそのマツチング結果が認識結果とし
て出力されることになる。■ After the corrected word dictionary 7 is created in this way, if an unknown input is input from the input speaker Ad and the microphone 8, the recognition unit 9 matches this unknown input with the corrected word dictionary 7 and performs the matching. The results will be output as recognition results.

単語辞書として時間圧縮された辞書の場合でも圧縮前の
各時間区分における各母音の時間比率を記憶しておき、
これらを重みとして各母音の差のスペクトルの重み付き
和を補正音とすれば、容易に話者適応を行うことができ
る。この場合の実施例を第４図及び第５図について説明
する。Even in the case of a time-compressed word dictionary, the time ratio of each vowel in each time segment before compression is memorized.
If these are used as weights and the weighted sum of the spectra of the differences between vowels is used as the corrected sound, speaker adaptation can be easily performed. An embodiment in this case will be described with reference to FIGS. 4 and 5.

例えば第４図のように、　　ＩＡＫＡＩＩという単語が
時間方向に８分割されて、各区間毎にパワーで重み付け
されたスペクトルの和で表わされているとする。このと
き第１区間、第２区間等はＩＡＩの音だけであるから問
題ないが、第７区関ｔｉｔ　ＩＡＩとＩＩＩが混在して
いるので、単純に修正できない。そこで各区間に含まれ
る５母音のそれぞれについて別々にパワーの和を計算し
、それらを重みとして５母音の差スペクトルに乗じ、そ
の結果を足し合わせて得られるスペクトルをもってその
区分を修正すればよい。For example, as shown in FIG. 4, it is assumed that the word IAKAII is divided into eight parts in the time direction and represented by the sum of spectra weighted by power for each interval. At this time, there is no problem since the first section, second section, etc. are only IAI sounds, but since the seventh section contains IAI and III sounds, it cannot be simply corrected. Therefore, it is sufficient to calculate the sum of powers separately for each of the five vowels included in each section, use them as weights to multiply the difference spectrum of the five vowels, and add the results to correct the division using the obtained spectrum.

すなわち、第４図の場合には、第５図に示す如く、各区
分と母音の重み（便宜正大、中、小と表現しているが、
具体的な数値が記入されるものである）を示すテーブル
を作成してこれを単語辞書に記憶しておき、例えば第７
区分を修正する場合には、ＩＡＩの重みを小としＩＩＩ
の重みを中として重み付けし、これに第３図に得られた
スペクトル差を使用して修正すればよい。That is, in the case of Figure 4, as shown in Figure 5, the weight of each category and vowel (expressed as positive, medium, and small for convenience)
Create a table showing specific numerical values (in which specific numerical values are entered) and store it in a word dictionary, for example,
When modifying the classification, reduce the weight of IAI and
It is sufficient to set the weight as medium and correct it using the spectral difference obtained in FIG.

このように辞書内容の単語の各時間区間における各母音
のそれぞれのパワー和の情報を記憶しておけば、このよ
うな場合で４簡単に修正することができる。If information on the power sum of each vowel in each time interval of a word in the dictionary is stored in this way, correction can be easily made in such a case.

（６）発明の効果本発明によれば、音声認識上大きな重みを有する母音情
報を入力話者に適合するように単語辞書を簡単に話者適
用化できるので、非常に簡単な手法により他人の作成し
た単語辞書を使用して、Ｉｌｉ！！識率の高い音声認識
を行うことができる。(6) Effects of the Invention According to the present invention, a word dictionary can be easily adapted to the input speaker so that vowel information, which has a large weight in speech recognition, is adapted to the input speaker. Using the word dictionary you created, Ili! ! It is possible to perform speech recognition with high recognition rate.

[Brief explanation of the drawing]

第１図〜第３図は本発明の一実施例構成図、第４図、第
５図は本発明の他の実施例の説明図である。図中、ｌはマイクロフォン、２は母音スペクトル収集部
、３は母音差スペクトル計算部、４＃ｉ単語辞書、Ｓａ
辞書発声者母音スペクトル保持部、６Ｆｉ単語辞書修正
部、７Ｆｉ修正単飴辞書、８は１イクロフオン、９は認
識部である。特許出願人　　富士通株式会社代理人弁理士　　山　谷晧栄？２膳：′ｊ′４図才Ｓ圓1 to 3 are configuration diagrams of one embodiment of the present invention, and FIGS. 4 and 5 are explanatory diagrams of other embodiments of the present invention. In the figure, l is a microphone, 2 is a vowel spectrum collection unit, 3 is a vowel difference spectrum calculation unit, 4#i word dictionary, Sa
A dictionary speaker vowel spectrum holding section, a 6Fi word dictionary correction section, a 7Fi modified single candy dictionary, 8 a 1-iclophon, and 9 a recognition section. Patent Applicant: Fujitsu Ltd. Representative Patent Attorney Akie Yamatani? 2 meals: 'j'4 diagrams S round

Claims

[Claims]

(1) A unit for correcting the single-modal dictionary using the Kusame dictionary, the vowel information holding means of the core word dictionary creator, the vowel information created in advance from the voice of the usage number, and the vowel information of the single-modal dictionary creator. Ameji i# - A voice using vowel information, which is equipped with a correction means and adapts the word dictionary to the user's voice by correcting the vowel information of the unimodal dictionary according to the user's voice. Recognition method.

(2) Each vowel data included in each time-compressed section is stored for each word, and based on this, the vowel information is corrected by 1. Vocal g weave method using the vowel information listed.