JPS6073697A

JPS6073697A - Preparation of phoneme dictionary

Info

Publication number: JPS6073697A
Application number: JP58181912A
Authority: JP
Inventors: 晋太木村; 奈良　泰弘; 裕二木島; 小林　敦仁
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1983-09-30
Filing date: 1983-09-30
Publication date: 1985-04-25

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（１）発明の技術分野本発明は、音声認識装置で用いられる音韻辞書の作成方
法に関する。DETAILED DESCRIPTION OF THE INVENTION (1) Technical Field of the Invention The present invention relates to a method for creating a phonetic dictionary used in a speech recognition device.

（２）技術の背景一般に、音声認識装置で音声データをｉ　ｈする場合に
は、該音声データと比較する比較データが必要である。(2) Background of the Technology Generally, when inputting voice data using a voice recognition device, comparison data with which the voice data is compared is required.

このような比較データを数多く備えたものとして一般に
音韻辞書と称されるものが知られている。A device that includes a large amount of such comparison data is generally known as a phoneme dictionary.

（３）従来技術とその問題点従来この種の音韻辞書は、例えば１００単語の辞書作成
用の音声データ群を用意し、夫々の音声データの特徴を
抽出すると共に、夫々の音声データに対して細分化され
た音韻記号とその特徴データを順次書込んで作成されて
いる。(3) Prior art and its problems Conventionally, this type of phonetic dictionary prepares a group of voice data for dictionary creation of, for example, 100 words, extracts the characteristics of each voice data, and It is created by sequentially writing subdivided phonetic symbols and their characteristic data.

然しなから、このような従来の音韻辞書にあっては、該
音韻辞書は実際の音声認識とは無関係に作成されること
から、辞書作成用の音声データ数が比較的少ない場合に
おいて上記音韻辞書を用いて音声認識を行なうと、その
認識結果に誤りを生じ易く、音声認識が不正確になり易
いという問題がある。However, in the case of such a conventional phoneme dictionary, since the phoneme dictionary is created independently of actual speech recognition, it is difficult to use the phoneme dictionary when the number of voice data for dictionary creation is relatively small. When voice recognition is performed using , there is a problem that errors tend to occur in the recognition results and the voice recognition tends to be inaccurate.

このような問題を解決するために、辞書作成用の音声デ
ータ数をある程度増加して音声認識の信頼性を高めると
いうことが考えられるが、このタイプにあっては、音韻
辞書そのものの容量が増大するほか、音声８識のための
処理に時間がかかり、音声認識処理の高速化という要請
にそぐわないという問題が生ずる。In order to solve this problem, it is possible to improve the reliability of speech recognition by increasing the number of speech data for dictionary creation to some extent, but in this type, the capacity of the phonetic dictionary itself increases. In addition, there is a problem in that the processing for voice recognition takes time, which does not meet the demand for speeding up voice recognition processing.

（４）発明の目的本発明は以上の観点に立って為されたものであって、そ
の目的とするところは、辞書作成用の音声データ数を不
必要に多くすることなく、音声認識を正確に行なえるよ
うにした音韻辞書の作成方法を提供することにある。(4) Purpose of the Invention The present invention has been made based on the above points of view, and its purpose is to accurately recognize speech without increasing the number of speech data for dictionary creation. The purpose of the present invention is to provide a method for creating a phonetic dictionary that can be used to create a phonetic dictionary.

（５）発明の構成そして、本発明の基本的構成は、音韻辞書を作成するに
際し、辞書作成用の音声データ群に基づく第一次音韻辞
書を予め作成した後、この第一次音韻辞書を作成した各
音声データを、認識対象となる音声データに基づく辞書
を除いた第一次音韻辞書を用いて認識すると共に、この
認識結果から認識の誤り規則を抽出し、上記第一次音韻
辞書に上記誤り規則を付加して第二次音韻辞書とするこ
とにある。(5) Structure of the Invention The basic structure of the present invention is that when creating a phonetic dictionary, a primary phonetic dictionary is created in advance based on a group of speech data for dictionary creation, and then this primary phonetic dictionary is created. Each created speech data is recognized using the primary phonological dictionary excluding the dictionary based on the speech data to be recognized, and recognition error rules are extracted from this recognition result and applied to the primary phonological dictionary. The purpose is to add the above error rules to create a second-order phoneme dictionary.

（６）発明の実施例以下、添付図面に示す実施例に基づいて本発明の詳細な
説明する。(6) Embodiments of the Invention Hereinafter, the present invention will be described in detail based on embodiments shown in the accompanying drawings.

第１図は本発明に係る音韻辞書の作成方法を実施するた
めの装置の一実施例を示すブロック図である。FIG. 1 is a block diagram showing an embodiment of an apparatus for carrying out the method of creating a phonetic dictionary according to the present invention.

図において、１は音声データを検出するためのマイクロ
ホーン、２はマイクロホーン１で検出された音声データ
をＮ巾変換すると共に、数ｍ　ｓ　ｅ　ｃ　幅にスライ
スしてフレームデータと称される時系列に変換する入力
部、３は入力部２のフレームデータに基づいて音声パワ
ースペクトル等を計算することにより音声データの特徴
を抽出し、上記フレームデータの時系列を特徴データの
時系列に変換する特徴抽出部、４はディスプレイ、５は
、例えば第２図に示すように、所定数（例えば１００）
の登録すべき単語を音韻列（５ＡＰＯＲＯ等）として予
め格納する登録単語音韻列ファイル、６は、例えば第３
図に示すように、特徴抽出部３からの特徴データの時系
列を登録単語音韻列ファイル５からの単語の音韻列と共
に格納する単語音声特徴ファイルであり、これらは全体
として音声登録部Ａを構成している。In the figure, 1 is a microphone for detecting audio data, and 2 is a microphone that converts the audio data detected by the microphone 1 into N widths and slices it into several msec widths, which is called frame data. The input unit 3 that converts into a series extracts the features of the audio data by calculating the audio power spectrum etc. based on the frame data of the input unit 2, and converts the time series of the frame data into a time series of feature data. 4 is a display; 5 is a predetermined number (for example, 100) as shown in FIG. 2;
A registration word phoneme string file 6 stores words to be registered in advance as a phoneme string (5APORO, etc.), for example, the third
As shown in the figure, this is a word speech feature file that stores the time series of feature data from the feature extraction section 3 together with the word phoneme string from the registered word phoneme string file 5, and these constitute the speech registration section A as a whole. are doing.

また、符号７は単語音声特徴ファイル６の単語の音韻列
によって特徴データ時系列の音韻を分割する音韻分割部
であり、例えば、第４図に示すように、特徴データ時系
列のパワーＰの時り点の間に単語（例えば０８ＡＫＡ　
＞の音韻列の一文字分を対応させるという処理を行うも
のである。８は、第４図に示すように、音韻分割部７で
得られた分割音韻に音韻記号を付け、音韻記号、この音
韻記号に対応する特徴データ、単語の音韻列及び音韻分
割情報（分割音韻が特徴データめどの範囲を占めるかを
示す情報）を出力する音韻ラベル付部である。そして、
９は、第５図に示すように、特徴データを音韻記号と単
語の音韻列と共に格納する第一次音韻辞書であり、音韻
分割部１、音韻ラベル付部８及び第一次音韻辞書９で第
−次音韻辞書作成部Ｂを構成している。尚、上記音韻分
割情報は第３図に示すように単語音声特徴ファイル６に
格納される。Further, reference numeral 7 is a phoneme division unit that divides the phonemes of the feature data time series according to the phoneme string of the word in the word voice feature file 6. For example, as shown in FIG. between the dots (e.g. 08AKA
> This process involves matching one character of a phoneme string. As shown in FIG. This is a phoneme labeling unit that outputs information indicating whether the feature data occupies the range of the feature data. and,
As shown in FIG. 5, 9 is a primary phoneme dictionary that stores feature data together with phoneme symbols and word phoneme strings, and the phoneme segmentation unit 1, phoneme labeling unit 8, and primary phoneme dictionary 9 store feature data together with phoneme symbols and word phoneme strings. It constitutes a secondary phoneme dictionary creation section B. The above-mentioned phoneme division information is stored in the word voice feature file 6 as shown in FIG.

更に、符号Ｃは第二次音韻辞書作成部であり、音韻認識
部１０、誤り規則抽出部１１、音韻ラベル追加部１２及
び第二次音韻辞書１３から成る。上記音韻認識部１０は
、単語音声特徴ファイル６内の単語音韻列の特徴データ
時系列を第一次音韻辞書９を用いて認識するもので、そ
の認識方法は、音声認識装置で用いる方法、例えば相違
度計算法や類似度計算法等と同一である。ただし、ある
単語の音韻認識を行う場合は当該単語の特徴データより
作成した第一次音韻辞書は使用しないものとする。また
、誤り規則抽出部１１は、第６図に示すように、単語音
声特徴ファイル６からの音韻分割情報と音韻認識部１０
から得られる音韻認識結果Ｉｔ較し、認識の誤り個所に
つい。Further, reference numeral C denotes a secondary phoneme dictionary creation unit, which includes a phoneme recognition unit 10, an error rule extraction unit 11, a phoneme label addition unit 12, and a secondary phoneme dictionary 13. The phoneme recognition unit 10 recognizes the feature data time series of word phoneme strings in the word phonetic feature file 6 using the primary phoneme dictionary 9, and the recognition method is a method used in a speech recognition device, for example. This is the same as the dissimilarity calculation method, similarity calculation method, etc. However, when performing phoneme recognition of a certain word, the primary phoneme dictionary created from the feature data of the word is not used. Further, as shown in FIG.
Compare the phoneme recognition results obtained from It and find out where the recognition errors are.

ての規則を抽出するものである。そして、上記音韻ラベ
ル追加部１２は、第６図及び第７図に示すように、上記
誤り規則抽出部１１から得られる誤り規則を第一次音韻
辞書９に追加し、第二次音韻辞書１３を作成するもので
ある。This method extracts all rules. Then, the phoneme label addition unit 12 adds the error rule obtained from the error rule extraction unit 11 to the primary phoneme dictionary 9, as shown in FIGS. is created.

従って、上記装置を用いて音韻辞書を作成する場合には
、登録単語音韻列ファイル５内の内容を順次ディスプレ
イ４上に表示し、表示された単語をマイクロホーン１に
向かって発声するようにすればよい。Therefore, when creating a phoneme dictionary using the above device, the contents of the registered word phoneme string file 5 are displayed one after another on the display 4, and the displayed words are uttered into the microphone 1. Bye.

このとき、先ず、発声された単語は、辞書作成用の音声
データとして登録単語と共に単語音声特徴ファイル６に
格納され、この段階において音声の登録が完了する。こ
の後、上記単語音声特徴ファイル６の内容は音韻分割部
１及び音韻ラベル付部８を介して第一次音韻辞書９に格
納され、第一次音韻辞書９が作成される。更にこの後、
音韻認識部１０では第一次音韻辞書９を用いて音声デー
タの音韻認識が行なわれるが、認識しようとする音声デ
ータの辞書を用いていないので、音韻認識時に第一次音
韻辞書９内には音声データと完全に一致するものはなく
、音声データの音韻認識に若干の誤りの傾向が与えられ
ることになる。このため、誤りの傾向が高い音韻を含む
音声データ、例えば第６図に示すりの認識に誤りが生ず
ることになり、この誤り部分の規則例えば〔Ａ〕が［ｉ
：）　、　［：Ｉ：］が〔Ｅ〕に誤り易いという規則が
誤り規則抽出部１１で抽出され、音韻ラベル追加部１２
で第一次音韻辞書９と共に第二次音韻辞書１３に格納さ
れ、この段階で第二次音韻辞書１３が作成される。At this time, the uttered words are first stored in the word voice feature file 6 together with the registered words as voice data for dictionary creation, and the voice registration is completed at this stage. Thereafter, the contents of the word phonetic feature file 6 are stored in the primary phoneme dictionary 9 via the phoneme division section 1 and the phoneme labeling section 8, and the primary phoneme dictionary 9 is created. Furthermore, after this,
The phoneme recognition unit 10 performs phoneme recognition of speech data using the primary phoneme dictionary 9, but since the dictionary of the speech data to be recognized is not used, there are no information in the primary phoneme dictionary 9 during phoneme recognition. There is no perfect match to the speech data, and the phonological recognition of the speech data will be subject to some error tendency. For this reason, errors occur in the recognition of speech data that includes phonemes with a high tendency for errors, such as the one shown in FIG.
:), the rule that [:I:] is easily mistaken for [E] is extracted by the error rule extraction unit 11, and the phonological label addition unit 12
The information is stored in the secondary phonetic dictionary 13 together with the primary phonetic dictionary 9, and the secondary phonetic dictionary 13 is created at this stage.

このようにして作成された一Ｆ記第二次音韻辞書１３は
音韻の誤り規則を自動的に学習したものになっているの
で、上記第二次音韻辞書を音声認識装置用の音韻辞書と
して使用した場合には、音声認識に当って音韻の誤り傾
向を加味した認識が可能となり、その分、音声の認識が
より正確なものになる。Since the 1F second-order phoneme dictionary 13 created in this way has automatically learned phoneme error rules, the second-order phoneme dictionary is used as a phoneme dictionary for the speech recognition device. In this case, speech recognition can take into account the tendency for errors in phonemes, and speech recognition becomes more accurate accordingly.

尚、本発明に係る音韻辞書の作成方法を実施するための
具体的装置としては上記実施例で示したものに限定され
るものではなく適宜設計変更して差支えない。Note that the specific device for carrying out the method for creating a phonetic dictionary according to the present invention is not limited to that shown in the above embodiments, and the design may be modified as appropriate.

（７）発明の詳細な説明してきたように、本発明に係る音韻辞書の作成方
法によれば、音韻の誤り規則を学習した音韻辞書を作成
することができるので、音声認識に当って音韻の誤り傾
向を知ることが可能となり、その分、音声の認識をより
正確にすることができる。また、本発明によれば、音韻
辞書を作成するに当って辞書作成用の音声データ数を増
加させる必要がないので、音韻辞書の容量が不必要に嵩
むという事態を有効に回避できると共に、音声認識処理
の高速化という要請を損うおそれも全くない。(7) As described in detail, according to the method for creating a phonetic dictionary according to the present invention, it is possible to create a phonetic dictionary in which phonetic error rules have been learned. It becomes possible to know error trends, and speech recognition can be made more accurate accordingly. Further, according to the present invention, there is no need to increase the number of voice data for dictionary creation when creating a phoneme dictionary, so it is possible to effectively avoid the situation where the capacity of the phoneme dictionary increases unnecessarily, and There is no risk of impairing the demand for faster recognition processing.

[Brief explanation of drawings]

第１図は本発明に係る音韻辞書の作成方法を実施するた
めの装置の一実施例を示すブロック図、第２図は登録単
語音韻列ファイルの内容を示す説明図、第３図は単語音
声特徴ファイルの内容を示す説明図、第４図は音韻分割
部と音韻ラベル付部との作用を示す説明図、第５図は第
一次音韻辞書の内容を示す説明図、第６図は誤り規則抽
出部の作用を示す説明図、第７図は第二次音韻辞書の内
容を示す説明図である。１・・・マイクロホーン　２・・・入力部３・・・特徴
抽出部　４・・・ディスプレイ５・・・登録単語音韻列
ファイル６・・・単語音声特徴ファイルト・・音韻分割部　８・・・音韻ラベル付部９・・・第
一次音韻辞書　１０・・・音韻認識部１１・・・誤り規
則抽出部　１２・・・音韻ラベル追加部１３・・・第二
次音韻辞書FIG. 1 is a block diagram showing an embodiment of a device for carrying out the method for creating a phoneme dictionary according to the present invention, FIG. 2 is an explanatory diagram showing the contents of a registered word phoneme string file, and FIG. 3 is a word phonetic diagram. An explanatory diagram showing the contents of the feature file, Fig. 4 is an explanatory diagram showing the operation of the phoneme segmentation section and the phoneme labeling section, Fig. 5 is an explanatory diagram showing the contents of the primary phoneme dictionary, and Fig. 6 is an error. FIG. 7 is an explanatory diagram showing the operation of the rule extracting section, and FIG. 7 is an explanatory diagram showing the contents of the secondary phoneme dictionary. 1...Microphone 2...Input unit 3...Feature extraction unit 4...Display 5...Registered word phoneme sequence file 6...Word audio feature file...Phonological segmentation unit 8...・Phonological labeling section 9...Primary phonological dictionary 10...Phonological recognition section 11...Error rule extraction section 12...Phonological label adding section 13...Second phonological dictionary

Claims

[Claims]

When creating a phonetic dictionary used in a speech recognition device,
After creating a primary phonetic dictionary based on a group of audio data for dictionary creation in advance, each audio data for which the primary phonetic dictionary was created is converted into a primary phonetic dictionary excluding the dictionary based on the audio data to be recognized. Creation of a phonological dictionary, characterized in that recognition is performed using a dictionary, and error rules in recognition are extracted from the recognition results, and the error rules are added to the primary phonological dictionary to form a secondary phonological dictionary. Method.