JPH0719158B2

JPH0719158B2 - Speaker adaptation method of phoneme transformation rule

Info

Publication number: JPH0719158B2
Application number: JP60244436A
Authority: JP
Inventors: 晋太木村
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1985-10-31
Filing date: 1985-10-31
Publication date: 1995-03-06
Anticipated expiration: 2010-03-06
Also published as: JPS62103698A

Description

【発明の詳細な説明】〔概要〕連続音声中の音素変形に対処するのに、少なくとも文章
や，単語や，形態素の音素遷移ネットワークを前以て与
えられた音素変形ルールを用いて作成する音素認識装置
において、多数話者の話者特有の音素変形ルール辞書を
音素ネットワーク作成部に接続し、単語辞書に学習デー
タのみを登録しておき、未知話者による該学習データに
対する発声結果の認識を行う際に、この時使用された音
素変形ルールの使用頻度を、上記多数話者の音素変形ル
ール辞書毎に計数して、使用頻度の高い特定話者の音素
変形ルール辞書を統合して、該学習話者に特有な音素変
形ルール辞書とするようにしたものである。DETAILED DESCRIPTION [Outline] In order to deal with phoneme transformation in continuous speech, a phoneme created by using a phoneme transformation rule given at least a phoneme transition network of sentences, words, and morphemes. In the recognition device, a speaker-specific phoneme transformation rule dictionary of a large number of speakers is connected to the phoneme network creation unit, only learning data is registered in the word dictionary, and an unknown speaker recognizes the voicing result for the learning data. When performing, the frequency of use of the phoneme transformation rules used at this time is counted for each phoneme transformation rule dictionary of the large number of speakers, and the phoneme transformation rule dictionary of the specific speaker with high frequency of use is integrated, This is a phoneme transformation rule dictionary peculiar to the learning speaker.

[Industrial application field]

本発明は、音声認識装置に係り、特に各話者に特有な音
素変形ルールを、音素変形ルール学習用の膨大な音声デ
ータの登録なしに学習する音素変形ルールの話者適応方
式に関する。The present invention relates to a speech recognition device, and more particularly to a speaker adaptation method of phoneme transformation rules for learning phoneme transformation rules peculiar to each speaker without registering enormous speech data for learning phoneme transformation rules.

音声学で云う音素変形ルールは、音声を構成する複数の
音素を発声した時、各話者によって、どのように変形す
るかを示す規則で、万人に共通な音素変形ルールと、各
話者に特有な音素変形ルールがあるが、従来から、各話
者に特有な音素変形ルールを自動的に学習する方式が開
発されておらず、音声認識装置の設計者が、未知の各話
者毎の学習（登録）音声データから目視で音素変形ルー
ルを抽出している。The phoneme transformation rule in phonetics is a rule that shows how each speaker transforms when multiple phonemes that make up a voice are uttered. There is a phoneme transformation rule peculiar to each speaker, but conventionally, a method of automatically learning the phoneme transformation rule peculiar to each speaker has not been developed. The phoneme transformation rule is visually extracted from the learning (registered) voice data of.

又、不特定話者に対応できる音素変形ルール辞書の構築
も考えられるが、この方式では、ルール数が多くて処理
時間が膨大となり現実的ではなく、又、ある話者にとっ
て有効であるが、別の話者にとっては有害な音素変形ル
ールもあり、識別性能として個人用の音素変形ルールよ
り劣るのが普通である。It is also conceivable to construct a phoneme transformation rule dictionary that can deal with unspecified speakers, but this method is not realistic because the number of rules is large and the processing time becomes huge, and it is effective for a certain speaker. Some phoneme transformation rules are harmful to another speaker, and are generally inferior to individual phoneme transformation rules in terms of identification performance.

現状では、高性能で、高速認識が可能な音声認識装置が
使用できる迄に長時間を要し、使用者に対して不便をか
けると共に、設計者の労力は大変なものとなっており、
音声認識装置の普及に対する大きな障害となっているこ
とから、未知話者用の音素変形ルールを効果的に生成す
る方式が求められるようになってきた。At present, it takes a long time before a high-performance, high-speed voice recognition device can be used, which is inconvenient for the user, and the labor of the designer is great.
As a major obstacle to the spread of voice recognition devices, there has been a demand for a method of effectively generating phoneme transformation rules for unknown speakers.

[Problems to be solved by conventional technology and invention]

第３図は、従来の一般的な音声認識装置の例を示した図
である。FIG. 3 is a diagram showing an example of a conventional general voice recognition device.

先ず、共通音素変形ルール辞書，特有音素変形ルール
辞書に格納されている音素変形ルールについて説明す
る。First, the phoneme transformation rules stored in the common phoneme transformation rule dictionary and the specific phoneme transformation rule dictionary will be described.

前述のように、音素変形ルールは、各音声を構成する複
数の音声を、ある話者が発声した時に、どのように変形
するかを示す法則（ルール）であって、各話者（万人）
に共通なルールと，各話者に特有なルールの２種類があ
る。As described above, the phoneme transformation rule is a rule (rule) showing how to transform a plurality of voices constituting each voice when a speaker utters the voices. )
There are two types of rules, which are common to each speaker and rules specific to each speaker.

各話者に共通なルールは、各言語（日本語，英語等）毎
の音声学として広く知られているが、各話者に特有なル
ールは実際に発声された音声データから学習する必要が
ある。The rules common to each speaker are widely known as phonetics for each language (Japanese, English, etc.), but the rules specific to each speaker need to be learned from the actually uttered voice data. is there.

＊話者に共通したルールの例：ｉ無声子音に挟まれた母音のイ，又はウは無声化する
可能性がある。* Examples of common rules for speakers: i The vowels a or u sandwiched between unvoiced consonants may be devoiced.

ii 語中の/G/は鼻濁音化する可能性がある。ii / G / in the word can be nasal.

iii 鼻子音に隣接する母音は鼻音化する可能性がある。iii Vowels adjacent to nasal consonants can become nasalized.

iv 母音のイが後続する子音は口蓋化する。iv Consonants followed by the vowel a are palatized.

＊話者に特有なルールの例：ｖ調音点が唇にある子音に続く母音のウは円唇化す
る。* Examples of speaker-specific rules: v The consonant whose articulatory point is on the lip, followed by the vowel c, is rounded.

vi /G/は鼻濁音化しない。vi / G / does not make nasal noise.

vii 鼻音は口蓋化しない。vii Nasal sounds do not palate.

等がある。Etc.

音声認識装置内においては、上記話者に特有な音素変形
ルールが話者に共通したルールに優先して使用される。
例えば、上記のルールの例においては、viが優先されて
iiは無視されることになる。In the voice recognition device, the phoneme transformation rule peculiar to the speaker is used in preference to the rule common to the speakers.
For example, in the example rule above, vi has priority
ii will be ignored.

次に音素変形ルールの学習方式について説明する。Next, the learning method of the phoneme transformation rule will be described.

音声認識装置は、音声認識に先立って音声発声者に特有
な音素変形ルールを学習する必要がある。従来において
は、未知の話者が発声する学習用の音声データを読み取
り、該話者に特有な音素変形ルールを抽出して、目視で
上記特有音素変形ルール辞書を作成していた。The voice recognition device needs to learn phoneme transformation rules specific to a voice utterer prior to voice recognition. Conventionally, the speech data for learning uttered by an unknown speaker is read, the phoneme transformation rules peculiar to the speaker are extracted, and the peculiar phoneme transformation rule dictionary is created visually.

次に、このような音声認識装置での音声認識方式を説明
する。Next, a voice recognition method in such a voice recognition device will be described.

上記の音素変形ルールが出来上がると、音声認識装置，
例えば、都道府県名を認識する装置においては、全ての
都道府県名を格納している単語辞書から、各文字列
（即ち、都道府県名）が読み出され、上記共通音素変形
ルール辞書と，特有音素変形ルール上記とに基づい
て、第４図に示す音素変遷ネットワークが作成される。When the above phoneme transformation rule is completed, a voice recognition device,
For example, in a device for recognizing a prefecture name, each character string (that is, a prefecture name) is read from a word dictionary storing all the prefecture names, and the common phoneme transformation rule dictionary Phoneme Transformation Rule Based on the above, the phoneme transition network shown in FIG. 4 is created.

本図の例においては、一般の単語を例にしているが、
‘ASITAGA'（明日が）と云う文字列の単語に対して、上
記音素変形ルールを適用すると、図示のような音素変遷
ネットワークとなることが示されている。In the example of this figure, a general word is taken as an example,
It is shown that when the above phoneme transformation rule is applied to a word of a character string called'ASITAGA '(tomorrow ga), a phoneme transition network as shown in the figure is obtained.

第３図の音素遷移ネットワーク作成部においては、単
語辞書に含まれている全ての文字列について、上記の
音素変形を用いて、音素遷移ネットワークを作成する。The phoneme transition network creation unit of FIG. 3 creates a phoneme transition network by using the above phoneme transformation for all character strings included in the word dictionary.

こうして作成された音素遷移ネットワークが、音素遷移
ネットワーク検証部に送られると、マイクから入力
されてきた特定話者の音声が音声分析部で各音素に分
解され、上記音素遷移ネットワークの何れのパスと一致
するかが検証され、受理できた場合には、該一致した音
素遷移ネットワークに対応する単語の文字列データを単
語辞書から読み出し、その文字列表記を認識結果とし
て出力するように動作する。When the phoneme transition network created in this way is sent to the phoneme transition network verification unit, the voice of the specific speaker input from the microphone is decomposed into each phoneme by the voice analysis unit, and any path of the phoneme transition network is If the match is verified and accepted, the character string data of the word corresponding to the matched phoneme transition network is read from the word dictionary, and the character string notation is output as the recognition result.

従って、従来方式においては、上記話者特有の音素変形
ルール辞書を、各話者についての音声データを読み取
り、各話者に特有な音素変形ルールを目視で抽出し、人
手で該音素変形ルール辞書を作成していた為、該特有
音素変形ルール辞書を作成する為の労力が膨大となり、
音声認識装置の普及に対する大きな障害となっていた。Therefore, in the conventional method, the speaker-specific phoneme transformation rule dictionary is read by reading the voice data for each speaker, the phoneme transformation rule unique to each speaker is visually extracted, and the phoneme transformation rule dictionary is manually extracted. Since it was created, the labor for creating the specific phoneme transformation rule dictionary becomes enormous,
This has been a major obstacle to the spread of voice recognition devices.

本発明は上記従来の欠点に鑑み、各話者に特有な音素変
形ルールを自動的に学習する方式を提供することによ
り、実用的な音声認識装置を実現する方法を提供するこ
とを目的とするものである。In view of the above conventional drawbacks, the present invention has an object to provide a method for realizing a practical voice recognition device by providing a method for automatically learning phoneme transformation rules peculiar to each speaker. It is a thing.

[Means for solving problems]

第１図は本発明の音素変形ルールの話者適応方式の原理
ブロック図である。FIG. 1 is a block diagram of the principle of the speaker adaptation method of the phoneme transformation rule of the present invention.

本発明においては、多数の話者（Ｎ人）の音素変形ルー
ル辞書′を全て用いて、未知話者に発声内容の分かっ
ている複数の音素変形ルール学習用音声データ（単語辞
書に格納されている）を与えて発声させ、音素遷移ネ
ットワーク検証部で認識するときに用いられる音素変
形ルールの使用頻度を特定話者用辞書′毎に計数し、
使用頻度の高い方から数個〜十数個の上記特定話者用辞
書′を統合して、該未知話者用音素変形ルール辞書と
するように構成する。In the present invention, all the phoneme transformation rule dictionaries' of a large number of speakers (N persons) are used, and a plurality of phoneme transformation rule learning speech data (stored in the word dictionary) whose utterance contents are known to an unknown speaker are used. The phoneme transition rule verification frequency used for recognition by the phoneme transition network verification unit is counted for each specific speaker dictionary ′.
It is configured such that a few to a dozen or more of the above-mentioned specific speaker dictionaries' are integrated to form the unknown speaker phoneme transformation rule dictionary.

[Action]

即ち、本発明によれば、連続音声中の音素変形に対処す
るのに、少なくとも文章や，単語や，形態素の音素遷移
ネットワークを前以て与えられた音素変形ルールを用い
て作成する音素認識装置において、多数話者の話者特有
の音素変形ルール辞書を音素ネットワーク作成部に接続
し、単語辞書に学習データのみを登録しておき、未知話
者による該学習データに対する発声結果を認識する際
に、この時使用された音素変形ルールの使用頻度を、上
記多数話者の音素変形ルール辞書毎に計数して、使用頻
度の高い特定話者の音素変形ルール辞書を統合して、該
学習話者に特有な音素変形ルール辞書とするようにした
ものであるので、話者特有の音素変形ルールを、比較的
少ない登録音声データから自動的に学習することがで
き、実用的な音声認識装置を提供することができる効果
がある。That is, according to the present invention, in order to deal with phoneme deformation in continuous speech, a phoneme recognition device that creates at least a phoneme transition network of sentences, words, and morphemes using a phoneme deformation rule given in advance. In, a speaker-specific phoneme transformation rule dictionary of a large number of speakers is connected to the phoneme network creation unit, only learning data is registered in the word dictionary, and when recognizing a voicing result for the learning data by an unknown speaker. , The frequency of use of the phoneme transformation rules used at this time is counted for each of the phoneme transformation rule dictionaries of the large number of speakers, and the phoneme transformation rule dictionaries of the specific speakers with high frequency of use are integrated to obtain the learning speaker. The phoneme transformation rule dictionary peculiar to the speaker is used, so that the phoneme transformation rules peculiar to the speaker can be automatically learned from a relatively small amount of registered speech data. There is an effect that it is possible to provide a location.

〔Example〕

以下本発明の実施例を図面によって詳述する。 Embodiments of the present invention will be described in detail below with reference to the drawings.

第２図が本発声の一実施例をブロック図で示した図であ
って、辞書使用頻度カウンタ，辞書選択スイッチ制御
部，辞書選択SWが本発明を実施するのに必要な機能
ブロックである。尚、全図を通して、同じ符号は同じ対
象物を示している。FIG. 2 is a block diagram showing one embodiment of the present utterance, and the dictionary usage frequency counter, the dictionary selection switch control unit, and the dictionary selection SW are the functional blocks necessary for implementing the present invention. The same reference numerals indicate the same objects throughout the drawings.

以下、第１図を参照しながら第２図によって、本発明の
音素変形ルールの話者適応方式を説明する。The speaker adaptation method of the phoneme transformation rule of the present invention will be described below with reference to FIG. 1 and FIG.

先ず、話者に特有な音素変形ルールを学習するときは、
多数話者の話者特有の音素変形ルール辞書′を音素遷
移ネットワーク作成部に接続し、単語辞書に学習デ
ータの単語名のみを登録しておき、該学習データの認識
を行う。First, when learning phoneme transformation rules peculiar to a speaker,
A speaker-specific phoneme transformation rule dictionary'of a large number of speakers is connected to the phoneme transition network creating unit, only word names of learning data are registered in the word dictionary, and the learning data is recognized.

該学習を単語で行う場合には、一単語毎の認識を行う
が、予め、該単語に対する音素遷移ネットワークが音素
遷移ネットワーク作成部で生成されているので、マイ
クから当該単語に対する発声が行われ、音声分析部
で音素単位に分析され、音素遷移ネットワーク検証部
に入力されると、該検証部においては、該ネットワー
クの何れかのパスで受理される。When performing the learning with words, recognition is performed for each word, but since the phoneme transition network for the word has been generated in advance by the phoneme transition network creation unit, the microphone utters the word, When it is analyzed in phoneme units in the voice analysis unit and input to the phoneme transition network verification unit, the verification unit accepts it in any path of the network.

若し、受理されない場合には、新しいルールが必要であ
り、従来方式の手作業によって音素変形ルールを登録す
るか，或いはその学習データを無視し、次の学習データ
の認識処理に移る。If it is not accepted, a new rule is required, and the phoneme transformation rule is manually registered by the conventional method, or the learning data is ignored, and the process of recognizing the next learning data is started.

本発明においては、上記各話者の特有音素変形ルール辞
書′に対して、辞書使用頻度カウンタが設けられて
いるので、上記受理した音素遷移ネットワークのパスの
作成に関与した音素変形ルールが、ある話者の特有音素
変形ルール辞書′から採用されている時、その対応す
るカウンタをカウントアップする。In the present invention, since a dictionary usage frequency counter is provided for each speaker's peculiar phoneme transformation rule dictionary ', there is a phoneme transformation rule involved in the creation of the accepted phoneme transition network path. When adopted from the speaker's peculiar phoneme transformation rule dictionary ', the corresponding counter is counted up.

同じ音素変形ルールが複数の話者の特有音素変形ルール
辞書′にあるときは、その音素変形ルールの存在する
全ての話者の特有音素変形ルール辞書′のカウンタ
をカウントアップするように機能する。When the same phoneme transformation rule exists in the peculiar phoneme transformation rule dictionary 'of a plurality of speakers, it functions to count up the counters of the peculiar phoneme transformation rule dictionary' of all speakers in which the phoneme transformation rule exists.

全ての学習データの処理が終了した時点において、全話
者の音素変形ルール辞書′に対応して設けられている
辞書使用頻度カウンタのカウント値を参照し、値の大
きいＭ個の各話者の特有音素変形ルール辞書′を、辞
書選択スイッチ制御部の制御信号によって、辞書選択
SWを動作させることにより統合し、上記学習話者の特
有音素変形ルール辞書とする。上記Ｍの値は、予め，装
置設計者が与えておいても良いし、操作者がキーボー
ド，スイッチ等によって指定しても良い。When the processing of all the learning data is completed, the count value of the dictionary usage frequency counter provided corresponding to the phoneme transformation rule dictionary of all the speakers is referred to, and each of the M speakers having a large value is referred to. Dictionary selection of the peculiar phoneme transformation rule dictionary 'by the control signal of the dictionary selection switch control unit
It is integrated by operating the SW to form the learning speaker's peculiar phoneme transformation rule dictionary. The value of M may be given in advance by the device designer, or may be specified by the operator using a keyboard, a switch, or the like.

未知話者用の音素変形ルールを作成する為の学習が終了
すると、上記のように統合された音素変形ルール辞書に
よって、通常の音声認識動作が行われる。When the learning for creating the phoneme transformation rule for the unknown speaker is completed, the normal speech recognition operation is performed by the phoneme transformation rule dictionary integrated as described above.

このように、本発明においては、予め内容が分かってい
る学習データについて、未知話者に発声させて認識する
時、多数の特定話者用音素変形ルール辞書の何れが多く
使用されるかを計数し、使用頻度の多い音素変形ルール
辞書を選択して統合し、当該未知の話者の音素変形ルー
ル辞書とするようにした所に特徴がある。As described above, in the present invention, when learning data whose content is known in advance is recognized by an unknown speaker, which of a large number of phoneme transformation rule dictionaries for specific speakers is used is counted. However, the feature is that the phoneme transformation rule dictionaries that are frequently used are selected and integrated to be used as the phoneme transformation rule dictionary of the unknown speaker.

〔The invention's effect〕

以上、詳細に説明したように、本発明の音素変形ルール
の話者適応方式は、連続音声中の音素変形に対処するの
に、少なくとも文章や，単語や，形態素の音素遷移ネッ
トワークを前以て与えられた音素変形ルールを用いて作
成する音声認識装置において、多数話者の話者特有の音
素変形ルール辞書を音素ネットワーク作成部に接続し、
単語辞書に学習データのみを登録しておき、未知話者に
よる該学習データに対する発声の認識を行う際に、この
時使用された音素変形ルールの使用頻度を、上記多数話
者の音素変形ルール辞書毎に計数して、使用頻度の高い
特定話者の音素変形ルール辞書を統合して、該学習話者
に特有な音素変形ルール辞書とするようにしたものであ
るので、話者特有の音素変形ルールを、比較的少ない登
録音声データから自動的に学習することができ、実用的
な音声認識装置を提供することができる効果がある。As described above in detail, the speaker adaptation method of the phoneme transformation rule of the present invention uses at least sentences, words, and phoneme transition networks of morphemes to deal with phoneme transformation in continuous speech. In a speech recognition device created using a given phoneme transformation rule, a speaker-specific phoneme transformation rule dictionary of many speakers is connected to a phoneme network creation unit,
Only the learning data is registered in the word dictionary, and the frequency of use of the phoneme transformation rule used at this time when recognizing the utterance for the learning data by the unknown speaker is determined by the phoneme transformation rule dictionary of the above-mentioned multiple speakers. The phoneme transformation rule dictionary of a speaker that is frequently used is integrated into a phoneme transformation rule dictionary that is unique to the learning speaker. The rules can be automatically learned from a relatively small amount of registered voice data, and a practical voice recognition device can be provided.

[Brief description of drawings]

第１図は本発明の音素変形ルールの話者適応方式の原理
ブロック図，第２図は本発明の一実施例をブロック図で示した図，第３図は一般的な音声認識装置の例を示した図，第４図は音素遷移ネットワークの例を示した図，である。図面において、はマイク，は音声分析部，は音素遷移ネットワー
ク検証部，は音素遷移ネットワーク作成部，，′
は特有音素変形ルール辞書，は共通音素変形ルール辞
書，は単語辞書，は辞書選択SW,は辞書選択スイ
ッチ制御部，は辞書使用頻度カウンタ，をそれぞれ示す。FIG. 1 is a block diagram showing the principle of a speaker adaptation system for phoneme transformation rules according to the present invention, FIG. 2 is a block diagram showing an embodiment of the present invention, and FIG. 3 is an example of a general speech recognition apparatus. , And Fig. 4 are examples of phoneme transition networks. In the drawing, is a microphone, is a voice analysis unit, is a phoneme transition network verification unit, is a phoneme transition network creation unit,
Indicates a peculiar phoneme transformation rule dictionary, a common phoneme transformation rule dictionary, a word dictionary, a dictionary selection switch, a dictionary selection switch control unit, and a dictionary usage frequency counter.

Claims

[Claims]

1. A number of speech recognition devices, which deal with phoneme deformation in continuous speech, using at least a phoneme transition network of a sentence, a word, and a morpheme using a given phoneme deformation rule. Using all the phoneme transformation rules dictionary for specific speakers of the above, recognizing a plurality of rule-learning speech data in which the utterance content of an unknown speaker is known, the frequency of use of the phoneme transformation rules used at that time is calculated. For each speaker-specific phoneme transformation rule dictionary for a specific speaker, a dictionary frequency counter is used to count, and several to a dozen or more phoneme transformation rule dictionaries for a specific speaker are integrated, Speaker adaptation method of phoneme transformation rules, which is a dictionary for phoneme transformation rules for mobile phones.