JPS62209598A

JPS62209598A - Word voice recognition processing system

Info

Publication number: JPS62209598A
Application number: JP61052807A
Authority: JP
Inventors: 晋太木村
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1986-03-11
Filing date: 1986-03-11
Publication date: 1987-09-14

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔概　要〕多数単語の音声認識の改善した処理方式。音声パターン
と辞書の単語音声パターンとの比較で、第１単語認識部
が１以上の候補単語を選択し、単語の発音記号ネットワ
ークの辞書を使う第２単語認識部は、辞書参照範囲を該
候補単語のみに制限して認識処理を実行する。この方式
により、第２単語認識部の比較的高い認識率の認識処理
が、辞書参照範囲の制限によって高速に実行できる。[Detailed Description of the Invention] [Summary] An improved processing method for speech recognition of multiple words. By comparing the speech pattern with the word speech pattern in the dictionary, the first word recognition section selects one or more candidate words, and the second word recognition section, which uses the dictionary of the word phonetic symbol network, sets the dictionary reference range to the candidate word. Execute recognition processing limited to words only. This method allows the second word recognition unit to perform recognition processing with a relatively high recognition rate at high speed by limiting the dictionary reference range.

[Industrial application field]

本発明は、多数の単語音声を認識する処理方式文書作成
装置や多種のコマンドを使用する装置を音声で操作する
方式が実用化されようとしている。か＼る装置が実用的
になるためには、比較的多種の単語音声の認識を、比較
的高速に実時間処理できる、経済的な構成が必要である
。The present invention is a processing method that recognizes the sounds of a large number of words. A method of operating a document creation device and a device that uses a variety of commands by voice is about to be put into practical use. In order for such a device to become practical, it must have an economical configuration that can recognize a relatively wide variety of word sounds at a relatively high speed in real time.

[Conventional technology]

単語音声の認識方式としては、公知の２方式が実用され
ている。Two known methods are in practical use as word speech recognition methods.

第１の方式では、所要の全単語の音声パターンの情報（
適当な時間隔における、単語音声のスペクトル情報等）
を、その音声の表す単語と対にした単語辞書を作成して
おく。In the first method, information on the phonetic patterns of all required words (
spectral information of word sounds at appropriate time intervals, etc.)
A word dictionary is created in which words are paired with the words expressed by the sounds.

音声認識処理において、入力音声パターンを単語辞書の
情報と比較して、登録されている各単語について両パタ
ーンの一致の程度を示す距離を求め、最も距離の小さい
単語を認識結果とする。In the speech recognition process, the input speech pattern is compared with the information in the word dictionary, the distance indicating the degree of agreement between the two patterns is determined for each registered word, and the word with the smallest distance is taken as the recognition result.

両パターンの比較において、音声発声の時間軸の伸縮を
補正する方法とし゛ての動的計画法によるマツチング法
等がこの方式を改良する方式として知られている。In comparing both patterns, a matching method using dynamic programming, which is a method for correcting the expansion and contraction of the time axis of speech utterance, is known as a method for improving this method.

第２の方式では、単語の発音記号から、音声知識を用い
て作成する発音記号ネットワークを辞書情報とする単語
辞書を作成する。In the second method, a word dictionary is created from the phonetic symbols of words, using a phonetic symbol network created using phonetic knowledge as dictionary information.

音声認識処理において、入力音声パターンを単語辞書の
情報と比較して、発音記号ネットワークの何れかの分枝
をとおる発音記号列の、各発音記号に対応する音声パタ
ーンと入力音声パターンとの距離の最小のものを結果の
単語とする。In speech recognition processing, the input speech pattern is compared with the information in the word dictionary, and the distance between the input speech pattern and the speech pattern corresponding to each phonetic symbol in the phonetic symbol string passing through any branch of the phonetic symbol network is calculated. Take the smallest one as the resulting word.

こ＼で、発音記号ネットワークとは公知のように、例え
ば「明日が」を一つの単語とする場合、その発音をロー
マ字で表現したｒＡＳＩＴＡＧＡ　Ｊから、いわゆる音
声知識に基づいて、実際の発音で現れる態様を、例えば
概念的に下記のような構成のネットワークで表したもの
である。As is well known, the phonetic symbol network is, for example, when "Tomorrow ga" is a word, its pronunciation is expressed in romaji rASITAGA J, and then it is expressed in the actual pronunciation based on so-called phonetic knowledge. The aspect is conceptually represented by a network having the following configuration, for example.

その結果、上記例の場合に単語「明日が」に対して、４
種類の発音記号列を組織的に比較対象として処理するこ
とができる。As a result, in the above example, for the word "Tomorrow ga", 4
It is possible to systematically process different kinds of phonetic symbol strings as comparison targets.

この例で、発音記号ネットワークの構成に使用された音
声知識は、口木語でＳＩはＳＨＩになること、無声音Ｓ
ＨとＴとに挟まれた１は無声化する場合があること、又
Ｔの前に無音の区切りが挿入されること、及びＧは鼻濁
音化してＮＧと発音される場合があること等である。In this example, the phonetic knowledge used to construct the phonetic symbol network is that in spoken language, SI becomes SHI, and the voiceless S
The 1 between H and T may be devoiced, a silent break may be inserted before the T, and the G may be nasally pronounced and pronounced as NG. .

この認識のための比較処理は、いわゆる人工知能的な手
法を要する比較的複雑な処理となるので、それを効率よ
く実行するための方法として、島駆動法、ＬＨＦＴ−Ｔ
ｏ−ＲＩＧＨＴ法、動的計画法によるマツチング法等が
知られている。This comparison process for recognition is a relatively complicated process that requires so-called artificial intelligence techniques, so the island drive method, LHFT-T
The o-RIGHT method, the matching method using dynamic programming, and the like are known.

[Problem that the invention seeks to solve]

前記第１の方式は、数百種程度の単語音声認識のための
大規模集積回路が開発されていて、高速処理が可能であ
るので、認識単語種類が多い場合には、この回路の並列
使用により同等の高速処理の可能性がある。In the first method, large-scale integrated circuits have been developed for speech recognition of several hundred types of words, and high-speed processing is possible. Therefore, when there are many types of recognized words, this circuit can be used in parallel. There is a possibility of equivalent high-speed processing.

しかし、第１の方式は、登録した個々の単語音声パター
ンとの比較のみによるので、それらの音声パターンから
、例えば無声化現象などによって変化した発声の音声は
、それが別に登録されていなければ認識できず、又単語
が多種になると、ごく類似した発音の異語が多数あるが
、このような単語は誤認識し易い等により、高い認識率
を期待できない。However, since the first method relies only on comparison with registered individual word sound patterns, from these sound patterns, the sound of utterances that have changed due to devoicing, etc., will be recognized unless it is registered separately. Furthermore, when there are many types of words, there are many different words with very similar pronunciations, but such words are easy to misrecognize, so a high recognition rate cannot be expected.

第２の方式は、音声知識を用いて、発声の変形を網羅し
た辞書を作成し、入力音声パターンとのマツチング処理
にも音声知識を利用して高い認識率が期待できることは
公知の通りである。It is well known that the second method uses phonetic knowledge to create a dictionary that covers the variations of utterances, and also uses phonetic knowledge in the matching process with the input speech pattern, which can be expected to achieve a high recognition rate. .

しかし、処理が複雑であるので、認識対象の単語種類が
増加して、参照すべき辞書の情報量が多くなるに従い、
処理時間が急速に増加する。However, since the processing is complex, as the number of words to be recognized increases and the amount of dictionary information to be referenced increases,
Processing time increases rapidly.

以上に上げたような理由から、従来は比較的多種の単語
音声の認識を、高い認識率で実時間処理する適当な方式
が無いという問題があった。For the reasons listed above, there has conventionally been a problem in that there is no suitable method for recognizing a relatively wide variety of word speech in real time at a high recognition rate.

[Means for solving problems]

第１図は、本発明の構成を示すブロック図である。 FIG. 1 is a block diagram showing the configuration of the present invention.

図において工は第１単語認識部、２は第１単語辞書、３
は第２単語認識部、４は第２単語辞書、５は単語辞書制
限部である。In the figure, engineering is the first word recognition unit, 2 is the first word dictionary, and 3 is the first word recognition unit.
4 is a second word recognition unit, 4 is a second word dictionary, and 5 is a word dictionary restriction unit.

〔作　用〕第１単語辞書２は、各単語の音声パターンを登録した辞
書とし、第１単語認識部１は第１単語辞書２を使用して
、信号線６からの入力音声パターンについて、前記第１
の方式による音声認識処理を実行する。[Operation] The first word dictionary 2 is a dictionary in which the speech patterns of each word are registered, and the first word recognition unit 1 uses the first word dictionary 2 to determine the above-mentioned speech pattern input from the signal line 6. 1st
Execute speech recognition processing using the method described below.

但し、ｌ単語を認識結果として決定するのではなく、第
２単語認識部３のための前処理として、複数の所定個数
の候補単語を選択して、単語辞書制限部５に通知する。However, instead of determining one word as a recognition result, a plurality of predetermined number of candidate words are selected as pre-processing for the second word recognition section 3 and notified to the word dictionary restriction section 5.

第２単語辞書４は、各単語の発音記号ネットワーク情報
を登録した辞書とし、第２単語認識部３は第２単語辞書
４を使用して、信号線６からの入力音声パターンについ
て、前記第２の方式による音声認識処理を実行して、決
定した単語を入力音声の認識結果として出力する。The second word dictionary 4 is a dictionary in which phonetic symbol network information of each word is registered, and the second word recognition unit 3 uses the second word dictionary 4 to determine the second Speech recognition processing is executed using the method described above, and the determined word is output as a recognition result of the input speech.

その際、第２単語認識部３が参照する辞書情報は、単語
制限部５によって指定され、先に単語辞書制限部５が第
１単語認識部１から受は取っている候補単語の辞書情報
のみとする。At this time, the dictionary information referred to by the second word recognition section 3 is only the dictionary information of the candidate word specified by the word restriction section 5 and previously received from the first word recognition section 1 by the word dictionary restriction section 5. shall be.

以上の方式により、複数の候補単語を選択する荒い認識
処理を、高速処理の可能な第１単語認識部で行うことに
よって、第２単語認識部の高認識率の処理において参照
する辞書の参照範囲を、比較的少数の単語に絞るので、
高認識率の処理を比較的高速に処理することが可能にな
る。With the above method, by performing rough recognition processing for selecting multiple candidate words in the first word recognition unit capable of high-speed processing, the reference range of the dictionary used in the high recognition rate processing of the second word recognition unit is Because we narrow it down to a relatively small number of words,
It becomes possible to process high recognition rate processing at relatively high speed.

〔Example〕

第２図は、本発明の実施例の音声認識装置の構成を示す
ブロック図である。FIG. 2 is a block diagram showing the configuration of a speech recognition device according to an embodiment of the present invention.

第１単語認識部１１は第１単語辞書１２を使用して、信
号線１６からの入力音声パターンについて、前記第１の
方式による音声認識処理を実行する。The first word recognition unit 11 uses the first word dictionary 12 to perform speech recognition processing according to the first method on the input speech pattern from the signal line 16.

第１単語認識部１１は、例えば前記の高速処理の可能な
大規模集積回路の並列処理によって構成する。The first word recognition unit 11 is configured, for example, by parallel processing of the aforementioned large-scale integrated circuit capable of high-speed processing.

第２単語認識部１３は第２単語辞書１４を使用して、信
号線１６からの入力音声パターンについて、前記第２の
方式による音声認識処理を実行する。The second word recognition unit 13 uses the second word dictionary 14 to perform speech recognition processing according to the second method on the input speech pattern from the signal line 16.

そのために、第２単語辞書１４は、所要の各単語につい
ての前記のような発音記号ネットワーク情報を登録した
辞書として構成する。For this purpose, the second word dictionary 14 is configured as a dictionary in which the above-mentioned phonetic symbol network information for each required word is registered.

単語辞書合成部エフは、第２単語辞書１４の発音記号ネ
ットワーク情報と、各発音記号の音声パターン情報１８
とを使用して、各単語の例えばすべての発音記号列に対
応する音声パターンを合成し、その単語名と共に格納し
て、第１単語辞書１２を生成する。The word dictionary synthesis unit F uses the phonetic symbol network information of the second word dictionary 14 and the phonetic pattern information 18 of each phonetic symbol.
The first word dictionary 12 is generated by synthesizing the speech patterns corresponding to, for example, all the phonetic symbol strings of each word and storing them together with the word names.

このようにすることにより、実際の音声から第１単語辞
書１２を構成する必要がなく、全く自動的な処理のみで
、多量の単語を収容する単語辞書を短時間に構成するこ
とができる。By doing this, there is no need to construct the first word dictionary 12 from actual speech, and a word dictionary accommodating a large number of words can be constructed in a short period of time through completely automatic processing.

第１単語認識部１１は、信号線１６から入力する音声パ
ターンを処理し、複数の所定個数の候補単語を選択して
、単語辞書制限部１５に通知する。The first word recognition unit 11 processes the voice pattern input from the signal line 16, selects a plurality of predetermined number of candidate words, and notifies the word dictionary restriction unit 15.

第２単語認識部１３は、信号線１６からの入力音声パタ
ーンを、第１単語認識部１１と並列に受信し、第２単語
辞書１４及び音声パターン情＠ｉ１８を使って認識処理
を実行する。The second word recognition unit 13 receives the input speech pattern from the signal line 16 in parallel with the first word recognition unit 11, and executes recognition processing using the second word dictionary 14 and the speech pattern information @i18.

その際参照する辞書情報は、単語辞書制限部１５によっ
て、先に単語辞書制限部１５が第１単語認識部１１から
受は取っている候補単語の辞書情報のみに制限される。The dictionary information to be referred to at this time is limited by the word dictionary restriction section 15 to only the dictionary information of candidate words that the word dictionary restriction section 15 has previously received from the first word recognition section 11 .

〔Effect of the invention〕

以上の説明から明らかなように、本発明によれば、多数
の単語音声の認識を行う場合に、高認識率の処理を高速
に処理する方式が、経済的に実現されるという著しい工
業的効果がある。As is clear from the above description, the present invention has a remarkable industrial effect in that a method for performing high-speed processing with a high recognition rate can be realized economically when recognizing a large number of word sounds. There is.

[Brief explanation of drawings]

第１図は本発明の構成を示すブロック図、第２図は本発
明の実施例構成ブロック図である。図において、１．１１は第１単語認識部、２．１２は第１単語辞書、３．１３ば第２単語認識部、４．１４は第２単語辞書、５．１５は単語辞書制限部、６．１６は信号線、１７は単語辞書合成部、１８は音声パターン情報本発明の構成を示すブロック図第１図本発明の実施例構成ブロック図第２図FIG. 1 is a block diagram showing the configuration of the present invention, and FIG. 2 is a block diagram showing the configuration of an embodiment of the present invention. In the figure, 1.11 is the first word recognition unit, 2.12 is the first word dictionary, 3.13 is the second word recognition unit, 4.14 is the second word dictionary, 5.15 is the word dictionary restriction unit, 6. 16 is a signal line, 17 is a word dictionary synthesis unit, 18 is voice pattern information. Figure 1 is a block diagram showing the configuration of the present invention. Figure 2 is a block diagram of the configuration of an embodiment of the present invention.

Claims

[Claims] When determining a word represented by an input audio signal from an input audio pattern of the input audio signal, a first word dictionary (2) that uses the audio pattern of the desired word as dictionary information; a phonetic symbol for the required word; A first word recognition unit ( 1) means (5) for limiting the reference range of the second word dictionary (4) to only the dictionary information of the candidate word; and speech recognition processing of the input speech pattern using the limited dictionary information. A word speech recognition processing method characterized by comprising a second word recognition unit (3) that executes.