JPS6331793B2

JPS6331793B2 -

Info

Publication number: JPS6331793B2
Application number: JP57218973A
Authority: JP
Inventors: Hiroyuki Iwahashi; Yoshiki Nishioka
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1982-12-13
Filing date: 1982-12-13
Publication date: 1988-06-27
Also published as: JPS59107400A

Description

【発明の詳細な説明】＜技術分野＞本発明は予め登録された標準パターンと、入力
される音声パターンとのマツチング結果に基いて
音声認識を行う方式の音声認識方式に関する。DETAILED DESCRIPTION OF THE INVENTION <Technical Field> The present invention relates to a speech recognition method that performs speech recognition based on a matching result between a standard pattern registered in advance and an input speech pattern.

＜従来技術＞音声認識、特に特定話者用音声認識方式では、
登録の際、数回の発声で一つの語に対し、１個な
いし複数個の標準パターンを持たせている。即ち
第１図に示す様に一つの語に対し、Ｍ個の標準パ
ターンが記憶部１に記憶されており、Ｎ個の各語
に対し全てＭ個の標準パターンを持たせている。<Prior art> Speech recognition, especially speaker-specific speech recognition methods,
During registration, one word is uttered several times to have one or more standard patterns. That is, as shown in FIG. 1, M standard patterns are stored in the storage unit 1 for one word, and each of N words has M standard patterns.

第１図の様に語数が増すにつれて、メモリ１の
標準パターン群には、他のグループの標準パター
ンと類似したものが存在してくる。そのため、入
力された音声を認識する場合、類似した標準パタ
ーンにてマツチングを行つた時に、この標準パタ
ーンに対する語であると認識されるといつた、誤
認識が多発することになる。従来では、この様な
誤認識を防止するための手段がなかつた。 As the number of words increases as shown in FIG. 1, the standard pattern group in memory 1 includes standard patterns similar to standard patterns in other groups. Therefore, when recognizing input speech, when matching is performed using a similar standard pattern, erroneous recognition often occurs, such as when the word is recognized as a word corresponding to this standard pattern. Conventionally, there was no means to prevent such misrecognition.

＜発明の目的＞本発明は従来の欠点である誤認識を防止し、認
識率の低下を抑えることを目的としている。<Objective of the Invention> An object of the present invention is to prevent erroneous recognition, which is a drawback of the conventional method, and to suppress a decrease in the recognition rate.

＜実施例＞本発明は第１図に示す如くメモリ１に語数Ｎ個
に対し、夫々Ｍ個の標準パターンを記憶させてい
る。この第１図において、Ｐ，ｉ，ｊは、ｉ番目
の語のグループのＪ番目の標準パターンであるこ
とを意味している。この第１図の標準パターンの
メモリ１に対して、第２図に示す如くフラグメモ
リ２が用意される。このフラグメモリ２は、標準
パターンのメモリＰ，ｉ，ｊに、メモリＦ，ｉ，
ｊが対応するもので、後に説明するが、上記メモ
リＰ，ｉ，ｊに記憶されている標準パターンが音
声認識する上で不適合であれば“１”を、適当で
あれば“０”を立てる。本発明は、第１図及び第
２図に示すメモリ１，２を用意することで、話者
の音声認識を誤認識することなく確実に認識でき
るようにしたものである。<Embodiment> In the present invention, as shown in FIG. 1, a memory 1 stores M standard patterns for N words. In FIG. 1, P, i, j means the J-th standard pattern of the i-th word group. In contrast to the standard pattern memory 1 shown in FIG. 1, a flag memory 2 is prepared as shown in FIG. This flag memory 2 includes standard pattern memories P, i, j, memories F, i,
j is the corresponding one, and as will be explained later, if the standard pattern stored in the memory P, i, j is not suitable for speech recognition, set "1", and if appropriate, set "0". . In the present invention, by preparing the memories 1 and 2 shown in FIGS. 1 and 2, it is possible to reliably recognize the speaker's voice without erroneously recognizing it.

第３図は本発明の音声認識方式の流れを示すフ
ローチヤートである。この図を参照に本発明を説
明する。 FIG. 3 is a flowchart showing the flow of the speech recognition method of the present invention. The present invention will be explained with reference to this figure.

最初にＳ１において話者は、１つの語に対しＭ
個の標準パターンを登録するための音声入力を行
う。標準パターンを作成し、これを登録する場
合、周知の技術を利用すればよく、例えば入力さ
れた音声を多数のバンドパス・フイルタに通して
分割し、音声の特徴データを抽出し、この情報を
標準パターンとしてメモリ１の決められた位置例
えばＰ，ｉ，ｊに登録（記録）する。この様に話
者によるＮ語に対する夫々Ｍ個の標準パターンの
登録が終了すれば次のＳ２に移る。このＳ１にお
いては従来技術をそのまま利用している。 First, in S1, the speaker selects M for one word.
Perform voice input to register standard patterns. When creating and registering standard patterns, well-known techniques can be used, such as dividing input audio through a number of bandpass filters, extracting audio characteristic data, and using this information. It is registered (recorded) as a standard pattern at a predetermined position in the memory 1, for example, P, i, and j. In this way, when the registration of M standard patterns for N words by the speaker is completed, the process moves to the next step S2. In this S1, the conventional technology is used as is.

次にＳ２においては、登録されたメモリ１の標
準パターン間のマツチングを行う。これは、標準
パターンのメモリ１より任意のパターンを抜き出
し、これを話者の音声入力として、通常と同様に
音声認識を実行させる。このマツチングの結果、
抜き出した標準パターンが、同一グループ内に属
するものであれば、この標準パターンに対応した
フラグメモリ２に“０”を立てる。今少し説明す
れば、第１図のメモリ１より第１番目のグループ
最初の語の標準パターンＰ，１，１を読み出し、
このパターンＰ，１，１を話者の音声入力として
他の全ての標準パターンとのマツチングを行う。 Next, in S2, matching between the registered standard patterns of the memory 1 is performed. This extracts an arbitrary pattern from the standard pattern memory 1, uses it as the speaker's voice input, and executes voice recognition in the same way as usual. As a result of this matching,
If the extracted standard pattern belongs to the same group, "0" is set in the flag memory 2 corresponding to this standard pattern. To explain a little, the standard pattern P, 1, 1 of the first word of the first group is read out from memory 1 in FIG.
This pattern P, 1, 1 is used as the speaker's voice input and is matched with all other standard patterns.

この場合、標準パターンＰ，１，１とのマツチ
ングは除く。このマツチングの結果、標準パター
ンＰ，１，１が第１番目のグループでなく、他の
グループとの一致が見られるようであれば、この
標準パターンＰ，１，１は認識の際の標準パター
ンとしては、不適当であると見なし、Ｐ，１，１
に対応したフラグメモリ２のＦ，１，１を“１”
にする。つまり、一致が見られたグループの語が
音声入力されれば、通常このグループの語である
と認識されるはずであるが、Ｐ，１，１の標準パ
ターンとのマツチングが最大であるとして、第１
番目のグループの語であると誤認識されることに
なる。この様に誤認識を防止するために、抜き出
した標準パターンＰ，１，１が、マツチングによ
り同一グループ（第１番目の語）内になければ、
フラグメモリＦ，１，１を“１”とする。逆に同
一グループ内にあればＦ，１，１を“０”とす
る。 In this case, matching with the standard pattern P, 1, 1 is excluded. As a result of this matching, if the standard pattern P,1,1 is not the first group but matches with other groups, this standard pattern P,1,1 is the standard pattern for recognition. is considered inappropriate, and P,1,1
F,1,1 of flag memory 2 corresponding to “1”
Make it. In other words, if a word from a matched group is input by voice, it should normally be recognized as a word from this group, but assuming that the matching with the standard pattern of P, 1, 1 is maximum, 1st
It will be mistakenly recognized as a word in the second group. In order to prevent misrecognition in this way, if the extracted standard patterns P, 1, 1 are not in the same group (first word) by matching,
Set flag memory F,1,1 to "1". Conversely, if they are in the same group, F,1,1 is set to "0".

以上の処理をＰ，１，２、Ｐ，１，３、Ｐ，
１，４…、PN，Ｍについて順に行う。 The above processing is performed as P,1,2,P,1,3,P,
1, 4..., PN, M in order.

即ち、標準パターンの任意のＰ，ｉ，
j_i=1…_N;_j=1,2…_Mを抜き出し、Ｐ，ｉ，ｊ以外の
すべての標準パターンとのマツチングを行う。そ
の結果、第ｉ番目の語のグループでなければ、フ
ラグメモリＦ，ｉ，ｊを“１”に、第ｉ番目の語
のグループであればＦ，ｉ，ｊを“０”にする。
この様に、Ｐ，１，１からＰ，Ｎ，Ｍまでの全標
準パターン間のマツチング処理が終了すれば、第
４図の様に不適当な標準パターンに対応したフラ
グメモリの位置が“１”となる。第４図に示す如
く、Ｆ，２，１、Ｆ，２，ｊが“１”となつてい
るので、これに対応した標準パターンメモリ１の
Ｐ，２，１、Ｐ，２，ｊが標準パターンとして不
適当であることがわかる。 That is, any P,i, of the standard pattern
j _i=1 ... _N ; _j=1,2 ... _M is extracted and matched with all standard patterns other than P, i, and j. As a result, if it is not the i-th word group, the flag memory F, i, j is set to "1", and if it is the i-th word group, F, i, j is set to "0".
In this way, when the matching process between all the standard patterns from P, 1, 1 to P, N, M is completed, the position of the flag memory corresponding to the inappropriate standard pattern is "1" as shown in Figure 4. ” becomes. As shown in Fig. 4, since F,2,1, F,2,j is "1", the corresponding P,2,1, P,2,j of standard pattern memory 1 is standard. It can be seen that this is inappropriate as a pattern.

上述の様にして標準パターン間のマツチングが
終了すればＳ３に移り、フラグメモリ２を順次見
て、フラグが“１”となつている場合は、標準パ
ターン群の中に不適当なものが存在していること
を話者に知らせる。フラグＦ，ｉ，ｊに“１”が
存在すれば、話者はその箇所における標準パター
ンの再登録を行うか否かを決める。ここで、再登
録を行う場合、Ｓ５の再登録処理に移る。この再
登録は、フラグメモリの“１”になつている箇所
のみ再登録するよう話者に指示する。つまりフラ
グメモリ２の“１”の部分に対応する語を話者が
音声入力し、初期登録と同様に特徴を抽出した情
報を、調整パターンとして登録する。この様に、
フラグメモリ２の“１”になつている箇所に対応
する標準パターンメモリ１１の部分の再登録が全
て終了すれば、Ｓ２に移り標準パターン間のマツ
チング処理を繰り返えす。この様にして、標準パ
ターンメモリ内の不適当な標準パターンをなく
し、フラグメモリ２のＦ，ｉ，ｊ全てが“０”と
なれば、Ｓ６の通常の音声認識処理に移る。即
ち、Ｆ，ｉ，ｊ全てが“０”になれば不適当な標
準パターンがなくなり、この標準パターンメモリ
１を用いて認識処理を行うことで、認識率の低下
を抑えることができる。 When the matching between the standard patterns is completed as described above, the process moves to S3, and the flag memory 2 is sequentially checked. If the flag is "1", it is determined that there is an inappropriate pattern among the standard patterns. Let the speaker know what you are doing. If "1" exists in the flags F, i, and j, the speaker decides whether or not to re-register the standard pattern at that location. If re-registration is to be performed here, the process moves to re-registration processing in S5. This re-registration instructs the speaker to re-register only the portion of the flag memory that is set to "1". That is, the speaker inputs the word corresponding to the "1" part of the flag memory 2 by voice, and the information from which the features are extracted is registered as an adjustment pattern in the same manner as the initial registration. Like this,
When the re-registration of all the portions of the standard pattern memory 11 corresponding to the portions of the flag memory 2 that are set to "1" is completed, the process moves to S2 and the matching process between the standard patterns is repeated. In this way, when the inappropriate standard pattern in the standard pattern memory is eliminated and all of F, i, and j in the flag memory 2 become "0", the process moves to the normal speech recognition process in S6. That is, when F, i, and j all become "0", there are no inappropriate standard patterns, and by performing recognition processing using this standard pattern memory 1, it is possible to suppress a decrease in the recognition rate.

尚、Ｓ４において話者が再登録を指示しなけれ
ば、Ｓ６の音声認識処理に移る。しかし、音声認
識において、フラグメモリ２の“１”になつてい
る、対応のメモリ１の標準パターンを認識処理に
は利用しない。つまり、フラグメモリの“０”に
対応するメモリ１の標準パターンと、入力された
音声とのマツチング処理を行うことで音声認識を
行うため、誤認識を防止し、認識率の向上が望め
る。 Note that if the speaker does not instruct re-registration in S4, the process moves to speech recognition processing in S6. However, in speech recognition, the standard pattern in the corresponding memory 1, which is set to "1" in the flag memory 2, is not used for recognition processing. In other words, voice recognition is performed by performing a matching process between the standard pattern in the memory 1 corresponding to "0" in the flag memory and the input voice, thereby preventing erroneous recognition and improving the recognition rate.

＜発明の効果＞本発明の音声認識方式によれば、既に登録ずみ
の標準パターン群の中から任意のパターンを抜き
出し、この抜き出したパターンと他の標準パター
ンとのマツチングをとり、このマツチング結果に
より上記パターンが不適当か否か判定し、この判
定結果として不適当な標準パターンであることを
記憶しておくことで、音声認識の際に上記不適当
として記憶された標準パターンを除外して、それ
以外の最適な標準パターンに基づいて音声認識を
行うため、認識率を大幅に向上できることにな
る。<Effects of the Invention> According to the speech recognition method of the present invention, an arbitrary pattern is extracted from a group of standard patterns that have already been registered, the extracted pattern is matched with other standard patterns, and based on this matching result, By determining whether the pattern is inappropriate or not, and storing the fact that it is an inappropriate standard pattern as a result of this determination, the standard pattern stored as inappropriate can be excluded during speech recognition. Since speech recognition is performed based on other optimal standard patterns, the recognition rate can be significantly improved.

[Brief explanation of the drawing]

第１図は本発明にかかる標準パターンメモリの
記憶状態を示す図、第２図は第１図の標準パター
ンのメモリに対応して設けられたフラグメモリを
示す図、第３図は本発明の認識方式による手順を
示すフローチヤート、第４図は本発明による標準
パターン間のマツチング結果によるフラグメモリ
を示す図である。１：標準パターンメモリ、２：フラグメモリ。 FIG. 1 is a diagram showing the storage state of the standard pattern memory according to the present invention, FIG. 2 is a diagram showing a flag memory provided corresponding to the standard pattern memory of FIG. 1, and FIG. 3 is a diagram showing the storage state of the standard pattern memory according to the present invention. FIG. 4 is a flowchart showing the procedure according to the recognition method, and is a diagram showing a flag memory based on the matching result between standard patterns according to the present invention. 1: Standard pattern memory, 2: Flag memory.

Claims

[Claims] 1. Recognize the input voice by analyzing the input voice for voice recognition and matching the extracted data with a pre-registered standard pattern. In the speech recognition method, a step of registering M standard patterns for each word for N words, a step of selecting an arbitrary pattern from the group of registered standard patterns, and a step of selecting the selected standard pattern as one. a step of matching one standard pattern with all other standard patterns except for the standard pattern, using one as voice input data, and as a result of the matching, the standard pattern selected above matches a group of words that are the same as the standard pattern; If not, the selected standard pattern is stored in an unusable state, and the input voice is matched with other standard patterns excluding the stored standard pattern to perform voice recognition. Speech recognition method.