JPH02250096A

JPH02250096A - Speech recognition system

Info

Publication number: JPH02250096A
Application number: JP1070931A
Authority: JP
Inventors: Hiromi Shibuya; 渋谷　浩洋; Munekazu Maeda; 宗万前田; Yasutomo Onishi; 大西　康友
Original assignee: Matsushita Refrigeration Co
Current assignee: Panasonic Holdings Corp
Priority date: 1989-03-23
Filing date: 1989-03-23
Publication date: 1990-10-05

Abstract

PURPOSE:To improve the recognition rate by selecting a pattern by another standard pattern selecting means while one standard pattern selecting means in a pattern selection process period. CONSTITUTION:A selecting means 11 selects a standard pattern selecting means A9 until a standard pattern selecting means A9 enters the pattern selection process period, and then selects a standard pattern selecting means B10. Then when the standard pattern selecting means B10 enters the pattern selection process period, the standard pattern selecting means A9 is selected again and the serial operation is repeated thereafter. Consequently, a period wherein speech recognition is impossible is eliminated.

Description

【発明の詳細な説明】産業上の利用分野本発明は、特定話者及び不特定話者が人力した単語音声
を認識しその音声により数々の処理を行なうための音声
認識システムに関し、特に、不特定話者に関するもので
ある。DETAILED DESCRIPTION OF THE INVENTION Field of Industrial Application The present invention relates to a speech recognition system for recognizing word speech manually produced by specific speakers and unspecified speakers and performing various processes using the speech. It concerns a particular speaker.

従来の技術従来、カップ飲料等の自動販売機（以後、簡単にカップ
自販機と称する。）を始めとする自販機用音声認識シス
テムは、第４図に示すように、まず、利用者がマイクロ
ホン１により入力した音声を音声分析手段２により分析
して音声パターンを抽出する０分析には帯域通過フィル
ター群を使ったＢＰＦ（Ｂａｎｄ　　Ｐａｔｈ　　Ｆｌ
ｌｔｅｒ）分析結果を時間軸と周波数軸で標本化し、強
度をデジタル処理する手法を用いる。標準パターン記憶
手段３には、同様の方法により抽出した多数の不特定話
者が発声した複数の離散単語の音声パターンを標準パタ
ーンとして記憶しである。ただし、ここで標準パターン
として記憶されている単語は、カップ自販機で販売する
フレーバー（コーヒージュース等飲料の品名）の呼称と
いくつかの返答単語（はい、いいえ、ホット、アイス等
）である。2. Description of the Related Art Conventionally, a voice recognition system for a vending machine such as a vending machine for cup beverages (hereinafter simply referred to as a cup vending machine), as shown in FIG. The input voice is analyzed by the voice analysis means 2 to extract voice patterns.BPF (Band Path Fl.
lter) A method is used to sample the analysis results on the time axis and frequency axis and digitally process the intensity. The standard pattern storage means 3 stores, as standard patterns, audio patterns of a plurality of discrete words uttered by a large number of unspecified speakers extracted by a similar method. However, the words stored as standard patterns here are the names of flavors sold in cup vending machines (product names of beverages such as coffee juice) and some response words (yes, no, hot, ice cream, etc.).

そして、標準パターン選出手段４で、標準パターンの中
から入カバターンに最も近い標準パターンをＤＰ（Ｄｙ
ｎａｍｌｃ　　Ｐｒｏｇｒａｍｌｎｇ）マツチング法に
より選び出し音声を認識するものである。ＤＰマツチン
グ法とは動的計画法と訳され、１９５７年に米国のＢｅ
ｌｌｍａｎが提案した数理計画法の一手法で、多段決定
過程の最適化に適用される。その手法は、各段である決
定（制御）ｔ−行なって状懇を変換させながら、目的に達するまでの過程での良
さ／悪さを評価する間数を最大／最小とするというもの
である。また、音声認識システムが特定話者に対応する
場合は、標準パターン記憶手段３に特定話者が発声した
認識単語の音声パターンを登録し、−力率特定話者に対
応する場合は、不特定多数の話者が発声した認識単語の
音声パターンの内、代表パターンのいくつかを登録する
０発声誘導手段５は、音声合成手段により構成され、後
述する制御手段６に応じて、利用者の発声を促すために
音声による発声を促す。ただし、フレーバー塩は、カッ
プ自販機前面のパネル板等に明記してあり、利用者はそ
の中から妊みのフレーバー塩を１つ選んで発声するもの
である。制御手段６は、処理に応じて発声誘導手段５に
誘導音声の発声を指示し、標準パターン選出手段４によ
り選出した標準パターンの中から利用者が発声した単語
を認識すると共に、認識結果により以後のカップ自販機
の動作を制御するものである。また、７はコインの受取
りと釣銭の払い戻しを行なうコイン受取り手段、８は選
択されたフレーバーをカップに注ぎ搬圧する飲料搬出手
段である。Then, the standard pattern selection means 4 selects the standard pattern closest to the input cover turn from among the standard patterns as DP (Dy
This method recognizes selected speech using a matching method (namlc programming). The DP matching method is translated as dynamic programming, and was developed in the United States in 1957 by Be
This is a method of mathematical programming proposed by llman, and is applied to the optimization of multi-stage decision processes. The method is to make a certain decision (control) at each stage and transform the situation, while maximizing/minimizing the number of steps to evaluate the goodness/badness of the process until reaching the goal. In addition, when the speech recognition system corresponds to a specific speaker, the speech pattern of the recognized word uttered by the specific speaker is registered in the standard pattern storage means 3; The utterance guiding means 5, which registers some representative patterns among the voice patterns of recognized words uttered by a large number of speakers, is constituted by a voice synthesizing means, and adjusts the user's utterance according to the control means 6, which will be described later. Encourage vocal utterances to encourage. However, the flavor salts are clearly marked on the panel on the front of the cup vending machine, and the user must select one of the flavor salts for pregnancy and say it out loud. The control means 6 instructs the utterance guidance means 5 to utter a guidance voice according to the process, recognizes the words uttered by the user from among the standard patterns selected by the standard pattern selection means 4, and uses the recognition results to This is to control the operation of the cup vending machine. Further, 7 is a coin receiving means for receiving coins and refunding change, and 8 is a beverage discharging means for pouring and conveying the selected flavor into a cup.

次に、第５図に、従来の自販機用音声認識システムの音
声認識可能期間を示す、ｔｌは音声終了確認期間、ｔ２
はパターン選出処理期間である。第５図より、ｔｌ及び
ｔ２の期間は、音声認識が不可能であることがわかる。Next, FIG. 5 shows the voice recognition possible period of the conventional voice recognition system for vending machines, where tl is the voice end confirmation period, t2
is the pattern selection processing period. From FIG. 5, it can be seen that speech recognition is impossible during periods tl and t2.

発明が解決しようとする課題しかしながら、上記のような方法では、ｔｌ及びｔ２の
期間は、発声者の音声を認識できないため、認識率が低
下するという欠点を有していた。Problems to be Solved by the Invention However, the method described above has the disadvantage that the recognition rate decreases because the voice of the speaker cannot be recognized during the periods tl and t2.

本発明は上記従来の課題を解決するもので、音声認識不
可能である期間ｔ２をなくすことにより、認識率の高い
音声認識システムを提供することを目的とする。The present invention solves the above-mentioned conventional problems, and aims to provide a speech recognition system with a high recognition rate by eliminating the period t2 during which speech recognition is not possible.

課題を解決するための手段この目的を達成するために本発明の音声認識システムは
、複数の離散単語音声の標準パターン群を記憶した標準
パターン記憶手段と、発声者の音声を分析し音声パター
ンを抽出する音声分析手段と、前記音声分析手段により
抽出した音声パターンに最も近い標準パターンを前記標
準パターン群から選出する複数の標準パターン選出手段
と、前記複数の標準パターン選出手段中のいずれかの標
準パターン選出手段を選択する選択手段と、発声者に単
語を発声するように誘導する発声誘導手段とからなる構
成を有している。Means for Solving the Problems To achieve this object, the speech recognition system of the present invention includes a standard pattern storage means that stores a group of standard patterns of a plurality of discrete word sounds, and a standard pattern storage means that analyzes the speech of a speaker and generates speech patterns. a speech analysis means for extracting, a plurality of standard pattern selection means for selecting from the standard pattern group a standard pattern closest to the speech pattern extracted by the speech analysis means, and any standard among the plurality of standard pattern selection means. It has a configuration consisting of a selection means for selecting a pattern selection means, and a utterance guidance means for guiding a speaker to utter a word.

作用この構成によって、複数の標準パターン選出手段を持つ
ことにより、１つの標準パターン選出手段がパターン選
出処理期間にある時は、他の標準パターン選出手段がパ
ターン選出処理を行なうことにより、音声認識不可能で
ある期間ｔ２−をなくし認識率の高い音声認識システム
を実現できることどなる。Effect With this configuration, by having a plurality of standard pattern selection means, when one standard pattern selection means is in the pattern selection processing period, the other standard pattern selection means performs pattern selection processing, thereby preventing voice recognition failure. It is possible to eliminate the possible period t2- and realize a speech recognition system with a high recognition rate.

実施例以下本発明の一実施例について、図面を参照しながら説
明する。EXAMPLE An example of the present invention will be described below with reference to the drawings.

本実施例は、不特定話者に対する音声認識システムをカ
ップ自販機に適応したものである。ただし、構成要件中
、従来例と同構成のものは、同番号を付し、説明を割愛
する。第１図は、本発明の実施例における音声認識シス
テムの機能ブロック図を示すものである。９．１０はそ
れぞれ、標準バタ・〜ン選出手段Ａ、［準パターン選出
手段Ｂであり、標準パターンの中から入カバターンに最
も近い標準パターンをＤＰ（Ｄｙｎａｍｌｃ　　Ｐｒｏ
ｇｒａｍｌｎｇ）マツチング法により選び出し音声を認
識するものである。１１は選択手段であり、前記複数の
標準パターン選出手段９，１０のいずれかの標準パター
ン選出手段を選択するものである。In this embodiment, a voice recognition system for unspecified speakers is applied to a cup vending machine. However, among the structural requirements, those having the same configuration as the conventional example are given the same numbers and explanations are omitted. FIG. 1 shows a functional block diagram of a speech recognition system in an embodiment of the present invention. 9.10 are standard pattern selection means A and quasi-pattern selection means B, respectively, which select the standard pattern closest to the input pattern from among the standard patterns by DP (Dynamlc Pro).
This method recognizes selected speech using a matching method (gramlng). Reference numeral 11 denotes a selection means, which selects one of the plurality of standard pattern selection means 9 and 10.

第２図に、本発明の実施例における自販機用音声認識シ
ステムの音声認識可能期間を示す。FIG. 2 shows the speech recognition possible period of the speech recognition system for vending machines in the embodiment of the present invention.

ｔｌは音声終了確認期間、ｔ２、ｔ　２１はパターン選
出処理期間である。tl is a voice end confirmation period, and t2 and t21 are pattern selection processing periods.

第２図に示すように、前記選択手段１１は前記標準パタ
ーン選出手段Ａ９がパターン選出処理期間に入るまでは
前記標準パターン選出手段Ａ９を選択し、その後は、前
記標準パターン選出手段Ｂ１０を選択する。そして、前
記標準パターン選出手段ＢＩＯがパターン選出処理期間
に入ると再び前記標準パターン選出手段Ａ９を選択し、
以後この一連の動作を繰り返す。したがって、例えば発
声者が”え−と（音声入力期間　）、コーヒー（音声入
力期間　）“と発声した場合、従来の自販機用音声認識
システムでは、′″コーヒーという音声は前記標準パタ
ーン選出手段Ａ９のパターン選出処理期間Ｃｎえ−と”
という音声の処理期間）に発声されるため認識不可能で
あったが、本発明の実施例では前記標準パターン選出手
段Ｂ１０によって、認識されることとなる。As shown in FIG. 2, the selection means 11 selects the standard pattern selection means A9 until the standard pattern selection means A9 enters a pattern selection processing period, and thereafter selects the standard pattern selection means B10. . Then, when the standard pattern selection means BIO enters a pattern selection processing period, it selects the standard pattern selection means A9 again,
This series of operations is then repeated. Therefore, for example, when a speaker utters ``um (voice input period), coffee (voice input period)'', in the conventional voice recognition system for vending machines, the voice ``coffee'' is not recognized by the standard pattern selection means A9. Pattern selection processing period Cn
However, in the embodiment of the present invention, it is recognized by the standard pattern selection means B10.

以上のように構成されたカップ自販機用音声認識システ
ムについて、第８図のフローチャートを用いてその販売
動作を説明する。第３図において、まず、ステップ２０
１で、前記コイン受取手Ｊ！！２７にコインが投入され
たか否かを判定し、コインが投入されればステップ２０
２に進む。ステップ２０２では、前記発声誘導手段５に
より”いらつしゃいませ、何になさいますか”と誘導し
、客からのフレーバー名の発声を待つ。そして、ステッ
プ２０８で、前記標準パターン選出手段Ａ９あるいは前
記標準パターン選出手段ＢＩＯにより、前記標準パター
ン記憶手段３に記憶されている標準パターンから、入力
された音声パターンに最も近い標準パターンを選出して
フレーバー名を認識する。ステップ２０４では、ステッ
プ２０３での認識結果が適当か否かを判定し、リジェク
トの場合はステップ２０５へ進み、発声誘導手段５によ
り”もう−度お答え下さい”と誘導して２０３へ戻る。The vending operation of the voice recognition system for a cup vending machine configured as described above will be explained using the flowchart shown in FIG. In FIG. 3, first, step 20
1, the coin recipient J! ! It is determined whether a coin has been inserted in step 27, and if a coin has been inserted, step 20
Proceed to step 2. In step 202, the voice guidance means 5 guides the customer by saying, "Welcome, what would you like to have?" and waits for the customer to say the flavor name. Then, in step 208, the standard pattern selection means A9 or the standard pattern selection means BIO selects a standard pattern closest to the input voice pattern from among the standard patterns stored in the standard pattern storage means 3. Recognize flavor names. In step 204, it is determined whether or not the recognition result in step 203 is appropriate, and if it is rejected, the process proceeds to step 205, where the utterance guidance means 5 prompts "Please answer again", and the process returns to 203.

一方、リジェクトでない場合はステップ２０６へ進む。On the other hand, if the request is not rejected, the process advances to step 206.

ステップ２０６では、ステップ２０８で認識したフレー
バーにより以降の動作を分岐するものであるが、本実施
例においてはコーヒーを認識したものとし、他のフレー
バー名を認識した場合の動作についてはコーヒーの場合
と同様であるため説明を割愛する０次にステップ２０７
では、発声誘導手段５により゛′コーヒーですね”と確
認し、客の返答を待つ、そして、２０８で、フレーバー
名と同様の方法で、はいかいいえの返答を認識する。In step 206, the subsequent operation is branched depending on the flavor recognized in step 208, but in this embodiment, it is assumed that coffee has been recognized, and the operation when another flavor name is recognized is the same as in the case of coffee. Since it is similar, the explanation is omitted. Step 207
Then, the voice guidance means 5 confirms, ``It's coffee,'' and waits for the customer's response.In step 208, the customer recognizes the yes or no response in the same manner as the flavor name.

ステップ２０９では、ステップ２０８での認識結果が適
当か否かを判定し、リジェクトの場合はステップ２０７
へ戻り、そうでない場合はステップ２１０へ進む。ステ
ップ２１０では、ステップ２０８で認識した返答がはい
の場合はステップ２１１へ進み、いいえの場合はステッ
プ２０５へ戻る。In step 209, it is determined whether the recognition result in step 208 is appropriate or not, and in the case of rejection, step 207
If not, proceed to step 210. In step 210, if the answer recognized in step 208 is yes, the process advances to step 211; if the answer is no, the process returns to step 205.

ステップ２１１では、制御手段６が、コーヒーを前記飲
料搬出手段８を使ってカップに注ぎ搬出する。そして、
ステップ２１２で、釣り銭がある場合は、コイン受取手
段７により釣り銭を払い戻し、最後に、ステップ２１３
で発声誘導手段５により”ありがどうございました”と
発声して一連の動作を終了する。In step 211, the control means 6 pours and transports coffee into a cup using the beverage transport means 8. and,
In step 212, if there is change, the change is refunded by the coin receiving means 7, and finally, in step 213
Then, the voice guiding means 5 utters "Thank you very much" and the series of operations ends.

以上のように本実施例によれば、複数の標準パターン選
出手段をもつことにより一つの標準パターン選出手段が
パターン選出処理期間に入ると、他の標準パターン選出
手段が音声分析結果取り込み期間に入るため、従来は音
声認識が不可能であった期間においても音声認識が可能
となり、例えば発声者がフレーバー選択時に迷っている
時に、え−と、（フレーバー名）”と発声しても認識で
きる確率が高くなる。このため、音声認識システムの認
識率が向上すると共に、リジェクトの回数も減少し発声
者がスムーズに対話ができることとなるなどその効果は
大である。As described above, according to this embodiment, by having a plurality of standard pattern selection means, when one standard pattern selection means enters the pattern selection processing period, the other standard pattern selection means enters the speech analysis result acquisition period. Therefore, voice recognition is now possible even during periods when voice recognition was previously impossible. For example, when a speaker is unsure about choosing a flavor, the probability that the voice will be recognized even if he or she utters, ``Um, (flavor name).'' As a result, the recognition rate of the speech recognition system improves, the number of rejections decreases, and the speaker can communicate smoothly, which has great effects.

発明の効果以上のように本発明の音声認識システムは、複数の離散
単語音声の標準パターン群を記憶した標準パターン記憶
手段と、発声者の音声を分析し音声パターンを抽出する
音声分析手段と、前記音声分析手段により抽出した音声
パターンに最も近い標準パターンを前記標準パターン群
から選出する複数の標準パターン記憶手段と、前記複数
の標準パターン選出手段中のいずれかの標準パターン選
出手段を選択する選択手段と、発声者に単語を発声する
ように誘導する発声誘導手段とを設けることにより、一
つの標準パターン選出手段がパターン選出処理期間にあ
る時は、他の標準パターン選出手段が音声分析結果取り
込み期間に入るため従来は音声認識が不可能であった期
間においても音声認識が可能となり、認識率の高い音声
認識システムを実現することができることとなる。Effects of the Invention As described above, the speech recognition system of the present invention includes: a standard pattern storage means that stores a group of standard patterns of a plurality of discrete word sounds; a speech analysis means that analyzes a speaker's speech and extracts a speech pattern; a plurality of standard pattern storage means for selecting a standard pattern closest to the speech pattern extracted by the speech analysis means from the standard pattern group; and a selection for selecting one of the plurality of standard pattern selection means. By providing means and a voice guidance means for guiding the speaker to utter a word, when one standard pattern selection means is in the pattern selection processing period, the other standard pattern selection means can import the voice analysis results. Since the period is entered, speech recognition is now possible even during a period in which speech recognition was previously impossible, making it possible to realize a speech recognition system with a high recognition rate.

[Brief explanation of drawings]

第１図は本発明の一実施例における音声認識システムの
機能ブロック図、第２図は本発明の実施例における音声
認識システムの音声認識可能期間の説明図、第３図は本
発明の実施例における音声認識システムの動作例を示す
フローチャート、第４図は従来の音声認識システムの機
能ブロック図、第５図は従来の音声認識システムの音声
認識可能期間の説明図である。２・・・音声分析手段、３・・・標準パターン記憶手段
、５・・・発声誘導手段、９・・・標準パターン選出手
段Ａ、１０・・・標準パターン選出手段Ｂ、１１・・・
選択手段。ヌンｍ砧頃串〉屑の＋ｌ■ 区FIG. 1 is a functional block diagram of a speech recognition system according to an embodiment of the present invention, FIG. 2 is an explanatory diagram of the speech recognition possible period of the speech recognition system according to an embodiment of the present invention, and FIG. 3 is an embodiment of the present invention. FIG. 4 is a functional block diagram of the conventional speech recognition system, and FIG. 5 is an explanatory diagram of the speech recognition possible period of the conventional speech recognition system. 2... Voice analysis means, 3... Standard pattern storage means, 5... Vocal guidance means, 9... Standard pattern selection means A, 10... Standard pattern selection means B, 11...
means of selection. Nunm Kinutokorokushi〉 Kuzu no +l■ Ward

Claims

[Claims]

a standard pattern storage means that stores a group of standard patterns of a plurality of discrete word sounds, a voice analysis means that analyzes the voice of a speaker and extracts a voice pattern, and a standard pattern that is closest to the voice pattern extracted by the voice analysis means. a plurality of standard pattern selection means for selecting from the standard pattern group; a selection means for selecting one of the plurality of standard pattern selection means; and utterance for inducing a speaker to utter the word. A voice recognition system comprising a guidance means.