JPH041919B2

JPH041919B2 -

Info

Publication number: JPH041919B2
Application number: JP58091741A
Authority: JP
Inventors: Akihiko Takeuchi
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1983-05-25
Filing date: 1983-05-25
Publication date: 1992-01-14
Also published as: JPS59216199A

Description

[Detailed description of the invention]

(a) 発明の技術分野本発明は音声処理システムにおける不特定話者
を対象とする音声認識手段と特定話者を対象とす
る音声認識手段の切換装置に関する。 (b) 技術の背景近年データ処理技術の発展と普及に伴いデータ
処理システムにおける入出力手段の一端として、
当初は音声制御による分類装置や電話における案
内サービスにとゞまつていた音声認識・合成技術
も半導体特に集積化技術と回路設計技術の進展に
支えられ複数高度あるいは大容量記憶を要する実
現手段がより小形且低コスト化されるに従い、日
本語処理によるデータ入出力手段が対話形式に適
し操作者に特別の習熟を必要とすることなく容易
に利用出来る点を生かして広く利用されるように
なつた。 (e) 従来技術と問題点従来より音声処理装置は電話交換機等を介して
入力する不特定多数の話者を対象とする音声認識
手段、または特定話者の音声パターンを登録して
行う音声認識手段により音声入力を識別してデジ
タルの音声データに変換してホスト計算機
（HOST）に送出する。HOSTのデータ処理機能
により得られる回答データに従つて音声処理装置
は当該装置内に備えた音声合成部により回答音声
を発呼者に送出するデータ入出力のための周辺装
置である。上記の認識機能において不特定話者を
対象とする認識手段では、使用する前に音声パタ
ーンを登録する必要がないので簡単に使用出来
る。しかし不特定多数のバラツキを考慮してその
認識用の辞書を保持する必要があり、認識対象と
する単語を増すことも、単語の種類を変更するこ
とも困難であるため注文指定の対象とする商品名
等でその数の多いものに対処するためには数字等
を用いてコード化して入力する方法によつてい
た。一方特定話者を対象とする認識手段では使用す
る前に例えば単語毎に音声パターンを登録する必
要があり手段がかゝるが個人の特徴を捕えて登録
するので話者毎のバラツキを考慮する必要がない
ため比較的容易に認識対象とする単語を増加した
り変更したりすることが出来る。一方電話回線な
どを利用したシステムでは多数の人が音声処理装
置を使用するのでシステムの記憶容量に充分の余
裕があればすべての利用者を後者の登録式による
音声認識を適用してより多くの人に幅広いサービ
スを提供出来るがすべての人に適用を拡大すると
辞書のために必要とする記憶容量が膨大になつて
収容しきれなくなる。また利用者のすべてが頻度
の高い使用とは限らないので無駄な音声パターン
の登録が行われて装置の利用率が低下して了うこ
とになる。 (d) 発明の目的本発明の目的は上記の欠点を除去するため、例
えば一定期間１ケ月または６ケ月間で利用度の高
い人は予め音声処理装置で音声入力に際して行う
音声による識別コードを識別コードテーブルに登
録しておき、利用者からアクセスがある都度該テ
ーブルに照合して登録された利用者は登録式によ
る音声認識手段により、登録のない利用者は多少
不便でも不特定の認識手段により例えば商品名を
コードで入力する方法によつて処理する両方の利
点を利用して限られた記憶容量で出来る限り装置
の利用効率が高く得られる手段を提供しようとす
るものである。 (e) 発明の構成この目的は、音声入力を識別し音声出力により
応答する音声処理装置において、不特定話者によ
る音声を対象とする第１の音声認識手段、特定話
者による音声を対象とする登録式による第２の音
声認識手段、制御部の制御に従い両手段への入力
を切替える切替手段および該制御部はその記憶領
域に該第２の認識手段を使用する話者の識別コー
ドデータを登録する識別コードテーブルを備えて
なり、該制御部は話者のアクセスに際して音声に
よる識別コードを入力せしめると共に前記切替手
段により該第１の認識手段を選択せしめて得られ
る識別コードのデータを該識別コードテーブルに
照合して一致が得られぬときは引続き該第１の認
識手段により話者の音声認識を実行せしめ、一致
が得られたときは前記切替手段をして該第２の認
識手段を選択しつゝ話者にその旨通知して該第２
の認識手段による音声認識を実行することを特徴
とする音声認識の不特定／特定切換装置を提供す
ることによつて達成することが出来る。 (f) 発明の実施例以下図面を参照しつゝ本発明の一実施例につい
て説明する。図は本発明の一実施例における音声
認識の不特定／特定切換装置のブロツク図を示
す。図において、１は音声処理装置、２は網制御装
置（NCU）、１０ａは制御部、１０ｂはその記憶
部、１１ａは不特定話者音声認識部、１１ｂはそ
の不特定話者用辞書、１２ａは特定話者音声認識
部、１２ｂはその登録音声パターンを蓄積する辞
書部、１３ａは音声合成部、１３ｂはその音声メ
ツセージフアイル部、１４は制御部１０ａの制御
に従い入出力する音声信号を選択する切替部、１
５はデイスプレイ、１６はキーボード（KB）ま
たはライトペン等による入力部、尚１０baは制
御プログラム、１０bbは制御データおよび１０
bcは認識コードテーブルである。図の構成にお
いては外部の利用者からの電話機等の発呼に伴う
交換機を介してNCU２経由入力される音声信号
を前記部１０ａはその記憶部１０ｂに蓄積する制
御プログラム１０ba、制御データ１０bbにアク
セスしつゝ行う制御により切換部１４をして認識
部１１ａまたは１２ａを選択せしめて該音声信号
を入力する。認識部１１ａまたは１２ａは各辞書
部１１ｂまたは１２ｂにアクセスして該音声信号
を認識してデジタル信号による音声データを出力
し、制御部１０ａはその音声データをホスト計算
機（HOST）へ送出してHOSTはデータ処理を
行う。次に制御部１０ａはHOSTからの処理デ
ータを受信して音声合成部１３ａに印加し、音声
合成部１３ａはフアイル部１３ｂにアクセスして
処理データを音声メツセージに変換する。制御部
１０ａは切替部１４をして音声合成部１３ａを選
択接続せしめてNCU２を介し発呼者に処理デー
タによる音声メツセージを選択して応答する。以
上は従来の音声処理システムにおける音声処理動
作に変りはないが本発明においては音声応答装置
１の稼動効果を上げるため制御部１０ａは音声認
識における不特定／特定話者に対応する認識部１
１ａまたは１２ａの切替部１４による接続選択を
認識コードテーブル１０にアクセスしつゝ制御す
る。認識コードテーブル１０bcは音声処理装置
１における過去の利用状況を例えば利用者コード
毎に一定期間蓄積し利用回数の多い上位者より複
数名をテーブル１０bcに登録しておく。登録は
装置１の操作者がその利用者データをデイスプレ
イ１５の画面上に表示しつゝ入力部１６により登
録を設定しても良いし、１ケ月毎に削除、追加等
の訂正を自動処理としても良い。テーブル１０
bcは第１表のように構成される。 (a) Technical Field of the Invention The present invention relates to a switching device for speech recognition means intended for unspecified speakers and speech recognition means intended for specific speakers in a speech processing system. (b) Technical background With the development and spread of data processing technology in recent years, as part of the input/output means in data processing systems,
Speech recognition and synthesis technology, which was initially limited to voice-controlled classification devices and telephone guidance services, has been supported by advances in semiconductor integration technology and circuit design technology, and has expanded to include methods that require multiple levels of sophistication or large-capacity storage. As they became smaller and cheaper, data input/output means using Japanese language processing became widely used, taking advantage of the fact that they were suitable for interactive formats and were easy to use without requiring special training on the part of the operator. . (e) Prior Art and Problems Conventionally, speech processing devices have used voice recognition means for an unspecified number of speakers that is input via a telephone exchange, etc., or voice recognition that is performed by registering the voice pattern of a specific speaker. The means identifies the voice input, converts it into digital voice data, and sends it to the host computer (HOST). The voice processing device is a peripheral device for data input/output that sends a response voice to the caller using a voice synthesis section provided within the device according to the response data obtained by the data processing function of the HOST. In the above-mentioned recognition function, the recognition means targeted at unspecified speakers can be easily used because there is no need to register a speech pattern before use. However, it is necessary to maintain a dictionary for recognition in consideration of unspecified variations, and it is difficult to increase the number of words to be recognized or change the type of words, so it is subject to order specification. In order to deal with the large number of product names, etc., it has been necessary to code them using numbers and input them. On the other hand, with recognition methods that target specific speakers, it is necessary to register, for example, a speech pattern for each word before use, but since individual characteristics are captured and registered, variations among speakers are taken into account. Since this is not necessary, the number of words to be recognized can be increased or changed relatively easily. On the other hand, in systems using telephone lines, etc., many people use voice processing devices, so if the system has sufficient memory capacity, all users can be Although it is possible to provide a wide range of services to people, if the application is extended to all people, the storage capacity required for the dictionary will become enormous and cannot be accommodated. Furthermore, since not all users use the device frequently, unnecessary voice patterns are registered, resulting in a decrease in the utilization rate of the device. (d) Purpose of the Invention The purpose of the present invention is to eliminate the above-mentioned drawbacks. For example, for a certain period of one month or six months, a person who has a high usage rate can use a voice processing device to identify in advance the voice identification code that is used when inputting voice. The code is registered in the code table, and each time a user accesses the code, the code is checked against the table. Registered users will be recognized by a registered voice recognition method, while unregistered users will be recognized by an unspecified recognition method, even if it is somewhat inconvenient. For example, it is an attempt to provide a means for obtaining the highest possible utilization efficiency of the device with a limited storage capacity by taking advantage of both the advantages of processing by inputting a product name using a code. (e) Structure of the Invention The object of the present invention is to provide a first speech recognition means that targets speech by an unspecified speaker, and a first speech recognition means that targets speech by a specific speaker, in a speech processing device that identifies speech input and responds with a speech output. a second voice recognition means based on a registration formula, a switching means for switching input to both means under control of a control section, and the control section stores identification code data of a speaker using the second recognition means in its storage area. The controller includes an identification code table to be registered, and the control unit allows the speaker to input an identification code by voice upon access, and selects the first recognition means by the switching means, and uses the data of the identification code obtained for the identification code. When a match is not obtained by checking the code table, the first recognition means continues to recognize the speaker's voice, and when a match is obtained, the switching means is activated to recognize the second recognition means. Select the second option and notify the speaker of the selection.
This can be achieved by providing a speech recognition unspecified/specific switching device characterized in that the speech recognition is executed by the recognition means. (f) Embodiment of the invention An embodiment of the invention will be described below with reference to the drawings. The figure shows a block diagram of a speech recognition non-specific/specific switching device in an embodiment of the present invention. In the figure, 1 is a speech processing device, 2 is a network control unit (NCU), 10a is a control unit, 10b is a storage unit thereof, 11a is a speaker-independent speech recognition unit, 11b is a dictionary for non-specific speakers, 12a 12b is a speech recognition unit for a specific speaker; 12b is a dictionary unit that stores registered speech patterns; 13a is a speech synthesis unit; 13b is a voice message file unit; 14 selects audio signals to be input and output under the control of the control unit 10a. Switching section, 1
5 is a display, 16 is an input unit using a keyboard (KB) or light pen, etc., 10ba is a control program, 10bb is control data, and 10
bc is a recognition code table. In the configuration shown in the figure, the unit 10a accesses the control program 10ba and control data 10bb that store the voice signal inputted via the NCU 2 via the exchange when an external user makes a call from a telephone or the like in its storage unit 10b. Under continuous control, the switching unit 14 selects the recognition unit 11a or 12a and inputs the audio signal. The recognition unit 11a or 12a accesses each dictionary unit 11b or 12b, recognizes the audio signal, and outputs audio data in the form of a digital signal, and the control unit 10a sends the audio data to the host computer (HOST). performs data processing. Next, the control section 10a receives the processed data from the HOST and applies it to the speech synthesis section 13a, and the speech synthesis section 13a accesses the file section 13b and converts the processing data into a voice message. The control unit 10a causes the switching unit 14 to selectively connect the voice synthesis unit 13a, and responds to the caller by selecting a voice message based on the processed data via the NCU 2. The above is the same as the voice processing operation in the conventional voice processing system, but in the present invention, in order to increase the operational effect of the voice response device 1, the control unit 10a is a recognition unit 1 corresponding to unspecified/specific speakers in voice recognition.
The connection selection by the switching unit 14 of 1a or 12a is controlled while accessing the recognition code table 10. The recognition code table 10bc accumulates the past usage status of the voice processing device 1 for a certain period of time, for example, for each user code, and registers a plurality of users in the table 10bc in descending order of usage frequency. Registration may be done by the operator of the device 1 by displaying the user data on the screen of the display 15 and setting the registration using the input section 16, or by automatic processing of corrections such as deletions and additions every month. Also good. table 10
bc is constructed as shown in Table 1.

【表】【table】

Claims

[Claims]

1. In a speech processing device that identifies speech input and responds with speech output, a first speech recognition means targets speech by an unspecified speaker, and a second speech recognition means by a registration method targets speech by a specific speaker. means, a switching means for switching input to both means under control of the control section, and the control section includes an identification code table for registering identification code data of a speaker using the second recognition means in its storage area. , the control unit causes the speaker to input an audio identification code when accessing the device, and causes the switching means to select the first recognition means, and collates the obtained identification code data with the identification code table to find a match. If not, the first recognition means continues to recognize the speaker's voice, and if a match is found, the switching means selects the second recognition means and informs the speaker of the recognition. A speech recognition unspecified/specific switching device characterized in that the second recognition means executes speech recognition upon notification.