JPH043099A

JPH043099A - Voice recognition processor

Info

Publication number: JPH043099A
Application number: JP2104016A
Authority: JP
Inventors: Shinji Hayashi; 林　進二
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1990-04-19
Filing date: 1990-04-19
Publication date: 1992-01-08

Abstract

PURPOSE:To eliminate the need for the registration of the recognized words by a subscriber at the time of the recognition of a specific speaker by specifying the voice pattern group to which the subscriber's voice patterns belong from previously registered voice pattern groups at the time of registration and recognizing the specific speaker by the specified voice pattern group at the time of the recognition of the subscriber. CONSTITUTION:The voice recognition data 10 from the subscriber is related to the most resembling voice recognition data 13 in the voice recognition data 12 of the mother subscriber group consisting of the voice patterns of the plural recognition words which are respectively previously registered. The specific speaker is recognized by the voice recognition data 13 of the related mother subscriber group at the time the subscriber utilizes a voice recognition service. The subscriber is able to utilize the specific speaker recognition which can recognize the many words with a high recognition ability in this way without registering the recognition words by the subscriber.

Description

【発明の詳細な説明】技術分野本発明は音声認識処理装置に関し、特に電話回線に接続
された電話機からの音声の音声認識処理に関する。TECHNICAL FIELD The present invention relates to a speech recognition processing device, and more particularly to speech recognition processing of speech from a telephone connected to a telephone line.

従来技術従来、この種の音声認識処理においては、語数を非常に
制限した不特定話者認識か、あるいは予め加入者毎に認
識するすべての認識語を登録する特定話者認識が行われ
ていた。Conventional technology Conventionally, in this type of speech recognition processing, either speaker-independent recognition with a very limited number of words or specific speaker recognition was performed in which all recognized words for each subscriber were registered in advance. .

このような従来の音声認識処理では、語数を非常に制限
した不特定話者認識の場合、認識語数が非常に限定され
るという欠点がある。Such conventional speech recognition processing has the disadvantage that the number of recognized words is extremely limited in the case of speaker-independent recognition where the number of words is extremely limited.

また、予め加入者毎に認識するすべての認識語を登録す
る特定話者認識の場合、すべての加入者（音声認識、処
理を利用するユーザ）に認識する語をすべて登録させる
という面倒な作業を押し付けるという欠点かある。In addition, in the case of specific speaker recognition in which all recognized words are registered for each subscriber in advance, the troublesome task of having all subscribers (users using voice recognition and processing) register all recognized words is eliminated. There is a drawback to being forced.

発明の目的本発明は上記のような従来のものの欠点を除去すべくな
されたもので、多くの語を高い識別能力で識別可能な特
定話者認識を加入者による認識語の登録を行うことなく
利用することができる音声認識処理装置の提供を目的と
する。Purpose of the Invention The present invention has been made to eliminate the drawbacks of the conventional methods as described above, and is capable of recognizing a specific speaker that can identify many words with high discrimination ability without registering recognized words by the subscriber. The purpose is to provide a speech recognition processing device that can be used.

発明の構成本発明による音声認１処理装置は、各々予め登録された
複数の認識語の音声パターンからなる複数の音声パター
ン群を格納する格納手段と、加入者の登録時に前記加入
者の音声パターンの属する音声パターン群を前記格納手
段に格納された前記複数の音声パターン群の中から特定
する特定手段と、前記加入者の識別時に前記特定手段に
より特定された前記音声パターン群により特定話者認識
を行う認識処理手段とを有することを特徴とする。Structure of the Invention The speech recognition 1 processing device according to the present invention includes a storage means for storing a plurality of speech pattern groups each consisting of speech patterns of a plurality of recognition words registered in advance, and a storage means for storing a plurality of speech pattern groups each consisting of speech patterns of a plurality of recognition words registered in advance; specifying means for specifying a voice pattern group to which the user belongs from among the plurality of voice pattern groups stored in the storage means, and recognizing a specific speaker using the voice pattern group specified by the specifying means when identifying the subscriber. It is characterized by having a recognition processing means for performing.

実施例次に、本発明の一実施例について図面を参照して説明す
る。Embodiment Next, an embodiment of the present invention will be described with reference to the drawings.

第１図は本発明の一実施例の構成を示すブロック図であ
り、第２図は本発明の一実施例による音声認識データを
示す概念図である。これらの図において、加入者か音声
登録用電話機８および音声登録支援用パソコン９により
加入者識別登録を行う場合、音声登録用電話機８および
音声登録支援用パソコン９が電話網６を介して接続され
ると、音声認識処理装置１はき声登録用電話機８および
音声登録支援用パソコン９から人力された少数の認識語
による加入者の音声認識データ１０と、予め図示せぬメ
モリ内に登録された母加入者集団の複数の認識語各々の
音声認識データ１２とを比較し、音声認識部２と音声送
出部３と端末制御部４とにより加入者識別登録を行う。FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention, and FIG. 2 is a conceptual diagram showing speech recognition data according to the embodiment of the present invention. In these figures, when a subscriber performs subscriber identification and registration using a voice registration telephone 8 and a voice registration support personal computer 9, the voice registration telephone 8 and voice registration support personal computer 9 are connected via a telephone network 6. Then, the voice recognition processing device 1 receives subscriber's voice recognition data 10 based on a small number of recognition words manually input from the voice registration telephone 8 and the voice registration support personal computer 9, and the subscriber's voice recognition data 10 registered in advance in a memory (not shown). The voice recognition data 12 of each of the plurality of recognized words of the mother subscriber group are compared, and the voice recognition section 2, voice transmission section 3, and terminal control section 4 perform subscriber identification registration.

この加入者識別登録により、音声登録用電話機８および
音声登録支援用パソコン９からの加入者の音声認識デー
タ１０か母加入者集団の音声認識データ１２の中の最も
近い音声認識データ１３に関係付けられる。Through this subscriber identification registration, the subscriber's voice recognition data 10 from the voice registration telephone 8 and voice registration support personal computer 9 is associated with the voice recognition data 13 closest to the voice recognition data 12 of the mother subscriber group. It will be done.

このとき、上記加入者識別登録により、登録処理により
得られた加入者作成の音声認識データ１０と、この音声
認識データ１０に関係付けられた母加入者集団の音声認
識データ１３と、加入者により独自に登録された加入者
登録音声認識データ１１とが作成される。At this time, through the subscriber identification registration, the voice recognition data 10 created by the subscriber obtained through the registration process, the voice recognition data 13 of the mother subscriber group associated with this voice recognition data 10, and the subscriber uniquely registered subscriber registered voice recognition data 11 is created.

この後、加入者か電話機７−１．　７−２により音声認
識サービスを受けるときには、加入者作成の音声認識デ
ータ１０と、この音声認識データ１０に関係付けられた
母加入者集団の音声認識データ１３と、加入者登録音声
認識データ１１とにより特定話者認識か音声認識処理装
置１て行われる。After this, the subscriber or telephone 7-1. When receiving the voice recognition service by 7-2, the voice recognition data 10 created by the subscriber, the voice recognition data 13 of the mother subscriber group associated with this voice recognition data 10, and the subscriber registered voice recognition data 11 are used. Specific speaker recognition is performed by the speech recognition processing device 1.

この特定話者認識においては、まず特定電話機から発呼
した呼は特定加入者が使用していると見做されるので、
音声認識処理装置１て特定電話機からの呼か否かがチエ
ツクされ、次に加入者からの加入者登録音声認識データ
１１が登録されたものか否かがチエツクされることによ
り加入者が特定される。In this specific speaker recognition, calls made from a specific telephone are considered to be used by a specific subscriber.
The voice recognition processing device 1 checks whether the call is from a specific telephone, and then checks whether the subscriber registered voice recognition data 11 from the subscriber is registered, thereby identifying the subscriber. Ru.

これらのチエツクにより加入者が特定されると、この加
入者の音声認識データ１０に関係付けられた母加入者集
団の音声認識データ１３によって特定話者認諏処理が行
われる。When a subscriber is identified through these checks, specific speaker recognition processing is performed using the voice recognition data 13 of the mother subscriber group associated with the voice recognition data 10 of this subscriber.

第３図は第２図の母加入者集団の音声認識データ１２の
一例を示す図である。図において、母加入者集団の音声
認識データ１２としては母加入者集団パターン１〜ｎ夫
々に認識語「はい」、「いいえ」、・・・・・・　「東
京」、「大阪」、・・・・・・　「部長」、「課長」、
・・・・の音声パターンが登録されている。FIG. 3 is a diagram showing an example of the voice recognition data 12 of the mother subscriber group shown in FIG. In the figure, the voice recognition data 12 of the mother population group includes the recognized words "yes", "no", . . . "Tokyo", "Osaka", . . . for mother population patterns 1 to n, respectively. ... "Director", "Section Manager",
The voice patterns of ... have been registered.

これら第１図〜第３図を用いて本発明の一実施例の動作
について説明する。The operation of one embodiment of the present invention will be explained using these FIGS. 1 to 3.

加入者か音声登録用電話機８および音声登録支援用パソ
コン９により加入者識別登録を行う場合、音声登録用電
話機８および音声登録支援用パソコン９が電話網６を介
して接続されると、音声認識処理装置１は加入者に上記
の母加入者集団の音声認識データ１２として登録された
認識語の一部、たとえば「はい」、「いいえ」、「東京
」、「部長」なとを言ってもらってその認識語の音声パ
タンを採取する。When a subscriber performs subscriber identification and registration using a voice registration telephone 8 and a voice registration support computer 9, when the voice registration telephone 8 and voice registration support computer 9 are connected via the telephone network 6, voice recognition is performed. The processing device 1 asks the subscriber to say some of the recognition words registered as the voice recognition data 12 of the mother subscriber group, such as "yes,""no,""Tokyo," and "manager." The speech pattern of the recognized word is collected.

音声認識処理装置１はその後の認識処理、すなわち母加
入者集団パターン１〜ｎとの関係付けをその音声パター
ンとの比較により行う。The speech recognition processing device 1 performs the subsequent recognition processing, that is, the association with the mother subscriber group patterns 1 to n by comparing the speech patterns with the mother subscriber group patterns 1 to n.

−数的には認、識語を３回言ってもらい、そのうち１回
目と２回目とて平均の音声パターンを採取し、３回目で
平均の音声パターンが認識可能か否かを判断する。- In terms of numbers, ask students to say recognition and idioms three times, take the average speech pattern from the first and second times, and judge whether or not the average speech pattern can be recognized from the third time.

このとき、上記の音声パターンの採取とは別に、その加
入者を識別するための認識語、たとえば「俺」、「自分
の名前」なとを加入者登録音再認、諜データ１１として
登録する。At this time, in addition to the voice pattern collection described above, recognition words for identifying the subscriber, such as "Ore" and "My name", are registered as the subscriber's registered sound recognition data 11.

この後に、加入者か電話機７−１．　７−２により音巾
認識サービスを受けるときには、音声認識データＷ１は
特定電話機からの呼か否かをチエツクし、次に電話機７
−１．　７−２および電話網６を介して入ってきた認識
語が加入者登録音声認識データ１１か百かをチエツクす
ることにより加入者を特定する。After this, the subscriber or telephone 7-1. When receiving the sound width recognition service by 7-2, the voice recognition data W1 is checked to see if the call is from a specific telephone;
-1. The subscriber is identified by checking whether the recognition word received via 7-2 and the telephone network 6 is subscriber registered voice recognition data 11 or 100.

これらのチエツクにより加入者か特定された後に、電話
機７−１．　７−２および電話網６を介して認識すべき
音声が入ってくると、音声認識処理装置１は入力音声パ
ターンと登録された音声パターンおよび関係付けられた
母加入者集団パターン１〜ｎの音声パターンとの比較サ
ーチを行い、それにより特定話者認識を行う。After the subscriber is identified through these checks, the telephone 7-1. 7-2 and the telephone network 6, the voice recognition processing device 1 converts the input voice pattern, the registered voice pattern, and the voice of the related subscriber group patterns 1 to n. A comparison search with a pattern is performed, and specific speaker recognition is performed based on the comparison search.

このように、加入者識別登録時に、加入者からの音声認
識データ】０を、各々予め登録された複数の認識語の音
声パターンからなる母加入者集団の音声認識データ１２
の中の最も近い音声識別ブタ１３に関係イ・ｊけ、加入
者か音声認識ザービスを利用するときに関係イτｊけた
母加入者集団の音声識別データ１３により特定話者認識
を行うようにすることによって、多くの語を高い識別能
力で識別可能な特定話者認識を加入者による認識語の登
録を行うことなく利用することかできる。In this way, at the time of subscriber identification registration, the voice recognition data 0 from the subscriber is converted into the voice recognition data 12 of the mother group consisting of the voice patterns of a plurality of recognized words, each registered in advance.
When a subscriber uses the voice recognition service, specific speaker recognition is performed using the voice identification data 13 of the mother subscriber group that is related to the nearest voice recognition group 13. By doing this, specific speaker recognition that can identify many words with high discrimination ability can be used without registering recognized words by the subscriber.

発明の詳細な説明したように本発明によれば、加入者の登録時に加
入者の音声パターンの属する音声パターン群を、各々予
め登録された複数の認識語の名声パターンからなる複数
の音声パターン群の中から特定し、加入者の識別時にそ
の特定されたＷ　ｆ４パターン群により特定話者認識を
行うようにすることによって、多くの語を高い識別能力
で識別可能な特定話者認識を加入者による認識語の登録
を行うこ己なく利用することかできるという効果がある
。DETAILED DESCRIPTION OF THE INVENTION According to the present invention, when a subscriber is registered, the voice pattern group to which the subscriber's voice pattern belongs is divided into a plurality of voice pattern groups each consisting of fame patterns of a plurality of recognition words registered in advance. By identifying a particular speaker from among them and performing specific speaker recognition using the identified W f4 pattern group when identifying a subscriber, specific speaker recognition that can identify many words with high discrimination ability can be performed. This has the advantage that it can be used without any effort to register recognition words.

[Brief explanation of drawings]

第１図は本発明の一実施例の構成を示すブロック図、第
２図は本発明の一実施例による音声認識データを示す概
念図、第３図は第２図の母加入者集団の音声認識データ
の一例を示す図である。主要部分の符号の説明１・・・・・音声認識処理装置２・・・・音声認識部３・・・・・音声送出部４・・・・端末制御部８・・・・・・音声登録用電話機９・・・・・音声登録支援用パソコン１０・・・・・加入前作成の音声認識データ１１・・・
・加入者登録音声認識データ１２・・・・・母加入者集
団の音声認識データ１３・・・・・関係付けられた母加入者集団の音声認識データ第２図第３図FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention, FIG. 2 is a conceptual diagram showing voice recognition data according to an embodiment of the present invention, and FIG. 3 is the voice of the mother group of subscribers in FIG. It is a figure showing an example of recognition data. Explanation of symbols of main parts 1...Speech recognition processing device 2...Speech recognition section 3...Speech transmission section 4...Terminal control section 8...Sound registration Telephone 9...Voice registration support computer 10...Voice recognition data created before joining 11...
・Subscriber registration voice recognition data 12...Speech recognition data of mother subscriber group 13...Speech recognition data of associated mother subscriber group Figure 2, Figure 3

Claims

[Claims]

(1) storage means for storing a plurality of voice pattern groups each consisting of voice patterns of a plurality of recognition words registered in advance;
specifying means for specifying a voice pattern group to which the voice pattern of the subscriber belongs from among the plurality of voice pattern groups stored in the storage means when registering a subscriber; and specifying by the specifying means when identifying the subscriber; and recognition processing means for performing specific speaker recognition using the voice pattern group.