JPS63260253A

JPS63260253A - Voice response method

Info

Publication number: JPS63260253A
Application number: JP62093022A
Authority: JP
Inventors: Akio Komatsu; 小松　昭男; Eiji Ohira; 栄二大平; Yoshiaki Asakawa; 浅川　吉章
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1987-04-17
Filing date: 1987-04-17
Publication date: 1988-10-27

Abstract

PURPOSE:To specify a language used by a user through a natural conversation by outputting simultaneously a message from a voice conversation system in said system in lots of languages being objects of the system. CONSTITUTION:In the case of international telephone exchange, a conversation system is started by giving an origination to an international telephone attendant board and the conversation system, in receiving a request from the user, outputs a 1st message. In this case, a complimental text by all languages being objects of conversation system is voice-synthesized and they are outputted simultaneously. The user responds to the complimental text from the voice conversation system as above in a form of natural conversation and in its own language. Thus, the speech recognition section 13 can specify the language of the user easily through the natural conversation.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、複数国語を対象とした音声袋詰システムにお
ける音声応答方式に係り、特に、システムで用いる国語
を容易に特定するに好適な音声応答方式に関する。[Detailed Description of the Invention] [Field of Industrial Application] The present invention relates to a voice response method in a voice bagging system that targets multiple languages, and in particular, to a voice response method suitable for easily specifying the national language used in the system. Concerning response methods.

[Conventional technology]

多数国語を対象とした音声会話システム、例えば１国際
電話交換システムなど、においては、会話に用いる国語
をいかに特定するかが重要な問題となる。例えば、英語
を話す利用者に対して。In a voice conversation system that targets multiple languages, such as an international telephone exchange system, an important issue is how to specify the national language used for conversation. For example, for English speaking users.

「相手の電話番号は？」と日本語で質問した場合、スム
ーズな会話を期待することはできず、音声による会話の
自然さが、大きく損なわれてしまう。If you ask someone in Japanese, "What is the other person's phone number?", you cannot expect a smooth conversation, and the naturalness of the spoken conversation will be greatly diminished.

これに対し、従来技術では、特開昭６１−９０５６２に
記載のように、一定時間無音を出力し、その期間の利用
者からの応答文を認識することによってシステムが用い
る国語ｔ−特定すること、などが考えられている。On the other hand, in the conventional technology, as described in Japanese Patent Laid-Open No. 61-90562, it is possible to output silence for a certain period of time and identify the Japanese language t used by the system by recognizing the response sentences from the user during that period. , etc. are being considered.

[Problem that the invention seeks to solve]

上記従来技術では、音声会話システムにおける会話の自
然さにつれて配慮が光分ではなく、システムからの無音
の出力に対して利用者は戸惑いを感じ、自然に会話を行
うことが困欅となる場合が生じ得る。また、無音の出力
に対して利用者が如何に応答するかの予測が困難となり
、利用者の応答文の認識性能の劣化要因となり得る。In the above conventional technology, consideration is not given to the naturalness of the conversation in the voice conversation system, and the user may feel confused by the silent output from the system, making it difficult to have a natural conversation. can occur. Furthermore, it becomes difficult to predict how the user will respond to the silent output, which may be a factor in deteriorating the recognition performance of the user's response sentences.

本発明の目的は、多数国語を対象とした音声会話システ
ムにおける上記の問題を解決するものであシ、会話を自
然に進めながら、システムで用いるべき国語を容易に特
定できる音声応答方式を提供することである。An object of the present invention is to solve the above-mentioned problems in a voice conversation system that targets multiple languages, and to provide a voice response method that allows the conversation to proceed naturally while easily specifying the Japanese language to be used in the system. That's true.

[Means for solving problems]

上記目的は、音声会話システムにおけるシステムからの
メツセージｔ−，システムが対象としている多数の国語
で、同時に出力することにエリ、達成される。The above object is achieved by simultaneously outputting messages from a voice conversation system in multiple national languages targeted by the system.

[Effect]

システムからのメツセージが、多数１誼で同時に出力さ
れた場合でも、利用者は、普段から聴きなれている自国
語のメツセージを容易に聴き取ることができる。これは
、カクテルパーティ効果と関係しており、複雑な音声環
境の中でも日頃聡き馴れた声（自分の名前など）を容易
に聴き取ることができる。したがって１日本語、英語、
独語、等々での挨拶を同時に聴いたとしても１日本人に
とっては１日本語の挨拶を自然に聴き取ることがでさ、
利用者は自然に挨拶の応答を返し、自然な会話となる。Even when a large number of messages from the system are output simultaneously, the user can easily hear the messages in his or her native language, which he or she is accustomed to hearing. This is related to the cocktail party effect, which allows people to easily hear familiar voices (such as their own name) even in a complex audio environment. Therefore, 1 Japanese, English,
Even if Japanese people listen to greetings in German, etc. at the same time, it is difficult for Japanese people to naturally understand greetings in Japanese.
Users naturally respond with greetings, leading to a natural conversation.

また、システムからのメツセージに対する利用者からの
応答内容を容易に予測することができるので、システム
が電話に用いる国語ｔｎ定するための音声認識の精度を
極めて高くすることができる（上述の例では、利用者か
らの応答文は限られた挨拶文であると予測できる）。In addition, since it is possible to easily predict the content of the user's response to a message from the system, the accuracy of the voice recognition used by the system to determine the Japanese language used for telephone calls can be extremely high (in the example above, , it can be predicted that the responses from users will be limited to greetings).

〔Example〕

以下、本発明の一実７Ｍ例を第２図により説明する。第
２図は、本発明を用いた音声会話システムのシステム構
成金示す。音声会話システムの主要な制御は、音声会話
処理部２３で行なわれる。システムと利用者とのインタ
フェースは、システムからのメツセージを出力する音声
合成部２１と、利用者からの音声応答全認識・理解する
音声認識＠２２．とによって、自然な会話の形で行なわ
れる。Hereinafter, a 7M example of the present invention will be explained with reference to FIG. FIG. 2 shows the system configuration of a voice conversation system using the present invention. The main control of the voice conversation system is performed by the voice conversation processing section 23. The interface between the system and the user is a voice synthesis unit 21 that outputs messages from the system, and a voice recognition @22 that recognizes and understands all voice responses from the user. This is done in the form of a natural conversation.

音声応答部２１は１本発明による音声応答方式によって
システムからのメツセージを出力するものである。シス
テムから出力すべき情報に応じた内容をもつメッセージ
文を、音声会話システムで対象とする国語すべて°に対
して音声合成し、それらの各国語によるメツセージ音声
を同時に出力する。The voice response section 21 outputs messages from the system using a voice response method according to the present invention. A message text with content corresponding to the information to be output from the system is synthesized into all target Japanese languages using a voice conversation system, and the message voices in those languages are simultaneously output.

音声認誠部２２は、システムのメツセージに対する利用
者の応答である音声′ｔ−認識・理解するものでメジ、
従来技術により容易に実現できる。また、音声会話処理
部２３は、音声会話システムの応用に応じて会話を処理
するものである。たとえば、国際電話交換を対象とした
ものであれば、相手先の電話番号や課金方法などの理解
・確認を。The voice recognition unit 22 recognizes and understands the voice that is the user's response to the message of the system.
This can be easily achieved using conventional technology. The voice conversation processing unit 23 processes conversations according to the application of the voice conversation system. For example, if you are dealing with an international telephone exchange, understand and confirm the other party's phone number and billing method.

自然な会話の形で行なう。Do it in a natural conversational way.

第１図は１本発明による音声応答方式を適用した音声会
話システムの動作の様子を示したものである。会話の始
まりは、一般に、利用者からの音声会話システムへの要
求により起動される。国際電話変換の場合１国際電話交
換台への発信によシ。FIG. 1 shows the operation of a voice conversation system to which a voice response system according to the present invention is applied. The beginning of a conversation is generally triggered by a request from the user to the voice conversation system. In the case of international call conversion, 1. To make a call to the international telephone exchange.

会話システムが起動される。利用者からの要求を受けて
、会話システムは最初のメツセージを出力する。この時
、会話システムが対象としている国語（第１図では、日
本賭、英語、独語の３ケ国語と想定している）による挨
拶文を音声合成し、それらを同時に出力する。すなわち
、［おはようございますＪ　、　「ｇｏｏｄ　ｍｏｒｎ
ｉｎｇＪ、　［ｇｕｔｅｎ　ｍｏｒｇｅｎ　Ｊを同時に
出力する。このように、多国語での挨拶文が同時に出力
された場合の音声環境は複雑ではあるが、利用者にとっ
ては、日頃馴れている自国語の挨拶文を、極めて容易に
聴き取ることができる。この時１国語の違いを、音声の
音色や、性別。The conversation system is activated. Upon receiving a request from the user, the conversation system outputs the first message. At this time, greetings in the Japanese language targeted by the conversation system (in FIG. 1, it is assumed that the three languages are Japanese, English, and German) are synthesized and output simultaneously. In other words, [good morning J, "good morning"]
ingJ, [guten morgen J is output at the same time. In this way, although the audio environment is complicated when greetings in multiple languages are output at the same time, users can very easily hear greetings in their own language, which they are familiar with. At this time, the differences between the two languages are the timbre of the voice and gender.

年令別などにより異なった音声で表現することができる
。たとえば１日本語の挨拶を男性の声で。It can be expressed in different voices depending on age, etc. For example, 1. Greetings in Japanese in a male voice.

英語の挨拶七女注の声で、独語の挨拶を子供の声で、各
々音声合成して同時に出力する。これにより、利用者は
、さらに容易に自国語の挨拶を聴き取ることができる。Both the English greeting in Nanajo's voice and the German greeting in a child's voice are synthesized and output simultaneously. This allows the user to more easily hear greetings in their native language.

このような音声会話システムからの挨拶文に対して、利
用者は、自然な会話の形で、自国語による挨拶文を応答
する。この工うに、利用者からの応答内容が容易に予測
できるので音声認識１３で対象とすべき認識用標準音声
１４としては、極めて限られたもののみを準備して２け
ば良いことになる。これにより、高い認識精度が期待で
き、利用者が発声した国語を容易に特定することができ
る。第１図では午前中の挨拶文の例を示したが。In response to greetings from such voice conversation systems, users respond with greetings in their native language in a natural conversational manner. In this way, since the content of the response from the user can be easily predicted, it is only necessary to prepare a very limited standard voice 14 for recognition to be used in the voice recognition 13. As a result, high recognition accuracy can be expected, and the Japanese language uttered by the user can be easily identified. Figure 1 shows an example of a morning greeting.

午後や夕方などの時間環境の変化に応じて、合成用音声
パターン１２や認識用標準音声１４の内容を変えること
により、会話の自然性を高めることができる。By changing the contents of the synthesis speech pattern 12 and the recognition standard speech 14 according to changes in the time environment, such as in the afternoon or evening, the naturalness of the conversation can be improved.

利用者からの応答文の認識によって利用者が話している
国語全特定し、それ以降の会話ではそれと同じ国語を用
いることにより。システムと利用者とが自然に会話を進
めることができる。By recognizing the user's response sentences, we can identify all the Japanese languages spoken by the user, and use the same Japanese language in subsequent conversations. The system and user can have a natural conversation.

〔Effect of the invention〕

本発明によれば、多国語を対象とした音声会話システム
において、自然な会話を通じて、しかも。According to the present invention, in a multilingual voice conversation system, through natural conversation.

信頼性高く、利用者の用いている国語ｔ−Ｗ定すること
ができるので、システムと利用者との会話を極めて円滑
に進めることができる。Since the Japanese language t-W used by the user can be determined with high reliability, conversation between the system and the user can proceed extremely smoothly.

[Brief explanation of the drawing]

第１図は１本発明による音声応答方式を用いた音声会話
システムの動作の様子を示す図、第２図は、音声会話シ
ステムのシステム構成図である。１１・・・音声合成、１２・・・合成用音声パターン。１３・・・音声認識、１４・・・認識用標準音声、１５
・・・音声会話処理、２１・・・音声合成部、２２・・
・音声認識部、２３・・・音声会話処理部。FIG. 1 is a diagram showing the operation of a voice conversation system using a voice response system according to the present invention, and FIG. 2 is a system configuration diagram of the voice conversation system. 11...Speech synthesis, 12...Speech pattern for synthesis. 13...Speech recognition, 14...Standard speech for recognition, 15
...Voice conversation processing, 21...Speech synthesis unit, 22...
- Voice recognition unit, 23... voice conversation processing unit.

Claims

[Claims]

1. In the voice response method of a voice conversation system that targets multiple languages, a message from the system is uttered simultaneously in multiple languages, the type of Japanese is identified based on the user's response to the message, and then, A voice response method characterized by responding to the user in the specified Japanese language.