JPH07129594A

JPH07129594A - Automatic interpreter system

Info

Publication number: JPH07129594A
Application number: JP5272476A
Authority: JP
Inventors: Masaie Amano; 真家天野; Kimito Takeda; 公人武田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1993-10-29
Filing date: 1993-10-29
Publication date: 1995-05-19

Abstract

(57)【要約】【目的】音声認識誤り、翻訳誤りの発生などのような
使用中に生ずる様々な状況に対処できる自動通訳システ
ムを提供すること。【構成】発話入力のための音声入力部１、指示情報入
力のための指示入力部２、音声出力部３および表示部４
を有し、互いに異なる種類の言語を示す属性が付された
複数の入出力手段と、一の入出力手段に含まれる音声入
力部からの発話を該属性に基づいて音声認識してコード
化する音声認識手段１０と、該コードを他の属性の言語
に対応するコードに翻訳する双方向自動翻訳手段１２
と、該コードを音声に変換する音声生成手段１１と、該
発話の認識結果または翻訳結果が特定できなかった場合
に前記一つの入出力手段の操作者すなわち話し手と該結
果に対する確認のための対話をするとともに、該対話が
行われる間、他の入出力手段の操作者すなわち聞き手と
該対話に関する対話をする対話手段１３を備えた。 (57) [Abstract] [Purpose] To provide an automatic interpretation system capable of coping with various situations occurring during use such as a voice recognition error and a translation error. [Structure] A voice input unit 1 for speech input, an instruction input unit 2 for inputting instruction information, a voice output unit 3, and a display unit 4
And a plurality of input / output means to which attributes indicating different kinds of languages are attached, and utterances from a voice input unit included in one input / output means are voice-recognized and coded based on the attributes. Voice recognition means 10 and bidirectional automatic translation means 12 for translating the code into a code corresponding to a language of another attribute.
And a voice generation means 11 for converting the code into a voice, and a dialogue for confirming the result with the operator of the one input / output means, that is, the speaker when the recognition result or the translation result of the utterance cannot be specified. In addition to the above, the dialogue means 13 is provided for making a dialogue regarding the dialogue with an operator of another input / output means, that is, a listener while the dialogue is being performed.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、異なる言語の話者が互
いに自国の言語で対話することを可能とする自動通訳シ
ステムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an automatic interpreting system that enables speakers of different languages to interact with each other in their own languages.

【０００２】[0002]

【従来の技術】従来、互いに異なる言語の話者が対話す
るシステムとしては、ポータブル翻訳機のように、２か
国語以上の簡単な対訳辞書とキーボードと、１行程度の
液晶表示部を持ち、キーボードから入力された複数の単
語をそのままの順序で次々と辞書を引き、表示すること
を基本としたものがあるだけであった。これは、翻訳機
という名がつけられているが、言語の文法解析、意味解
析などは行なわないので正しい翻訳は出力されず、単な
る辞書引きツールに過ぎないものであった。2. Description of the Related Art Conventionally, as a system in which speakers of different languages interact, like a portable translator, it has a simple bilingual dictionary of two or more languages, a keyboard, and a liquid crystal display unit of about one line. There was only one that was based on displaying multiple words entered from the keyboard in the same order as they were in the dictionary. Although it was named a translator, it did not perform grammatical analysis or semantic analysis of the language, so correct translations were not output and it was just a dictionary lookup tool.

【０００３】この装置では、例えば、かれレストランいくと入力すると、ＨＥＲＥＳＴＡＵＲＡＮＴＧＯのように単語がでてくるだけであった。このような装置
は、すぐに分かるように、複雑な文章に対しては全く実
用にならない。また、簡単な文章でさえ、助詞などの機
能語は辞書にその意味を記述することができないため、
意味がまったく逆の翻訳になることもある。例えば、
「太郎を花子は好きだ。」という意味を、次のように入
力すると、ＴａｒｏＨａｎａｋｏ好きＴａｒｏＨａｎａｋｏｌｉｋｅとなり、どちらが、どちらを好きなのか全く分からない
し、場合によっては太郎が花子を好きだと誤解されてし
まうこともあり得る。With this device, for example, when he or she inputs to go to a restaurant, only words such as HE RESTAURANT GO appear. Such a device, as will be readily seen, is completely impractical for complex sentences. Also, even in simple sentences, function words such as particles cannot describe their meaning in the dictionary,
Sometimes the meaning is completely opposite. For example,
If you enter the meaning of "I like Taro Hanako." As follows, you will get Taro Hanako like Taro Hanako like, and you will not know which one you like, and in some cases you will misunderstand that Taro likes Hanako. There is a possibility that it will be done.

【０００４】一方、本格的な自動翻訳システムとして
は、原言語の文法解析、意味解析を行なうとともに、対
象言語の生成過程を有する機械翻訳システムが、文書翻
訳の領域で実用化されている。On the other hand, as a full-fledged automatic translation system, a machine translation system that performs grammatical analysis and semantic analysis of a source language and has a process of generating a target language has been put to practical use in the field of document translation.

【０００５】しかし、これらは専ら文書翻訳に用いら
れ、対話をするための装置になっていない。典型的に
は、２言語以上の言語を同時に翻訳するような構成には
なっておらず、たとえ双方向翻訳機能をもっていても、
一度、一つ翻訳方向の翻訳を終了してから、逆方向の翻
訳プログラムを読び出すという手順が必要であった。However, these are exclusively used for document translation and do not serve as a device for dialogue. Typically, it is not configured to translate two or more languages simultaneously, and even if it has a bidirectional translation function,
It was necessary to complete the translation in one translation direction and then read the translation program in the reverse direction.

【０００６】また、従来の翻訳機はキーボード入力で原
文を入力しており、音声入力によることはできなかっ
た。一方、音声による通訳システムも研究されている
が、現在では、発話された音声を音声認識し、それを単
純に機械翻訳部に通し、翻訳された結果を音声生成部で
音声に変換するだけのものであり、音声認識誤り、翻訳
誤りが起った時にどう対処するか等の実用的な問題に対
する配慮はされていないなどの問題があった。[0006] Further, in the conventional translator, the original sentence is input by the keyboard, and it cannot be input by voice. On the other hand, although a voice interpretation system is also being researched, at present, speech recognition is performed on the uttered voice, and the result is simply passed through a machine translation unit, and the translated result is converted into a voice by a voice generation unit. However, there is a problem in that no consideration is given to practical problems such as how to deal with voice recognition errors and translation errors.

【０００７】[0007]

【発明が解決しようとする課題】このように従来では、
２か国語の話者が自由に音声で対話できる完全な通訳機
は実現されていなかった。本発明は、上記事情に鑑みて
なされたもので、音声認識誤り、翻訳誤りの発生などの
ような使用中に生ずる様々な状況に対処できる自動通訳
システムを提供することを目的とする。As described above, in the prior art,
A complete interpreter that allows bilingual speakers to talk freely by voice has not been realized. The present invention has been made in view of the above circumstances, and an object of the present invention is to provide an automatic interpreter system capable of coping with various situations that occur during use such as voice recognition error and translation error.

【０００８】[0008]

【課題を解決するための手段】本発明に係る自動通訳シ
ステムでは、発話を入力するための音声入力部、指示情
報を入力するための指示入力部、与えられた音声信号を
音声に変換して出力する音声出力部および与えられた情
報を表示する表示部を有し、互いに異なる種類の言語を
示す属性が付与された複数の入出力手段と、前記複数の
入出力手段のうち一つの入出力手段に含まれる前記音声
入力部から入力された発話を、該一つの入出力手段に付
与された前記属性に基づいて音声認識し対応するコード
またはコード列を生成する音声認識手段と、該コードま
たはコード列を、前記一つの入出力手段の他の入出力手
段に付与された属性が示す種類の言語に対応するコード
またはコード列にそれぞれ翻訳する双方向自動翻訳手段
と、該コードまたはコード列を、前記音声出力部に与え
るための音声信号に変換する音声生成手段と、前記音声
認識手段が前記入力された発話に対する認識結果を特定
できなかった場合または前記双方向自動翻訳手段が該音
声認識手段によって生成された前記コードまたはコード
列に対する翻訳結果を特定できなかった場合に、該認識
または該翻訳の結果に対する確認のための確認処理を前
記一つの入出力手段に含まれる前記音声入力部および前
記指示入力部の少なくとも一方ならびに前記音声出力部
および前記表示部の少なくとも一方を用いて行うととも
に、該確認処理が行われる間、前記他の入出力手段に含
まれる前記表示部および前記音声入力部の少なくとも一
方に該確認処理に関する情報を出力する対話手段とを備
えたことを特徴とする。In the automatic interpreting system according to the present invention, a voice input section for inputting utterances, an instruction input section for inputting instruction information, a given voice signal is converted into voice. A plurality of input / output means having an audio output section for outputting and a display section for displaying given information, and provided with attributes indicating different kinds of languages, and one input / output of the plurality of input / output means Voice recognition means for voice-recognizing an utterance input from the voice input unit included in the means based on the attribute given to the one input / output means, and generating a corresponding code or code string; Bidirectional automatic translation means for translating a code string into a code or a code string corresponding to a language of the type indicated by the attribute assigned to the other input / output means of the one input / output means, and the code or A voice generation means for converting a code string into a voice signal to be given to the voice output part, and a case where the voice recognition means cannot specify a recognition result for the input utterance or the bidirectional automatic translation means When the translation result for the code or code string generated by the voice recognition means cannot be specified, a confirmation process for confirming the recognition or translation result is included in the one input / output means. Section and the instruction input section and at least one of the voice output section and the display section, and the display section and the voice included in the other input / output unit while the confirmation processing is performed. It is characterized in that at least one of the input sections is provided with a dialogue means for outputting information regarding the confirmation processing.

【０００９】また、好ましくは、前記音声入力部から入
力された前記発話を前記音声認識手段に与えるととも
に、前記他の入出力手段に含まれる音声出力部から該発
話をそのまま出力させるように構成すると良い。Further, preferably, the utterance input from the voice input unit is given to the voice recognition unit, and the utterance is output as it is from a voice output unit included in the other input / output unit. good.

【００１０】また、前記音声認識手段は、前記発話が前
記音声入力部から入力される際に前記指示入力部から与
えられた該発話の中の少なくとも１つの字種を特定する
情報を用いて、該発話を音声認識するように構成しても
良い。Further, the voice recognition means uses information for specifying at least one character type in the utterance given from the instruction input unit when the utterance is input from the voice input unit, The speech may be recognized by voice.

【００１１】さらに、発話者が、前記指示入力部および
前記音声入力部の少なくとも一方を用いて、自分の発話
が終了したことを該自動通訳システムに伝えるように構
成すると好ましい。また、発話者が、前記指示入力部お
よび前記音声入力部の両方を用いて発話を入力するよう
に構成しても良い。Further, it is preferable that the speaker uses at least one of the instruction input unit and the voice input unit to notify the automatic interpreter system that his / her utterance is completed. Further, the speaker may input the utterance by using both the instruction input unit and the voice input unit.

【００１２】[0012]

【作用】この結果、本発明（請求項１）によれば、２以
上の各入出力手段は、それぞれ処理対象とする言語の種
類が予め決定されており、いずれかの入出力手段の音声
入力部から発話が入力されると、音声認識手段は該発話
を前記属性が示す種類の言語として音声認識し、双方向
自動翻訳手段はこの認識結果を他の言語に翻訳し、音声
生成手段はこの翻訳結果を音声信号に変換し、翻訳した
言語に対応する音声出力部はこの音声信号を音声に変換
して出力する。ここで、発話者の発話を音声認識した結
果、音声認識に失敗した場合、または音声認識は成功し
たものとして処理され、翻訳の段階で失敗した場合のい
ずれの場合も、対話部を通じて認識または翻訳できなか
った部分を前記一つの入出力手段の操作者である発話者
と対話しながら修正するとともに、対話部は該修正のた
めに発話者と対話している間、前記他の入出力手段の操
作者である一人または複数人の聞き手の対話相手に対し
て、しばらく待つ旨などを知らせたり、聞き手に状況を
逐一知らせたり、問い合わせに答えるような対話相手に
なるなどして、聞き手に無音時間を生じさせることを防
ぐ。As a result, according to the present invention (Claim 1), the type of language to be processed is predetermined for each of the two or more input / output means, and the voice input of any one of the input / output means is performed. When the utterance is input from the section, the voice recognition means recognizes the utterance as a language of the type indicated by the attribute, the bidirectional automatic translation means translates the recognition result into another language, and the voice generation means recognizes this utterance. The translation result is converted into a voice signal, and the voice output unit corresponding to the translated language converts the voice signal into voice and outputs the voice. Whether the speech recognition of the speaker is failed as a result of speech recognition, or the speech recognition is processed as successful and fails at the translation stage, recognition or translation is performed through the dialogue unit. While not interacting with the speaker who is the operator of the one input / output means to correct the unsuccessful portion, the dialog unit interacts with the speaker for the correction while the other input / output means Silent time is given to listeners who are one or more listeners who are operators by notifying them that they will wait for a while, letting listeners know the situation one by one, and becoming a dialogue partner who answers inquiries. Prevent from causing.

【００１３】また、前記対話手段は、音声を用いるのに
加えて、表示部や指示入力部を使い文字や記号などで行
うこともできるので、音声のみによる修正情報が再び音
声認識に失敗することによる再修正が生ずることを防ぐ
ことができる。In addition to the use of voice, the dialogue means can use characters and symbols using the display unit and the instruction input unit, so that correction information based only on voice will fail to recognize voice again. It is possible to prevent re-correction due to.

【００１４】また、本発明（請求項２）によれば、前記
一つの入出力手段に含まれる音声入力部から入力された
前記発話を前記他の入出力手段に含まれる音声出力部か
らそのまま出力させる。従って、前記他の入出力手段の
操作者である聞き手が前記一つの入出力手段の操作者で
ある対話相手の発話状況を相手の肉声と背景状況をモニ
ターできる。According to the present invention (Claim 2), the utterance input from the voice input unit included in the one input / output unit is directly output from the voice output unit included in the other input / output unit. Let Therefore, the listener who is the operator of the other input / output unit can monitor the utterance situation of the dialogue partner who is the operator of the one input / output unit, the real voice and the background situation of the other party.

【００１５】すなわち、音声通訳された応答が、音声生
成部による合成音であると、発話相手に関する情報が得
られなず、はなはだしい場合、女性の話し手の声が男性
音で合成される可能性もあるが、上記によって、相手の
性別、年齢、イントネーションによる会話の焦点、感情
などの情報が、相手の肉声によって得られ、また、相手
が一人なのか、誰かと相談しながら話しているのかなど
の状況や、相手のいる場所の背景音など通常の電話のよ
うに相手の置かれた状況についての情報を得ることがで
きる。That is, if the voice-interpreted response is a synthesized voice by the voice generation unit, information about the utterance partner cannot be obtained, and in the worst case, the voice of a female speaker may be synthesized with a male voice. However, by the above, information such as the other's gender, age, focus of conversation by intonation, feelings etc. can be obtained by the other person's real voice, and whether the other person is alone, talking with someone etc. You can get information about the situation, such as the background sound of the other party's place, the situation where the other person is placed like a normal telephone.

【００１６】一方、上記対話部の動作中に対話相手と該
自動通訳システムの会話をモニターできるので、聞き手
の待ち時間が長くなった場合、話者と応答部との会話を
モニターすることにより、意味は分からなくても、状況
認識に役立たせることも可能である。On the other hand, since the conversation between the conversation partner and the automatic interpretation system can be monitored during the operation of the dialogue unit, when the waiting time of the listener becomes long, the conversation between the speaker and the response unit can be monitored. Even if you don't understand the meaning, it can be useful for situational awareness.

【００１７】また、本発明（請求項３）によれば、話し
手は発話入力の際に、該発話の中の少なくとも１つの字
種を特定する情報を前記指示入力部からシステムに与
え、前記音声認識手段は、この情報を用いて該発話を音
声認識する。Further, according to the present invention (Claim 3), at the time of utterance input, the speaker gives information specifying at least one character type in the utterance to the system from the instruction input unit, and the voice The recognition means uses this information to recognize the speech by voice.

【００１８】従って、例えば英語では大文字か小文字
か、日本語では普通名詞か固有名詞か、などの区別等の
音声のみによっては伝えることの難しい言語情報を用い
て該発話を音声認識するので、音声認識の性能が高めら
れる。Therefore, for example, the utterance is voice-recognized by using the linguistic information which is difficult to convey only by the voice such as distinction between uppercase or lowercase in English, common noun or proper noun in Japanese, etc. The recognition performance is improved.

【００１９】[0019]

【実施例】以下、図面を参照しながら実施例を説明す
る。図１は、本発明の一実施例に係る自動通訳システム
を示す概略構成図である。この自動通訳システムは、異
なる言語で話す２人の話者Ａおよび話者Ｂの間の通訳を
行うものであり、例えば自動翻訳電話や同時通訳機とい
ったシステムに適用できる。Embodiments will be described below with reference to the drawings. FIG. 1 is a schematic configuration diagram showing an automatic interpretation system according to an embodiment of the present invention. This automatic interpreter system is for interpreting between two speakers A and B who speak in different languages, and can be applied to a system such as an automatic translation telephone or a simultaneous interpreter.

【００２０】図のように、該自動通訳システムは、表示
部１、入力部２、マイクロフォン３およびスピーカ４か
らなる話者Ａ用の第１の入出力部、表示部５、入力部
６、マイクロフォン７およびスピーカ８からなる話者Ｂ
用の第２の入出力部、制御部９、音声処理部２０および
自動通訳部３０を備える。また、音声処理部２０は音声
認識部１０および音声生成部１１を有し、自動通訳部１
５は双方向自動翻訳部１２と、自然言語理解部１４およ
び自然言語生成部１５からなる対話部１３とを有する。As shown in the figure, the automatic interpreter system includes a display unit 1, an input unit 2, a microphone 3, and a first input / output unit for a speaker A for a speaker A, a display unit 5, an input unit 6, and a microphone. Speaker B consisting of 7 and speaker 8
A second input / output unit, a control unit 9, a voice processing unit 20, and an automatic interpreting unit 30. Further, the voice processing unit 20 has a voice recognition unit 10 and a voice generation unit 11, and the automatic interpretation unit 1
Reference numeral 5 has a bidirectional automatic translation unit 12 and a dialogue unit 13 including a natural language understanding unit 14 and a natural language generation unit 15.

【００２１】第１の入出力部および第２の入出力部はそ
れぞれ、予め使用する言語の種類を決めておく。ここで
は、各入出力部に対して言語の種類を示す属性を付与し
ておくものとする。The type of language to be used is determined in advance for each of the first input / output unit and the second input / output unit. Here, it is assumed that an attribute indicating the type of language is given to each input / output unit.

【００２２】表示部１，５は、当該システムの使用者に
情報を伝えるために、文字や記号などの可視情報を表示
するためのものであり、液晶パネルなどにより構成され
る。入力部２，６は、文字や記号など音声以外により発
話や後述する確認用処理などに関する指示情報を入力す
るためのものであり、例えばキーボードマウスやタッチ
パネル等から構成される。The display units 1 and 5 are for displaying visible information such as characters and symbols in order to convey information to the user of the system, and are composed of a liquid crystal panel or the like. The input units 2 and 6 are for inputting instruction information relating to utterance or confirmation processing to be described later other than voice such as characters and symbols, and are composed of, for example, a keyboard mouse and a touch panel.

【００２３】マイクロフォン３，７は、話者の音声発話
を入力するためのものである。スピーカ４，８は、音声
生成部１１からの出力を発声する。制御部９は、当該シ
ステム全体の動作を制御するものであり、２種類の言語
における翻訳の方向、各種情報の流れの方向、情報のア
ドレスなど全てに渡って管理・制御する。The microphones 3 and 7 are for inputting the voice utterance of the speaker. The speakers 4 and 8 utter the output from the voice generation unit 11. The control unit 9 controls the operation of the entire system, and manages and controls the translation direction in two types of languages, the direction of various information flows, the address of information, and the like.

【００２４】音声認識部１０は、マイクロフォン３，７
から入力された音声を前記属性に基づいて認識する。音
声生成部１１は、自動翻訳部１２からの出力および対話
部１３からの出力を音声化する。The voice recognition unit 10 includes microphones 3 and 7.
The voice input from is recognized based on the attribute. The voice generation unit 11 converts the output from the automatic translation unit 12 and the output from the dialogue unit 13 into voice.

【００２５】双方向自動翻訳部１２は、音声認識部１０
から送出されてくる発話者の発話を対話相手方言語に翻
訳する。対話部１３は、認識結果や翻訳結果に曖昧性が
あるときのように話し手に対する確認や再度の音声入力
が必要な場合に、自然言語理解部１４および自然言語生
成部１５を用いて、当該システムが話し手と対話すると
ともに、この対話中に当該システムが聞き手と対話する
ものである。この対話のための入出力も双方向自動翻訳
部１２と同じ経路を通って運ばれる。The bidirectional automatic translation unit 12 is a voice recognition unit 10.
Translates the utterance of the speaker transmitted from the dialogue partner language. The dialogue unit 13 uses the natural language understanding unit 14 and the natural language generation unit 15 when the confirmation or the voice input is required again for the speaker such as when there is ambiguity in the recognition result or the translation result. Interacts with the speaker while the system interacts with the listener during this interaction. Input and output for this dialogue are also carried through the same route as the bidirectional automatic translation unit 12.

【００２６】自然言語理解部１４は、発話者の発話が、
曖昧性の解消に対する入力の場合には、自然言語で入力
された文章に対して構文・意味解釈を行い、発話者の指
示を解釈する。The natural language understanding unit 14 determines that the utterance of the speaker is
In the case of the input for disambiguation, the sentence / natural meaning interpretation is performed on the sentence input in natural language, and the instruction of the speaker is interpreted.

【００２７】自然言語生成部１５は、音声認識部１０で
認識された音声認識結果に曖昧性が生じた場合、または
双方向自動翻訳部１２で翻訳結果に曖昧性が生じた場合
に、発話者へ曖昧性があることを知らせる文章を作成す
る。The natural language generating unit 15 is a speaker when the voice recognition result recognized by the voice recognizing unit 10 is ambiguous or when the bidirectional automatic translation unit 12 is ambiguous in the translation result. Create a sentence that informs you that there is ambiguity.

【００２８】例えば、音声認識結果に曖昧性がある場
合、「音声認識に曖昧性があります。つぎの２つのう
ち、どちらが正しいか番号で答えてください。１．かた
しはがくせいです２．わたしはがくせいです」を作成
する。この作成された文章は、話し手側の表示部に出力
されるとともに、音声生成部１１に送られ、音声化され
て発話に知らせられる。For example, if there is ambiguity in the voice recognition result, "There is ambiguity in the voice recognition. Please answer by the number which of the following two is correct. 1. I am sorry but 2. I am. I'm sorry. " The created sentence is output to the display unit on the speaker side and also sent to the voice generation unit 11 to be voiced and notified to the utterance.

【００２９】発話者は、スピーカに音声出力されたメッ
セージまたは表示部に表示されたメッセージで、曖昧性
があることを知ると、解消する文章を自然言語で入力す
る。例えば、「２番が正解です」と入力する。この入力
は、自然言語１４で解釈が行われ、２番目の文章に該当
する「わたしはがくせいです」が選択される。When the speaker knows that there is ambiguity in the message output by voice from the speaker or the message displayed on the display unit, the speaker inputs the sentence to be resolved in natural language. For example, enter "No. 2 is the correct answer". This input is interpreted in natural language 14, and "I am gakusei desu" corresponding to the second sentence is selected.

【００３０】図２は、自動通訳システムの動作を表すフ
ローチャートである。以下、図２を参照しながら、自動
通訳システムの動作を説明する。ここでは、話者Ａが話
し手となり、話者Ｂが聞き手となっている状態であるも
のとする。また、話者Ａは日本語を用い、話者Ｂは英語
を用いるものとする。FIG. 2 is a flow chart showing the operation of the automatic interpretation system. The operation of the automatic interpretation system will be described below with reference to FIG. Here, it is assumed that the speaker A is the speaker and the speaker B is the listener. Also, it is assumed that speaker A uses Japanese and speaker B uses English.

【００３１】ａ）認識および通訳がそれぞれ一回で成功
した場合は、処理の流れは次のようになる。発話者Ａの
発話はまず、マイクロフォン３で入力される（ステップ
１）。A) When the recognition and the interpretation are successful once, the processing flow is as follows. The utterance of the speaker A is first input through the microphone 3 (step 1).

【００３２】入力された音声発話（例えば「わたしはが
くせいです」）は、制御部９によって音声認識部１０に
送られ、ここで前記属性に基づいてコード化される（ス
テップ２）。The input voice utterance (for example, "I am a kid") is sent by the control unit 9 to the voice recognition unit 10, where it is coded based on the attribute (step 2).

【００３３】コード化された発話（すなわちコードまた
はコード列）は、制御部９を介して自動翻訳部１２に送
られ、ここで相手方である話者Ｂの用いる言語による文
章（例えば「Ｉａｍａｓｔｕｄｅｎｔ．」）に対
応するコードまたはコード列に翻訳される（ステップ
４）。The coded utterance (that is, a code or a code string) is sent to the automatic translation unit 12 via the control unit 9, and a sentence (for example, "I am a "student.") or the corresponding code or code string (step 4).

【００３４】翻訳された結果は、再び制御部９を通って
音声生成部１１に送られ、音声化される（ステップ
７）。音声化された発話は、制御部９により、相手方Ｂ
のスピーカ８に送られ、音声出力される（ステップ
８）。The translated result is sent again to the voice generation unit 11 through the control unit 9 and converted into voice (step 7). The control unit 9 controls the other party B
Is sent to the speaker 8 and output as voice (step 8).

【００３５】ｂ）次に、音声認識部１０による認識結果
として適正なものが得られなかったと判断された場合、
例えば認識ができないためあるいは認識結果に曖昧性が
あるために認識結果を特定できなかった場合など、につ
いて説明する。B) Next, when it is determined that the proper recognition result by the voice recognition unit 10 is not obtained,
For example, a case where the recognition result cannot be specified because it cannot be recognized or because the recognition result is ambiguous will be described.

【００３６】発話者Ａの発話が、マイクロフォン３で入
力される（ステップ１）。入力された音声発話は、制御
部９によって音声認識部１０に送られ、コード化される
（ステップ２）。The utterance of the speaker A is input through the microphone 3 (step 1). The input voice utterance is sent to the voice recognition unit 10 by the control unit 9 and coded (step 2).

【００３７】ここで、その認識結果として適正なものが
得られなかったと判断された場合（ステップ３）、制御
部９は、音声認識部１０から音声認識結果とともに認識
結果の付帯情報を受けとり、それに応じた確認用処理を
行なう（ステップ６）。If it is determined that the proper recognition result is not obtained (step 3), the control unit 9 receives the voice recognition result and the supplementary information of the recognition result from the voice recognition unit 10, and A corresponding confirmation process is performed (step 6).

【００３８】例えば、図３のように「わたしはがくせい
です」との発話に対する認識結果として、曖昧性が生じ
たものとする。図３では、「わたし」の中の「わ」が、
「わ」と「か」のどちらか認識できず、両方が出力され
た様子を表している。For example, as shown in FIG. 3, it is assumed that ambiguity has occurred as a recognition result for the utterance "I am a student". In Figure 3, "wa" in "I"
Either "wa" or "ka" cannot be recognized, and both are output.

【００３９】制御部９は、音声認識部１０から図３の２
つの音声認識結果とともに認識結果の付帯情報、例えば
「曖昧性があります」を受けとって、対話部１３にこの
情報を送り、自然言語生成部１５に「わ」と「か」のど
ちらが正しいかを発話者Ａに問い合せる問い合せ文を作
成させ、音声生成部１１により音声化して、スピーカ４
により音声で問い合せるとともに、必要に応じて文字コ
ードのまま表示部１にも同じ問い合せを表示する。発話
者Ａは、この問い合せに応じて、「わ」の発音に注意す
るなりして再度「わたしはがくせいです」と発話入力
し、再度の認識処理を試みる（ステップ１〜３）。The control unit 9 operates from the voice recognition unit 10 to 2 in FIG.
In addition to the two speech recognition results, the supplementary information of the recognition results, for example, "there is ambiguity", sends this information to the dialogue unit 13 and utters the natural language generation unit 15 which of "wa" and "ka" is correct. The person A is made to make an inquiry sentence, and the voice is generated by the voice generator 11, and the speaker 4
In addition to making an inquiry by voice, the same inquiry is displayed on the display unit 1 as the character code if necessary. In response to this inquiry, the speaker A utters “I am gakusei” again after paying attention to the pronunciation of “wa”, and tries the recognition process again (steps 1 to 3).

【００４０】そして、発話が正しく認識されるまで、以
上の処理ループを繰り返す。さらに、本実施例では、前
記問い合せに対して、上記例のように表示部１に表示さ
れた音声認識結果に正しいものがある場合は、例えば図
３の２番目の音声認識結果「わたしはがくせいです」を
前述したような方法により音声であるいはキーボード等
から選択入力することで確認を行って、再度の認識処理
を省くことが可能である。Then, the above processing loop is repeated until the utterance is correctly recognized. Further, in the present embodiment, in the case where the voice recognition result displayed on the display unit 1 is correct in response to the inquiry as in the above example, for example, the second voice recognition result of FIG. It is possible to check by inputting "I am sorry" by voice or by selecting from a keyboard etc. by the method as described above, and it is possible to omit the recognition process again.

【００４１】一方、上記確認用処理が行われている間、
待機することになる聞き手Ｂに対しても、対話部１３は
待機用処理を行なう（ステップ６）。例えば、「話し手
Ａからの発話の到着が遅れますので、しばらくお待ち下
さい」あるいは「話し手Ａからの発話を確認しておりま
すので、しばらくお待ち下さい」などの旨の案内を出し
て、無音状態を回避することによって、聞き手Ｂがいら
ついたり、不安になったりすることがないようにする。
この案内は、音声生成部１１を通して、スピーカ８およ
び表示部５の一方あるいは両方を使って行なうことがで
きる。これらの様子を図４に示す。On the other hand, while the confirmation process is being performed,
For the listener B who will be on standby, the dialogue unit 13 also performs standby processing (step 6). For example, a message such as "Please wait for a while because the arrival of the utterance from speaker A will be delayed" or "We are confirming the utterance from speaker A, please wait for a while." Avoiding this will prevent listener B from being irritated or anxious.
This guidance can be performed through the voice generation unit 11 using one or both of the speaker 8 and the display unit 5. These states are shown in FIG.

【００４２】また、上記の他に、聞き手に状況を逐一知
らせたり、聞き手の問い合わせに答えるなどして対話相
手として機能させることも可能である。次に、上記のよ
うにして正しくコード化された発話は、制御部９を介し
て自動翻訳部１２に送られ、ここで相手方である話者Ｂ
の用いる言語による文章に翻訳される（ステップ４）。In addition to the above, it is also possible to inform the listener of the situation one by one and answer the inquiry of the listener so that the listener functions as a dialogue partner. Next, the utterance correctly coded as described above is sent to the automatic translation unit 12 through the control unit 9, and here the speaker B who is the other party.
Is translated into a sentence in the language used by (step 4).

【００４３】翻訳された結果は、再び制御部９を通って
音声生成部１１に送られ、音声化される（ステップ
７）。音声化された発話は、制御部９により、相手方Ｂ
のスピーカ８に送られ、音声出力される（ステップ
８）。The translated result is sent again to the voice generation unit 11 through the control unit 9 and converted into voice (step 7). The control unit 9 controls the other party B
Is sent to the speaker 8 and output as voice (step 8).

【００４４】ｃ）次に、音声認識部１０が認識誤りを検
出できずに、自動翻訳部１２による翻訳結果として適正
なものが得られなかったと判断された場合、例えば翻訳
ができないためあるいは翻訳結果に曖昧性があるために
翻訳結果を特定できなかった場合など、について説明す
る。C) Next, when the speech recognition unit 10 cannot detect a recognition error and it is determined that the automatic translation unit 12 cannot obtain a proper translation result, for example, because the translation cannot be performed or the translation result is not obtained. The case where the translation result could not be specified due to ambiguity is explained.

【００４５】発話者Ａの発話は、マイクロフォン３で入
力される（ステップ１）。入力された音声発話は、制御
部９によって音声認識部１０に送られ、ここでコード化
される（ステップ２）。The utterance of the speaker A is input through the microphone 3 (step 1). The input voice utterance is sent to the voice recognition unit 10 by the control unit 9 and coded therein (step 2).

【００４６】コード化された発話は、制御部９を介して
自動翻訳部１２に送られ、ここで相手方である話者Ｂの
用いる言語による文章に翻訳される（ステップ４）。そ
の翻訳結果として適正なものが得られなかったと判断さ
れた場合（ステップ５）、制御部９は、自動翻訳部１２
から自動翻訳結果とともに翻訳結果の付帯情報を受けと
り、それに応じた確認用処理を行なう（ステップ６）。The coded utterance is sent to the automatic translation unit 12 via the control unit 9 and is translated into a sentence in the language used by the speaker B who is the other party (step 4). When it is determined that a proper translation result has not been obtained (step 5), the control unit 9 causes the automatic translation unit 12
The auxiliary information of the translation result is received together with the automatic translation result, and the confirmation process is carried out accordingly (step 6).

【００４７】例えば、図５に、音声認識部１０が認識誤
りを検出できなかった場合の様子を示す。仮に音声認識
部１０が音声認識結果「かたしはがくせいです」を正し
いと判断した場合、該認識結果は制御部９に送られる。
制御部９からこの結果を受取った自動翻訳部１２は、
「かたし」を辞書中に見つけられず、未知語として処理
する。この結果、翻訳は不完全なものとなる。全く翻訳
できない場合、あるいは部分的に翻訳できない場合など
いろいろな場合が想定される。これは音声認識の誤りの
性質と自動翻訳部の設計思想に依存する。For example, FIG. 5 shows a state in which the voice recognition unit 10 cannot detect a recognition error. If the voice recognition unit 10 determines that the voice recognition result “Katahagakuseisei” is correct, the recognition result is sent to the control unit 9.
The automatic translation unit 12 that receives this result from the control unit 9
"Katashi" cannot be found in the dictionary and is treated as an unknown word. The result is an incomplete translation. Various cases are conceivable, such as a case where no translation is possible, or a case where partial translation is not possible. This depends on the nature of the error in speech recognition and the design concept of the automatic translation unit.

【００４８】ここでは、一例として、図５のような結果
が自動翻訳部１２から制御部９に返されるものとする。
制御部９は、この結果を対話部１３に送り発話者Ａに、
修正を求めるための適切な問い合せ文を生成させ、上記
の音声認識誤りの場合と同様にして、話し手Ａと確認の
ための対話を行なう。Here, as an example, it is assumed that the result as shown in FIG. 5 is returned from the automatic translation unit 12 to the control unit 9.
The control unit 9 sends this result to the dialogue unit 13,
An appropriate inquiry sentence for correction is generated, and a dialogue for confirmation is made with the speaker A in the same manner as in the case of the voice recognition error described above.

【００４９】また、上記の音声認識誤りの場合と同様に
して、上記対話と同時に聞き手Ｂと待機のための対話を
行なう。このようにして正しく翻訳された結果は、再び
制御部９を通って音声生成部１１に送られ、音声化され
る（ステップ７）。In the same manner as in the case of the voice recognition error described above, a dialogue for waiting with the listener B is performed at the same time as the above dialogue. The result of correct translation in this manner is sent again to the voice generation unit 11 through the control unit 9 and converted into voice (step 7).

【００５０】音声化された発話は、制御部９により、相
手方Ｂのスピーカ８に送られ、音声出力される（ステッ
プ８）。このように、本実施例の自動通訳システムで
は、音声認識結果や翻訳結果に何等可の問題があった場
合に、話し手と確認のための対話をすると同時に、待機
している聞き手にも該確認に関する情報を伝えるために
対話を行う対話手段を設けたので、音声入力による自動
翻訳で実際に生ずるさまざまな問題に対しての対処が可
能となり、該システムの実用性を飛躍的に向上させるこ
とができる。The voiced utterance is sent by the control unit 9 to the speaker 8 of the other party B and output as voice (step 8). As described above, in the automatic interpretation system according to the present embodiment, when there is any problem in the voice recognition result or the translation result, the conversation with the speaker for confirmation is made, and the confirmation is made to the waiting listener. Since a dialogue means for carrying out a dialogue is provided to convey information about the information, it becomes possible to deal with various problems actually caused by automatic translation by voice input, and it is possible to dramatically improve the practicality of the system. it can.

【００５１】ここで、上記では発話として音声のみを用
いていたが、音声に加えてキーボードなどからの入力を
用いることにより、記号のように音声入力のみでは入力
しにくいもの、あるいは音声入力に何度も失敗したもの
等が簡単に入力可能になる。Here, although only voice is used as the utterance in the above, by using input from a keyboard or the like in addition to voice, it is difficult to input only voice input such as symbols, or what is used for voice input. It is possible to easily enter things that failed.

【００５２】また、本実施例の自動通訳システムは、２
人の会話者Ａ，Ｂが同じ場所で対面しながら用いること
も、また制御部９を公衆通信回線に接続して遠隔地にい
る相手との対話に用いることもできる。すなわち、公衆
通信回線を通して会話する場合、スピーカからの出力は
合成音声になるため、相手の年齢、性別、感情、イント
ネーションなどの情報、あるいは相手が一人なのか、誰
かと相談しながら話しているのかなどの状況、相手のい
る場所の背景音など相手の置かれた状況についての情報
が欠落することがあり、相手側の様子が分からない可能
性があるが、制御部９によって発話者Ａのマイクロフォ
ン３からの肉声をそのまま、聞き手Ｂに流すように構成
することによって、該欠落することがある情報を得るこ
とができる。ここで、図６には発話者の発話が、翻訳さ
れる場合の単純化したパスと肉声のパスを、図７には発
話者の発話が聞き手に到着する相対的タイミングを示
す。各図中、ｔ１は発話の開始時刻である。この肉声
は、制御部９から直ちに、公衆通信回線を通って相手に
送られる。この遅れは、通信衛星を通した場合で、１秒
程度である。すなわち、ｔ２＝ｔ１＋１ｓｅｃ程度である。また、ｔ３は、発話者が発話にかかる時間、音
声認識にかかる時間、自動翻訳にかかる時間、誤りが生
じた場合の問い合せ・対話にかかる時間からなる。この
時間量は、特定が困難であるが、誤りの修正のための対
話時間がないとし、１０語程度の短い発話ならば、実測
によれば最大３０秒程度である。すなわち、ｔ３≦
ｔ１＋３０ｓｅｃである。Further, the automatic interpreting system of this embodiment is 2
The human talkers A and B can be used while facing each other at the same place, or can be used for a dialogue with a remote party by connecting the control unit 9 to a public communication line. That is, when talking over a public communication line, the output from the speaker is synthetic voice, so information such as the age, sex, emotion, intonation of the other party, or whether the other party is alone, talking with someone However, there is a possibility that information about the situation of the other party such as the background sound of the other party's place may be missing, and the situation of the other party may not be known. However, the control unit 9 controls the microphone of the speaker A. It is possible to obtain the information that may be missing by arranging the real voice from 3 as it is to the listener B. Here, FIG. 6 shows a simplified path and a real voice path when the utterance of the speaker is translated, and FIG. 7 shows relative timings when the utterance of the speaker arrives at the listener. In each figure, t1 is the start time of speech. This real voice is immediately sent from the control unit 9 to the other party through the public communication line. This delay is about 1 second when passing through a communication satellite. That is, t2 = t1 + 1 sec. Further, t3 is composed of a time required for the speaker to speak, a time required for voice recognition, a time required for automatic translation, and a time required for inquiry / dialogue when an error occurs. Although it is difficult to specify this amount of time, it is assumed that there is no dialogue time for correcting an error, and if the utterance is short for about 10 words, it is about 30 seconds at maximum according to actual measurement. That is, t3 ≦
It is t1 + 30 sec.

【００５３】また、遠隔地通信の場合、時間遅れのため
の会話のタイミングがうまく取れないなどの問題があ
る。国際電話では、特に衛星通信の場合、この発話のタ
イミングは現在でも普通に体験されるところである。自
動通訳を行うシステムでは、音声認識処理、機械翻訳処
理などのため、この時間遅れが膨大なものになる可能性
がある。本実施例では、前記対話手段を用いることによ
って、時間遅れの期間に聞き手にメッセージを与えるな
どすることができる。Further, in the case of remote communication, there is a problem that the timing of conversation cannot be properly taken due to a time delay. With international calls, especially in the case of satellite communications, the timing of this utterance is still commonly experienced today. In a system that performs automatic interpretation, this time delay may be enormous due to voice recognition processing, machine translation processing, and the like. In this embodiment, by using the dialogue means, it is possible to give a message to the listener during the time delay.

【００５４】また、音声入力による場合、発話の終了を
自動的に認識することは困難である。そこで、ある一定
時間、音声入力がない場合、終了したと見做して、翻訳
を開始することもできる。あるいは、トランシーバのよ
うに、制御部９に対して、「どうぞ」のような一定の音
声合図をすることもできる。キーボードなどの音声以外
による入力手段を用いて、システムに発話の終了を伝え
るようにすれば確実であるのでより好ましい。In the case of voice input, it is difficult to automatically recognize the end of speech. Therefore, if there is no voice input for a certain period of time, it can be considered that the processing has been completed and the translation can be started. Alternatively, like a transceiver, a constant voice signal such as "Please" can be given to the control unit 9. It is more preferable to inform the system of the end of the utterance by using an input means other than voice such as a keyboard, which is more preferable.

【００５５】また、音声では、大文字、小文字の区別あ
るいは、普通名詞と固有名詞の区別が困難である。英語
の場合「ｊａｐａｎ」は「漆器」、「Ｊａｐａｎ」は
「日本」と訳さなければならないが、音声で、この区別
をすることは極めて困難である。また、日本語でも、
「近藤」と「混同」の区別は困難である。このような情
報もキーボードから入力するように構成すれば好まし
い。その他、文章中に他言語が混在する場合（例えば、
「Ｈｏｗａｂｏｕｔ “ｇｏｍｅｎｎａｓａ
ｉ”」）なども同様である。Also, in voice, it is difficult to distinguish between uppercase and lowercase letters, or distinguish between common nouns and proper nouns. In the case of English, "Japan" must be translated as "lacquer ware" and "Japan" as "Japan", but it is extremely difficult to make this distinction by voice. Also in Japanese,
It is difficult to distinguish “Kondo” and “confused”. It is preferable to configure such that such information is also input from the keyboard. In addition, when other languages are mixed in the sentence (for example,
"How about" go men na sa
i "") and the like.

【００５６】ここで、音声生成部５では、生成する音声
に話者の属性、例えば年齢、性別、イントネーション
（皮肉、喜怒哀楽等）、アクセントなどを反映させるよ
うに構成すると、聞き手が翻訳音声から話し相手に関す
る情報を得ることができるようになるので効果的であ
る。Here, if the voice generation unit 5 is configured to reflect the attributes of the speaker, such as age, sex, intonation (irony, emotions, etc.), accent, etc., in the generated voice, the listener can translate the voice. This is effective because you can get information about the person you are talking to from.

【００５７】また、双方向自動翻訳部１２では、ダイア
レクト、例えば語彙の方言、出身地、教育背景などを反
映させて翻訳するようにすれば、聞き手が翻訳文の内容
から話し相手に関する情報を得ることができるようにな
るので効果的である。In the bidirectional automatic translation unit 12, if the dialect, for example, the vocabulary of the vocabulary, the place of origin, the educational background, etc. are reflected and translated, the listener can obtain the information about the other party from the content of the translated sentence. It is effective because it will be possible.

【００５８】例えば、英語の「Ｉ」に対応する日本語と
しては、「私」、「僕」、「俺」をはじめとして種々の
ものが揚げられるが、まず、該双方向自動翻訳部１２
は、会話初期には「私」を用いておき、会話が進む過程
で用いられる語彙等を分析して、逐次適切な語彙に置き
換えていくようにすれば良い。For example, as Japanese corresponding to English "I", various ones such as "I", "I", and "I" can be mentioned. First, the bidirectional automatic translation unit 12
In the initial stage of the conversation, "I" is used, and the vocabulary used in the process of the conversation is analyzed and sequentially replaced with an appropriate vocabulary.

【００５９】次に、本発明の他の実施例に係る自動通訳
システムについて説明する。図８は、本実施例の自動通
訳システムを示す概略構成図である。この自動通訳シス
テムは、図１の自動通訳システムを、異なる言語で話す
Ｎ人の話者の間の通訳を行うもの拡張した例である。な
お、Ｎ人の話者のうちに同一の言語で話す者が含まれて
いても構わない。Next, an automatic interpretation system according to another embodiment of the present invention will be described. FIG. 8 is a schematic configuration diagram showing the automatic interpretation system of the present embodiment. This automatic interpreting system is an example in which the automatic interpreting system in FIG. 1 is expanded to provide interpretation between N speakers who speak in different languages. It should be noted that the N speakers may include speakers who speak the same language.

【００６０】図のように、該自動通訳システムは、表示
部、入力部、マイクロフォンおよびスピーカからなる入
出力部をＮ組み備えたものであり、例えば話者Ａが話し
手となった場合、話者Ｂ〜Ｎが聞き手となる。As shown in the figure, the automatic interpreter system has N sets of input / output units consisting of a display unit, an input unit, a microphone and a speaker. For example, when the speaker A is a speaker, B to N are listeners.

【００６１】この場合、音声処理部２０には、最大Ｎ種
類の言語を処理できる機能を付加し、自動通訳部３０に
は、一方の言語から他方の言語への翻訳として最大Ｎ×
（Ｎ−１）種類の翻訳機能を付加する修正を図１のシス
テムに施せば良い。In this case, the voice processing section 20 is provided with a function capable of processing a maximum of N kinds of languages, and the automatic interpreting section 30 has a maximum of N × as a translation from one language to the other language.
The system of FIG. 1 may be modified by adding (N-1) types of translation functions.

【００６２】なお、このＮ人用自動通訳システムの動作
は、図１のシステムの説明から自明であるので、詳細な
説明は省略する。また、本発明は上述した各実施例に限
定されるものではなく、その要旨を逸脱しない範囲で、
種々変形して実施することができる。The operation of the automatic interpreter system for N persons is self-explanatory from the description of the system shown in FIG. 1, and a detailed description thereof will be omitted. Further, the present invention is not limited to the above-mentioned embodiments, and within the scope of the invention,
Various modifications can be implemented.

【００６３】[0063]

【発明の効果】本発明に係る自動通訳システムでは、音
声認識や翻訳において何等かの問題が生じた場合に、話
し手と確認のための対話をすると同時に、待機している
聞き手にも該確認に関する情報を伝えるために対話を行
う対話手段を設けたので、音声入力による自動翻訳で実
際に生ずる種々の問題に対する対処が可能となり、該シ
ステムの実用性を飛躍的に向上させることができる。In the automatic interpreter system according to the present invention, when a problem occurs in voice recognition or translation, a dialogue for confirmation is made with the speaker, and at the same time, the listening listener who is on standby is also involved in the confirmation. Since the dialog means for carrying out a dialog for transmitting information is provided, it becomes possible to deal with various problems actually caused by automatic translation by voice input, and it is possible to dramatically improve the practicality of the system.

[Brief description of drawings]

【図１】本発明の一実施例に係る自動通訳システムを示
す概略構成図FIG. 1 is a schematic configuration diagram showing an automatic interpretation system according to an embodiment of the present invention.

【図２】同実施例の動作を示すフローチャートFIG. 2 is a flowchart showing the operation of the embodiment.

【図３】同実施例における認識誤りの出力例を示す図FIG. 3 is a diagram showing an output example of a recognition error in the same embodiment.

【図４】同実施例におけるシステムから発話者への問い
合せの出力例を示す図FIG. 4 is a diagram showing an output example of an inquiry from a system to a speaker in the embodiment.

【図５】同実施例における自動翻訳部での誤りの出力例
を示す図FIG. 5 is a diagram showing an output example of an error in the automatic translation unit in the embodiment.

【図６】同実施例における音声発話のパスを説明する図FIG. 6 is a diagram for explaining a voice utterance path in the embodiment.

【図７】同実施例における発話の聞き手に対する発話時
とその到着時の相対的タイミングを示す図FIG. 7 is a diagram showing relative timings at the time of utterance and arrival at the listener of the utterance in the embodiment.

【図８】本発明の他の実施例に係る自動通訳システムを
示す概略構成図FIG. 8 is a schematic configuration diagram showing an automatic interpretation system according to another embodiment of the present invention.

[Explanation of symbols]

１，５…表示部、２，６…入力部、３，７…マイクロフ
ォン、４，８…スピーカ、９…制御部、１０…音声認識
部、１１…音声生成部、１２…双方向自動翻訳部、１３
…対話部、１４…自然言語理解部、１５…自然言語生成
部、２０…音声処理部、３０…自動通訳部1, 5 ... Display unit, 2, 6 ... Input unit, 3, 7 ... Microphone, 4, 8 ... Speaker, 9 ... Control unit, 10 ... Voice recognition unit, 11 ... Voice generation unit, 12 ... Bidirectional automatic translation unit , 13
... dialogue part, 14 ... natural language understanding part, 15 ... natural language generating part, 20 ... voice processing part, 30 ... automatic interpreting part

Claims

[Claims]

1. A voice input unit for inputting a speech, an instruction input unit for inputting instruction information, a voice output unit for converting a given voice signal into voice and outputting the voice, and displaying the given information. A plurality of input / output units having a display unit, to which attributes indicating different types of languages are added, and utterances input from the voice input unit included in one of the plurality of input / output units A voice recognition means for performing voice recognition based on the attribute given to the one input / output means to generate a corresponding code or code string, and the code or code string for the other input / output means. Bidirectional automatic translation means for respectively translating into a code or code string corresponding to the type of language indicated by the attribute given to the input / output means, and a voice signal for giving the code or code string to the voice output unit. And a voice generation unit for converting the code or code string generated by the voice recognition unit when the voice recognition unit cannot specify a recognition result for the input utterance or the two-way automatic translation unit. When the result cannot be specified, a confirmation process for confirming the result of the recognition or the translation is included in at least one of the voice input unit and the instruction input unit included in the one input / output unit, and the voice output unit. And using at least one of the display units, and while the confirmation process is being performed, information regarding the confirmation process is output to at least one of the display unit and the voice input unit included in the other input / output unit. An automatic interpreting system, which is provided with a dialogue means.

2. The utterance input from the voice input unit is given to the voice recognition unit, and the utterance is output as it is from a voice output unit included in the other input / output unit. The automatic interpretation system described in 1.

3. The voice recognition means uses information for specifying at least one character type in the utterance given from the pointing unit when the utterance is input from the voice input unit, 2. The voice recognition of the utterance.
Automatic interpretation system described in.