JPH11242751A

JPH11242751A - Animation control apparatus and method, and text-to-speech apparatus

Info

Publication number: JPH11242751A
Application number: JP4250198A
Authority: JP
Inventors: Kazue Kaneko; 和恵金子
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1998-02-24
Filing date: 1998-02-24
Publication date: 1999-09-07

Abstract

(57)【要約】【課題】文章の読み上げを実行するに際して、読み上げ
る文章の内容に応じた表情をもつ顔のアニメーションを
自動的に生成することを可能とする。【解決手段】構文解析部１０３は文章入力部１０１より
入力された読み上げるべき文を解析し、更に意味解析部
１０８及びムード情報取り出し部１０９が当該文の有す
る雰囲気を示すムードを決定する。一方読み情報生成部
１９４は当該文の読み情報を生成する。アニメーション
生成部１１０は、ムード情報取り出し部１０９よりのム
ード情報と読み情報生成部１０４よりの読み情報とに基
づいて、アニメーション用辞書１１２を参照しながら顔
の表情や口の動きのアニメーションを生成する。そし
て、音声出力部１０６における当該文の音声出力と同期
させながらアニメーション表示部１１１が生成されたア
ニメーションを表示する。 (57) [Summary] [Problem] To read out a sentence, it is possible to automatically generate an animation of a face having an expression corresponding to the content of the sentence to be read out. A sentence analysis unit analyzes a sentence to be read, which is input from a sentence input unit, and a semantic analysis unit and a mood information extraction unit determine a mood indicating an atmosphere of the sentence. On the other hand, the reading information generation unit 194 generates reading information of the sentence. The animation generation unit 110 generates an animation of a facial expression and a mouth movement while referring to the animation dictionary 112 based on the mood information from the mood information extraction unit 109 and the read information from the read information generation unit 104. . Then, the animation display unit 111 displays the generated animation while synchronizing with the audio output of the sentence in the audio output unit 106.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明はアニメーション制御
装置及び方法に関する。特に、装置によって文章読み上
げを行う際の顔のアニメーション制御に好適なアニメー
ション制御装置及び方法及び該アニメーション装置を備
えた文読み上げ装置に関する。[0001] The present invention relates to an animation control apparatus and method. In particular, the present invention relates to an animation control device and method suitable for controlling animation of a face when a text is read out by a device, and a text-to-speech device provided with the animation device.

【０００２】[0002]

【従来の技術】一般に、顔のアニメーションを有する音
声合成システムは、ユーザとの一問一答を行なうガイダ
ンスシステムのユーザーインターフェイスとして使用さ
れることが多い。この種のシステムにおいては、発声す
る文章の長さに合せて、発声時間中、唇の形を変えて、
あたかもしゃべっているように見せかけることが行われ
る。また、応答の種類によって、笑顔や悲しい顔などの
表情の切り換えを行うものもある。しかし、このような
表情の切り換えを伴うガイダンスシステムでは、予め用
意された応答文のそれぞれについてどういう表情を生成
すればよいかを決めて表情の情報を保持しておき、応答
文を発声する際に、その応答文に応じた表情の情報を制
御情報として与えている。2. Description of the Related Art In general, a speech synthesis system having a face animation is often used as a user interface of a guidance system for asking and answering a user. In this type of system, the shape of the lips is changed during the utterance time according to the length of the sentence to be uttered,
The pretend that you are talking is performed. In addition, there is a type in which an expression such as a smiling face or a sad face is switched depending on the type of response. However, in such a guidance system that involves the switching of facial expressions, it is necessary to decide what facial expression should be generated for each prepared response sentence, hold the information on the facial expression, and use it when uttering the response sentence. , The information of the expression corresponding to the response sentence is given as control information.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、任意の
文章の読み上げを行う場合には、唇の形を文章の読み上
げ期間にわたって切り換えていくことはできるかもしれ
ないが、表情などを切り換えるための情報を取り出すこ
とはできない。However, when reading out a given sentence, it may be possible to change the shape of the lips over the reading period of the sentence. It cannot be removed.

【０００４】すなわち、一問一答を行うガイダンスシス
テムなどの場合は、システム側が予め応答の内容を用意
しているので、それに合せた表情の制御が行える。しか
しながら、任意の文章を読み上げる場合は、その内容に
ついて知る手段やそれに合った表情を選択する機能がな
いために、顔のアニメーションは無表情のまま唇を動か
して発声してゆくか、でたらめに表情を変えながら発声
してゆくかのどちらかとなる。[0004] In other words, in the case of a guidance system in which a question and answer is given, the contents of the response are prepared in advance by the system, so that the expression can be controlled in accordance with the content. However, when reading out a given sentence, there is no means to know the content or the function to select an expression that matches it, so the facial animation either moves the lips with no expression or speaks out randomly. While changing the voice.

【０００５】本発明は上記の問題に鑑みてなされたもの
であり、文章の読み上げを実行するに際して、読み上げ
る文章の内容に応じたアニメーション制御を可能とする
アニメーション制御装置及び方法を提供することを目的
とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and has as its object to provide an animation control apparatus and a method capable of performing animation control according to the contents of a sentence to be read when the sentence is read. And

【０００６】[0006]

【課題を解決するための手段】上記の目的を達成するた
めの本発明によるアニメーション制御装置は、例えば以
下の構成を備える。すなわち、読み上げるべき文を解析
して当該文の有する雰囲気を示すムードを決定する決定
手段と、前記決定手段で決定されたムードに基づいてア
ニメーション表示を制御する表示制御手段とを備える。An animation control apparatus according to the present invention for achieving the above object has, for example, the following arrangement. That is, there are provided a determination unit that analyzes a sentence to be read aloud and determines a mood indicating an atmosphere of the sentence, and a display control unit that controls an animation display based on the mood determined by the determination unit.

【０００７】また、好ましくは、前記表示制御手段は、
更に、前記読み上げるべき文に対応する読み情報に基づ
いてアニメーション表示を制御する。Preferably, the display control means includes:
Further, an animation display is controlled based on the reading information corresponding to the sentence to be read.

【０００８】また、上記の目的を達成するための本発明
によるアニメーション制御方法は例えば以下の工程を有
する。すなわち、読み上げるべき文を解析して当該文の
有する雰囲気を示すムードを決定する決定工程と、前記
決定工程で決定されたムードに基づいてアニメーション
表示を制御する表示制御工程とを備える。Further, the animation control method according to the present invention for achieving the above object has, for example, the following steps. That is, the method includes a determining step of analyzing a sentence to be read and determining a mood indicating an atmosphere of the sentence, and a display controlling step of controlling an animation display based on the mood determined in the determining step.

【０００９】また、本発明によれば、上記のアニメーシ
ョン制御装置を備えた文読み上げ装置が提供される。Further, according to the present invention, there is provided a text-to-speech apparatus provided with the above-mentioned animation control device.

【００１０】[0010]

【発明の実施の形態】以下、添付の図面を参照して本発
明の好適な実施形態を説明する。Preferred embodiments of the present invention will be described below with reference to the accompanying drawings.

【００１１】［第１の実施形態］図１は第１の実施形態
による文章読み上げ装置の概略構成を示すブロック図で
ある。図１において、１０はＣＰＵでありＲＯＭ１１或
いはＲＡＭ１２に格納された制御プログラムを実行する
ことにより各種制御を実現する。１１はＲＯＭであり、
当該装置の立ち上げ時の処理を記述するブートプログラ
ムや各種データが格納される。１２はＲＡＭであり、Ｃ
ＰＵ１０が各種制御を実行するにあたっての作業領域を
提供する。１３はディスプレイであり、ＣＰＵ１０の制
御により各種表示を行う。本実施形態では、合成音声の
発声に合わせて話し手の顔を表示するものとする。１４
は入力装置であり、キーボードやマウスを備える。１５
は外部記憶装置であり、各種辞書（構文解析辞書１０
２、意味解析辞書１０７、アニメーション用辞書１１
２）や制御プログラム１５ａを格納する。なお、制御プ
ログラム１５ａは、その実行時には外部記憶装置１５か
らＲＡＭ１２にロードされる。１６はスピーカであり、
合成音声等を出力する。[First Embodiment] FIG. 1 is a block diagram showing a schematic configuration of a text-to-speech apparatus according to a first embodiment. In FIG. 1, reference numeral 10 denotes a CPU, which realizes various controls by executing a control program stored in a ROM 11 or a RAM 12. 11 is a ROM,
A boot program describing various processes at the time of starting the device and various data are stored. 12 is a RAM, C
The PU 10 provides a work area for executing various controls. Reference numeral 13 denotes a display, which performs various displays under the control of the CPU 10. In the present embodiment, the face of the speaker is displayed according to the utterance of the synthesized voice. 14
Is an input device including a keyboard and a mouse. Fifteen
Is an external storage device, and various dictionaries (syntax analysis dictionary 10
2. Semantic analysis dictionary 107, animation dictionary 11
2) and the control program 15a are stored. The control program 15a is loaded from the external storage device 15 to the RAM 12 at the time of execution. 16 is a speaker,
Outputs synthesized speech and the like.

【００１２】図２は、第１の実施形態による文章読み上
げ装置の機能構成を示すブロック図である。同図におい
て１０１は文章入力部であり、キーボード等の入力装置
１４により読み上げるべき文章を入力する。もちろん外
部記憶装置１５等に予め記憶された文章であってもよ
い。１０２は構文解析辞書、１０３は構文解析部であ
る。構文解析部１０３は構文解析辞書１０２を参照しな
がら文章入力部１０１より入力された文章の構文解析を
行い、その結果を意味解析部１０８及び読み情報生成部
１０４へそれぞれ提供する。FIG. 2 is a block diagram showing a functional configuration of the text-to-speech apparatus according to the first embodiment. Referring to FIG. 1, reference numeral 101 denotes a text input unit for inputting text to be read out by an input device 14 such as a keyboard. Of course, it may be a sentence stored in the external storage device 15 or the like in advance. 102 is a syntax analysis dictionary, and 103 is a syntax analysis unit. The syntax analysis unit 103 analyzes the syntax of the text input from the text input unit 101 while referring to the syntax analysis dictionary 102, and provides the result to the semantic analysis unit 108 and the reading information generation unit 104, respectively.

【００１３】１０４は読み情報生成部であり、入力され
た構文解析の結果に基づいて、発声のための読み情報を
生成し、これを音声合成部１０５及びアニメーション生
成部１１０に提供する。音声合成部１０５は提供された
読み情報に基づいて音声合成を行い、音声情報を生成す
る。１０６は音声出力部であり、音声情報に基づいて、
スピーカ１６を介して音声を出力する。Reference numeral 104 denotes a reading information generation unit which generates reading information for utterance based on the input result of the syntax analysis, and provides the information to the speech synthesis unit 105 and the animation generation unit 110. The voice synthesis unit 105 performs voice synthesis based on the provided reading information to generate voice information. 106 is an audio output unit, based on audio information,
The sound is output via the speaker 16.

【００１４】１０８は意味解析部であり、構文解析部１
０３より提供される解析結果に基づいて、当該文章が表
す話者の雰囲気を解析し、その結果をムード情報取り出
し部１０９へ提供する。１０９はムード情報取り出し部
であり、意味解析部１０８の解析結果に基づいてムード
情報を生成する。本例ではムード情報として、「伝
聞」、「困惑」、「確信」を用意しておき、ムード情報
取り出し部１０９は意味解析の結果に基づいてこれらの
うちのいずれかを選択して、アニメーション生成部１１
０に提供する。Reference numeral 108 denotes a semantic analysis unit.
Based on the analysis result provided by the third sentence 03, the mood of the speaker represented by the sentence is analyzed, and the result is provided to the mood information extracting unit 109. A mood information extraction unit 109 generates mood information based on the analysis result of the semantic analysis unit 108. In this example, “hearing”, “confused”, and “convenience” are prepared as mood information, and the mood information extraction unit 109 selects one of these based on the result of the semantic analysis to generate an animation. Part 11
0 is provided.

【００１５】１１０はアニメーション生成部であり、読
み情報生成部１０４より提供される読み情報に基づいて
唇のアニメーション情報を生成し、ムード情報取り出し
部１０９より提供されるムード情報に基づいて顔の表情
のアニメーション情報を生成する。なお、アニメーショ
ン生成部１１０は、口の動きを発声音声に同期させるた
めに、音声合成部１０５で生成された音声情報を入力
し、これに基づいて唇のアニメーション情報を生成す
る。１１１はアニメーション表示部であり、アニメーシ
ョン生成部１１０で生成されたアニメーション情報に基
づいて、ディスプレイ１３に顔のアニメーション表示を
行う。Reference numeral 110 denotes an animation generator which generates lip animation information based on the reading information provided by the reading information generator 104, and expresses the facial expression based on the mood information provided by the mood information extracting unit 109. Generate animation information for In order to synchronize the movement of the mouth with the uttered voice, the animation generating unit 110 inputs the voice information generated by the voice synthesizing unit 105, and generates lip animation information based on the voice information. Reference numeral 111 denotes an animation display unit, which performs animation display of a face on the display 13 based on the animation information generated by the animation generation unit 110.

【００１６】図３は本実施形態によるアニメーション用
辞書１１２の構成例を示す図である。アニメーション用
辞書１１２は、表情のアニメーション情報を格納する表
情辞書部と唇のアニメーション情報を格納する唇アニメ
ーション辞書部とを備える。表情辞書部には、ムード情
報と表情データが対になって格納されている。また、唇
アニメーション辞書部には読み情報と唇のアニメーショ
ン情報が対となって格納されている。FIG. 3 is a diagram showing a configuration example of the animation dictionary 112 according to the present embodiment. The animation dictionary 112 includes a facial expression dictionary unit for storing facial expression animation information and a lip animation dictionary unit for storing lip animation information. The expression dictionary section stores mood information and expression data in pairs. The lip animation dictionary section stores reading information and lip animation information in pairs.

【００１７】以上のような構成を備えた本実施形態の文
章読み上げ装置の動作について説明する。図４は本実施
形態による文書読み上げ処理の手順を説明するフローチ
ャートである。The operation of the text-to-speech apparatus according to the present embodiment having the above-described configuration will be described. FIG. 4 is a flowchart illustrating the procedure of the document reading process according to the present embodiment.

【００１８】まず、ステップＳ２０１では文章入力部１
０１より読み上げるべき文の入力を行う。ステップＳ２
０２では未処理の文があるかどうかの判定を行い、ある
場合は、ステップＳ２０３に進む。First, in step S201, the text input unit 1
A sentence to be read out from 01 is input. Step S2
In 02, it is determined whether there is an unprocessed sentence, and if so, the process proceeds to step S203.

【００１９】ステップＳ２０３では構文解析部１０３が
ステップＳ２０１で入力された文について構文解析を行
う。そして、ステップＳ２０４では、構文解析の結果に
基づいて当該入力文の読み情報を生成する。また、ステ
ップＳ２０５では構文解析の結果に基づいて意味解析部
１０８が意味解析を行い、ステップＳ２０６では、ムー
ド情報取り出し部１０９が、その意味解析の結果に基づ
いてムード情報を取り出す。In step S203, the syntax analysis unit 103 performs syntax analysis on the sentence input in step S201. Then, in step S204, reading information of the input sentence is generated based on the result of the syntax analysis. Further, in step S205, the semantic analysis unit 108 performs semantic analysis based on the result of the syntax analysis, and in step S206, the mood information extracting unit 109 extracts mood information based on the result of the semantic analysis.

【００２０】例えば、図５の文生成例における文、「気
象庁によると、先週、梅雨入りしたそうです。」では、
意味解析により、下線部分の「気象庁によると」と「そ
うです」という表現から、「伝聞」というムード情報を
得る。同様に、例えば、図６の文生成例における文、
「今週は、残念ながら、低気圧が居座り、ずっと雨の天
気になってしまいました。」の場合では、意味解析によ
り、下線部分の「残念ながら」と「ってしまいました」
という表現から、「意外」というムード情報を得る。ま
た、同様に、図７の文生成例における文、「でも、大陸
から高気圧が張り出していますから、土曜日にはきっと
良いお天気になるでしょう」の場合では、意味解析によ
り、下線部分の「きっと」と「でしょう」という表現か
ら、「確信」というムード情報を得る。For example, in the sentence generation example in FIG. 5, "according to the Japan Meteorological Agency, it is said that the rainy season began last week."
From the semantic analysis, the mood information of "hearing" is obtained from the underlined expressions "according to the Japan Meteorological Agency" and "yes". Similarly, for example, in the sentence generation example of FIG.
In the case of "Unfortunately, this week, the low pressure was sitting down and it was raining all the time." In the case of semantic analysis, "Unfortunately" was "underlined" by semantic analysis.
Mood information "surprise" is obtained from the expression. Similarly, in the sentence generation example in FIG. 7, “But, since the high pressure is overhanging from the continent, it will surely be a good weather on Saturday”, the semantic analysis shows that the underlined “ From the expressions "" and "", the mood information of "conviction" is obtained.

【００２１】次に、ステップＳ２０７では、読み情報と
ムード情報からアニメーション用辞書１１２を用いて顔
（表情及び唇）のアニメーションの生成を行い、ステッ
プＳ２０８では、読み情報から合成音声の生成を行う。
そして、ステップＳ２０９では音声合成の出力を行い、
ステップＳ２１０で合成音声の出力に合わせてアニメー
ションの表示を行う。なお、ステップＳ２０９とステッ
プＳ２１０は、音と唇の形にずれが起らないように同期
が取られる。Next, in step S207, animation of a face (expression and lips) is generated from the reading information and mood information using the animation dictionary 112, and in step S208, synthesized speech is generated from the reading information.
Then, in step S209, speech synthesis is output,
In step S210, an animation is displayed according to the output of the synthesized voice. Steps S209 and S210 are synchronized so that there is no deviation between the sound and the shape of the lips.

【００２２】例えば、図５の例では、「伝聞」というム
ード情報から「視線をずらす」表情データがアニメーシ
ョン用辞書１１２から得られ、「視線をずらす」という
表情の変化を見せながら発声音声に応じて唇を動かすこ
とになる。以上のようにして合成の出力とのアニメーシ
ョン表示が終わったら、ステップＳ２０２へ戻り、未処
理の文の処理を行う。入力された文章に関して未処理の
文が無くなったら、ステップＳ２０２からステップＳ２
０１へ戻り、次の文章入力に備える。For example, in the example of FIG. 5, facial expression data of "displacement of the eyes" is obtained from the mood information of "hearing" from the animation dictionary 112. Will move your lips. When the animation display with the output of the composition is completed as described above, the process returns to step S202 to process the unprocessed sentence. When there is no unprocessed sentence for the input sentence, the process proceeds from step S202 to step S2.
Return to 01 and prepare for the next sentence input.

【００２３】なお、図６の例では、ムード情報が「意
外」であることから「困惑」という表情データがアニメ
ーション用辞書１１２から獲得され、「困惑」を示す表
情が描画されている。また、図７の例では、ムード情報
が「確信」ということから「笑顔」という表情がアニメ
ーション用辞書１１２から獲得され、「笑顔」を示す表
情が描画されている。In the example of FIG. 6, since the mood information is "unexpected", facial expression data of "confused" is obtained from the animation dictionary 112, and a facial expression indicating "confused" is drawn. In the example of FIG. 7, since the mood information is "convinced", the expression "smile" is obtained from the animation dictionary 112, and the expression indicating "smile" is drawn.

【００２４】以上説明したように第１の実施形態によれ
ば、読み上げる文章を意味解析して話し手のムード情報
を取り出し、話し手のムード情報に基づいて発声中の顔
のアニメーションの視線位置や表情などが制御される。
このため、読み上げる文の内容にあわせて、適切な表情
の切り換えを行える文章読み上げ装置を提供できる。As described above, according to the first embodiment, the sentence to be read is semantically analyzed to extract the mood information of the speaker, and based on the mood information of the speaker, the gaze position, facial expression, etc. of the animation of the face being uttered. Is controlled.
Therefore, it is possible to provide a text-to-speech apparatus capable of appropriately switching expressions according to the content of the text to be read.

【００２５】［第２の実施形態］なお、上記第１の実施
形態では、意味解析結果のムードに対して一つの表情を
選択するようにしているが、一つのムードに対して複数
の表情を登録しておき、話し手を変えることによって、
異なる表情を選択するようにしてもよい。例えば、ムー
ドが「確信」の場合、話し手が「女性」であれば「笑
顔」を男性であれば「真摯」な表情を選択するというよ
うに構成してもよい。[Second Embodiment] In the first embodiment, one facial expression is selected for the mood of the semantic analysis result. However, a plurality of facial expressions are selected for one mood. By registering and changing speakers,
Different facial expressions may be selected. For example, when the mood is “convinced”, the configuration may be such that if the speaker is “female”, “smile” is selected, and if the speaker is male, “sincere” expression is selected.

【００２６】また、ムード情報だけでなく、待遇表現に
ついての解析も行い、文末が「です・ます」などの改ま
った表現には、「真面目」な表情、文末「よ・わ・ね」
のようなくだけた表現には、「くだけ」た表情のセット
の中から選択するようにしてもよい。In addition to the mood information, analysis of treatment expressions is also performed.
For an expression that is incongruous like this, a selection may be made from a set of expressions that are “incongruous”.

【００２７】また、目や口などの表情だけでなく、「疑
念」のムードの場合は、アニメーションの人物の頭の上
にクエスチョンマークを置くなどの表現方法をとっても
よい。In the case of a mood of "skepticism" as well as facial expressions such as eyes and mouth, an expression method such as putting a question mark on the head of the person in the animation may be used.

【００２８】また、読み上げる文を解析してムード情報
を取り出すのではなく、システムが意味表現から文を生
成して音声による出力を行う場合は、意味表現から直接
ムード情報を取り出すようにしてもよい。If the system does not extract the mood information by analyzing the sentence to be read but outputs the speech by the system by generating the sentence from the semantic expression, the mood information may be directly extracted from the semantic expression. .

【００２９】なお、本発明は、複数の機器から構成され
るシステムに適用しても、１つの機器からなる装置に適
用してもよい。前述した実施形態の機能を実現するソフ
トウエアのプログラムコードを記録した記録媒体を、シ
ステム或いは装置に供給し、そのシステム或いは装置の
コンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に格
納されたプログラムコードを読み出し実行することによ
っても、達成されることは言うまでもない。The present invention may be applied to a system constituted by a plurality of devices or to an apparatus constituted by a single device. A recording medium storing software program codes for realizing the functions of the above-described embodiments is supplied to a system or an apparatus, and a computer (or CPU or MPU) of the system or apparatus executes the program code stored in the recording medium. Needless to say, this can also be achieved by executing the reading.

【００３０】この場合、記録媒体から読出されたプログ
ラムコード自体が前述した実施形態の機能を実現するこ
とになり、そのプログラムコードを記録した記録媒体は
本発明を構成することになる。In this case, the program code itself read from the recording medium implements the functions of the above-described embodiment, and the recording medium on which the program code is recorded constitutes the present invention.

【００３１】プログラムコードを供給するための記録媒
体としては、例えば、フロッピーディスク，ハードディ
スク，光ディスク，光磁気ディスク，ＣＤ−ＲＯＭ，Ｃ
Ｄ−Ｒ，磁気テープ，不揮発性のメモリカード，ＲＯＭ
などを用いることができる。また、コンピュータが読み
だしたプログラムコードを実行することにより、前述し
た実施形態の機能が実現されるだけでなく、そのプログ
ラムコードの指示に基づき、コンピュータ上で稼働して
いるＯＳなどが実際の処理の一部または全部を行い、そ
の処理によって前述した実施形態の機能が実現される場
合も含まれることは言うまでもない。As a recording medium for supplying the program code, for example, a floppy disk, hard disk, optical disk, magneto-optical disk, CD-ROM, C
DR, magnetic tape, nonvolatile memory card, ROM
Etc. can be used. The functions of the above-described embodiment are realized by executing the program codes read by the computer, and the OS or the like running on the computer performs actual processing based on the instructions of the program codes. It goes without saying that a part or all of the above is performed, and the processing realizes the functions of the above-described embodiments.

【００３２】更に、記録媒体から読み出されたプログラ
ムコードが、コンピュータに挿入された機能拡張ボード
やコンピュータに接続された機能拡張ユニットに備わる
メモリに書き込まれた後、そのプログラムコードの指示
に基づき、その機能拡張ボードや機能拡張ユニットに備
わるＣＰＵなどが実際の処理の一部または全部を行い、
その処理によって前述した実施形態の機能が実現される
場合も含まれることは言うまでもない。Further, after the program code read from the recording medium is written into a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer, based on the instruction of the program code, The CPU provided in the function expansion board or function expansion unit performs part or all of the actual processing,
It goes without saying that a case where the function of the above-described embodiment is realized by the processing is also included.

【００３３】[0033]

【発明の効果】以上説明したように本発明によれば、文
章の読み上げを実行するに際して、読み上げる文章の内
容に応じたアニメーション制御が可能となる。このた
め、読み上げる文の内容にあった表情の顔のアニメーシ
ョンを自動的に生成することができる。As described above, according to the present invention, when a text is read aloud, animation control can be performed according to the content of the text to be read. For this reason, it is possible to automatically generate an animation of a facial expression corresponding to the content of the sentence to be read.

【００３４】[0034]

[Brief description of the drawings]

【図１】第１の実施形態による文章読み上げ装置の概略
構成を示すブロック図である。FIG. 1 is a block diagram illustrating a schematic configuration of a text-to-speech apparatus according to a first embodiment.

【図２】第１の実施形態による文章読み上げ装置の機能
構成を示すブロック図である。FIG. 2 is a block diagram illustrating a functional configuration of the text-to-speech apparatus according to the first embodiment;

【図３】本実施形態によるアニメーション用辞書１１２
の構成例を示す図である。FIG. 3 is an animation dictionary 112 according to the embodiment.
FIG. 3 is a diagram showing an example of the configuration of FIG.

【図４】本実施形態による文書読み上げ処理の手順を説
明するフローチャートである。FIG. 4 is a flowchart illustrating a procedure of a document reading process according to the embodiment;

【図５】本実施形態による読み上げ文の例と顔のアニメ
ーション制御を説明する図である。FIG. 5 is a diagram illustrating an example of a read-aloud sentence and animation control of a face according to the embodiment.

【図６】本実施形態による読み上げ文の例と顔のアニメ
ーション制御を説明する図である。FIG. 6 is a diagram illustrating an example of a read-out sentence and animation control of a face according to the embodiment.

【図７】本実施形態による読み上げ文の例と顔のアニメ
ーション制御を説明する図である。FIG. 7 is a diagram illustrating an example of a read-aloud sentence and animation control of a face according to the embodiment.

Claims

[Claims]

A determination unit configured to analyze a sentence to be read and determine a mood indicating an atmosphere of the sentence; and a display control unit configured to control an animation display based on the mood determined by the determination unit. An animation control device characterized by the above-mentioned.

2. The animation control device according to claim 1, wherein said display control means further controls animation display based on reading information corresponding to the sentence to be read.

3. An animation dictionary for storing mood data representing a mood of a speaker and facial expression data representing various facial expressions in association with each other, and facial expression data from the animation dictionary based on the mood determined by the determining means. 2. The animation control device according to claim 1, further comprising: an acquisition unit configured to acquire, wherein the display control unit controls the animation display of the face based on the expression data acquired by the acquisition unit. 3.

4. The animation dictionary further stores reading information and motion data representing various mouth movements in association with each other, and generates reading information by analyzing the sentence to be read out, and the generating means. And a second acquisition unit for acquiring the corresponding mouth movement data from the animation dictionary based on the reading information generated in the step (a), wherein the display control unit includes the expression data acquired by the acquisition unit and the second expression data. The animation control device according to claim 3, wherein the animation display of the face is controlled based on the motion data acquired by the acquisition means.

5. The animation control device according to claim 1, wherein the expression data includes data indicating an eye movement.

6. The animation control device according to claim 1, wherein the expression data includes a graphic to be added to the animation of the face.

7. A text-to-speech apparatus comprising: the animation control device according to claim 1; and an output unit that outputs a voice based on the reading information generated by the generation unit. .

8. The text-to-speech apparatus according to claim 7, wherein the display means controls the animation in synchronization with the audio output by the output means.

9. A method for analyzing a sentence to be read out and determining a mood indicating an atmosphere of the sentence, and a display controlling step for controlling an animation display based on the mood determined in the determining step. An animation control method characterized by the following.

10. The animation control method according to claim 9, wherein the display control step further controls an animation display based on reading information corresponding to the sentence to be read aloud.

11. An acquisition step of acquiring expression data from an animation dictionary that stores mood data representing a mood of a speaker and expression data representing various expressions in association with each other based on the mood determined in the determination step. The animation control method according to claim 9, further comprising: controlling the display of the face animation based on the expression data acquired in the acquiring step.

12. The animation dictionary further stores reading information and motion data representing various mouth movements in association with each other, and generates a reading information by analyzing the sentence to be read, and the generating step. And a second acquisition step of acquiring corresponding mouth movement data from the animation dictionary based on the reading information generated in the step (a). The display control step further comprises: the facial expression data acquired in the acquisition step; The animation control method according to claim 11, wherein the animation display of the face is controlled based on the motion data acquired in the acquiring step.

13. The animation control method according to claim 9, wherein the expression data includes data indicating an eye movement.

14. The animation control method according to claim 9, wherein the expression data includes a figure to be added to the animation of the face.

15. A computer readable memory storing a control program for controlling an animation display, the control program analyzing a sentence to be read out and determining a mood indicating an atmosphere of the sentence. A computer readable memory comprising: a code; and a code of a display control step of controlling an animation display based on the mood determined in the determining step.