JPH11231885A

JPH11231885A - Speech synthesizing device

Info

Publication number: JPH11231885A
Application number: JP10037421A
Authority: JP
Inventors: Osamu Ishikawa; 修石川; Hiroyuki Fujimoto; 博之藤本
Original assignee: Denso Ten Ltd
Current assignee: Denso Ten Ltd
Priority date: 1998-02-19
Filing date: 1998-02-19
Publication date: 1999-08-27

Abstract

PROBLEM TO BE SOLVED: To assist an audience in understanding the contents of reading by adding a reading speed, a rhythm and an emotional tone to the sound of a synthesizing speech to be read out. SOLUTION: Regarding a speech synthesizing device where a text is linguistically analyzed as a word, a rhythm is generated in the reading of the word, and a speech waveform is generated for the text for synthesizing a speech, an important part specifying part 11 for specifying an important part in the linguistically analyzed text and an important part rhythm part 12 for generating a rhythm for keeping the important part speed by the important part specifying part 11, different from parts other than the important part in the text are provided, together with an important part generation part 13 for generating a speech waveform for the important part.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明はテキスト文を音声に
合成して読み上げる音声合成装置に関し、特に読み上げ
る音声合成音に読み上げの速度、強弱、感情的な調子を
付加することによって受聴者に読み上げ内容の理解の補
助を行うことができる装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech synthesizing apparatus for synthesizing a text sentence into a voice and reading it out. The present invention relates to a device capable of assisting the understanding of the device.

【０００２】[0002]

【従来の技術】従来の音声合成装置では、テキスト文は
言語解釈されてテキスト文の「べた文」が各種の品詞の
単語に解釈され、音素記号と韻律記号とからなる発音記
号列に変換される。次に、この発音記号列に対して、各
音素の継続時間長、ピッチパターン、ポーズ長、イント
ネーションなどの韻律処理が行われる。例えば、音素を
合成単位として音声波形素片を接続して合成音声が合成
される。2. Description of the Related Art In a conventional speech synthesizer, a text sentence is linguistically interpreted, and the "solid sentence" of the text sentence is interpreted into words of various parts of speech, and is converted into a phonetic symbol sequence comprising phoneme symbols and prosodic symbols. You. Next, prosody processing such as a duration time of each phoneme, a pitch pattern, a pause length, and intonation is performed on the phonetic symbol string. For example, synthesized speech is synthesized by connecting speech waveform segments using phonemes as synthesis units.

【０００３】このような音声合成装置が機器全般の分野
に利用され、例えば、データ送信を行うサービス局との
通信に利用される。[0003] Such a speech synthesizer is used in the field of general equipment, for example, for communication with a service station that performs data transmission.

【０００４】[0004]

【発明が解決しようとする課題】ところで、近年、多種
多様なデータを送信するサービス局が増大している。一
方、この情報を受け手として、例えば自動車では、運転
中であればその情報を画面表示としているが常に見るこ
とはできないことから、主に音声合成に頼って聞くこと
になる。In recent years, the number of service stations for transmitting various types of data has been increasing. On the other hand, as a recipient of this information, for example, in a car, while driving, the information is displayed on the screen, but cannot be viewed at all times.

【０００５】従来の音声合成装置では、読み上げる音声
には、聞きづらくならないように、前述の如く、アクセ
ント、イントネーション等が付いているが、特に雑音が
多い自動車内でこの情報が重要なものかどうかは受聴者
が良く注意して聞かなければ分からないという問題があ
る。また、テキスト文中に、特に、感嘆符「！」、疑問
符「？」がある場合には、感情表現が行われるが、他に
はこのような感情表現が無く、読み上げられる音声は全
般的に感情表現に乏しいという問題がある。[0005] In the conventional speech synthesizer, the speech to be read out is provided with accents and intonations as described above so as to make it difficult to hear. There is a problem that the listener cannot understand unless he listens carefully. In addition, when there is an exclamation point “!” Or a question mark “?” In a text sentence, an emotional expression is performed. However, there is no other emotional expression, and the voice read out is generally emotional. There is a problem that expression is poor.

【０００６】したがって、本発明は、上記問題点に鑑
み、受聴者に注意を促し且つ感情表現に富む読み上げを
行うことができる音声合成装置を提供することを目的と
する。SUMMARY OF THE INVENTION Accordingly, it is an object of the present invention to provide a speech synthesizer that can draw attention to a listener and can read aloud with an emotional expression.

【０００７】[0007]

【課題を解決するための手段】本発明は、前記問題点を
解決するために、テキストが単語として言語解析され、
単語の読みに韻律が生成され且つテキストに対して音声
波形が生成されて音声を合成する音声合成装置におい
て、言語解析されたテキスト中の重要部分を指定する重
要部分指定部と、前記重要部分指定部により指定された
重要部分に対して、テキスト中の前記重要部分以外の他
の部分と異ならせる韻律を形成する重要部分韻律部と、
前記重要部分に対して音声波形を生成する重要部分生成
部とを備えることを特徴とする音声合成装置を提供す
る。この手段により、重要部分として、主語・述語、数
字、固有名詞が他のテキストの部分に対してゆっくりと
した音声、又は大きな音声で読み上げられるので、受聴
者が聞き取りやすい音声を合成することが可能になる。According to the present invention, in order to solve the above-mentioned problems, text is linguistically analyzed as words.
In a speech synthesizer for generating a prosody for reading a word and generating a speech waveform for a text to synthesize a speech, an important part designating unit for designating an important part in a linguistically analyzed text; An important part prosody part that forms a prosody that makes the important part specified by the part different from other parts other than the important part in the text;
An important part generation unit that generates a voice waveform for the important part is provided. By this means, subjects, predicates, numbers, and proper nouns are read out in slow or loud voice relative to other text parts as important parts, making it possible to synthesize speech that is easy for listeners to hear become.

【０００８】さらに、テキストの言語解析に使用する通
常の単語辞書部に対して感情表現に富む単語を格納する
感情用単語辞書部と、前記通常の単語辞書部の格納され
る単語の音声波形に対してピッチを変更した音声波形を
格納する感情用波形辞書部とを備え、前記重要部分指定
部は前記感情用単語辞書部を用いてテキスト中の重要部
分を指定し、前記重要部分韻律部は前記重要部分指定部
により指定された重要部分にピッチを変更して感情表現
の韻律を形成し、前記重要部分音声生成部は前記感情用
波形辞書部を用いて重要部分の音声波形を生成すること
を特徴とする。この手段により、明るい感情表現に対し
てピッチが高い音声、暗い感情表現に対してはピッチが
低い音声で合成されるので、受聴者は感情に富んだ読み
上げを聞くことが可能になる。[0008] Further, an emotion word dictionary for storing words rich in emotional expression with respect to a normal word dictionary used for language analysis of text, and a speech waveform of a word stored in the normal word dictionary. An emotional waveform dictionary for storing a voice waveform whose pitch has been changed, wherein the important part designating part specifies an important part in a text using the emotional word dictionary, and the important part prosody is Changing the pitch to the important part specified by the important part designating unit to form a prosody of the emotional expression, and the important part speech generating unit using the emotional waveform dictionary unit to generate a speech waveform of the important part. It is characterized by. By this means, a voice with a high pitch is synthesized with a bright emotional expression and a voice with a low pitch is mixed with a dark emotional expression, so that the listener can hear the emotionally rich reading.

【０００９】[0009]

【発明の実施の形態】以下本発明の実施の形態について
図面を参照して説明する。図１は本発明に係る音声合成
装置を有しサービス局から音声合成の入力テキストを受
信する受信機を示す図である。本図に示す如く、受信機
は、音声信号を受信するアンテナ２００と、受信信号を
音声に処理する受信部２０１と、音声信号を音声に変換
するスピーカ２０３とが設けられる。さらに、この受信
機は、音声信号以外に、サービス局から多種多様のデー
タを受信し、受信部２０１でデータを抽出してこれをテ
キスト文として入力する音声合成装置１００を具備し、
音声合成装置１００の合成音声は加算部２０２を経由し
てスピーカ２０３から出力される。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a diagram showing a receiver having a speech synthesizer according to the present invention and receiving an input text for speech synthesis from a service station. As shown in the figure, the receiver is provided with an antenna 200 for receiving an audio signal, a receiving unit 201 for processing the received signal into audio, and a speaker 203 for converting the audio signal into audio. The receiver further includes a speech synthesizer 100 that receives various kinds of data from the service station in addition to the speech signal, extracts the data by the receiving unit 201, and inputs the data as a text sentence,
The synthesized speech of the speech synthesis device 100 is output from the speaker 203 via the adding unit 202.

【００１０】図２は図１の音声合成装置１００を説明す
る図である。本図に示す音声合成装置１００は、テキス
ト文を入力するテキスト解析処理部１を具備し、テキス
ト解析処理部１は単語辞書部２を有する。単語辞書部２
は、名詞、動詞、助詞、数詞等の品詞を区別した多数の
単語と、この単語の読みとアクセント等とを格納してい
る。名詞には固有名詞も多数含まれる。ここに単語の読
みは、例えば、カタカナ（平仮名でもよい）の一文字に
相当するモーラで表され、１つのモーラは音素である子
音と母音とからなる。FIG. 2 is a diagram for explaining the speech synthesizer 100 of FIG. The speech synthesis apparatus 100 shown in FIG. 1 includes a text analysis processing unit 1 for inputting a text sentence, and the text analysis processing unit 1 includes a word dictionary unit 2. Word dictionary part 2
Stores a number of words that distinguish parts of speech, such as nouns, verbs, particles, and numbers, and readings and accents of the words. Nouns include many proper nouns. Here, the reading of a word is represented, for example, by a mora corresponding to one character of katakana (or hiragana), and one mora is composed of a consonant which is a phoneme and a vowel.

【００１１】テキスト解析処理部１は、単語辞書部２を
用いて、入力される「べた文」のテキスト文を、単語に
区分して各単語に読みとアクセントを付けた発音記号列
に変換する。音声合成装置１００はさらにテキスト解析
処理部１から発音記号列を入力する韻律制御処理部３を
具備し、韻律制御処理部３は発音記号列を音韻系列と
し、且つ音韻系列に対して各音素の継続時間長、ピッチ
パターン、ポーズ長、アクセント、イントネーション、
振幅などの韻律系列を形成する。The text analysis processing section 1 uses the word dictionary section 2 to convert an input "solid sentence" text sentence into words and converts them into phonetic symbol strings in which each word is read and accented. . The speech synthesizer 100 further includes a prosody control processing unit 3 for inputting a phonetic symbol sequence from the text analysis processing unit 1. The prosody control processing unit 3 converts the phonetic symbol sequence into a phoneme sequence, and generates a phoneme sequence for each phoneme. Duration, pitch pattern, pose length, accent, intonation,
Form a prosodic sequence such as amplitude.

【００１２】音声合成装置１００はさらに音声生成処理
部４を具備し、音声生成処理部４は波形辞書部５を具備
する。波形辞書部５は音素長程度の音声波形を大量に用
意し音素のピッチ、振幅、継続時間長などの情報と共に
音声波形素片として格納される。音声生成処理部４は音
韻系列及び韻律系列から音声波形素片を波形辞書部５か
ら選択して、接続して合成音声を形成する。なお、波形
辞書部５には通常の波形辞書部５Ａ、重要部分用波形辞
書部５Ｂ、５Ｃが設けられる。重要部分用波形辞書部５
Ｂは通常の波形辞書部５Ａに対して音声波形素片の継続
時間長が大きいものが格納される。重要部分用波形辞書
部５Ｃは通常の波形辞書部５Ａに対して音声波形素片の
振幅が大きいものが格納される。The speech synthesizer 100 further includes a speech generation processing unit 4, and the speech generation processing unit 4 includes a waveform dictionary unit 5. The waveform dictionary unit 5 prepares a large amount of speech waveforms of about the phoneme length and stores them as speech waveform segments together with information such as the pitch, amplitude, and duration of phonemes. The speech generation processing unit 4 selects speech waveform segments from the phonetic sequence and the prosodic sequence from the waveform dictionary unit 5 and connects them to form a synthesized speech. The waveform dictionary unit 5 is provided with a normal waveform dictionary unit 5A and important part waveform dictionary units 5B and 5C. Important part waveform dictionary 5
B stores a speech waveform segment having a longer duration time than the normal waveform dictionary unit 5A. The important part waveform dictionary unit 5C stores a voice waveform unit whose amplitude is larger than that of the normal waveform dictionary unit 5A.

【００１３】さらに、テキスト解析処理部１は入力され
るテキスト中の重要部分を指定する重要部分指定部１１
を具備する。重要部分指定部１１は重要部分としてテキ
スト中の、例えば、『主語・述語』、『数字』、『固有
名詞』を指定する。韻律制御処理部３は重要部分指定部
１１により指定された重要部分について特有の韻律を付
加する重要部分韻律部１２を具備する。重要部分韻律部
１２は、韻律の指定として、例えば、重要部分のみを
『ゆっくり』又は『大きく』読み上げる韻律を指定す
る。The text analysis processing unit 1 further includes an important part designating unit 11 for designating an important part in the input text.
Is provided. The important part designation unit 11 designates, for example, "subject / predicate", "number", and "proper noun" in the text as important parts. The prosody control processing unit 3 includes an important part prosody part 12 for adding a specific prosody to the important part specified by the important part specification part 11. The important part prosody part 12 specifies, for example, a prosody in which only the important part is read “slowly” or “largely”.

【００１４】音声合成装置１００は重要部分指定部１１
及び重要部分韻律部１２の操作を外部から行う表示操作
部１４を具備する。図３は図１の表示操作部１４の表示
操作を説明する図である。表示操作部１４には本図に示
す如く、重要部分の指定の操作ボタンとして『主語・述
語』、『数字』、『固有名詞』を選択操作するボタンが
表示され、韻律の指定の操作ボタンとして『ゆっく
り』、『大きく』のボタンが表示される。これらのボタ
ンに加えて『サービス局の指定』のボタンが設けられ
る。The voice synthesizing apparatus 100 includes an important part specifying unit 11.
And a display operation unit 14 for operating the important part prosody unit 12 from outside. FIG. 3 is a diagram illustrating the display operation of the display operation unit 14 in FIG. As shown in the figure, the display operation unit 14 displays buttons for selecting and operating "subject / predicate", "number", and "proper noun" as operation buttons for specifying important parts. "Slow" and "Large" buttons are displayed. In addition to these buttons, a button for “designating a service station” is provided.

【００１５】先ず、重要部分指定部１１及び韻律制御処
理部１２の具体例を説明する。図３の表示操作１４の表
示で、重要部分の指定として『主語・述語』のボタンが
押された場合の重要部分指定部１１の動作を、以下に、
説明する。図４は図２の重要部分指定部１１の動作を説
明する例を示す図である。図４（ａ）の左側に示す如
く、テキスト解析処理部１に、例えば、『なんだかんだ
であれ、要点は重要です。』との漢字仮名混じりのテキ
ストが入力される。この入力テキストに対してテキスト
処理部１ではこれがカタカナのモーラである発音記号列
に変換される。First, a specific example of the important part designation unit 11 and the prosody control processing unit 12 will be described. In the display of the display operation 14 in FIG. 3, the operation of the important part specification unit 11 when the “subject / predicate” button is pressed as the specification of the important part will be described below.
explain. FIG. 4 is a diagram showing an example for explaining the operation of the important part designation unit 11 of FIG. As shown on the left side of FIG. 4 (a), the text analysis processing unit 1, for example, "Essentially, the point is important. ] And a text mixed with kanji kana. The text processing unit 1 converts the input text into a phonetic symbol string that is a katakana mora.

【００１６】テキスト解析処理部１ではこの発音記号列
のうち主語・述語とが検索されて、重要部分指定部１１
では、図４（ａ）の右側に示す『ナンダカンダデアレ、
ヨウテンハジュウヨウデス。』の下線部分の如く、主語
の助詞を除いた名詞部分と述語の語幹部分とが指定され
る。同様に、図３の表示操作１４の表示で、重要部分の
指定として『数字』のボタンが押されているとする。図
４（ｂ）の左側に示す如く、テキスト解析処理部１に、
例えば、『時速１００ｋｍ／ｈです。』との漢字仮名混
じりのテキストが入力される。この入力テキストに対し
てテキスト処理部１ではこれが発音記号列に変換され、
この発音記号列のうち数字が検索されて、重要部分指定
部１１では、図４（ｂ）の右側に示す『ジソクヒャクキ
ロメートルパーアワーデス。』の下線部分の如く、数字
が指定される。The text analysis processing section 1 searches the phonetic symbol string for the subject and the predicate, and the important part designating section 11
Then, “Nanda Kanda de Are, shown on the right side of FIG.
Gist Ha important death. , The noun part excluding the subject particle and the stem part of the predicate are designated. Similarly, in the display of the display operation 14 in FIG. 3, it is assumed that the “number” button has been pressed as the designation of an important part. As shown on the left side of FIG. 4B, the text analysis processing unit 1
For example, "100 km / h. ] And a text mixed with kanji kana. The text processing unit 1 converts the input text into a phonetic symbol string,
The phonetic symbols are retrieved numbers of columns, the important part specification section 11, shown on the right side shown in FIG. 4 (b) "flux one hundred key <br/> b meters Per Hour death. ], The number is specified as underlined.

【００１７】同様に、図３の表示操作１４の表示で、重
要部分の指定として『固有名詞』のボタンが押されてい
るとする。図４（ｃ）の左側に示す如く、テキスト解析
処理部１に、例えば、『まもなく神戸する。』との漢字
仮名混じりのテキストが入力される。この入力テキスト
に対してテキスト処理部１ではこれが発音記号列に変換
され、この発音記号列のうち固有名詞が検索されて、重
要部分指定部１１では、図４（ｃ）の右側に示す『マモ
ナクコウベデス。』の下線部分の如く、固有名詞が指定
される。Similarly, in the display of the display operation 14 in FIG. 3, it is assumed that the button of "proper noun" is pressed as the designation of the important part. As shown on the left side of FIG. 4 (c), the text analysis processing unit 1 outputs, for example, "Soon Kobe. ] And a text mixed with kanji kana. The text processing unit 1 converts the input text into a phonetic symbol string, searches for a proper noun in the phonetic symbol string, and the important part designating unit 11 uses the "mamonak" shown on the right side of FIG. Kobe death. ], The proper noun is specified.

【００１８】なお、本図（ａ）、（ｂ）、（ｃ）に示す
重要部分の任意の２つを組み合わせて、又は全部が同時
に指定されてもよい。韻律制御処理部１２では、上記の
指定された重要部分に対して、『ゆっくり』又は『大き
く』の韻律の指定が行われる。この場合、『ゆっく
り』、『大きく』の韻律の指定が同時に行われてもよ
い。Incidentally, any two of the important parts shown in FIGS. 1A, 1B and 1C may be combined or all may be designated at the same time. The prosody control processing unit 12 specifies “slow” or “large” prosody for the specified important part. In this case, the "slow" and "large" prosody may be specified at the same time.

【００１９】次に、受信機１００に対するサービス局か
ら重要部分の指定ラベル、これに付加する韻律の指定ラ
ベルが入力テキストに付けられて送られる場合には図３
の表示操作１４の表示で、『サービス局の指定』のボタ
ンが押される。図５は『サービス局の指定』のボタンが
押された場合の図２の重要部分指定部１１の動作を説明
する例を示す図である。図５（ａ）の左側に示す如く、
テキスト解析処理部１に、例えば、『なんだかんだであ
れ、「＜Big or Slow ＞要点」は「＜Big or Slow ＞重
要」です。』とのテキストが入力される。このように、
重要部分がラベル「」により指定され、指定された重
要部分の韻律がラベル＜＞内のBig （『大きく』の意
味）or Slow （『ゆっくり』の意味）より指定される。
重要部分指定部１１ではこのラベルを解釈して、図５
（ａ）右側に示す『ナンダカンダデアレ、ヨウテンハジ
ュウヨウデス。』の下線部分の如く、主語の助詞を除い
た名詞部分と述語の語幹部分とが抽出され重要部分と指
定される。重要部分韻律部１２ではラベルを解釈してこ
のように指定された重要部分に対して『ゆっくり』又は
『大きく』の韻律が抽出され指定される。FIG. 3 shows a case where the service station for the receiver 100 sends a designated label of an important part and a designated label of a prosody to be added thereto to the input text.
Is displayed, the button of "designation of service station" is pressed. FIG. 5 is a diagram showing an example for explaining the operation of the important part designating section 11 of FIG. 2 when the "designation of service station" button is pressed. As shown on the left side of FIG.
In the text analysis processing unit 1, for example, "Whatever the point,"<Big or Slow> key point "is"<Big or Slow> important ". Is entered. in this way,
The important part is specified by the label “”, and the prosody of the specified important part is specified by Big (meaning “big”) or Slow (meaning “slow”) in the label <>.
The important part designating section 11 interprets this label, and
(A) shown in the right-hand side, "Nanda Kanda de array, the main point Ha di
Yuuyou death. , The noun part excluding the subject particle and the stem part of the predicate are extracted and designated as important parts. The important part prosody section 12 interprets the label and extracts and specifies the "slow" or "large" prosody for the important part specified in this way.

【００２０】図６は図２の音声生成処理部４における重
要部分音声生成部１３を説明する図である。音声生成処
理部４では、韻律制御処理部３から音韻系列、韻律系列
を入力して音素のピッチ、振幅、継続時間長などの韻律
系列に対して音韻系列を構成する音素の波形が通常時の
波形辞書部５から読み出されて接続が行われる。例え
ば、指定された重要部分である『ヨウテン（要点）』の
モーラを子音と母音との音素分解したモーラ『ＹＯＵＴ
ＥＮ』に対して音声生成処理部４では通常の波形辞書部
５Ａを用いて図６（ａ）に示す音声波形が生成される。FIG. 6 is a diagram for explaining the important part sound generator 13 in the sound generator 4 of FIG. The speech generation processing unit 4 receives the phoneme sequence and the prosody sequence from the prosody control processing unit 3 and outputs the waveform of the phoneme constituting the phoneme sequence to the prosody sequence such as the pitch, amplitude, and duration of the phoneme in the normal state. The connection is made by reading from the waveform dictionary unit 5. For example, a mora “YOUT” which is a phoneme decomposition of a consonant and a vowel of the specified important part “mall”
EN ”, the voice generation processing unit 4 generates the voice waveform shown in FIG. 6A using the normal waveform dictionary unit 5A.

【００２１】重要部分音声生成部１３では、重要部分韻
律部１２から『ゆっくり』の韻律指定が入力されると、
音声生成処理部４では重要部分用波形辞書部５Ｂを用い
て図６（ｂ）に示すモーラ『ＹＯＵＴＥＮ』のように時
間的に伸長した音声波形が形成される。このようにし
て、重要部分は他の部分と比較してゆっくりとした音声
で読み上げられるので、受聴者に注意を促すことが可能
になる。特に高齢者にとっても聞きやすくなる。In the important part sound generation unit 13, when the “slow” prosody designation is input from the important part prosody unit 12,
The speech generation processing unit 4 forms a speech waveform that is temporally expanded like a mora “YOUTEN” shown in FIG. 6B by using the important part waveform dictionary unit 5B. In this way, the important part is read out with a slower voice than the other parts, so that the listener can be alerted. In particular, it is easier for elderly people to hear.

【００２２】重要部分音声生成部１３は、重要部分韻律
部１２から『大きく』の韻律指定が入力されると、音声
生成処理部４では重要部分用波形辞書部５Ｃを用いて図
６（ｃ）に示すモーラ『ＹＯＵＴＥＮ』のように大きな
振幅の音声波形が形成される。このようにして、重要部
分は他の部分と比較して大きい音声で読み上げられるの
で、受聴者に注意を促すことが可能になる。特に高齢者
にとっても聞きやすくなる。When the "large" prosody designation is input from the important part prosody unit 12 to the important part speech generation unit 13, the speech generation processing unit 4 uses the important part waveform dictionary unit 5C as shown in FIG. A voice waveform having a large amplitude is formed like the mora “YOUTEN” shown in FIG. In this way, the important part is read out with a loud voice compared to the other parts, so that the listener can be alerted. In particular, it is easier for elderly people to hear.

【００２３】また、情報を提供するサービス局が意図す
るニュアンスでユーザに音声合成を聞かせることが可能
になる。図７は図２の変形例を説明する図である。本図
において図１と異なるのは単語辞書２、波形辞書部５で
ある。単語辞書２として、通常の単語辞書２Ａと感情用
単語辞書２Ｂが設けられる。さらに、波形辞書部５とし
て、通常の波形辞書部５Ａと、感情用波形辞書部５Ｄ、
５Ｅとが設けられる。Further, it is possible to make the user hear speech synthesis with the nuance intended by the service station that provides the information. FIG. 7 is a diagram illustrating a modification of FIG. The drawing differs from FIG. 1 in the word dictionary 2 and the waveform dictionary unit 5. As the word dictionary 2, a normal word dictionary 2A and an emotion word dictionary 2B are provided. Further, as the waveform dictionary 5, a normal waveform dictionary 5A, an emotional waveform dictionary 5D,
5E.

【００２４】図８は図７の音声合成装置１００に設けら
れる通常の単語辞書２Ａと感情用単語辞書２Ｂとを説明
する図である。図７に示す如く、単語辞書２として、通
常の単語辞書２Ａと感情用単語辞書２Ｂが設けられる。
図８（ａ）に示す如く、通常の単語辞書２Ａには、感
動、東京、慰留、結婚、退職、鎮魂等の通常の単語が登
録されている。感情用単語辞書２Ｂには、（１）明るく
読むモード、（２）暗く読むモードとを指定した単語を
識別して登録してある。例えば、明るく読むモード
（１）として嬉しい、楽しい、結婚等の単語が登録さ
れ、暗く読むモード（２）として悲しい、退職等の単語
が登録されている。このように、通常の単語辞書２Ａを
変更することなく、感情読み上げのキーワードになる単
語に拡張情報を持たせる感情用単語辞書２Ｂを追加する
ことで実現することができる。FIG. 8 is a diagram for explaining a normal word dictionary 2A and an emotion word dictionary 2B provided in the speech synthesizer 100 of FIG. As shown in FIG. 7, as the word dictionary 2, a normal word dictionary 2A and an emotion word dictionary 2B are provided.
As shown in FIG. 8A, ordinary words such as impression, Tokyo, comfort, marriage, retirement, and requiem are registered in the ordinary word dictionary 2A. In the emotion word dictionary 2B, words that specify (1) a bright reading mode and (2) a dark reading mode are identified and registered. For example, words such as happy, fun, and marriage are registered as the bright reading mode (1), and words such as sad and retirement are registered as the dark reading mode (2). As described above, the present invention can be realized by adding the word dictionary for emotion 2B that adds extended information to words that are used as keywords for emotion reading without changing the normal word dictionary 2A.

【００２５】感情用波形辞書部５Ｄは（１）明るく読む
モードに対応する単語に対してピッチを高くした単語の
音声波形を格納する。感情用波形辞書部５Ｅは（２）暗
く読むモードに対応する単語に対してピッチを低くした
単語の音声波形を格納する。なお、図８（ｂ）に示す如
く、通常の単語辞書２を拡張して、例えば、結婚という
単語に明るく読むモードである記号（１）を付け、鎮魂
という単語に暗く読むモードである記号（２）を付けて
もよい。前者と比較すると、単語辞書のサイズの増大を
抑制することが可能になる。The emotional waveform dictionary section 5D stores (1) the speech waveform of a word whose pitch is higher than that of the word corresponding to the bright reading mode. The emotional waveform dictionary unit 5E stores (2) a speech waveform of a word whose pitch is lower than that of the word corresponding to the dark reading mode. As shown in FIG. 8B, the normal word dictionary 2 is expanded to add, for example, a symbol (1) that is a mode for reading brightly to the word of marriage and a symbol ( 2) may be added. Compared with the former, it is possible to suppress an increase in the size of the word dictionary.

【００２６】図９は表示操作１４の感情的な読み上げを
行う場合の表示操作例を説明する図である。本図に示す
表示操作で感情の表現を行うか否かの選択が行われる。
表示操作１４で『はい』のボタンが押されると、重要部
分指定部１１では、感情用単語辞書２Ｂにより、テキス
ト解析処理部１に入力したテキストの文に、明るく読む
モード（１）に属する単語、又は暗く読むモード（２）
に属する単語が含まれるかが判断される。FIG. 9 is a diagram for explaining an example of display operation when emotional reading of the display operation 14 is performed. A selection as to whether or not to express an emotion is made by the display operation shown in FIG.
When the "yes" button is pressed in the display operation 14, the important part designation unit 11 uses the sentence word dictionary 2B to add the words belonging to the mode (1) that are brightly read to the text sent to the text analysis processing unit 1. Or dark reading mode (2)
It is determined whether a word belonging to is included.

【００２７】図１０は明るく読むモード（１）を含む場
合、読み上げるピッチを説明する図である。本図に示す
如く、『鈴木さんが結婚しました。』という入力テキス
トに結婚という明るく読み上げるモード（１）に該当す
る結婚という単語が含まれているので、重要部分指定部
１１ではこの発音記号列『スズキサンガケッコンシマシ
タ。』の下線部分を重要部分として指定する。重要部分
韻律部１２では重要部分を他のテキストの部分に対し
て、本図に示す如く、ピッチを上げる韻律系列を形成す
る。重要部分音声生成部１３では、感情用波形辞書部５
Ｄを用いて重要部分に対してピッチを上げて音声を生成
する。このようにピッチを上げると、明るい読み上げに
なる。FIG. 10 is a diagram for explaining a reading pitch when the mode (1) for reading brightly is included. As shown in this figure, "Suzuki is married. Because it contains the word marriage corresponding to the mode (1) read aloud bright that marriage to the input text ", the important part designation section 11 this string of phonetic symbols" Suzuki Sanga marriage Shimashi <br/> data. ] Is specified as an important part. The important part prosody unit 12 forms a prosody series in which the pitch is increased with respect to other text parts as shown in FIG. In the important part voice generating unit 13, the emotional waveform dictionary unit 5
A voice is generated by raising the pitch of an important part using D. Increasing the pitch in this way results in a brighter reading.

【００２８】図１１は暗く読むモード（２）を含む場合
の文を読み上げるピッチを説明する図である。本図に示
す如く、『鈴木さんが退職しました。』という入力テキ
ストに結婚という暗く読み上げるモード（２）に該当す
る退職という単語が含まれているので、重要部指定部１
１ではこの発音記号列『スズキサンガタイショクシマシ
タ。』の下線部分を重要部分として指定する。重要部分
韻律部１２では重要部分を他のテキストに対して、本図
に示す如く、ピッチを下げる韻律系列を形成する。重要
部分音声生成部１３では、感情用波形辞書部５Ｅを用い
て重要部分に対してピッチを下げて音声を生成する。こ
のようにピッチを下げると、暗い読み上げになる。FIG. 11 is a diagram for explaining the pitch at which a sentence is read out when the mode (2) for reading darkly is included. As shown in this figure, "Suzuki has retired. Contains the word “retirement” corresponding to the mode (2) that reads out darkly as “marriage”.
In 1 the pronunciation symbol string "Suzuki Sanga retirement Shimashi <br/> data. ] Is specified as an important part. The important part prosody unit 12 forms a prosody series for lowering the pitch, as shown in FIG. The important part sound generation unit 13 generates a sound by lowering the pitch for the important part using the emotional waveform dictionary unit 5E. When the pitch is lowered in this way, a dark reading is obtained.

【００２９】したがって、受聴者に対して感情表現に富
む音声の合成を行うことが可能になる。次に、受信機１
００に対するサービス局から感情に関する重要部分の指
定ラベル、これに付加する韻律の指定ラベルが入力テキ
ストに付けられて送られる場合には図９の表示操作１４
の表示で、『サービス局の指定』のボタンが押される。Therefore, it is possible to synthesize a voice rich in emotional expression for the listener. Next, the receiver 1
In the case where the service station for 00 sends a designation label of an important part related to emotion and a designation label of a prosody to be added thereto to the input text, the display operation 14 shown in FIG.
Is displayed, the button "Specify service station" is pressed.

【００３０】図１２は図９の表示操作で『サービス局の
指定』のボタンが押された場合の図２の重要部分指定部
１１の動作を説明する例を示す図である。図１２（ａ）
の左側に示す如く、テキスト解析処理部１に、例えば、
『鈴木さんが「<High>結婚」しました。』とのテキスト
が入力される。このように、重要部分がラベル「」に
より指定され、指定された重要部分の感情の韻律がラベ
ル＜＞内のHigh（ピッチ高の意味) より指定される。
重要部分指定部１１ではこのラベルを解釈して、図１２
（ａ）右側に示す『スズキサンガケッコンシマシタ。』
の下線部分の如く、重要部分が指定される。重要部分韻
律部１２ではラベルを解釈してこのように指定された重
要部分に対してピッチが高い韻律が抽出され指定され
る。FIG. 12 is a diagram showing an example for explaining the operation of the important part designating section 11 of FIG. 2 when the "designation of service station" button is pressed in the display operation of FIG. FIG. 12 (a)
As shown on the left side of FIG.
"Mr. Suzuki got married <High>. Is entered. As described above, the important part is specified by the label “”, and the prosody of the emotion of the specified important part is specified by High (meaning pitch height) in the label <>.
The important part designating section 11 interprets this label and
(A) shown in the right-hand side "Suzuki Sangha marriage Shimashita. 』
Important parts are designated as underlined. The important part prosody section 12 interprets the label and extracts and specifies a prosody with a high pitch for the important part specified in this way.

【００３１】また、図１２（ｂ）の左側に示す如く、テ
キスト解析処理部１に、例えば、『鈴木さんが「<Low>
退職」しました。』とのテキストが入力される。前述と
同様に、重要部分がラベル「」により指定され、指定
された重要部分の感情の韻律がラベル＜＞内のLow
（ピッチ低の意味) より指定される。重要部分指定部１
１ではこのラベルを解釈して、図１２（ｂ）右側に示す
『スズキサンガタイショクシマシタ。』の下線部分の如
く、重要部分が指定される。重要部分韻律部１２ではラ
ベルを解釈してこのように指定された重要部分に対して
ピッチが低い韻律が抽出され指定される。Also, as shown on the left side of FIG. 12B, the text analysis processing unit 1 outputs, for example, “Mr.
Retired. " Is entered. As described above, the important part is designated by the label “”, and the prosody of the emotion of the designated important part is set to the low level in the label <>.
(Meaning low pitch). Important part designation section 1
In 1 interprets this label, and FIG. 12 (b) shows the right "Suzuki SANGA retirement Shimashita. Important parts are designated as underlined. The important part prosody section 12 interprets the label and extracts and specifies a prosody with a low pitch for the important part specified in this way.

【００３２】このように、情報を提供するサービス局が
意図するニュアンスでユーザに音声合成を聞かせること
が可能になる。As described above, it is possible to make the user hear the speech synthesis with the nuance intended by the service station that provides the information.

【００３３】[0033]

【発明の効果】以上の説明により本発明によれば、重要
部分として、主語・述語、数字、固有名詞が他のテキス
トの部分に対してゆっくりとした音声、又は大きな音声
で読み上げられるので、受聴者が聞き取りやすい音声を
合成することが可能になる。さらに、明るい感情表現に
対してピッチが高い音声、暗い感情表現に対してはピッ
チが低い音声で合成されるので、受聴者は感情に富んだ
読み上げを聞くことが可能になる。As described above, according to the present invention, as important parts, the subject / predicate, numbers and proper nouns are read aloud slowly or loudly with respect to other text parts. It is possible to synthesize speech that is easy for the listener to hear. Furthermore, since a voice with a high pitch is synthesized for a bright emotional expression and a voice with a low pitch for a dark emotional expression, the listener can hear a rich emotional reading.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明に係る音声合成装置を有しサービス局か
ら音声合成の入力テキストを受信する受信機を示す図で
ある。FIG. 1 is a diagram showing a receiver having a speech synthesizer according to the present invention and receiving an input text for speech synthesis from a service station.

【図２】図１の音声合成装置１００を説明する図であ
る。FIG. 2 is a diagram illustrating the speech synthesizer 100 of FIG.

【図３】図１の表示操作部１４の表示操作を説明する図
である。FIG. 3 is a diagram illustrating a display operation of a display operation unit 14 in FIG.

【図４】図２の重要部分指定部１１の動作を説明する例
を示す図である。FIG. 4 is a diagram showing an example for explaining the operation of an important part designation unit 11 of FIG. 2;

【図５】『サービス局の指定』のボタンが押された場合
の図２の重要部分指定部１１の動作を説明する例を示す
図である。FIG. 5 is a diagram showing an example for explaining the operation of the important part designating unit 11 of FIG. 2 when the “designation of service station” button is pressed.

【図６】図２の音声生成処理部４における重要部分音声
生成部１３を説明する図である。FIG. 6 is a diagram illustrating an important partial sound generation unit 13 in the sound generation processing unit 4 of FIG. 2;

【図７】図２の変形例を説明する図である。FIG. 7 is a diagram illustrating a modification of FIG. 2;

【図８】図７の音声合成装置１００に設けられる通常の
単語辞書２Ａと感情用単語辞書２Ｂとを説明する図であ
る。8 is a diagram illustrating a normal word dictionary 2A and an emotion word dictionary 2B provided in the speech synthesis device 100 of FIG. 7;

【図９】表示操作１４の感情的な読み上げを行う場合の
表示操作例を説明する図である。FIG. 9 is a diagram illustrating a display operation example when emotional reading of the display operation 14 is performed.

【図１０】明るく読むモード（１）を含む場合、読み上
げるピッチを説明する図である。FIG. 10 is a diagram illustrating a reading pitch when a bright reading mode (1) is included.

【図１１】暗く読むモード（２）を含む場合の文を読み
上げるピッチを説明する図である。FIG. 11 is a diagram illustrating a pitch at which a sentence is read out when a dark read mode (2) is included.

【図１２】図９の表示操作で『サービス局の指定』のボ
タンが押された場合の図２の重要部分指定部１１の動作
を説明する例を示す図である。12 is a diagram illustrating an example of an operation of the important part designating unit 11 of FIG. 2 when a “designation of service station” button is pressed in the display operation of FIG. 9;

[Explanation of symbols]

１…テキスト解析処理部２、２Ａ、２Ｂ…単語辞書部３…韻律制御処理部４…音声生成処理部５、５Ａ、５Ｂ、５Ｃ、５Ｄ、５Ｅ…波形辞書部１１…重要部分指定部１２…重要部分韻律部１３…重要部分音声生成部１００…音声合成部２００…アンテナ２０１…受信部２０２…加算部２０３…スピーカ DESCRIPTION OF SYMBOLS 1 ... Text analysis processing part 2, 2A, 2B ... Word dictionary part 3 ... Prosody control processing part 4 ... Speech generation processing part 5, 5A, 5B, 5C, 5D, 5E ... Waveform dictionary part 11 ... Important part designation part 12 ... Important part prosody part 13 important part sound generation part 100 voice synthesis part 200 antenna 201 reception part 202 addition part 203 speaker

Claims

[Claims]

1. A speech synthesizer for linguistically analyzing a text as a word, generating a prosody for reading the word, and generating a speech waveform for the text to synthesize speech. An important part prosody part that forms a prosody that makes the important part specified by the important part specification part different from other parts other than the important part in the text; An audio synthesizing apparatus, comprising: an important part generation unit that generates an audio waveform for a part.

2. The important part designating unit designates a subject / predicate in a text as an important part.
The speech synthesizer according to claim 1.

3. The speech synthesizer according to claim 1, wherein the important part designating unit designates a number in a text as an important part.

4. The speech synthesizer according to claim 1, wherein the important part designation unit designates a proper noun in a text as an important part.

5. The speech synthesizer according to claim 1, wherein the important part prosody forms a prosody in which the important part specified by the important part specifying part is read out with a slow voice. .

6. The speech synthesizer according to claim 1, wherein the important part prosody unit forms a prosody in which the important part specified by the important part specification unit is read out with a large voice.

7. The text transmitted from the service bureau,
When an important part, a label for identifying the important part, and a label to which a prosody is to be added to the important part are included, the important part designating unit searches the text for the important part, specifies the important part, and specifies the important part prosody. The unit searches for a prosody for the important part to form a searched prosody,
The speech synthesizer according to claim 1.

8. An emotional word dictionary for storing words rich in emotional expression with respect to an ordinary word dictionary used for language analysis of text, and a voice of a word stored in the ordinary word dictionary. An emotional waveform dictionary unit for storing a voice waveform having a pitch changed with respect to the waveform, wherein the important part designating unit designates an important part in a text using the emotional word dictionary unit, The unit changes the pitch to the important part designated by the important part designating unit to form a prosody of the emotional expression, and the important part speech generating unit generates the speech waveform of the important part using the emotional waveform dictionary unit. The voice synthesizing apparatus according to claim 1, wherein:

9. The speech synthesizer according to claim 8, wherein the sentence word dictionary unit attaches a label for distinguishing a bright sentiment expression from a dark sentiment expression to a word stored therein. .

10. The important part prosody part according to claim 8, wherein the pitch is increased for a prosody of a bright emotion expression, and is increased for a prosody of a dark emotion expression. Speech synthesizer.

11. The word dictionary unit for emotions is combined with the normal dictionary unit and labels words rich in emotional expression among words used in the normal dictionary unit. A speech synthesizer according to claim 1.

12. When the text transmitted from the service station includes an important part rich in emotional expression and a label for identifying the important part, and a label to add prosody to the important part rich in emotional expression, the important part is included. The specifying unit searches the text for an important part rich in emotional expression and designates an important part rich in emotional expression. The important part prosody unit searches the prosody for the important part rich in emotion and forms the searched prosody. The speech synthesizer according to claim 8, wherein: