JPH0439698A

JPH0439698A - speech synthesizer

Info

Publication number: JPH0439698A
Application number: JP2148231A
Authority: JP
Inventors: Kiyo Hara; 紀代原
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1990-06-05
Filing date: 1990-06-05
Publication date: 1992-02-10

Abstract

PURPOSE:To obtain various and highly natural synthesized voices by controlling synthesis parameters in accordance with a part of speach in a word including an accent in a sentence to be synthesized. CONSTITUTION:The voice synthesizer has a text analyzing means 2 for analyzing an inputted text, a synthesis parameter forming means 6 for forming a synthesis parameter in accordance with the result of the means 2 and a speech synthesizing means 9 for synthesizing a voice from the synthesis parameter. The means 6 controls the synthesis parameter based upon the speach part information of a word including an accent and amplitude information. Consequently, various and highly natural synthesized voices can be obtained.

Description

【発明の詳細な説明】産業上の利用分野本発明は　音声合成装置に関するものであ４従来の技術従来の音声合成装置としてζよ　例えは　古井：ディジ
タル音声処理ｐ、１４６（東海大学出版会　１９８５）
に示されていも　第１図はこの従来の音声合成装置およ
び本発明実施例１の構成を示すブロック図であも　１は
文字列入力端で漢字かな混じり文が入力されム　２はテ
キスト解析部で、入力された漢字かな混じり文は辞書３
を用いて単語に分割され　各々の読み・アクセント型・
品詞等が付与されも　４は韻律制御部で、各単語・付属
語のアクセント型から文節のアクセント型の決定　ポー
ズ位置や文全体のイントネーション等の決定を規則によ
り行う。　５は音韻制御部で、テキスト解析部で得られ
た読みに対し　母音の無声化や鼻音化処凰　仮名表記と
発声表記の異なる場合等の処理（「私は」の「は」は「
わ」と発声される）を行（＼　音声表記を得ム　６は合
成パラメータ作成部で、韻律情報や音声表記情報から合
成に必要な合成パラメータ列を得も　合成パラメータと
は音の高さを決める基本周波数　音声の大きさを決める
復線　声道の状態を決める声道記述パラメータ（ＰＡＲ
ＣＯＲ係数やホルマント周波数など）、声帯の状態を決
める有声／無声判定フラグ等があり、パラメータテーブ
ル７に格納されていも　ここでは　従来例・実施例とも
へ　基本周波数付与には藤崎モデ／ｋ　　合成器として
、直・並列型ホルマント合成器を用いも　藤崎モデルに
ついて（よ例えば　藤崎他：　音響学会誌ｖｏ１．２７
　　ｎｏ、９　　ｐｐ、４４５−ｐｐ４５６（１９７１
）　　ニ解説されティ、４　　また　直・並列型ホルマ
ント合成器については　例えζＬ　アレン　エト　アル
著　７０ム　テキスト　トウ　スピーチ　：　サゝ　エ
ムアイ　トーク　システム（ケン７ゝリッジ′ユニハゝ
−シディ　７°レス　１９８７）　　（Ａｌｉｅｎ　　
ｅｔ　　ａｌ著Ｆｒｏｍ　Ｔｅｘｔ　ｔｏ　５ｐｅｅｃ
ｈ：　Ｔｈｅ　ＭＩＴａｌｋ　ｓｙｓｔｅｍ第１２章（
Ｃａｍｂｒｉｄｇｅ　Ｕｎｉｖｅｒｃｉｔｙ　Ｐｒｅｓ
ｓ　１９８７　））に解説されていム　８はパラメータ
補間部で、パラメータテーブル７で得られた各音素毎の
パラメータを補間して一定時間毎の合成パラメータ列を
得も　９は音声合成部で、　８で得られた合成パラメー
タ列から実際の音声を合成し　合成音声出力端１０に音
声波形を得る。[Detailed Description of the Invention] Industrial Field of Application The present invention relates to a speech synthesizer. 4. Prior Art A conventional speech synthesizer is ζ. Furui: Digital Speech Processing p. 146 (Tokai University Press 1985 )
1 is a block diagram showing the configuration of this conventional speech synthesis device and the first embodiment of the present invention. 1 is a character string input terminal where a sentence containing kanji and kana is input, and 2 is a text analysis unit. So, the entered kanji/kana mixed sentences are in Dictionary 3.
are divided into words using the pronunciation, accent type,
4 is a prosodic control unit that determines the accent type of a clause based on the accent type of each word/adjunct word, pause position, intonation of the entire sentence, etc., based on rules. 5 is the phonological control unit, which processes the pronunciation obtained by the text analysis unit, devoices and nasalizes vowels, processes cases where the kana notation and phonetic notation are different (the ``ha'' in ``Washiwa'' is
6 is the synthesis parameter creation section, which obtains the synthesis parameter string necessary for synthesis from the prosodic information and phonetic notation information.The synthesis parameter is the pitch of the sound. Fundamental frequency to be determined Return line to determine the loudness of the voice Vocal tract descriptive parameters (PAR) to determine the state of the vocal tract
COR coefficient, formant frequency, etc.), voiced/unvoiced determination flags that determine the state of the vocal cords, etc., and are stored in parameter table 7.Here, we will refer to both the conventional example and the example.Fujisaki model/k synthesizer is used to give the fundamental frequency. Regarding the Fujisaki model (for example, Fujisaki et al.: Journal of the Acoustical Society of Japan vol. 1.27), using a serial/parallel formant synthesizer.
no, 9 pp, 445-pp456 (1971
) 2, 4 Also, regarding the serial/parallel formant synthesizer, please refer to the example ζL by Allen et al. Alien
From Text to 5peec
h: The MITalk system Chapter 12 (
Cambridge University Pres.
s 1987)) 8 is a parameter interpolation unit, which interpolates the parameters for each phoneme obtained in the parameter table 7 to obtain a synthesis parameter sequence at a fixed time interval. 9 is a speech synthesis unit, 8 Actual speech is synthesized from the synthesis parameter sequence obtained in , and a speech waveform is obtained at the synthesized speech output terminal 10.

発明が解決しようとする課題音声規則合成装置を表　ＷＰ文章の読み合わせや公共案
内放送等いろいろな分野で利用されっつあム　合成され
た音声の内容が理解できるという明瞭性の観点からは　
実用化レベルにあるといえるカミ　自然性といった品質
の観点からはまだまだレベルは低いといわざるを得なし
も　自然性が低１．Ｘ。Problems to be Solved by the Invention The speech rule synthesis device is used in a variety of fields such as WP text reading and public information broadcasting.
It can be said that the kami is at a practical level.It must be said that the level is still low from the viewpoint of quality such as naturalness.Naturality is low 1. X.

言い替えれば非常に機械的な音声になってしまう一つの
要因は　合成音声が非常に単調であることが考えられも
　ある一定の韻律で合成されるために変化の乏しい機械
的な品質となるのである。本発明ζ戴　かかる従来技術
の課題に鑑みてなされたもので、合成規則に色々の変化
をもたせることにより、合成品質の改良をはかるもので
あム例えば　「それζよ　牛です。」　「それは　男ｋ
」という２つの文章の第２文節のアクセント核（アクセ
ントのある音節位置）について考えた場合、ともに「う
しで′す」　「おとこ′だ」と４モ一ラ文節で３モーラ
目にアクセントのある形となり、従来の規則合成装置で
は全く同じイントネーションで発声されることになも　
しかしなが収　前者の文の場合はアクセント核を持つ語
が「です」という付属語であり、後者の文でζよ　「男
」という自立語にアクセント核が存在し　同じイントネ
ーションにはならなしも請求項１に記載の発明はかかる従来技術の課題に鑑みて
なされたもので、合成する文章のアクセント核の存在す
る語の品詞に応じて合成パラメータを制御することによ
り、変化に富んだ自然性の高い合成音を提供することを
目的としていもまた　規則合成装置は合成フレーム長（
１０ｍ５ｅｃ〜３０ｍ５ｅｃで合成器によって一定に決
まっている）内で４１　　定常と仮定して合成処理を行
う力（実際の音声では化なりのゆらぎを含む。例えは２
５０Ｈｚの高さで「あ」という音を合成する場合、ピッ
チを２５０Ｈｚで固定して合成したので３表ブザーのよ
うな音となり自然性が損なわれも　そこでゆらぎを付加
して自然性を向上させる力（定のゆらぎ付与ではやはり
不自然な合成音となム請求項２に記載の発明はかかる従
来技術の課題に鑑みてなされたもので、揺らぎ付与手段
を有し合成パラメータの振幅に応じてゆらぎ成分の大き
さを制御することにより、変化に富んだ自然性の高い合
成音を提供することを目的としていもまた　各音素の時
間長や音程（イントネーション）に関する合成パラメー
タを作成する基準となる発声速度や基準基本周波数１よ
　可変であるが文章あるいは文節ごとに基準値を設定す
るのは煩雑であるので、　１つの文章・段落等では同じ
値が用いられていも請求項３に記載の発明はかかる従来技術の課題に鑑みて
なされたもので、合成パラメータ作成時に発声速度等の
基準値を自動的に変化させることにより、変化に富んだ
自然性の高い合成音を提供することを目的としていも課題を解決するための手段（１）文字列もしくは記号列（以後テキストという）を
入力するテキスト入力手段と、前記入力手段から入力さ
れたテキストを解析するテキスト解析手段と、前記テキ
スト解析手段で参照する辞書情報格納手段と前記テキス
ト解析手段の結果に従って合成パラメータを作成する合
成パラメータ作成手段と、前記合成パラメータから音声
を合成する音声合成手段を有し　前記パラメータ作成手
段部（友　前記テキスト解析手段から得られるアクセン
ト核を有する語の品詞情報にしたがって合成パラメータ
を制御する音声合成装置を構成すも（２）文字列もしく
は記号列（以後テキストという）を入力するテキスト入
力手段と、前記入力手段から入力されたテキストを解析
するテキスト解析手段と、前記テキスト解析手段で参照
する辞書情報格納手段と前記テキスト解析手段の結果に
従って合成パラメータを作成する合成パラメータ作成手
段と、前記合成パラメータにゆらぎを与えるゆらぎ付与
手段と前記合成パラメータから音声を合成する音声合成
手段を有し　前記揺らぎ付与手段では前記合成パラメー
タ作成手段で獲られた振幅情報にしたがって揺らぎ成分
を制御する音声合成装置を構成すも（３）文字列もしくは記号列（以後テキストという）を
入力するテキスト入力手段と、前記入力手段から入力さ
れたテキストを解析するテキスト解析手段と、前記テキ
スト解析手段で参照する辞書情報格納手段と、前記テキ
スト解析手段の結果に従って合成パラメータを作成する
合成パラメータ作成手段と、前記合成パラメータから音
声を合成する音声合成手段を有し　前記合成パラメータ
作成手段ζよ　基準値を選択する手段を含む音声合成装
置を構成すも作用上記の構成により、アクセント核のある語の品詞情報や
振幅情報により、合成パラメータや揺らぎ成分を制御し
　また合成パラメータ作成の基準値を制御することによ
り、変化に富んだ自然性の高い合成音を提供すム実施例１第１図（よ　請求項１に記載の発明の一実施例における
音声合成装置の構成を示すブロック図であム　これ（友
　従来例共通であるので各ブロックに関する説明は割愛
すも　各処理の詳細について実例を用いて説明すも　「
これこそ音声合成です。」という文章が入力された場合
について考えも　テキスト解析部２により入力文章は以
下のように分割され　アクセントや読みの情報を得も（
入力文章）「これこそ音声合成です。」（単語分割）　
これ／こそ／音声／合成／です。In other words, one of the reasons why the voice becomes very mechanical is that the synthesized voice is very monotonous, and because it is synthesized with a certain prosody, it has a mechanical quality with little variation. . The present invention has been made in view of the problems of the prior art, and aims to improve the synthesis quality by making various changes to the synthesis rules.For example, "That's a cow.""That's a man." k
When we consider the accent nucleus (the syllable position where the accent is placed) in the second clause of the two sentences, ``Ushide'su'' and ``Otoko'da'' are both four-molar clauses with an accent on the third mora. With conventional rule synthesizers, it is difficult to pronounce words with exactly the same intonation.
However, in the former sentence, the word with the accent nucleus is the adjunct word ``desu'', and in the latter sentence, the accent nucleus exists in the independent word ζyo ``man'', even though they do not have the same intonation. The invention as claimed in claim 1 has been made in view of the problems of the prior art, and by controlling the synthesis parameters according to the part of speech of the word in which the accent core of the sentence to be synthesized exists, it is possible to achieve a variety of naturalness. Although the purpose of the rule synthesizer is to provide synthesized speech with a high
10 m5 ec to 30 m5 ec, which is fixed by the synthesizer) is 41. The power to perform the synthesis process assuming that it is stationary (actual speech includes fluctuations in sound. For example, 2
When synthesizing the sound "a" at a height of 50Hz, the pitch was fixed at 250Hz and the sound was synthesized, so the sound would be like a three-sided buzzer and the naturalness would be lost, but we added some fluctuation to improve the naturalness. The invention as claimed in claim 2 has been made in view of the problem of the prior art, and includes a means for imparting a fluctuation, which produces an unnatural synthesized sound depending on the amplitude of the synthesis parameter. By controlling the magnitude of the fluctuation component, the purpose is to provide a highly natural synthesized voice with a rich variety of variations, and it also serves as a standard for creating synthesis parameters regarding the duration and pitch (intonation) of each phoneme. Although speed and reference fundamental frequency 1 are variable, it is cumbersome to set reference values for each sentence or clause, so the invention according to claim 3 does not apply even if the same value is used for one sentence or paragraph. This was done in view of the problems with the prior art, and the aim is to automatically change reference values such as speaking speed when creating synthesis parameters, thereby providing a highly natural synthesized sound with a rich variety. Means for Solving the Problems (1) A text input means for inputting a character string or a symbol string (hereinafter referred to as text), a text analysis means for analyzing the text input from the input means, and a text analysis means for reference by the text analysis means. , a dictionary information storage means for creating a synthesis parameter according to the result of the text analysis means, a synthesis parameter creation means for creating a synthesis parameter according to the result of the text analysis means, and a speech synthesis means for synthesizing speech from the synthesis parameter. A speech synthesis device that controls synthesis parameters according to part-of-speech information of a word having an accent core is configured.(2) Text input means for inputting a character string or symbol string (hereinafter referred to as text); a dictionary information storage means referred to by the text analysis means, a composition parameter creation means for creating a composition parameter according to the result of the text analysis means, and a fluctuation imparting that gives fluctuation to the composition parameter. and a speech synthesis device for synthesizing speech from the synthesis parameters, and the fluctuation imparting means controls the fluctuation component according to the amplitude information obtained by the synthesis parameter creation means. (3) Characters A text input means for inputting a string or a symbol string (hereinafter referred to as text), a text analysis means for analyzing the text input from the input means, a dictionary information storage means for reference by the text analysis means, and the text analysis means. The speech synthesis device comprises a synthesis parameter creation means for creating a synthesis parameter according to the result of the synthesis parameter, and a speech synthesis means for synthesizing speech from the synthesis parameter, and a means for selecting a reference value from the synthesis parameter creation means ζ. Effects With the above configuration, synthesis parameters and fluctuation components are controlled using part-of-speech information and amplitude information of words with accent cores, and by controlling reference values for synthesis parameter creation, it is possible to create synthesized speech that is rich in variation and highly natural. Embodiment 1 FIG. 1 is a block diagram showing the configuration of a speech synthesizer in an embodiment of the invention as claimed in claim 1. Since this is common to the conventional example, explanations regarding each block will be omitted. I will omit the details of each process using actual examples.
This is speech synthesis. '' is input. The text analysis section 2 divides the input sentence into the following parts and obtains information on accent and pronunciation (
Input text) "This is speech synthesis." (word division)
This/is/speech/synthesis/.

（読み）　　　　　　　コレ　　　　コツ　　　オンセ
ー　　コ゛−セー　　テ゛ス（アクセント）ＯＤ　　　
１　　０　　　　ｂ（品詞）　　　代名　副動　名　　
名　　助動ここで、　「こそ」　「です」に対して与え
られているアクセント型りやｂｉｅｒ、ＮＨＫアクセン
ト辞典・解説付録（日本放送出版会　１９８５年）に記
載されているもので、自立語と結合して文節を構成する
際の結合アクセント型を示したものであム　また　各単
語のアクセント型（よ　アクセントのある音節位置を示
したものであム次に韻律処理部５で、ポーズ位置や文節のアクセント瓢
　文全体のイントネーションを決定すも「これこ″そ（
ポーズ）おんせ−ご°−せ−です」とな４　すなわち第
１文節は４モーラ３型で付属語にアクセント核が存在し
　第２文節は１０モーラ５型で自立語にアクセント核が
存在すも　音響制御部５で実際の音声表記を得も　以上
で得られた韻律情報・音韻情報に従ってパラメータ作成
部６で実際の合成パラメータを得も　パラメータ補間部
９で１戴　こうして得られた各音素のパラメータ値を補
間Ｌ１０ｍｓｅｃ毎のパラメータ列を殊　音声合成部１
０でホルマント型の合成器を用いて合成すもパラメータ作成部における基本周波数（ピッチ）（よ　
原画モデルを用いて求めも　原画モデルは次式で現わさ
れもＩｎ　　　ＦＯ（ｔ）　　　＝　　　Ｉｎ　　　Ｆｍ１
ｎ　　　十　　Σ　ＡｐｉＧｐｉ（ｔ−ＴＯｉ）　　　
＋ΣＡａｊ（Ｇａｊ（ｔ−ｔｌｊ）−Ｇａｊ（ｔ−ｔ２
ｊ）　）Ｇｐｉ（ｔ）　　＝　　３．０本３，０ｔｅｘ
ｐ（−３，０ｔ）　　　　　ｆｏｒ　　ｔ　　＞−０Ｇ
ａｊ（ｔ）　　＝　　Ｍｉｎ［１−（１−２０ｔ）ｅｘ
ｐ（−２０ｔ）、　　０．９コｆｏｒ　ｔ　＞−０ａｐｉ　　：　　　フレーズ成分の振幅　０．４３（文
頭）、−０，５０（文末）ａａｊ　　：　　　アクセント成分の振幅　０．４０（
起伏）、０．２０（平板）ｉ　：　　フレーズ成分の数ｊ　：　　アクセント成分の数ｔｏｉ　　：　　　フレーズ成分立ち上がり位置ｔｉｉ
　　：　　　アクセント成分立ち上がり位置ｔ２ｉ　　
：　　　アクセント成分立ち下がり位置Ｐｕｇｉｎ　：
　　基準値　（＝８０）ここで、アクセント成分の振幅
は起伏を　平板型で企画化されていも　即ち例では　３
型・５型とともに起伏型なので０．４０が用いられるこ
とになム本実施例で（よ　アクセント核の存在する語の
品詞によりアクセント成分の振幅値を制御すム　即杖第
１文節は付属語にアクセント核が存在するので、振幅値
を０．３０とす４な耘　実施例において合成単位は音素（Ｃ，Ｖ）ホルマ
ント合成方式を用いた力（これに限定されるものではな
（を以上のように本実施例によれば　アクセント核の存在す
る品詞情報に応じて、アクセント振幅値を制御すること
により、変化に富んだ自然性の高い合成音声を提供する
ことが出来も実施例２第２図Ｃヨ　　請求項２に記載の発明の実施例における
音声合成装置の構成を示したブロック図であ４　なお実
施例１と共通する要素には同一番号をつけていも　１は
文字列入力端で漢字かな混じり文が入力されム　２はテ
キスト解析部で、入力された漢字かな混じり文は辞書３
を用いて単語に分割され　各々の読み・アクセント型・
品詞等が付与されも　４は韻律制御部で、各単語・付属
語のアクセント型から文節のアクセント型の決定　ポー
ズ位置や文全体のイントネーション等の決定を規則によ
り行う。　５は音韻制御部で、テキスト解析部で得られ
た読みに対し　母音の無声化や鼻音化処珠　仮名表記と
発声表記の異なる場合等の処理（「私は」の「は」は「
わ」と発声される）を行ｔ〜　音声表記を得ム　６は合
成パラメータ作成部て　韻律情報や音声表記情報から合
成に必要な合成パラメータ列を得も　合成バラメ゛−夕
はパラメータテーブル７に納されていム　８はパラメー
タ補間部で７で得られた各音素毎のパラメータを補間し
て一定時間毎の合成パラメータ列を得ａｌｌは揺らぎ成
分付与部で本実施例では乱数を用いて基本周波数のみに
ゆらぎを付与するものとすム　９は音声合成部で、８で
得られた合成パラメータ列から実際の音声を合成し　合
成音声出力端ｌＯに音声波形を得も次に揺らぎ成分付与部１１について説明すもパラメータ
補間部９で１０ｍ５ｅｃ毎の基本周波数として、次のパ
ラメータ値が得られたとする。(Reading) Kore Tips Onse Couse (Accent) OD
1 0 b (Part of speech) Pronoun Subverb Name
The accent pattern given to ``koso'' and ``desu'' is listed in the NHK accent dictionary/commentary appendix (Japan Broadcast Publishing Co., Ltd., 1985), and is combined with an independent word. It also shows the accent type of each word (i.e., the syllable position where the accent is placed).Next, the prosody processing unit 5 calculates the pause position and the syllable position of each word. The accent of 瓢 determines the intonation of the entire sentence.
Pause) Onse-go°-se-desu'' 4 In other words, the first clause is 4-mora type 3 with an accent nucleus in the attached word, and the second clause is 10-mora type 5 with an accent nucleus in the independent word. Also, the acoustic control unit 5 obtains the actual phonetic notation, the parameter creation unit 6 obtains the actual synthesis parameters according to the prosodic and phonetic information obtained above, and the parameter interpolation unit 9 obtains the actual synthesis parameters of each phoneme thus obtained. Interpolate the parameter values and create a parameter string every 10 msec.Speech synthesis unit 1
0, the fundamental frequency (pitch) in the parameter creation section is synthesized using a formant type synthesizer.
It can also be found using the original model. The original model can be expressed using the following formula: In FO(t) = In Fm1
n ten Σ ApiGpi (t-TOi)
+ΣAaj(Gaj(t-tlj)-Gaj(t-t2
j) )Gpi(t) = 3.0 lines 3,0tex
p(-3,0t) for t >-0G
aj(t) = Min[1-(1-20t)ex
p (-20t), 0.9 for t > -0 api: amplitude of phrase component 0.43 (beginning of sentence), -0,50 (end of sentence) aaj: amplitude of accent component 0.40 (
undulation), 0.20 (flat plate) i: Number of phrase components j: Number of accent components toi: Phrase component rising position tii
: Accent component rising position t2i
: Accent component falling position Pugin :
Standard value (=80) Here, the amplitude of the accent component is 3 even if the undulations are planned as a flat plate.
This is because 0.40 is used because it is an undulating type as well as type 5. In this example, the amplitude value of the accent component is controlled by the part of speech of the word in which the accent nucleus exists. Since there is an accent nucleus in , the amplitude value is set to 0.30. According to this embodiment, by controlling the accent amplitude value according to the part-of-speech information in which the accent kernel exists, it is possible to provide synthesized speech that is rich in variation and highly natural. Figure 2 C is a block diagram showing the configuration of a speech synthesis device in an embodiment of the invention as claimed in claim 2.4 Note that elements common to those in embodiment 1 are given the same numbers. 1 is a character string input terminal. 2 is the text analysis section, and the input sentence containing kanji and kana is inputted in Dictionary 3.
are divided into words using the pronunciation, accent type,
4 is a prosodic control unit that determines the accent type of a clause based on the accent type of each word/adjunct word, pause position, intonation of the entire sentence, etc., based on rules. 5 is the phonological control unit, which processes the pronunciation obtained by the text analysis unit, devoices vowels, makes them nasal, etc., and processes cases where the kana notation and vocalization are different (the ``ha'' in ``watashi wa'' is
6 is the synthesis parameter creation section. The synthesis parameter sequence necessary for synthesis is obtained from the prosodic information and phonetic notation information. The synthesis parameters are stored in the parameter table 7. 8 is a parameter interpolation unit that interpolates the parameters for each phoneme obtained in 7 to obtain a synthetic parameter sequence for each fixed time. 8 is a fluctuation component adding unit that uses random numbers to calculate the fundamental frequency. 9 is a speech synthesis unit which synthesizes actual speech from the synthesis parameter sequence obtained in 8, obtains a speech waveform at the synthesized speech output terminal lO, and then outputs a fluctuation component adding unit 11. To explain this, it is assumed that the parameter interpolation unit 9 obtains the following parameter value as a fundamental frequency every 10 m5ec.

但しここで合成音のサンプリング周波数はｌ０ＫＨｚと
すムフレーム番号　１　２　３基本周波数　２５０　２５０　２４８　　（Ｈｚ）ピッ
チ　　　　４０　　４０　　４０　　（Ｘ　１／１０ｍ
５ｅｃ）合成部１０で用いられるピッチパルスのサンプ
ルＮｏζ瓜　次のようになムフレームＮｏ　　　　　　　１　　　　　　　　　　　
２　　　　　　　　　　　３サン７°ルＮＯＯ，４０，
８０，１２０，１６０，２００，２４０，２８０しかし
なか仮　実際の音声は基本周波数２５０Ｈ２として耘　
このように規則的にピッチパルスが有るわけではなくゆ
らいでおり、このゆらぎが自然性を向上させていも　こ
のピッチパルス列に対して、乱数を用いて例えば±２の
ゆらぎを与えると以下のようになａフレームＮｏ　　　　　　　１　　　　　　　　　　　
２　　　　　　　　　　　３サン７°ルＮＯ０，３９，
８０，１２２，１６１，２００，２３８，２８０この与
えるゆらぎの幅を、振幅パラメータの大きさに関連して
決定する。振幅の小さい部分ではゆらぎの幅は大きく、
振幅の大きい部分ではゆらぎの幅を小さ（すも　本実施
例でζよ　振幅をａｍｐとした隊　ゆらぎの幅を次の式
で与えもｆ　（ゆらぎ）＝５−ａｍｐ／２０　　；即ｔ
ｌｘａｍｐ＝４０のとき、±３の乱数でゆらぎを与、ｉ
、、ａｍｐ＝８０では±１のゆらぎを与えるものとすもこのよう級　ゆらぎの幅を振幅情報にしたがって制御す
ることにより、規則合成音の単調さを軽減し　変化に富
んだ自然性の高い合成音声を提供することが出来も実施例３第３図１′！、請求項３に記載の発明の実施例における
音声合成装置の構成を示したブロック図であａ　なお実
施例１と共通する要素には同一番号をつけてい、ｋｌは
文字列入力端で漢字かな混じり文が入力され２１ｏ２は
テキスト解析部で、入力された漢字かな混じり文は辞書
３を用いて単語に分割され　各々の読み・アクセント型
・品詞等が付与されも　４は韻律制御部で、各単語・付
属語のアクセント型から文節のアクセント型の決定　ポ
ーズ位置や文全体のイントネーション等の決定を規則に
より行う。５は音韻制御部六　テキスト解析部で得られ
た読みに対し　母音の無声化や鼻音化処班　仮名表記と
発声表記の異なる場合等の処理（「私は」の「は」は「
わ」と発声される）を行匹　音声表記を得＆　　１２は
基準値選択手段で、次のパラメータ作成部で基本周波数
パラメータ作成に用いる基準値を決定すム　６は合成パ
ラメータ作成部で、韻律情報や音声表記情報から合成に
必要な合成パラメータ列を得も　合成パラメータはパラ
メータテーブル７に納されてい４８はパラメータ補間部
で７で得られた各音素毎のパラメータを補間して一定時
間毎の合成パラメータ列を得も　９は音声合成部で、　
８で得られた合成パラメータ列から実際の音声を合成し
　合成音声出力端１０に音声波形を得もパラメータ作成部における基本周波数（ピッチ）は　藤
崎モデルを用いて求めも　藤崎モデルは次式で現わされ
もＩｎ　　　ＦＯ（ｔ）　　　＝　　　Ｉｎ　　　Ｆｍ１
ｎ　　　十　　Σ　ＡｐｉＧｐｉ（ｔ−ＴＯｉ）　　　
＋ΣＡａｊ［Ｇａｊ（ｔ−ｔｌｊ）−Ｇａｊ（ｔ−ｔ２
ｊ））Ｇｐｉ（ｔ）　　−３，０本３．０ｔｅｘｐ（−
３，Ｏｔ）　　　ｆｏｒ　　ｔ　　＞−０Ｇａｊ（ｔ）
　−Ｍｉｎ［１−（１−２０ｔ）ｅｘｐ（−２０ｔ）、
　０．９１ｆｏｒ　ｔ　＞＝　Ｑａｐｉ　　：　　　フレーズ成分の振幅　０．４３（文
頭）、−０，５０（文末）ａａｊ　　：　　　アクセント成分の振幅　０．４０（
起伏）、０．２０（平板）ｉ　：　　フレーズ成分の数ｊ　：　　アクセント成分の数ｔｏｉ　　；　　　フレーズ成分立ち上がり位置ｔｌｉ
　　：　　　アクセント成分立ち上がり位置ｔ２ｉ　　
：　　　アクセント成分立ち下がり位置Ｆｍ１ｎ　：　
　基準値実施例１で＆友　基準値として８０（Ｈｚ）を用いてい
ため（これでは同じモーラ数同じアクセント型の文章が
続いた場合、作成される基本周波数パラメータは尾同−
となり、合成音は単調なものになム　そこで、本発明で
ｌよ　基準値選択手段６で乱数（ｒａｎ）を求数　基準
値Ｆｍ１ｎを次式で設定すムＦｍ１ｎ　　＝　　８０　　＋　　ｒａｎ　　；このよ
う！ミ　基準値を変化させることにより、合成音の単調
さを軽減することができも　ここで、制御するパラメー
タとして基本周波数　制御する基準値としてＦｍ１ｎを
用いた力（これは本発明を何隻拘束するものではな（〜以上のように本実施例によれ（戯　音質情報を合成時に
ダイナミックに登録することが出来　変化に富んだ効果
的な合成音声を提供することが出来も発明の効果本発明によれば　合成音の単調さを軽減し　変化に富ん
だ自然性の高い効果的な合成音を提供することが出来もHowever, here, the sampling frequency of the synthesized sound is 10KHz. Frame number 1 2 3 Fundamental frequency 250 250 248 (Hz) Pitch 40 40 40 (X 1/10m
5ec) Pitch pulse sample No. ζ used in the synthesis unit 10 Sample frame No. 1 as follows
2 3 sun 7° le NOO, 40,
80, 120, 160, 200, 240, 280 However, the actual voice has a fundamental frequency of 250H2.
In this way, the pitch pulses are not regular, but fluctuate, and although this fluctuation improves the naturalness, if a random number is used to give a fluctuation of, for example, ±2 to this pitch pulse train, it will be as follows. Naa Frame No. 1
2 3 sun 7° le NO0,39,
80, 122, 161, 200, 238, 280 The width of this fluctuation is determined in relation to the magnitude of the amplitude parameter. The width of fluctuation is large in parts with small amplitude,
In the part where the amplitude is large, the width of the fluctuation is small (in this example, ζ).The width of the fluctuation is given by the following formula:
When lxamp=40, fluctuation is given by a random number of ±3, i
,, amp = 80 gives a fluctuation of ±1.By controlling the width of fluctuation according to the amplitude information, the monotony of regular synthesized sounds is reduced and a highly natural synthesis with a rich variety is achieved. Embodiment 3 Figure 3 1' can also provide audio! , is a block diagram showing the configuration of a speech synthesis device in an embodiment of the invention as claimed in claim 3.A. Elements common to embodiment 1 are given the same numbers, and kl is a kanji character at the character string input terminal. A mixed sentence is input, and 21o2 is a text analysis section.The input kanji-kana mixed sentence is divided into words using a dictionary 3, and each word is given its reading, accent type, part of speech, etc.4 is a prosodic control section, where each word is divided into words. Determining the accent type of a clause based on the accent type of words and adjuncts.The pause position and intonation of the entire sentence are determined based on rules. 5 is the phonological control unit 6 Based on the pronunciation obtained by the text analysis unit Vowel devoicing and nasalization processing unit Processing when the kana notation and vocalization are different (the ``ha'' in ``watashi wa'' is
12 is a reference value selection means, which determines the reference value to be used for creating the fundamental frequency parameter in the next parameter creation part. 6 is a synthesis parameter creation part, which determines the prosody. The synthesis parameters necessary for synthesis are obtained from the information and phonetic notation information.The synthesis parameters are stored in the parameter table 7, and 48 is a parameter interpolation unit that interpolates the parameters for each phoneme obtained in 7 and calculates them at a fixed time interval. 9 is the speech synthesis section to obtain the synthesis parameter string.
The actual speech is synthesized from the synthesis parameter sequence obtained in step 8, and the speech waveform is obtained at the synthesized speech output terminal 10.The fundamental frequency (pitch) in the parameter creation section is also found using the Fujisaki model.The Fujisaki model is expressed using the following equation. Wasaremo In FO(t) = In Fm1
n ten Σ ApiGpi (t-TOi)
+ΣAaj[Gaj(t-tlj)-Gaj(t-t2
j)) Gpi(t) -3,0 lines 3.0texp(-
3, Ot) for t >-0Gaj(t)
-Min[1-(1-20t)exp(-20t),
0.91for t ＞= Q api: Amplitude of phrase component 0.43 (beginning of sentence), -0,50 (end of sentence) aaj: Amplitude of accent component 0.40 (
undulation), 0.20 (flat plate) i: number of phrase components j: number of accent components toi; phrase component rising position tli
: Accent component rising position t2i
: Accent component falling position Fm1n :
Since 80 (Hz) is used as the reference value in the reference value example 1 (with this, if sentences with the same number of moras and the same accent type continue, the created fundamental frequency parameter will be the same -
Therefore, in the present invention, the random number (ran) is determined by the reference value selection means 6. The reference value Fm1n is set using the following formula: Fm1n = 80 + ran; Like this! By changing the reference value, the monotony of the synthesized sound can be reduced. Here, the fundamental frequency is used as the parameter to be controlled. As described above, according to this embodiment, it is possible to dynamically register sound quality information at the time of synthesis, and it is possible to provide a varied and effective synthesized speech. According to the authors, it is possible to reduce the monotony of synthesized sounds and provide highly effective synthesized sounds that are rich in variety and natural.

[Brief explanation of drawings]

第１図は１本発明の第１の実施侭　および従来例の音声
合成装置の構成を示すブロック図　第２図（友　本発明
の第２の実施例における音声合成装置の構成を示すブロ
ック図　第３図は、　　本発明の第３の実施例における
音声合成装置の構成を示すブロック図であａ１・・・文字列人力能　２・・・テキスト解析服　３・
・・辞書、　４・・・韻律処理眼　５・・・音韻処理区
　６・・・合成パラメータ作成眼　７・・・パラメータ
テーブノに８・・・パラメータ補間ｉ９・・・音声合成
ｆｆ１ｋ　　１０・・・合成音声出力能　１１・・・ゆ
らぎ付与ＩＫ　　１２・・・基準値選択糺FIG. 1 is a block diagram showing the configuration of a speech synthesizer according to a first embodiment of the present invention and a conventional example. FIG. 2 is a block diagram showing the configuration of a speech synthesizer according to a second embodiment of the present invention. FIG. 3 is a block diagram showing the configuration of a speech synthesis device according to a third embodiment of the present invention.
...Dictionary, 4...Prosody processing eye 5...Phonological processing section 6...Synthesis parameter creation eye 7...Parameter table number 8...Parameter interpolation i9...Speech synthesis ff1k 10...・Synthetic voice output ability 11... Fluctuation IK 12... Reference value selection

Claims

[Claims]

(1) a text input means for inputting a character string or a symbol string (hereinafter referred to as text); a text analysis means for analyzing the text input from said input means; a dictionary information storage means referred to by said text analysis means; It has a synthesis parameter creation means for creating a synthesis parameter according to the result of the text analysis means, and a speech synthesis means for synthesizing speech from the synthesis parameter, and the parameter creation means has an accent kernel obtained from the text analysis means. A speech synthesis device characterized by controlling synthesis parameters according to part-of-speech information of a word.

(2) a text input means for inputting a character string or a symbol string (hereinafter referred to as text); a text analysis means for analyzing the text input from said input means; a dictionary information storage means referred to by said text analysis means; It has a synthesis parameter creation means for creating a synthesis parameter according to the result of the text analysis means, a fluctuation adding means for giving fluctuation to the synthesis parameter, and a speech synthesis means for synthesizing speech from the synthesis parameter, and the fluctuation giving means A speech synthesis device characterized in that a fluctuation component is controlled according to amplitude information obtained by a parameter creation means.

(3) a text input means for inputting a character string or a symbol string (hereinafter referred to as text); a text analysis means for analyzing the text input from the input means; and a dictionary information storage means for reference by the text analysis means; The method further includes a synthesis parameter creation means for creating a synthesis parameter according to the result of the text analysis means, and a speech synthesis means for synthesizing speech from the synthesis parameter, and the synthesis parameter creation means includes means for selecting a reference value. Characteristic speech synthesizer.