JPH055119B2

JPH055119B2 -

Info

Publication number: JPH055119B2
Application number: JP59181220A
Authority: JP
Inventors: Toshiro Shibanuma
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1984-08-30
Filing date: 1984-08-30
Publication date: 1993-01-21
Also published as: JPS6159400A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、音素等のパラメータ時系列を格納す
る音素等パラメータ格納部と、語句等のパラメー
タ時系列を格納する語句等パラメータ格納部とを
有し、与えられた読み列に対するパラメータ時系
列を、音素等パラメータ格納部及び語句等パラメ
ータ格納部より成る集まりの中に存在するパラメ
ータ時系列を組合せて作成できるようにした音声
合成装置に関するものである。[Detailed Description of the Invention] [Industrial Application Field] The present invention comprises a phoneme etc. parameter storage section that stores a parameter time series of phonemes, etc., and a word/phrase etc. parameter storage section that stores a parameter time series of words etc. This invention relates to a speech synthesis device that can create a parameter time series for a given pronunciation sequence by combining parameter time series existing in a collection consisting of a phoneme etc. parameter storage unit and a phrase etc. parameter storage unit. be.

[Prior art and problems]

PACOR方式の音声合成器等を用いて文字列か
ら音声を合成することは公知である。従来の音声
合成装置においては、読み列の各文字に対応する
PACOR係数を音素等パラメータ格納部から取り
出し、これらを結合して、読み列全体に対する
PACOR係数を作成していた。各音素対応の
PACOR係数の時系列を結合するだけでは、不自
然な音声になるので、補間処理を行つて音素の
PACOR係数の時系列を結合する必要があるが、
上記のような補間処理を行つても自然な音声を得
ることが出来なかつた。 It is well known to synthesize speech from character strings using a PACOR speech synthesizer or the like. In conventional speech synthesis devices, the
The PACOR coefficients are retrieved from the phoneme etc. parameter storage, and these are combined to calculate the
PACOR coefficient was created. corresponding to each phoneme
Simply combining the time series of PACOR coefficients will result in unnatural speech, so interpolation processing is performed to separate the phonemes.
I need to combine the time series of PACOR coefficients, but
Even with the interpolation process described above, it was not possible to obtain natural speech.

[Purpose of the invention]

本発明は、上記の考察に基づくものであつて、
自然発声にきわめて近い音声を合成できるように
なつた音声合成装置を提供することを目的として
いる。 The present invention is based on the above considerations, and includes:
The object of the present invention is to provide a speech synthesis device that can synthesize speech that is extremely close to natural speech.

[Means to achieve the purpose]

そしてそのため、本発明の音声合成装置は、任意語を合成可能な音素等のパラメータ時系列
を格納する音素等パラメータ格納部と、単語もしくは文節もしくはそれ以上の長い単位
のパラメータ時系列を格納する語句等パラメータ
格納部と、読み列に対するパラメータ時系列の設定が依頼
されたとき上記音素等パラメータ格納部および語
句等パラメータ格納部の集まりの中に存在するパ
ラメータ時系列を使用して上記読み列に対するパ
ラメータ時系列を作成するパラメータ時系列作成
手段とを具備し、上記パラメータ時系列が作成手段は上記集まり
の中に上記読み列全体に対するパラメータ時系列
存在せず且つ当該読み列に対するパラメータ時系
列が上記集まりの中に存在するパラメータ時系列
の組合せの複数個で表わせる場合、これらの組合
せの中で最も音質が向上する組合せを判定し、こ
の判定結果によつて定まる組合せを用いて当該読
み列に対するパラメータ時系列を作成するように
構成されていることを特徴とするものである。 Therefore, the speech synthesis device of the present invention includes a phoneme etc. parameter storage unit that stores a parameter time series of phonemes etc. that can synthesize arbitrary words, and a phrase that stores a parameter time series of words, phrases, or longer units. When a request is made to set the parameter time series for the pronunciation sequence, the parameters for the pronunciation sequence are set using the parameter time series existing in the phoneme etc. parameter storage unit and the phrase etc. parameter storage unit. and parameter time series creation means for creating a time series, and the parameter time series creation means is configured such that the parameter time series for the entire reading sequence does not exist in the collection and the parameter time series for the reading sequence does not exist in the collection. If the parameters can be expressed by multiple combinations of time series existing in , the combination that improves the sound quality the most among these combinations is determined, and the combination determined by this determination result is used to determine the parameters for the reading sequence. It is characterized by being configured to create a time series.

[Embodiments of the invention]

以下、本発明を図面を参照しつつ説明する。 Hereinafter, the present invention will be explained with reference to the drawings.

第１図は本発明の１実施例構成を示す図、第２
図は第１図のパラメータ組合せ判定部の処理を示
す図である。 FIG. 1 is a diagram showing the configuration of one embodiment of the present invention, and FIG.
The figure is a diagram showing the processing of the parameter combination determination section of FIG. 1.

第１図において、１は文章格納部、２は文章解
析部、３は韻律設定部、４はパラメータ変換部、
５はパラメータ組合せ判定部、６は音素等パラメ
ータ格納部、７は語句等パラメータ格納部をそれ
ぞれ示している。 In FIG. 1, 1 is a text storage unit, 2 is a text analysis unit, 3 is a prosody setting unit, 4 is a parameter conversion unit,
Reference numeral 5 indicates a parameter combination determination section, 6 indicates a phoneme etc. parameter storage section, and 7 indicates a phrase etc. parameter storage section.

文章格納部１には、コードの形の漢字仮名混り
り文が格納されている。文章解析部２は、単語辞
書や文法辞書などを有しており、これらを用いて
文章格納部１から取り出された文字列を単語列に
変換する。単語列とは、単語の読み、単語の文法
情報（品詞種別）、単語の拍数及び単語のアクセ
ント情報等より成る単語情報の並びである。文章
解析部２から出力される単語列は、韻律設定部３
及びパラメータ変換部４に送られる。韻律設定部
３は、単語列に対して呼気段落境界を設定し、呼
気段落区間に対するピツチ・パターンを作成す
る。呼気段落区間に対するピツチ・パターンは複
数の山を有しているが、ピツチ・パターンを山海
に区切り、この区切りに対応すに文節境界をパラ
メータ変換部４に通知する。パラメータ変換部４
は、通知された文節境界に従つて文章解析部２か
ら送られて来る読み列を区切り、この結果作成さ
れる文節の読み列をパラメータ組合せ判定部５に
送る。パラメータ組合せ判定部５は、音素等パラ
メータ格納部６及び語句等パラメータ格納部７よ
り成る集まりを参照し、パラメータ変換部４から
送られて来た文節読み列に対する最適なパラメー
タ時系列の組合せを判定し、この判定結果によつ
て定まるパラメータ時系列を上記の集まりの中か
ら取り出し、取り出されたパラメータ時系列をパ
ラメータ変換部４に送る。パラメータ変換部４
は、パラメータ組合せ判定部５から送られてくる
パラメータ時系列を結合して文節読み列に対する
パラメータ時系列を作成する。文節読み列に対す
るパラメータ時系列及び対応するピツチ・パター
ンは、音声合成部８に送られる。音声合成部８
は、例えばPACOR方式のものである。 The sentence storage section 1 stores kanji, kana, and mixed sentences in the form of codes. The text analysis section 2 has a word dictionary, a grammar dictionary, etc., and uses these to convert the character string retrieved from the text storage section 1 into a word string. A word string is a sequence of word information including the pronunciation of a word, grammatical information (part of speech type) of a word, number of beats of a word, accent information of a word, and the like. The word string output from the sentence analysis section 2 is sent to the prosody setting section 3.
and sent to the parameter conversion section 4. The prosody setting unit 3 sets an exhalation paragraph boundary for a word string and creates a pitch pattern for an exhalation paragraph section. Although the pitch pattern for the exhalation paragraph section has a plurality of peaks, the pitch pattern is divided into mountains and seas, and the phrase boundaries corresponding to the divisions are notified to the parameter conversion unit 4. Parameter converter 4
divides the pronunciation sequence sent from the sentence analysis unit 2 according to the notified clause boundaries, and sends the pronunciation sequence of the clauses created as a result to the parameter combination determination unit 5. The parameter combination determination section 5 refers to the collection consisting of the phoneme etc. parameter storage section 6 and the phrase etc. parameter storage section 7 and determines the optimal combination of parameter time series for the phrase pronunciation sequence sent from the parameter conversion section 4. Then, a parameter time series determined by this determination result is extracted from the above collection, and the extracted parameter time series is sent to the parameter conversion section 4. Parameter converter 4
combines the parameter time series sent from the parameter combination determination unit 5 to create a parameter time series for the phrase reading sequence. The parameter time series and the corresponding pitch pattern for the phrase pronunciation sequence are sent to the speech synthesis section 8. Speech synthesis section 8
is, for example, of the PACOR method.

第２図は、パラメータ組合せ判定部の処理を示
す図である。パラメータ組合せ判定部５では下記
のような処理が行われる。 FIG. 2 is a diagram showing the processing of the parameter combination determination section. The parameter combination determination unit 5 performs the following processing.

読みの位置を示す変数Ａをｎに設定する。
たゞしｎは読み列の読みの個数である。第１番
目ないし第ｎ番目の読みの並びに対応するパラ
メータ時系列が語句等パラメータ格納部７に格
納されているか否かを調べる。あれば、これを
パラメータ変換部４に送る。なければ、変数を
ｎ−１にし、第１番目ないし第ｎ−１番目の読
みの並びに対応するパラメータ時系列が語句等
パラメータ格納部７にあるか否かを調べる。あ
れば、これをパラメータ変換部４に送り、なけ
れば変数を−１する。このような処理を順番に
繰り返す。変数が１を示したとき、先頭の読み
に対応するパラメータ時系列を音素等パラメー
タ格納部６から取り出し、これをパラメータ変
換部４に送る。 A variable A indicating the reading position is set to n.
However, n is the number of readings in the reading sequence. It is checked whether the parameter time series corresponding to the first to nth pronunciation sequences are stored in the word/phrase parameter storage unit 7. If there is, it is sent to the parameter conversion section 4. If not, the variable is set to n-1, and it is checked whether or not there is a parameter time series corresponding to the first to n-1st pronunciations in the word/phrase parameter storage unit 7. If there is, it is sent to the parameter converter 4, and if there is not, the variable is decremented by one. This process is repeated in order. When the variable indicates 1, the parameter time series corresponding to the first reading is taken out from the phoneme etc. parameter storage section 6 and sent to the parameter conversion section 4.

第１番目ないし第ｎ−j₁（j₁は０、１、…ｎ−
１）番目の読みに対応するパラメータ時系列を
パラメータ変換部４に送つた後、残りの読み列
についてと同様の処理を行う。 1st to n-j ₁ (j ₁ is 0, 1,...n-
1) After sending the parameter time series corresponding to the first reading to the parameter conversion unit 4, the same processing as for the remaining reading sequences is performed.

文節の終り、即ち残りの読み列が０か否かを
調べ、Noであればの処理を繰り返す。 It is checked whether the end of the clause, that is, the remaining reading sequence is 0, and if No, the process is repeated.

次に、本発明によるパラメータ組合せ判定を
具体的に説明する。いま、「おんせいごうせい」
に対して「お」「ん」「せ」「い」「ご」「う」
「せ」「い」「おん」「せい」「ごう」「せい」「ご
うせい」に対応する音声のパラメータが記憶さ
れているとすれば、「おん」＋「せい」＋「ごうせ
い」の組合せが選ばれる。 Next, parameter combination determination according to the present invention will be specifically explained. Now, “onseigousei”
For "o", "n", "se", "i", "go", "u"
If the voice parameters corresponding to "se", "i", "on", "sei", "go", "sei" and "gousei" are stored, then "on" + "sei" + "gousei" is stored. A combination is selected.

なお、第２図のようにしてパラメータ時系列の
組合せ判定を行う代りに、組合せの要素の数が最
も少ない組合せを選択することも出来る。 Note that instead of determining the combination of parameter time series as shown in FIG. 2, it is also possible to select a combination with the smallest number of combination elements.

〔Effect of the invention〕

以上の説明から明らかなように、本発明によれ
ば、任意の文を自然音声に近い音声に変換するこ
とが出来る。 As is clear from the above description, according to the present invention, any sentence can be converted into speech that is close to natural speech.

[Brief explanation of drawings]

第１図は本発明の１実施例構成を示す図、第２
図は第１図のパラメータ組合せ判定部の処理を示
す図である。１……文章格納部、２……文章解析部、３……
韻律設定部、４……パラメータ変換部、５……パ
ラメータ組合せ判定部、６……音素等パラメータ
格納部、７……語句等パラメータ格納部。 FIG. 1 is a diagram showing the configuration of one embodiment of the present invention, and FIG.
The figure is a diagram showing the processing of the parameter combination determination section of FIG. 1. 1...Text storage unit, 2...Text analysis unit, 3...
Prosody setting unit, 4... Parameter conversion unit, 5... Parameter combination determination unit, 6... Phoneme etc. parameter storage unit, 7... Phrases etc. parameter storage unit.

Claims

[Scope of Claims] 1. A phoneme etc. parameter storage section that stores a parameter time series of phonemes etc. that can synthesize arbitrary words, and a phrase etc. parameter storage section that stores a parameter time series of words, phrases, or longer units. When a request is made to set a parameter time series for the pronunciation sequence, create a parameter time series for the pronunciation sequence using the parameter time series existing in the collection of the phoneme etc. parameter storage section and word etc. parameter storage section. and parameter time series creation means for creating a parameter time series for the entire reading sequence does not exist in the collection, and the parameter time series for the reading sequence does not exist in the collection. When expressed by multiple combinations of existing parameter time series, determine the combination that improves the sound quality the most among these combinations,
A speech synthesis device characterized in that it is configured to create a parameter time series for the pronunciation sequence using a combination determined by the determination result.