JPH0363760B2

JPH0363760B2 -

Info

Publication number: JPH0363760B2
Application number: JP57050595A
Authority: JP
Inventors: Katsunobu Fushikida; Yukio Mitome
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1982-03-29
Filing date: 1982-03-29
Publication date: 1991-10-02
Also published as: JPS58168095A

Description

【発明の詳細な説明】本発明は音声合成装置に関する。[Detailed description of the invention] The present invention relates to a speech synthesis device.

従来、音声のスペクトル包絡パラメータとして
ホルマトン（極）パラメータを用いるとともに音
源パラメータとしてピツチデータ、有声無声デー
タ、振巾データを用いて音声合成を行なう音声合
成方式が知られている。また、前記方式を用い、
スペクトル包絡パラメータ値および音源パラメー
タ値等の合成データを記憶する合成データメモリ
とともにLSI化した１チツプの音声合成LSIが知
られている。 Conventionally, a speech synthesis method is known in which speech is synthesized using a formaton (polar) parameter as a spectral envelope parameter of speech and pitch data, voiced and unvoiced data, and amplitude data as sound source parameters. Also, using the above method,
A one-chip speech synthesis LSI is known that is implemented as an LSI together with a synthesis data memory that stores synthesis data such as spectral envelope parameter values and sound source parameter values.

しかしながら、前記従来の方式は音声のスペク
トル包絡特性を複数個の極の特性で近似する比較
的近似の粗いモデル（言わゆる全極モデル）であ
ること、また、音源波形としても破裂音用の音源
を持たないこと等の要因により合成音声の品質が
十分でない欠点を持つている。 However, the conventional method is a relatively rough model (so-called all-pole model) that approximates the spectral envelope characteristics of speech using the characteristics of multiple poles, and the sound source waveform is also a plosive sound source. However, due to factors such as the lack of a voice, the quality of the synthesized speech is not sufficient.

本発明の目的は比較的規模が小さくLSI化が可
能で且つ比較的品質の良い合成音の生成を可能と
する音声合成装置を提供することにある。 An object of the present invention is to provide a speech synthesis device that is relatively small in scale, can be implemented in LSI, and can generate synthesized speech of relatively high quality.

本発明は、極零パラメータおよびピツチ周期デ
ータ、有声無声データ等の合成データを用いて音
声波形を合成する型の音声合成装置において、極
回路および零回路を含み前記各回路の接続構成が
可変な合成フイルタと、選択可能な複数種類の音
源波形を生成する音源波形生成回路と、前記合成
フイルタの接続構成および音源波形生成回路を合
成データに従つて選択する手段と、合成データを
記憶する合成データメモリと、入力として与えら
れる音声出力指令に従つて前記合成データを選択
あるいは編集し所望の音声合成波形を生成する手
段とから構成される。 The present invention provides a speech synthesis device of the type that synthesizes speech waveforms using synthesis data such as pole-zero parameters, pitch cycle data, and voiced and unvoiced data, in which the connection configuration of each of the circuits including a pole circuit and a zero circuit is variable. a synthesis filter, a sound source waveform generation circuit that generates a plurality of selectable types of sound source waveforms, means for selecting a connection configuration of the synthesis filter and the sound source waveform generation circuit according to synthesis data, and synthesis data that stores the synthesis data. It is comprised of a memory and means for selecting or editing the synthesis data in accordance with a voice output command given as an input to generate a desired voice synthesis waveform.

本発明の特徴は、複数個の極回路および零回路
の構成を音韻データ等に従つて可変とすることに
より、比較的小規模で且つ演算速度が比較的遅く
とも実時間処理が可能となるとともに音声音源生
成回路、雑音生成回路、破裂音源波形を生成する
破裂音源生成回路、予測残差波形データ等の外部
音源波形を供給する手段等の複数種類の音源波形
を選択可能とすることにより比較的品質の良い合
成音声が生成可能な音声合成装置を得ることがで
きる。さらに、本発明は合成データとして自然音
声波形を分析して得られるものを記憶しておき用
いる言わゆる分析合成型の音声合成装置として用
いることができるとともに、入力として与えられ
る文字系列から合成データを自動生成するいわゆ
る規則合成型の音声合成装置として用いることも
可能である。 The feature of the present invention is that by making the configuration of a plurality of polar circuits and zero circuits variable according to phonetic data, etc., it is possible to perform real-time processing even on a relatively small scale and with a relatively slow calculation speed. By making it possible to select multiple types of sound source waveforms, such as a sound source generation circuit, a noise generation circuit, a plosive sound source generation circuit that generates a plosive sound source waveform, and means for supplying external sound source waveforms such as predicted residual waveform data, relatively high quality can be achieved. It is possible to obtain a speech synthesizer capable of generating synthesized speech with good quality. Furthermore, the present invention can be used as a so-called analysis-synthesis type speech synthesizer that stores and uses synthesized data obtained by analyzing natural speech waveforms, and also can generate synthesized data from character sequences given as input. It is also possible to use it as a so-called rule synthesis type speech synthesizer that automatically generates speech.

次に図面を用いて本発明を詳細に説明する。 Next, the present invention will be explained in detail using the drawings.

まず、第１図は本発明の音声合成装置の一実施
例を示すブロツク図である。文字系列あるいは文
章番号データ等の音声出力指令データが音声出力
データ入力端子１を介して制御回路６に入力され
る。制御回路６は前記音声出力指令データに従つ
て、該合成データに対するアドレスデータを生成
しアドレスデータ伝送路３を介して合成データ記
憶回路２に出力する。合成データ記憶回路２は前
記アドレスデータに従つてスペクトル包絡パラメ
ータ値、音源データ等の合成データを合成データ
伝送路４を介して制御回路６に出力する。 First, FIG. 1 is a block diagram showing an embodiment of the speech synthesis apparatus of the present invention. Voice output command data such as character series or text number data is input to the control circuit 6 via the voice output data input terminal 1. The control circuit 6 generates address data for the composite data according to the audio output command data, and outputs it to the composite data storage circuit 2 via the address data transmission line 3. The composite data storage circuit 2 outputs composite data such as spectral envelope parameter values, sound source data, etc. to the control circuit 6 via the composite data transmission path 4 in accordance with the address data.

次に制御回路６は前記合成データを編集し合成
データ伝送路７を介して補間回路９に出力すると
ともに、前記合成データに従つて音声合成回路の
コンフイギユレーシヨンデータ（構成データ）を
生成しコンフイギユレーシヨンデータ伝送路８を
介してコンフイギユレーシヨン制御回路１２に出
力する。補間回路９は前記合成データの補間を行
ない前記スペクトル包絡パラメータ値に対する補
間値を極回路および零回路の制御パラメータ値と
して合成フイルタ制御データ伝送路１１を介して
合成フイルタ部２１に出力するとともに、前記音
源データに対する補間値を音源波形生成部１５に
出力する。一方、コンフイギユレーシヨン制御回
路１２は、前記コンフイギユレーシヨンデータに
従つて極回路および零回路のコンフイギユレーシ
ヨンデータを合成フイルタコンフイギユレーシヨ
ンデータ伝送路１４を介して合成フイルタ部２１
に出力するとともに、前記コンフイギユレーシヨ
ンデータに従つて音源波形の選択データを生成し
音源波形選択データ伝送路１３を介して音源波形
生成部１５に出力する。音源波形生成部１５は前
記音源波形の選択データに従つて合成データ記憶
回路２から音源波形データ伝送路５を介して入力
される音源波形データに従つて音源波形を生成す
る外部音源生成回路１６、三角波を音源波形とし
て生成する三角波音源生成回路１７、雑音波形を
音源波形として生成する雑音波形生成回路１８、
破裂音源を生成する破裂音源生成回路１９、のい
ずれかを選択し、前記音源データに対する補間値
に従つて音源波形に生成し合成フイルタ部２１に
出力する。合成フイルタ部２１は前記コンフイギ
ユレーシヨンデータに従つて極回路２２，２４，
２５，２７零回路２３，２６および加算回路２８
の接続を行ない、前記スペクトル包絡パラメータ
値の補間値および前記音源波形に従つて合成波形
を生成し合成波形出力端子２９を介して出力す
る。 Next, the control circuit 6 edits the synthesis data and outputs it to the interpolation circuit 9 via the synthesis data transmission line 7, and also generates configuration data for the speech synthesis circuit according to the synthesis data. The data is output to the configuration control circuit 12 via the configuration data transmission line 8. The interpolation circuit 9 interpolates the composite data and outputs the interpolated value for the spectral envelope parameter value as the control parameter value of the polar circuit and the zero circuit to the composite filter unit 21 via the composite filter control data transmission path 11. The interpolated value for the sound source data is output to the sound source waveform generation section 15. On the other hand, the configuration control circuit 12 synthesizes configuration data of polar circuits and zero circuits via a synthesis filter configuration data transmission line 14 according to the configuration data. Filter section 21
At the same time, sound source waveform selection data is generated according to the configuration data and output to the sound source waveform generation section 15 via the sound source waveform selection data transmission line 13. The sound source waveform generation unit 15 includes an external sound source generation circuit 16 that generates a sound source waveform in accordance with the sound source waveform data inputted from the synthetic data storage circuit 2 via the sound source waveform data transmission line 5 in accordance with the selection data of the sound source waveform; A triangular wave sound source generation circuit 17 that generates a triangular wave as a sound source waveform, a noise waveform generation circuit 18 that generates a noise waveform as a sound source waveform,
One of the plosive sound source generation circuits 19 that generate a plosive sound source is selected, and a sound source waveform is generated according to the interpolated value for the sound source data and output to the synthesis filter section 21. The synthesis filter section 21 converts polar circuits 22, 24,
25, 27 zero circuits 23, 26 and adder circuit 28
A synthesized waveform is generated according to the interpolated value of the spectral envelope parameter value and the sound source waveform, and is outputted via the synthesized waveform output terminal 29.

次に音源生成部および合成フイルタ部について
第１および第２の実施例を説明する。 Next, first and second embodiments of the sound source generation section and the synthesis filter section will be described.

第２図のａは第１の実施例として鼻音を合成す
る際の一構成例を示すブロツク図である。 FIG. 2a is a block diagram showing an example of the configuration when synthesizing nasal sounds as a first embodiment.

音源生成部においては三角波音源生成回路２０
１が選択され、三角波音源生成回路２０１は音源
データ伝送路２０２を介して与えられる音源デー
タに従つて三角波形をピツチ周期毎に生成し合成
フイルタ部２０３に出力する。合成フイルタ部２
０３においては極回路２０５，２０７，２０８お
よび零回路２０６が継続接続され、それぞれの極
回路おび零回路は合成フイルタ制御データ伝送路
２０４を介して与えられる合成フイルタ制御デー
タに従つて制御され前記三角波形を用いて合成波
形を生成し合成波形出力端子２０９を介して出力
する。 In the sound source generation section, a triangular wave sound source generation circuit 20
1 is selected, and the triangular wave sound source generation circuit 201 generates a triangular waveform every pitch period according to the sound source data provided via the sound source data transmission line 202 and outputs it to the synthesis filter section 203. Synthesis filter section 2
03, the pole circuits 205, 207, 208 and the zero circuit 206 are continuously connected, and each pole circuit and zero circuit is controlled according to the synthesis filter control data given via the synthesis filter control data transmission line 204, and the triangular A composite waveform is generated using the waveforms and outputted via the composite waveform output terminal 209.

第２図のｂは第２の実施例として有声まさつ音
（例えばザ行音の子音部）を合成する際の一構成
例を示すブロツク図である。 FIG. 2b is a block diagram showing an example of a configuration when synthesizing a voiced masatsu sound (for example, a consonant part of a za sound) as a second embodiment.

音源生成部２１０においては雑音源生成回路２
１２および三角波音源生成回路２１３が選択され
それぞれ音源データ伝送路２１１を介して与えら
れる音源データに従つて雑音および三角波を生成
し合成フイルタ部２１４に出力する。合成フイル
タ部２１４においては、縦続接続された極回路２
１７，２１８，２１９と極回路２１６が並列接続
される。極回路２１６は三角波、極回路２１７，
２１８，２１９は雑音によりそれぞれ励振させ、
出力が加算回路２２０により加え合わせられ合成
波形として合成波形出力端子２２１を介して出力
される。 In the sound source generation section 210, the noise source generation circuit 2
12 and a triangular wave sound source generation circuit 213 are selected and generate noise and a triangular wave according to the sound source data provided via the sound source data transmission line 211, respectively, and output them to the synthesis filter section 214. In the synthesis filter section 214, the cascade-connected polar circuits 2
17, 218, 219 and the polar circuit 216 are connected in parallel. The polar circuit 216 is a triangular wave, the polar circuit 217 is
218 and 219 are each excited by noise,
The outputs are added together by an adder circuit 220 and output as a composite waveform via a composite waveform output terminal 221.

以上の実施例では同図ａ，ｂともに４個の極回
路あるいは零回路を用いて構成されており、合成
フイルタ部において必要とされる演算量はほぼ同
じである。このことからも明らかなように本発明
によれば演算量の増加を併なわず種々の回路構成
を実現することができ、比較的高品質な合成音を
生成することが可能である。 In the above embodiments, both a and b in the figure are constructed using four pole circuits or zero circuits, and the amount of calculation required in the synthesis filter section is almost the same. As is clear from this, according to the present invention, it is possible to realize various circuit configurations without increasing the amount of calculations, and it is possible to generate synthesized sounds of relatively high quality.

本発明によれば極回路および零回路からなる合
成フイルタのコンフイギユレーシヨンを可変にす
ることにより比較的少ない演算量で多様な回路構
成が実現できるとともに、破裂音源、外部音源を
含む複数の音源回路を選択的に用いることにより
比較的高品質な合成音声を得ることができること
は明らかである。 According to the present invention, by making the configuration of the synthesis filter consisting of polar circuits and zero circuits variable, it is possible to realize various circuit configurations with a relatively small amount of calculation, and also to realize multiple circuit configurations including plosive sound sources and external sound sources. It is clear that relatively high quality synthesized speech can be obtained by selectively using the sound source circuits.

さらに本発明の大規模集積回路（LSI）技術を
用いて１チツプで実現することも可能である。 Furthermore, it is also possible to implement it on one chip using the large-scale integrated circuit (LSI) technology of the present invention.

[Brief explanation of drawings]

第１図は本発明の一実施例を示すブロツク図、
第２図ａ，ｂは第１図における音源波形生成部１
５および合成フイルタ部２１の具体的な第１およ
び第２の実施例を示すブロツク図である。１は音声出力データ入力端子、２は合成データ
記憶回路、３はアドレスデータ伝送路、４は合成
データ伝送路、５は音源波形データ伝送路、６は
制御回路、７は合成データ伝送路、８はコンフイ
ギユレーシヨンデータ伝送路、９は補間回路、１
０は音源データ伝送路、１１は合成フイルタ制御
データ伝送路、１２はコンフイギユレーシヨン制
御回路、１３は音源波形選択データ伝送路、１４
は合成フイルタコンフイギユレーシヨンデータ伝
送路、１５は音源波形生成部、１６は外部音源生
成回路、１７は三角波音源生成回路、１８は雑音
源生成回路、１９は破裂音源生成回路、２０は音
源波形伝送路、２１は合成フイルタ部、２２，２
４，２５，２７は極回路、２３，２６は零回路、
２８は加算回路、２９は合成波形出力端子、であ
る。２０１，２１３は三角波音源生成回路、２０
２，２１１は音源データ伝送路、２０３，２１４
は合成フイルタ部、２０４，２１５は合成フイル
タ制御データ伝送路、２０５，２０７，２０８，
２１６，２１７，２１８，２１９は極回路、２０
６は零回路、２０９，２２１は合成波形出力端
子、２１２は雑音源生回路、２２０は加算回路、
である。 FIG. 1 is a block diagram showing one embodiment of the present invention;
Figures 2a and b are the sound source waveform generator 1 in Figure 1.
5 is a block diagram showing concrete first and second embodiments of the synthesis filter section 21. FIG. 1 is an audio output data input terminal, 2 is a synthetic data storage circuit, 3 is an address data transmission line, 4 is a synthetic data transmission line, 5 is a sound source waveform data transmission line, 6 is a control circuit, 7 is a synthetic data transmission line, 8 is a configuration data transmission line, 9 is an interpolation circuit, 1
0 is a sound source data transmission line, 11 is a synthesis filter control data transmission line, 12 is a configuration control circuit, 13 is a sound source waveform selection data transmission line, 14
15 is a synthesis filter configuration data transmission line, 15 is a sound source waveform generation section, 16 is an external sound source generation circuit, 17 is a triangular wave sound source generation circuit, 18 is a noise source generation circuit, 19 is a plosive sound source generation circuit, and 20 is a sound source Waveform transmission line, 21 is a synthesis filter section, 22, 2
4, 25, 27 are polar circuits, 23, 26 are zero circuits,
28 is an adder circuit, and 29 is a composite waveform output terminal. 201 and 213 are triangular wave sound source generation circuits, 20
2, 211 is a sound source data transmission line, 203, 214
is a synthesis filter section, 204, 215 is a synthesis filter control data transmission line, 205, 207, 208,
216, 217, 218, 219 are polar circuits, 20
6 is a zero circuit, 209 and 221 are composite waveform output terminals, 212 is a noise source generating circuit, 220 is an adder circuit,
It is.

Claims

[Claims]

1. In a speech synthesis device of the type that synthesizes a speech waveform using synthesis data such as pole-zero parameters, pitch cycle data, voiced and unvoiced data, etc., a synthesis filter including a pole circuit and a zero circuit and having a variable connection configuration of each of the circuits is provided. , a sound source waveform generation circuit that generates a plurality of selectable types of sound source waveforms, means for selecting a connection configuration of the synthesis filter and the sound source waveform generation circuit according to the synthesis data, and a synthesis data memory that stores the synthesis data. and means for selecting or editing the synthesis data in accordance with a voice output command given as an input to generate a desired voice synthesis waveform.