JPH0365560B2

JPH0365560B2 -

Info

Publication number: JPH0365560B2
Application number: JP58047625A
Authority: JP
Priority date: 1983-03-22
Filing date: 1983-03-22
Publication date: 1991-10-14
Also published as: JPS59172689A

Description

【発明の詳細な説明】 (A) 発明の技術分野本発明は、音声分析合成装置、特に線形予測係
数に対応する形で、パワー・スペクトルを１／ｎ
乗する圧縮を行つた上で変形予測係数を得ている
音声分析合成装置において、上記線形予測係数を
係数に用いて構成するフイルタに代えて、上記変
形予測係数に用いたフイルタに対してｎ回縦続に
信号を通過せしめる構成を採用した音声分析合成
装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION (A) Technical Field of the Invention The present invention provides a speech analysis and synthesis device, in particular, a method for converting a power spectrum to 1/n in a form corresponding to linear prediction coefficients.
In a speech analysis and synthesis device that obtains deformed prediction coefficients by performing multiplication compression, instead of using a filter configured using the linear prediction coefficients as coefficients, the filter used for the deformed prediction coefficients is The present invention relates to a speech analysis and synthesis device that employs a configuration that allows signals to pass through in series.

(B) 技術の背景と問題点従来から、音声合成や音声認識などに用いるパ
ラメータの抽出に当つて、線形予測係数を抽出す
ることが行なわれている。そして上記音声合成や
音声認識に当つては、上記線形予測係数から入力
音声信号のスペクトル包絡情報を、例えば予測係
数自体を時間関数とみなしてフーリエ変換を行な
いそのスペクトルの逆スペクトルを算出すること
により、抽出したり、あるいは更に該スペクトル
包絡情報を用いてホルマント周波数などを求めた
りするようにされる。(B) Technical Background and Problems Conventionally, linear prediction coefficients have been extracted to extract parameters used in speech synthesis, speech recognition, etc. In the above-mentioned speech synthesis and speech recognition, the spectral envelope information of the input speech signal is obtained from the above-mentioned linear prediction coefficients by, for example, treating the prediction coefficient itself as a time function and performing Fourier transform to calculate the inverse spectrum of the spectrum. , or further use the spectral envelope information to obtain formant frequencies and the like.

しかし、スペクトル包絡情報を抽出する上記従
来公知の方式の場合には、得られた上記スペクト
ル包絡情報になどが入力音声のピツチ周波数など
に影響されるなどの問題を含んでいる。この問題
を解決すべく、本発明者らは、先に特願昭56−
188060号、特願昭56−188061号、特願昭57−
50431号などにおいて、入力音声から抽出された
パワー・スペクトルを例えば圧縮した上で、「変
形」予測係数α′を抽出するシステムについて提案
を行つた。即ち、一般にこの種の音声分析合成装
置においては、第１図に示す如き構成が採用され
て、線形予測係数αを得て例えばスペクトル包絡
情報P^（ｗ）が抽出されていた点を改善し、第２
図図示の如く「変形」予測係数α′を得てれから変
形スペクトル包絡情報P^（ｗ）を得た上で伸長し
てスペクトル包絡情報P^（ｗ）を得るようにする
ことなどが明らかにされている。なお、第１図に
おいて、１はフーリエ変換処理部であつて離散的
な入力音声信号Ｓ（ｎ）をフリーエ変換するもの、
２は２乗価抽出部であつて入力音声のパワー・ス
ペクトルＰ（ｗ）を抽出するもの、３はフーリエ
逆変換処理部であつてパワー・スペクトルＰ（ｗ）
に対してフーリエ逆変換をほどかして自己相関係
数Ｒ（ｎ）を算出するもの、４は線形予測係数算
出部であつて自己相関係数Ｒ（ｎ）にもとづいて
線形予測係数α（ｎ）を算出するもの、５はフー
リエ変換処理部であつて線形予測係数α（ｎ）を
時間関数とみなしてフーリエ変換を行なうもの、
６は２乗値抽出部、７は逆数処理部を表わしてい
る。なお、上記フーリエ変換処理部５と２乗抽出
部６と逆数処理部７とは、上記線形予測係数α
（ｎ）から入力音声信号のスペクトル包絡情報P^
（ｗ）を抽出するものと考えてよい。そして第２
図において、図中の符号１ないし７およびＳ
（ｎ）、Ｐ（ｗ）、P^（ｗ）は第１図に対応し、８は
第２図においてうけられる変換処理部、９は逆変
換処理部を表している。 However, in the case of the conventionally known method for extracting spectral envelope information, there is a problem that the obtained spectral envelope information is affected by the pitch frequency of the input voice. In order to solve this problem, the present inventors previously applied for a patent application filed in 1983-
No. 188060, Special Application No. 188061, Special Application No. 1880-
No. 50431 and others, we proposed a system that compresses the power spectrum extracted from input speech and then extracts the "deformation" prediction coefficient α'. That is, in general, this type of speech analysis and synthesis apparatus adopts the configuration shown in FIG. 1, and improves the point that the linear prediction coefficient α is obtained and, for example, spectral envelope information P^(w) is extracted. , second
As shown in the figure, it is clear that the "deformation" prediction coefficient α' is obtained, then the deformed spectrum envelope information P^(w) is obtained, and then the spectrum envelope information P^(w) is obtained by decompression. is being used. In FIG. 1, 1 is a Fourier transform processing unit that performs a Fourier transform on a discrete input audio signal S(n);
2 is a square value extraction unit that extracts the power spectrum P(w) of the input voice, and 3 is a Fourier inverse transform processing unit that extracts the power spectrum P(w).
4 is a linear prediction coefficient calculation unit which calculates the linear prediction coefficient α( based on the autocorrelation coefficient R(n). 5 is a Fourier transform processing unit that performs Fourier transform by regarding the linear prediction coefficient α(n) as a time function;
6 represents a square value extraction section, and 7 represents a reciprocal number processing section. The Fourier transform processing section 5, the square extraction section 6, and the reciprocal processing section 7 are used to calculate the linear prediction coefficient α.
Spectral envelope information P^ of the input audio signal from (n)
It can be thought of as extracting (w). and the second
In the figure, the numbers 1 to 7 and S in the figure
(n), P(w), and P^(w) correspond to those in FIG. 1, 8 represents a conversion processing section, and 9 represents an inverse conversion processing section, which is received in FIG.

第２図図示においては２乗値抽出部２によつて
入力音声のパワー・スペクトルＰ（ｗ）が得られ
るが、該パワー・スペクトルＰ（ｗ）に対して例
えば P′（ｗ）＝〔Ｐ（ｗ）〕^1/n −(1) なる変換を与える変換処理部８を挿置するように
する。第２図図示の場合、入力音声信号Ｓ（ｎ）
をフーリエ変換して絶対値をとつたパワー・スペ
クトルＰ（ｗ）に対して第(1)式に示す如き変換を
行なつた上で、変形自己相関係数R′（ｎ）、変形
予測係数α′（ｎ）、変形スペクトル包絡情報P^（ｗ）
を得てその上で、上記第(1)式の変換の逆変換を逆
変換処理部９において行なうようにする。即ち、
入力音声信号Ｓ（ｎ）をフーリエ変換した後であ
つてフーリエ逆変換処理部３によつて逆変換する
までの間の周波数領域において、第(1)式に示す如
き変換を行ない、スペクトル包絡情報P^（ｗ）を
抽出するに当つて、逆変換 P^（ｗ）＝P^′（ｗ）ⁿ −(2) を行なうようにしている。 In FIG. 2, the power spectrum P(w) of the input voice is obtained by the square value extraction unit 2. For example, P′(w)=[P (w)] A conversion processing unit 8 that provides a conversion of ^1/n −(1) is inserted. In the case shown in Figure 2, the input audio signal S(n)
After performing the transformation shown in equation (1) on the power spectrum P(w) whose absolute value was obtained by Fourier transformation, the modified autocorrelation coefficient R′(n) and the modified prediction coefficient α′(n), deformed spectrum envelope information P^(w)
Then, the inverse transformation of the transformation of the above equation (1) is performed in the inverse transformation processing section 9. That is,
In the frequency domain after the Fourier transform of the input audio signal S(n) and before the inverse transform by the Fourier inverse transform processor 3, the transform shown in equation (1) is performed to obtain spectral envelope information. In extracting P^(w), the inverse transformation P^(w)=P^'(w) ⁿ −(2) is performed.

上記第２図図示の如く、変形予測係数α′を抽出
することによつて多くの利点を享受できるが、例
えば音声分析合成処理などにおいて、例えば第２
図図示の逆変換処理部９から得られたスペクトル
包絡情報P^（ｗ）を用いて、上記圧縮処理が行わ
れない形での線形予測係数α（あるいは上述の圧
縮・伸長が行われた結果から得られることから線
形予測係数α″と言つてもよい）を再生し、当該
線形予測係数を係数とするフイルタを用いること
が必要となる。 As shown in Fig. 2 above, many advantages can be enjoyed by extracting the deformation prediction coefficient α'.
Using the spectral envelope information P^(w) obtained from the inverse transform processing unit 9 shown in the figure, the linear prediction coefficient α without the above compression processing (or the result after the above compression/expansion is performed) Therefore, it is necessary to reproduce the linear prediction coefficient α″) and use a filter that uses the linear prediction coefficient as a coefficient.

(C) 発明の目的と構成本発明は、上述の点に鑑み、得られている変形
予測係数α′から線形予測係数α″を再生してフイル
タを構成する点を改善し、変形予測係数α′を係数
に用いたフイルタ（本明細書においては変形フイ
ルタと呼んでいる）を構成し、所望の処理を行い
得るようにすることを目的としている。そしてそ
のため、本発明の音声分析合成装置は、少なくと
も、入力音声からパワースペクトルを求める手段
と、該求められたパワースペクトルについて１／ｎ
乗する圧縮を行う変換処理部と、該変換処理部からの出力にもとづいて予測係数
を算出する予測係数算出部とを有し、当該予測係数算出部によつて上記ワパースペク
トルについて１／ｎ乗した結果に対応した変形予
測係数を得ると共に、当該変形予測係数を出力し
た上で音声分析合成に利用する音声合成装置にお
いて、上記変形予測係数α′を係数として用いており、
かつ頃Σα_i′Z^-iを有するフイルタをそなえ、当該
フイルタに対してｎ回縦続に信号を通過せしめて
フイルタ出力を得るようにしたことを特徴としている。以下図面を参照しつつ説
明する。(C) Object and Structure of the Invention In view of the above-mentioned points, the present invention improves the point of constructing a filter by regenerating the linear prediction coefficient α″ from the obtained deformation prediction coefficient α′, and The purpose of the present invention is to configure a filter (referred to as a deformation filter in this specification) that uses ′ as a coefficient to perform desired processing.For this reason, the speech analysis and synthesis device of the present invention has the following features: , at least means for determining a power spectrum from an input voice, and 1/n for the determined power spectrum.
a conversion processing unit that performs multiplication compression; and a prediction coefficient calculation unit that calculates a prediction coefficient based on the output from the conversion processing unit; In a speech synthesis device that obtains a deformation prediction coefficient corresponding to the multiplication result and outputs the deformation prediction coefficient and uses it for speech analysis and synthesis, the deformation prediction coefficient α' is used as a coefficient,
The present invention is characterized in that it is equipped with a filter having a width Σα _i ′Z ^-i , and a signal is made to pass through the filter n times in cascade to obtain a filter output. This will be explained below with reference to the drawings.

(D) 発明の実施例第３図は音声合成に当つて残差信号にもとづい
て得られるピツチ周期とパワーと有声／無声情報
とを用いて音声合成を行う従来の構成例、第４図
は第３図に対応する本発明の一実施例、第５図は
波形符号化方式に用いられる従来の構成例、第６
図は第５図に対応する本発明の一実施例を示す。(D) Embodiments of the Invention Fig. 3 shows an example of a conventional configuration in which speech synthesis is performed using the pitch period, power, and voiced/unvoiced information obtained based on the residual signal. An embodiment of the present invention corresponding to FIG. 3, FIG. 5 is an example of a conventional configuration used in a waveform encoding method, and FIG.
The figure shows an embodiment of the invention corresponding to FIG.

第３図において、１０は雑音成分、１１はピツ
チ周期成分、１２はパワー成分、１３はフイルタ
であつて線形予測係数α（又はα″）を係数に用い
て構成されているもの、Ｓ（ｎ）は合成音声を表
わしている。 In FIG. 3, 10 is a noise component, 11 is a pitch periodic component, 12 is a power component, 13 is a filter configured using linear prediction coefficient α (or α″) as a coefficient, and S(n ) represents synthesized speech.

従来構成においては、上述の第(1)式に示す圧縮
処理を解除した状態の下で得られた線形予測係数
α（又はα″）が第３図図示のフイルタ１３におけ
る係数として用いられる。なおＺは時遅れ単位を
表わしている。しかし、本明細書冒頭に述べた如
く、第２図図示の構成で変形予測係数α′が得られ
ている状態からあらためて線形予測係数α″を生
成して利用することは好ましくなく煩雑である。 In the conventional configuration, the linear prediction coefficient α (or α″) obtained under the state in which the compression process shown in equation (1) above is canceled is used as the coefficient in the filter 13 shown in FIG. Z represents the time delay unit. However, as stated at the beginning of this specification, the linear prediction coefficient α'' is generated again from the state where the deformation prediction coefficient α' has been obtained with the configuration shown in FIG. It is undesirable and complicated to use.

本発明においては、上述の変形予測係数α′、変
形スペクトル包絡情報P^（ｗ）、線形予測係数α、
スペストル包格情報P^（ｗ）の関係が、
P^′（ｗ）←……→α′ 〔P^′（ｗ）〕ⁿ＝P^（ｗ） P^（ｗ）←……→α″ の如き関係にある点に着目し、（１／１＋Σα_i″Z^-i）←……→（１／１＋Σα′_iZ
^-i）ⁿ と対応するとみて、構成するようにしている。第
４図はその構成を示し、図示符号10、11、12、Ｓ
（ｎ）は第３図に対応し、１４は本発明にいう変
形フイルタの１つを表わしている。なお第４図に
おいては、変形フイルタ１４のｎ個分を縦続に接
続していることを示しているが、例えば変形フイ
ルタ１４−１のみを用いて、当該フイルタ１４−
１のみを用いて、当該フイルタ１４−１の出力を
入力側に戻すようにして信号をｎ回分縦続に通過
せしめるようにしてもよいことは言うまでもな
い。 In the present invention, the above-mentioned deformation prediction coefficient α′, deformation spectrum envelope information P^(w), linear prediction coefficient α,
The relationship between spastle inclusive information P^(w) is
P^′(w)←……→α′ [P^′(w)] ⁿ = P^(w) P^(w)←……→α″ Focusing on the relationship, (1 /1+Σα _i ″Z ^-i )←……→(1/1+Σα′ _i Z
^-i ) It is assumed that it corresponds to ⁿ , and is configured accordingly. Fig. 4 shows its configuration, with illustration symbols 10, 11, 12,
(n) corresponds to FIG. 3, and 14 represents one of the modified filters according to the present invention. Although FIG. 4 shows that n deformable filters 14 are connected in series, for example, only the deformable filter 14-1 is used to connect the filter 14-1.
It goes without saying that the output of the filter 14-1 may be returned to the input side by using only the filter 14-1, so that the signal passes n times in cascade.

第５図は波形符号化方式に用いられる従来の構
成例を示し、χは音声信号、εは残差信号、１５
は送信側フイルタであつて線形予測係数α（又は
α″）を係数に用いて構成されるもの、１６は受
信側フイルタであつて線形予測係数α（又はα″）
を係数に用いて構成されているものを表わしてい
る。 FIG. 5 shows an example of a conventional configuration used in the waveform encoding method, where χ is an audio signal, ε is a residual signal, and 15
16 is a transmitting side filter, which is configured using linear prediction coefficient α (or α″) as a coefficient, and 16 is a receiving side filter, which is configured using linear prediction coefficient α (or α″).
It represents what is constructed using the coefficients.

第６図図示の本発明の一実施例においては、送
信側フイルタ１７および受信側フイルタ１８とし
て、上述の変形予測係数α′を係数に用いて構成し
た変形フイルタ１９を利用するようにしている。
そして、上記フイルタ１５，１６とフイルタ１
７，１８との関係は、（１＋Σα_i″Z^-i）←……→（１＋Σα_i′Z^-i）ⁿ とする対応関係をもつようにされている。なお、
第６図図示の場合も変形フイルタ１９のｎ個分を
縦続に接続する必要はなく、要は信号が変形フイ
ルタ１９をｎ回分通過するよう構成すれば足り
る。 In the embodiment of the present invention shown in FIG. 6, a deformation filter 19 constructed using the above-mentioned deformation prediction coefficient α' as a coefficient is used as the transmitting side filter 17 and the receiving side filter 18.
Then, the filters 15 and 16 and the filter 1
The relationship between 7 and 18 is such that (1+Σα _i ″Z ^-i )←……→(1+Σα _i ′Z ^-i ) ⁿ .
In the case shown in FIG. 6 as well, it is not necessary to connect n deformation filters 19 in series, and it is sufficient if the signal passes through the deformation filters 19 n times.

(E) 発明の効果以上説明した如く、本発明によれば、変形予測
係数α′をそのまま係数として用いたフイルタを利
用することが可能となる。(E) Effects of the Invention As explained above, according to the present invention, it is possible to use a filter that uses the deformation prediction coefficient α' as it is as a coefficient.

[Brief explanation of drawings]

第１図および第２図は本発明の前提問題を説明
する説明図、第３図は音声合成に当つて残差信号
にもとづいて得られるピツチ周期とパワーと有
声／無声情報とを用いて音声合成を行う従来の構
成例、第４図は第３図に対応する本発明の一実施
例、第５図は波形符号化方式に用いられる従来の
構成例、第６図は第５図に対応する本発明の一実
施例を示す。図中、αまたはα″は線形予測係数、α′は変形予
測係数、１３，１５，１６はフイルタ、１４，１
９は変形フイルタを表わす。 Figures 1 and 2 are explanatory diagrams explaining the prerequisite problem of the present invention, and Figure 3 is an explanatory diagram for explaining the prerequisite problem of the present invention. An example of a conventional configuration for performing synthesis, FIG. 4 is an embodiment of the present invention corresponding to FIG. 3, FIG. 5 is an example of a conventional configuration used in a waveform encoding method, and FIG. 6 corresponds to FIG. 5. An embodiment of the present invention is shown below. In the figure, α or α″ is a linear prediction coefficient, α′ is a deformation prediction coefficient, 13, 15, 16 are filters, 14, 1
9 represents a deformation filter.

Claims

[Claims] 1. At least means for determining a power spectrum from an input voice; and 1/n for the determined power spectrum.
a conversion processing unit that performs compression to multiply the power spectrum; and a prediction coefficient calculation unit that calculates a prediction coefficient based on the output from the conversion processing unit; In a speech analysis and synthesis device that obtains a deformation prediction coefficient corresponding to the result and outputs the deformation prediction coefficient and uses it for speech analysis and synthesis, the deformation prediction coefficient α' is used as a coefficient,
1. A speech analysis and synthesis device comprising a filter having a width Σα _i ′Z ^-i and passing a signal through the filter n times in cascade to obtain a filter output.