JPH041920B2 - - Google Patents

Info

Publication number
JPH041920B2
JPH041920B2 JP57063153A JP6315382A JPH041920B2 JP H041920 B2 JPH041920 B2 JP H041920B2 JP 57063153 A JP57063153 A JP 57063153A JP 6315382 A JP6315382 A JP 6315382A JP H041920 B2 JPH041920 B2 JP H041920B2
Authority
JP
Japan
Prior art keywords
pitch
unvoiced
voiced
search range
period
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP57063153A
Other languages
Japanese (ja)
Other versions
JPS58179898A (en
Inventor
Satoru Taguchi
Masanori Kobayashi
Takayuki Ishikawa
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
Nippon Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Electric Co Ltd filed Critical Nippon Electric Co Ltd
Priority to JP6315382A priority Critical patent/JPS58179898A/en
Publication of JPS58179898A publication Critical patent/JPS58179898A/en
Publication of JPH041920B2 publication Critical patent/JPH041920B2/ja
Granted legal-status Critical Current

Links

Landscapes

  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
  • Working-Up Tar And Pitch (AREA)

Description

【発明の詳細な説明】[Detailed description of the invention]

本発明は音声波形をピツチ周期程度のフレーム
周期で分析して得られる自己相関係数に基づいて
ピツチ抽出を行なうピツチ抽出装置に関し、特に
聴覚的に重要な有声音連続部におけるピツチ抽出
誤りを大幅に減少し得るピツチ抽出装置に係る。 音声波形における有声音部分は周期的な繰返し
波形を持ち、その周期(ピツチ周期)の変化特性
は音声の分析合成、認識等における重要なパラメ
ータであることが知られている。例えば、音声の
分析合成系においては分析部で抽出されるピツチ
抽出結果が合成部において合成される合成音の品
質に大きな影響を及ぼす。 音声波形のピツチ周期抽出法としては、従来、
ピツチ周期程度の時間長を持つフレーム毎に自己
相関係数を算出し抽出する方法等、種々の分析パ
ラメータを用いる方法が知られている。 自己相関係数に基づくピツチ抽出法は、自己相
関係数が時間領域内の処理で求め得る点と、被分
析波形とフレームとの位相の影響が比較的に小さ
い点とから広く用いられている。しかしながら自
己相関係数に基づくピツチ抽出法は、ピツチ周期
の整数倍、又はピツチ周期のN1/N2倍の周期を
ピツチ周期として誤つて検出することが多い(但
し、N1,N2は整数であり、N1<N2である)。
なお、ピツチ抽出法における上記、本来のピツチ
周期の整数倍の周期をピツチ周期と誤検出する
(以後、「整数倍ピツチ周期誤り」と云う)問題に
ついては本発明者の提案になる特開昭54−139307
「ピツチ抽出装置」の中に詳しく述べられている。 多くの自己相関係数に基づくピツチ抽出法は整
数倍ピツチ周期誤り等を緩和するためにピツチ検
索範囲をピツチ周期付近に限定している。従来ピ
ツチ検索範囲決定法としては、音声を有声音区間
と無声音区間とに区分し、更に無声音区間が比較
的長時間接続した後の有声区間に対するピツチ検
索範囲は初期設定値に、又、連続有声音区間に対
するピツチ検索範囲は、過去のピツチ周期の近傍
に設定する方法が行なわれていた。なお、この種
従来のピツチ検索範囲決定法については本発明者
の提案になる本願、特開昭56−42296「ピツチ抽出
装置」に詳しく述べられている。 しかしながら、従来の方法では所定期間以上の
無声音区間の後の有声音区間に対しては初期設定
されたピツチ検索範囲を与えるが、これより短い
期間の無声音区間については前のピツチ周期をホ
ールドして有声音区間と同じように処理してい
た。実際、短期間の無声音区間であつてもその前
後でピツチ周期は変つている場合が多く、従来の
ような二者択一的な処理では短期間の無声音区間
の後のピツチ周期抽出は正確さを欠くものとなつ
ていた。 本発明の目的は概有声音区間に短時間の無声区
間が混在しても、正確なピツチ検索が可能なピツ
チ抽出装置を供給することにある。 本発明は概有声音区間に混在する短時間の無声
音区間長に応じてピツチ検索範囲を拡大制御する
手段を有している。 次に本発明の実施例を図面を参照して説明す
る。 図は本発明の実施例を説明するためのブロツク
図である。音声入力端子1を介して入力音声信号
がA/D変換器2へ入力される。A/D変換器2
は入力音声信号を例えば8kHzで標本化し、各標
本を量子化する。標本化、量子化された音声信号
はバツクフアメモリ3へ書込まれる。バツフアメ
モリ3はフレーム周期後、例えば20msec毎に1
分析窓長分、例えば30msec分の音声サンプルを
有声/無声判別器4と自己相関係数計測器5とへ
出力する。有声/無声判別器4は音声の有声/無
声判別を行なうものであり、バツフアメモリ3よ
りフレーム周期後に供給される音声波形の有声/
無声を判定し、更に判定結果をピツチ検索器6へ
出力する。自己相関係数計測器5はバツフアメモ
リ3よりフレーム周期毎に供給される音声波形の
自己相関係列をピツチ周期分布範囲、例えば
2.6msec〜15msecに対応する遅れ範囲について計
測する。自己相関係数計測器5により計測される
遅れτの自己相関係数ρPである。但しx(i)は分析フレームにおける第i番
目の音声サンプル、Nは自然数であり、例えば
15msecに相等する数(8kHzサンプルのとき120)
である。計測された自己相関係数列はピツチ検索
器6へ供給される。ピツチ検索器6は自己相関係
数計測器5より供給される自己相関係数列のピツ
チ検索範囲に於ける最大値を検索し、前記最大値
に対応する遅れ時間τPを求める。前記ピツチ検索
範囲を表わす情報はピツチ検索範囲計測器10か
ら供給される。 ピツチ検索器6は更に有声/無声判別器4より
供給される有声/無声判別結果が有声であれば前
記τPを、判別結果が無声であれば前記τPを“0”
に変更して出力する。ピツチ検索器6の出力は連
続有声検出器7と連続無声検出器8とピツチ検索
範囲計測器10とピツチ出力端子11とへ供給さ
れる。連続有声検出器7はピツチ検索器6より供
給されるデータが比較的に長時間例えば100msec
連続して“0”でない場合、即ち有声音の連続部
分を検出するとトリガを出力しフリツプフロツプ
9を“1”にセツトする。連線無声検出器8はピ
ツチ検索器6より供給されるデータが比較的に長
時間例えば300msec連続して“0”の場合、即ち
無声音の連続部分を検出するとトリガを出力し、
フリツプフロツプ9を“0”にリセツトする。 ピツチ検索範囲計測器10は蓄積プログラム方
式の処理装置であり、フリツプフロツプ9より供
給される信号が“0”であるときは、初期ピツチ
検索範囲、例えば2.625msec〜11.25msecを設定
しピツチ検索器6へ設定値を出力する。又、フリ
ツプフロツプ9より供給される信号が“1”であ
るときは、ピツチ検索器6より供給される過去の
ピツチ周期をリークドインテグレードし平均ピツ
チ周期を算出し、更に平均ピツチ周期の近傍にピ
ツチ検索範囲を設定し、ピツチ検索器6へ設定値
を出力する。フリツプフロツプ9より供給される
信号が“0”から“1”に変化するとき、即ち無
声連続から有声連続への過度部分では、ピツチ検
索範囲計測器10はピツチ検索器6より供給され
る過去数フレーム分のピツチ周期の平均値、例え
ば100msec分のピツチ周期の平均値を算出し、更
に平均ピツチ周期の近傍にピツチ検索範囲を設定
し、ピツチ検索器6へ設定値を出力する。 フリツプフロツプ9の出力が“1”の状態、即
ち有声音の連続区間と設定された区間で、有声/
無声判別器4が無声判別を数フレーム行ない、そ
の後、有声判別を行なう場合にピツチ検索範囲計
測器10は特徴ある処理を行なう。フリツプフロ
ツプ9の出力が“1”の状態で有声/無声判別器
4が無声判別を行なうと、ピツチ検索器6の出力
は“0”となるため、ピツチ検索範囲計測器10
はピツチ周期を単純なリークドインテグレートは
実行しない。ピツチ検索範囲計測器10はピツチ
検索器6から供給されるピツチ周期情報が“0”
となる直前までのリークドインテグレートされた
平均ピツチ周期を保持する。更にピツチ検索範囲
計測器10は保持された平均ピツチ周期から、ピ
ツチ周期が“0”の区間、即ち無声区間に対応し
てピツチ検索範囲を広げる。無声音になる直前の
平均ピツチ周期をTA、ピツチ検索範囲の上限を
決定する係数をA(A>1.0)、ピツチ検索範囲の
下限を決定する係数をB(B<1.0)とすると、
The present invention relates to a pitch extraction device that performs pitch extraction based on an autocorrelation coefficient obtained by analyzing a speech waveform with a frame period approximately equal to the pitch period. This relates to a pitch extraction device that can be reduced in size. It is known that the voiced part of a speech waveform has a periodic repeating waveform, and the change characteristic of the period (pitch period) is an important parameter in speech analysis, synthesis, recognition, etc. For example, in a speech analysis and synthesis system, the pitch extraction result extracted by the analysis section has a large effect on the quality of synthesized speech synthesized by the synthesis section. Conventionally, the pitch period extraction method for audio waveforms is
Methods using various analysis parameters are known, such as a method of calculating and extracting an autocorrelation coefficient for each frame having a time length of approximately the pitch period. Pitch extraction methods based on autocorrelation coefficients are widely used because the autocorrelation coefficients can be obtained through processing in the time domain, and the influence of the phase between the analyzed waveform and the frame is relatively small. . However, pitch extraction methods based on autocorrelation coefficients often erroneously detect periods that are an integral multiple of the pitch period or N1/N2 times the pitch period as the pitch period (however, N1 and N2 are integers, N1 < N2).
The above-mentioned problem in the pitch extraction method where a cycle that is an integer multiple of the original pitch cycle is mistakenly detected as a pitch cycle (hereinafter referred to as "integer multiple pitch cycle error") is solved by the method proposed by the present inventor in Japanese Patent Application Laid-Open No. 54−139307
It is described in detail in ``Pituchi Extraction Device.'' Many pitch extraction methods based on autocorrelation coefficients limit the pitch search range to around the pitch period in order to alleviate integer multiple pitch period errors. Conventionally, the pitch search range determination method divides speech into voiced sections and unvoiced sections, and then sets the pitch search range for the voiced section after unvoiced sections have been connected for a relatively long time to an initial setting value, or The pitch search range for a voice section has been set in the vicinity of the past pitch period. This type of conventional pitch search range determination method is described in detail in the present application, ``Pitch Extraction Apparatus,'' proposed by the inventor of the present invention, ``Pitch Extraction Apparatus'' published in Japanese Patent Application Laid-Open No. 56-42296. However, in the conventional method, an initially set pitch search range is given for a voiced section after an unvoiced section of a predetermined period or more, but the previous pitch period is held for an unvoiced section of a shorter period. It was processed in the same way as voiced intervals. In fact, even if there is a short-term unvoiced sound section, the pitch period often changes before and after it, and in conventional two-way processing, pitch period extraction after a short-term unvoiced sound section is not accurate. It had become something lacking. SUMMARY OF THE INVENTION An object of the present invention is to provide a pitch extraction device capable of accurate pitch retrieval even when short-time unvoiced sections are mixed in generally voiced sections. The present invention has means for controlling and expanding the pitch search range in accordance with the length of a short unvoiced sound section mixed in a generally voiced sound section. Next, embodiments of the present invention will be described with reference to the drawings. The figure is a block diagram for explaining an embodiment of the present invention. An input audio signal is input to an A/D converter 2 via an audio input terminal 1 . A/D converter 2
samples the input audio signal at, for example, 8kHz and quantizes each sample. The sampled and quantized audio signal is written into the buffer memory 3. The buffer memory 3 stores data once every 20 msec after the frame period, for example.
Speech samples corresponding to the length of the analysis window, for example 30 msec, are output to the voiced/unvoiced discriminator 4 and the autocorrelation coefficient measuring device 5. The voiced/unvoiced discriminator 4 determines whether the voice is voiced or unvoiced.
It is determined whether there is no voice, and the determination result is further output to the pitch search device 6. The autocorrelation coefficient measuring device 5 calculates the autocorrelation sequence of the audio waveform supplied from the buffer memory 3 for each frame period into a pitch period distribution range, e.g.
Measure the delay range corresponding to 2.6msec to 15msec. The autocorrelation coefficient ρ P of the delay τ measured by the autocorrelation coefficient measuring device 5 is It is. However, x(i) is the i-th audio sample in the analysis frame, N is a natural number, for example
Number equivalent to 15msec (120 for 8kHz sample)
It is. The measured autocorrelation coefficient sequence is supplied to a pitch searcher 6. The pitch searcher 6 searches for the maximum value in the pitch search range of the autocorrelation coefficient sequence supplied from the autocorrelation coefficient measuring device 5, and determines the delay time τ P corresponding to the maximum value. Information representing the pitch search range is supplied from a pitch search range measuring device 10. The pitch searcher 6 further sets the above τ P to "0" if the voiced/unvoiced discrimination result supplied from the voiced/unvoiced discriminator 4 is voiced, and sets the above τ P to "0" if the discrimination result is unvoiced.
Change it to and output it. The output of the pitch searcher 6 is supplied to a continuous voicing detector 7, a continuous unvoiced detector 8, a pitch search range measuring device 10, and a pitch output terminal 11. The continuous voicing detector 7 receives data supplied from the pitch searcher 6 for a relatively long time, for example, 100 msec.
If it is not "0" continuously, that is, if a continuous portion of voiced sound is detected, a trigger is output and the flip-flop 9 is set to "1". The continuous unvoiced detector 8 outputs a trigger when the data supplied from the pitch searcher 6 is "0" continuously for a relatively long time, for example, 300 msec, that is, when a continuous portion of unvoiced sounds is detected,
Reset flip-flop 9 to "0". The pitch search range measuring device 10 is a storage program type processing device, and when the signal supplied from the flip-flop 9 is "0", the pitch search range measuring device 10 sets an initial pitch search range, for example, 2.625 msec to 11.25 msec. Output the setting value to. Also, when the signal supplied from the flip-flop 9 is "1", the past pitch period supplied from the pitch searcher 6 is leaked integrated to calculate the average pitch period, and then the average pitch period is calculated. The pitch search range is set and the set value is output to the pitch search device 6. When the signal supplied from the flip-flop 9 changes from "0" to "1", that is, in the transition from unvoiced to voiced continuation, the pitch search range measuring device 10 detects the past few frames supplied from the pitch search device 6. The average value of the pitch periods of minutes, for example, the average value of the pitch periods of 100 msec, is calculated, a pitch search range is set in the vicinity of the average pitch period, and the set value is output to the pitch search device 6. When the output of the flip-flop 9 is "1", that is, in the interval set as a continuous interval of voiced sounds, voiced/
When the unvoiced discriminator 4 performs unvoiced determination for several frames and then performs voiced determination, the pitch search range measuring device 10 performs a characteristic process. When the voiced/unvoiced discriminator 4 performs voiceless discrimination when the output of the flip-flop 9 is "1", the output of the pitch search device 6 becomes "0", so the pitch search range measuring device 10
does not perform a simple leaked integration of the pitch cycle. The pitch search range measuring device 10 has pitch cycle information supplied from the pitch search device 6 as “0”.
The leaked integrated average pitch period just before is held. Further, the pitch search range measuring device 10 widens the pitch search range from the held average pitch cycle to correspond to the interval where the pitch cycle is "0", that is, the silent interval. Assuming that the average pitch period immediately before becoming an unvoiced sound is T A , the coefficient that determines the upper limit of the pitch search range is A (A > 1.0), and the coefficient that determines the lower limit of the pitch search range is B (B < 1.0),

【表】【table】

Claims (1)

【特許請求の範囲】[Claims] 1 入力音声信号の有声/無声音部を判別し、有
声音区間の自己相関係列からピツチ周期を抽出す
るピツチ抽出装置において、前記有声音区間に混
在する短時間の無声音区間長を求める手段と、こ
の無声音区間長に応じてピツチ検索範囲を拡大制
御する手段とを有することを特徴とするピツチ抽
出装置。
1. In a pitch extraction device that discriminates voiced/unvoiced parts of an input audio signal and extracts a pitch period from an autocorrelation sequence of the voiced sound section, means for determining the length of a short unvoiced sound section mixed in the voiced sound section; A pitch extraction device characterized by comprising means for expanding and controlling a pitch search range according to the length of the unvoiced sound section.
JP6315382A 1982-04-15 1982-04-15 Pitch extractor Granted JPS58179898A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP6315382A JPS58179898A (en) 1982-04-15 1982-04-15 Pitch extractor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP6315382A JPS58179898A (en) 1982-04-15 1982-04-15 Pitch extractor

Publications (2)

Publication Number Publication Date
JPS58179898A JPS58179898A (en) 1983-10-21
JPH041920B2 true JPH041920B2 (en) 1992-01-14

Family

ID=13221001

Family Applications (1)

Application Number Title Priority Date Filing Date
JP6315382A Granted JPS58179898A (en) 1982-04-15 1982-04-15 Pitch extractor

Country Status (1)

Country Link
JP (1) JPS58179898A (en)

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5642296A (en) * 1979-09-17 1981-04-20 Nippon Electric Co Pitch extractor

Also Published As

Publication number Publication date
JPS58179898A (en) 1983-10-21

Similar Documents

Publication Publication Date Title
US7072836B2 (en) Speech processing apparatus and method employing matching and confidence scores
JP4202090B2 (en) Speech synthesis system using smoothing filter and method thereof, smoothing filter characteristic control device and method thereof
US5091948A (en) Speaker recognition with glottal pulse-shapes
JPS597120B2 (en) speech analysis device
US20140200889A1 (en) System and Method for Speech Recognition Using Pitch-Synchronous Spectral Parameters
US5809453A (en) Methods and apparatus for detecting harmonic structure in a waveform
JPH041920B2 (en)
JPH07191696A (en) Voice recognizer
AU612737B2 (en) A phoneme recognition system
JPH0378636B2 (en)
JPS6214839B2 (en)
JPS6151320B2 (en)
JP2583854B2 (en) Voiced / unvoiced judgment method
JPH0122639B2 (en)
KR100359988B1 (en) real-time speaking rate conversion system
KR930010398B1 (en) Transfer section detecting method on sound signal wave
JPS6068000A (en) Pitch extractor
US20240013803A1 (en) Method enabling the detection of the speech signal activity regions
JP3423233B2 (en) Audio signal processing method and apparatus
Govender et al. Fundamental frequency and tone in isizulu: initial experiments.
JP3049711B2 (en) Audio processing device
JPH02192335A (en) Word head detecting system
KR100322704B1 (en) Method for varying voice signal duration time
JP2679039B2 (en) Vowel cutting device
Howard et al. Towards a comprehensive quantitative assessment of the operation of real-time fundamental frequency extractors