JPS6330640B2

JPS6330640B2 -

Info

Publication number: JPS6330640B2
Application number: JP57002673A
Authority: JP
Inventors: Takeo Murata; Nobuhisa Kadowaki
Original assignee: Agency of Industrial Science and Technology
Current assignee: National Institute of Advanced Industrial Science and Technology AIST
Priority date: 1982-01-13
Filing date: 1982-01-13
Publication date: 1988-06-20
Also published as: JPS58120300A

Description

【発明の詳細な説明】本発明は破裂音を抽出する破裂音抽出装置に関
するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a plosive sound extraction device for extracting plosive sounds.

従来、マイクロホンより通常の方法でピツクア
ツプした音声の特徴から破裂音を確実に抽出する
方法は知られていない。 Conventionally, there is no known method for reliably extracting plosive sounds from the characteristics of sounds picked up by a microphone using a normal method.

本発明は発話にともなう音声の音圧情報のみな
らず、口気流情報をもマイクロホンで検出し、そ
の信号を処理することにより破裂音を確実、簡便
かつリアルタイムで抽出する破裂音抽出装置を提
供することを目的とする。 The present invention provides a plosive sound extraction device that detects not only the sound pressure information of voices accompanying speech but also oral air flow information using a microphone, and processes the signals to extract plosive sounds reliably, easily, and in real time. The purpose is to

以下、本発明の実施例について、図面と共に説
明する。 Embodiments of the present invention will be described below with reference to the drawings.

第１図は本発明の破裂音検出装置の一実施例を
示すブロツク図である。 FIG. 1 is a block diagram showing an embodiment of the plosive sound detection device of the present invention.

図において、１は発話にともなう口気流と音声
の音圧を検出するマイクロホンであり、たとえば
コンデンサマイクロホンの風防部分を取りのぞき
発話者の口の前方、口気流を受けるように配置さ
れる。 In the figure, reference numeral 1 denotes a microphone that detects the oral airflow and sound pressure of the voice accompanying speech, and is placed, for example, in front of the speaker's mouth, with the windshield of the condenser microphone removed, so as to receive the oral airflow.

２はこのマイクロホン１の出力信号の低周波成
分のみを通過させるローパスフイルタであり、た
とえば遮断周波数200Hz、減衰傾度12dB／oct程
度のアクテイブフイルタで構成される。３はマイ
クロホン１の出力信号の高周波数成分のみを通過
させるハイパスフイルタで、例えば遮断周波数
300Hz、減衰傾度12dB／oct程度のアクテイブフ
イルタで構成される。４はこのハイパスフイルタ
ー３の出力信号のパワーを検出するパワー検出回
路であり、たとえば２乗回路等を用いて構成され
るが、全波又は半波整流回路等で代用することも
できる。５はこのパワー検出回路４の出力信号を
積分する積分回路であり、たとえば、一次おくれ
回路等が用いられる。６はこの積分回路５の出力
が大なるとき前記ローパスフイルタの出力を小と
する処理回路でありたとえば、電圧制御可変利得
増巾回路や割り算回路等で構成される。極端には
ゲート回路を用いることもできる。 Reference numeral 2 denotes a low-pass filter that passes only the low frequency components of the output signal of the microphone 1, and is composed of, for example, an active filter with a cutoff frequency of 200 Hz and an attenuation slope of about 12 dB/octave. 3 is a high-pass filter that passes only the high frequency components of the output signal of microphone 1; for example, the cutoff frequency
It consists of an active filter with a frequency of 300Hz and an attenuation slope of approximately 12dB/octave. Reference numeral 4 denotes a power detection circuit for detecting the power of the output signal of the high-pass filter 3, which is constructed using, for example, a square circuit or the like, but it can also be replaced with a full-wave or half-wave rectifier circuit. Reference numeral 5 denotes an integrating circuit that integrates the output signal of the power detection circuit 4, and for example, a first-order delay circuit or the like is used. A processing circuit 6 reduces the output of the low-pass filter when the output of the integrating circuit 5 becomes large, and is composed of, for example, a voltage-controlled variable gain amplification circuit or a division circuit. In the extreme, a gate circuit can also be used.

第２図ｂ〜ｇ及び第３図ｊ〜ｑは第１図に示し
た破裂音抽出装置の各部の信号波形を示すもので
あり、第２図は破裂音、第３図は無声摩擦音を発
話した場合をそれぞれ示している。また、第２図
ａ及び第３図ｉは夫々破裂音、無声摩擦音を発話
した場合の音声の音圧を通常の方法でマイクロホ
ンで検出した波形であり、第２図ｈ及び第３図ｑ
は夫々破裂音、無声摩擦音を発話した場合の口の
前における口気流流速波形を示している。 Figures 2 b to g and Figures 3 j to q show the signal waveforms of each part of the plosive extractor shown in Figure 1. Figure 2 is for plosive sounds, and Figure 3 is for voiceless fricatives. Each case is shown below. Furthermore, Fig. 2a and Fig. 3i are the waveforms of the sound pressure of the voice detected by a microphone in the usual manner when a plosive and a voiceless fricative are uttered, respectively, Fig. 2h and Fig. 3q
show the oral airflow velocity waveforms in front of the mouth when plosives and voiceless fricatives are uttered, respectively.

第２図及び第３図において、ａ，ｉは夫々破裂
音と無声摩擦音の音声波形の代表例を表わし、
ｂ，ｊは第１図におけるマイクロホン１の出力信
号を、ｃ，ｋはローパスフイルタ２の出力信号
を、ｄ，ｌはハイパスフイルタ３の出力信号を、
ｅ，ｍはパワー検出回路４の出力信号を、ｆ，ｎ
は積分回路５の出力信号を、ｇ，ｐは処理回路６
の出力信号をそれぞれ表わす。 In FIGS. 2 and 3, a and i represent representative examples of the audio waveforms of plosives and voiceless fricatives, respectively;
b, j are the output signals of the microphone 1 in FIG. 1, c, k are the output signals of the low pass filter 2, d, l are the output signals of the high pass filter 3,
e, m are the output signals of the power detection circuit 4, f, n
is the output signal of the integrating circuit 5, and g and p are the processing circuit 6.
represent the output signals of, respectively.

以上のような構成において、破裂音、例えば、
｜ｐ｜をふくむ音｜pu｜を発声した場合、口の
前におかれたマイクロホン１は第２図ｂのような
波形を出力として生ずる。この出力波形は口気流
ｈによりマイクロホンの振動膜が偏位せられるこ
とにより生ずる出力と、破裂時の空気振動に基づ
く音圧ａにより生ずる出力とが重畳されたもので
ある。 In the above configuration, plosive sounds, for example,
When a sound |pu| containing |p| is uttered, the microphone 1 placed in front of the mouth outputs a waveform as shown in FIG. 2b. This output waveform is a superposition of the output generated by the deflection of the microphone's diaphragm due to the oral air flow h and the output generated by the sound pressure a based on the air vibration at the time of bursting.

ここで、一般に口気流による出力の主成分は、
空気振動音による出力の主成分に比し、低い周波
数を有することが実験により確かめられた。従つ
て、ローパスフイルタ２の出力である波形ｃは、
破裂音発話時の口気流に基づくものであるとみな
すことができる。 Here, the main component of the output due to oral airflow is generally
Experiments have confirmed that this has a lower frequency than the main component of the output due to air vibration sound. Therefore, the waveform c that is the output of the low-pass filter 2 is
It can be considered that it is based on the oral air flow during plosive speech.

一般に破裂音発話時の口気流流速は、非破裂音
発話時の緩慢な小さい口気流に比し、立上りの鋭
い大きなパルス状を呈し、これに対応してローパ
スフイルタ２の出力も代表的には波形ｃの様なパ
ルス状となり、一般の非破裂音発話時にローパス
フイルタ２の出力が殆ど、生じないのに比し明ら
かに区別される。 In general, the oral airflow velocity during plosive utterances exhibits a large pulse shape with a sharp rise compared to the slow, small oral airflow during non-plosive utterances. Correspondingly, the output of the low-pass filter 2 also typically The waveform becomes a pulse like waveform c, and is clearly distinguished from the output of the low-pass filter 2 which hardly occurs during normal non-plosive speech.

しかしながら、非破裂音のうちの無声摩擦音、
特に、｜Φ｜の発話時には速度の早い動揺性の口
気流を生じるため、ローパスフイルタ２の出力波
形が破裂音のそれと区別しにくくなる。すなわ
ち、第３図は無声摩擦音、特に｜Φu｜を発話し
た場合であり、口気流流速波形ｑに対応してロー
パスフイルタ２の出力は波形ｋのように、かなり
の大きさに達し、破裂音に対する波形第２図ｃと
区別がつきにくくなり、ローパスフイルタ２の出
力のみでは正確な破裂音の抽出が困難である。 However, voiceless fricatives among non-plosives,
In particular, when |Φ| is uttered, a fast and perturbing oral airflow is generated, making it difficult to distinguish the output waveform of the low-pass filter 2 from that of a plosive. In other words, Fig. 3 shows the case where an unvoiced fricative, especially |Φu|, is uttered, and the output of the low-pass filter 2 reaches a considerable size as shown in the waveform k in response to the oral airflow velocity waveform q, resulting in a plosive. It becomes difficult to distinguish the plosive sound from the waveform c in FIG.

これに対し、実験の結果、次の点が明らかにな
つた。即ち、破裂音の場合のローパスフイルタ２
の出力は破裂がはじまつた直後に生じるのに対
し、無声摩擦音とくに｜Φ｜の場合のローパスフ
イルタ出力は摩擦の開始時点より、かなりの後、
多くは摩擦音区間の中央附近で生じること、ま
た、破裂音の場合のハイパスフイルタ３の出力
は、第２図ｄの様に破裂時点より以降の短い期間
に生じるが、一方無声摩擦音、とくに｜Φ｜の場
合には、第３図ｌの様に摩擦全区間にわたり生
じ、この成分は摩擦による音声の音圧によるもの
の他、マイクロホンの振動膜が口気流により偏位
せられることにより生ずるものから成つておりか
なりの大きさに達すること、である。 On the other hand, as a result of experiments, the following points became clear. That is, low-pass filter 2 in the case of plosives
The output of is generated immediately after the rupture begins, whereas the low-pass filter output for voiceless fricatives, especially |Φ|, occurs much later than the beginning of the fricative.
Most of the fricatives occur near the center of the fricative section, and in the case of plosives, the output of the high-pass filter 3 occurs in a short period after the plosive point, as shown in Figure 2d, but on the other hand, voiceless fricatives, especially |Φ In the case of ｜, the friction occurs over the entire range as shown in Figure 3, and this component is not only due to the sound pressure of the voice due to friction, but also due to the deflection of the microphone's vibrating membrane by the oral airflow. It grows and reaches a considerable size.

従つて、破裂音、無声摩擦音を発話したときの
パワー検出回路４の出力は、それぞれ第２図ｅ，
第３図ｍの様になり、積分回路５の出力はそれぞ
れ第２図ｆ、第３図ｎの様になる。すなわち、ロ
ーパスフイルタ２に大きな出力を生じる時点での
積分回路５の出力は破裂音に比し、無声摩擦音時
に遥かに大となる。故にこの積分回路５の出力に
より、ローパスフイルタの出力を処理回路６で処
理し、積分回路５の出力が大なるときに、ローパ
スフイルタ２の出力が小となる様にすれば、破裂
音及び無声摩擦音とくに｜Φ｜に対して、処理回
路６の出力はそれぞれ第２図ｇ、第３図ｐの様に
なる。図より明らかなように、処理回路６より破
裂音の場合には無声摩擦音の場合に比し、遥かに
大なるパルス出力が得られ、無声摩擦音との区別
を正確に行ない得るリアルタイムでの破裂音抽出
装置が実現できる。 Therefore, the outputs of the power detection circuit 4 when a plosive sound and a voiceless fricative sound are uttered are as shown in FIG. 2e, respectively.
The outputs of the integrating circuit 5 are as shown in FIG. 3 m, and the outputs of the integrating circuit 5 are as shown in FIG. 2 f and FIG. 3 n, respectively. That is, the output of the integrating circuit 5 at the time when a large output is produced in the low-pass filter 2 is much larger in the case of a voiceless fricative than in the case of a plosive. Therefore, by using the output of the integrating circuit 5, the output of the low-pass filter is processed by the processing circuit 6, so that when the output of the integrating circuit 5 is large, the output of the low-pass filter 2 is small. For fricative sounds, especially |Φ|, the outputs of the processing circuit 6 are as shown in FIG. 2g and FIG. 3p, respectively. As is clear from the figure, a much larger pulse output is obtained from the processing circuit 6 in the case of plosives than in the case of voiceless fricatives, and plosives can be accurately distinguished from voiceless fricatives in real time. An extraction device can be realized.

以上の説明から明らかなように、本発明による
破裂音検出装置は、マイクロホン出力中の比較的
高周波成分のパワーを積分し、積分出力によつて
マイクロホン出力中の比較的低周波成分を制御
し、この制御された低周波成分によつて破裂音を
抽出するもので、破裂音の確実な抽出が可能とな
る。 As is clear from the above description, the plosive sound detection device according to the present invention integrates the power of relatively high frequency components in the microphone output, controls the relatively low frequency components in the microphone output by the integrated output, This controlled low-frequency component is used to extract plosive sounds, making it possible to reliably extract plosive sounds.

また、本発明によれば、センサーとして１個の
マイクロホンを用いるのみでよく、回路も簡単で
あるので、装置を簡単になし得るとともに、リア
ルタイムでの破裂音抽出が可能である。 Further, according to the present invention, it is only necessary to use one microphone as a sensor and the circuit is simple, so that the apparatus can be easily constructed and plosive sound extraction can be performed in real time.

[Brief explanation of the drawing]

第１図は本発明による破裂音検出装置の実施例
の構成を示すブロツク図、第２図および第３図は
第１図における各部波形図である。１…マイクロホン、２…ローパスフイルタ、３
…ハイパスフイルタ、４…パワー検出器、５…積
分回路、６…処理回路。 FIG. 1 is a block diagram showing the configuration of an embodiment of a plosive sound detection device according to the present invention, and FIGS. 2 and 3 are waveform diagrams of various parts in FIG. 1. 1...Microphone, 2...Low pass filter, 3
...High pass filter, 4...Power detector, 5...Integrator circuit, 6...Processing circuit.

Claims

[Claims]

1. A microphone arranged in front of the mouth to receive oral airflow, a low-pass filter that passes only the low-frequency components of the output signal of the microphone, and a high-pass filter that passes only the high-frequency components of the output signal of the microphone; a power detection circuit that detects the power of the output signal of the high-pass filter; an integration circuit that integrates the output signal of the power detection circuit; and processing that reduces the output of the low-pass filter when the output of the integration circuit becomes large. What is claimed is: 1. A plosive sound extraction device, comprising: a plosive sound extraction device comprising: a plosive sound extraction device;