JPH0223878B2 - - Google Patents
Info
- Publication number
- JPH0223878B2 JPH0223878B2 JP58072420A JP7242083A JPH0223878B2 JP H0223878 B2 JPH0223878 B2 JP H0223878B2 JP 58072420 A JP58072420 A JP 58072420A JP 7242083 A JP7242083 A JP 7242083A JP H0223878 B2 JPH0223878 B2 JP H0223878B2
- Authority
- JP
- Japan
- Prior art keywords
- phoneme
- line
- plane
- oral cavity
- energy curve
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Description
【発明の詳細な説明】
この発明は、人間の音声を検出するための音声
検出方法に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a voice detection method for detecting human voice.
コンピユータの発展に伴い、音声入力装置が提
案されている。この音声入力装置は予め音声波の
特徴を音声記憶部に記憶させ、新らたに入力され
た音声波の特徴を音声記憶部の記憶値と比較して
入力した音声を認識するものである。 With the development of computers, voice input devices have been proposed. This voice input device stores the characteristics of a voice wave in advance in a voice storage section, and recognizes the input voice by comparing the characteristics of the newly input voice wave with the stored values in the voice storage section.
しかしながら、従来の音声認識は鼻腔・口腔出
力が重ね合わされた音声波でもつて音素を識別認
識するものであるため、認識率・認識時間の点で
十分に満足のいけるものではない。 However, conventional speech recognition is not fully satisfactory in terms of recognition rate and recognition time, because it identifies and recognizes phonemes even in speech waves in which nasal and oral cavity outputs are superimposed.
ところで、有声音発声時、声帯振動波は口蓋帆
によつて鼻腔に向うものと口腔に向うものに分割
され、鼻腔、口腔の形状相違などに基づく伝達特
性により修飾を受けて鼻孔と唇から出力される。
この時、鼻腔形状は変化しないが、口蓋帆は動
き、口腔形状は歯、舌などの動きにより変化し、
唇は開閉する。したがつて、声帯から発生し口腔
と鼻腔に分割された声帯振動波は、鼻孔と唇まで
の通路形状の違いによつて種々の修飾を受けるた
め、両者の鼻孔、唇からの出力波形は明瞭に異な
るものとなる。 By the way, when producing a voiced sound, the vocal cord vibration waves are divided by the velum into those directed to the nasal cavity and those directed to the oral cavity, and are output from the nostrils and lips after being modified by transmission characteristics based on the differences in the shapes of the nasal cavity and oral cavity. be done.
At this time, the shape of the nasal cavity does not change, but the velum of the mouth moves, and the shape of the oral cavity changes due to the movement of teeth, tongue, etc.
Lips open and close. Therefore, the vocal cord vibration waves generated from the vocal cords and divided into the oral cavity and nasal cavity are modified in various ways depending on the shape of the passage from the nostrils to the lips, so the output waveforms from both nostrils and lips are distinct. will be different.
また、無声音発声時においても、鼻腔出力が極
めて小さく、口腔出力のみが存在し、両者の出力
波形は明瞭に異なる。 Furthermore, even when producing an unvoiced sound, the nasal cavity output is extremely small, only the oral cavity output exists, and the output waveforms of both are clearly different.
この発明は、口腔と鼻腔から出る音声波が明瞭
に異なる波形を示すことに着目して成したもので
あり、音素の認識を極めて容易にすることを目的
とする。 This invention was made by focusing on the fact that the sound waves emitted from the oral cavity and the nasal cavity exhibit clearly different waveforms, and aims to make recognition of phonemes extremely easy.
この目的達成のため、この発明の音声波検出方
法にあつては、鼻腔からの音声出力を検出すると
同時に口腔からの音声出力を検出し、
鼻腔出力エネルギー曲線と口腔出力エネルギー
曲線の立上り時間差を第1の特徴量とし、口腔出
力エネルギー曲線の立上り時における正規化鼻腔
出力エネルギー曲線と正規化口腔出力エネルギー
曲線の傾斜の比を第2の特徴量とする判別用D−
S平面と、
咽頭腔通過エネルギー曲線の立上り時における
咽頭腔通過エネルギーに対する鼻腔出力エネルギ
ーの比を第1の特徴量とし、発声中における前記
比の時間変化曲線の最大傾斜の値を第2の特徴量
とする判別用O−Δ平面と、
口腔出力音声信号における無音区間の差分零交
叉数を除去した雑音除去差分零交叉数を第1の特
徴量とし、この雑音除去差分零交叉数と鼻腔出力
エネルギー曲線の立上り時間差を第2の特徴量と
する判別用N−D平面と、
を作成し、発せられた音声波のこれらの判別用各
平面上における分布に基づき、
(step1) O−Δ平面上で(/N/)を分離し、
(step2) N−D平面上で(/サ行の音韻/)、
(/破擦音/)、(/摩擦音/)、(/ザ行の音
韻/)、(/マ行の音韻/、/ナ行の音韻/、/
ガ行の音韻/、/ダ行の音韻/、/バ行の音
韻/)の各音韻及び音韻群を分離し、
(step3) O−Δ平面上で(/ヤ行の音
韻/、/ラ行の音韻/、/ワ行の音韻/)の音
韻群を分離し、
(step4) D−S平面上で(/カ行の音
韻/、/タ行の破裂音韻/、/パ行の音韻/)
の音韻群を分離し、
(step5) N−D平面上で(/ア行の音韻/)
と(/ハ行の音韻/)
の音韻の識別を行なう、
上記各stepからなる一連の手順により上記各音
韻群又は音韻を分類するようにしたのである。 In order to achieve this objective, the sound wave detection method of the present invention detects the sound output from the oral cavity at the same time as the sound output from the nasal cavity, and calculates the difference in rise time between the nasal cavity output energy curve and the oral cavity output energy curve. 1, and the second feature is the ratio of the slope of the normalized nasal output energy curve and the normalized oral cavity output energy curve at the rise of the oral cavity output energy curve.
The S plane and the ratio of the nasal cavity output energy to the pharyngeal cavity passage energy at the rise of the pharyngeal cavity passage energy curve are the first feature quantity, and the value of the maximum slope of the time change curve of the ratio during phonation is the second feature. The first feature quantity is the O-Δ plane for discrimination, which is the quantity, and the noise-removed differential zero-crossing number obtained by removing the differential zero-crossing number of silent sections in the oral cavity output audio signal, and the noise-removed differential zero-crossing number and the nasal cavity output. Create a discrimination N-D plane with the rise time difference of the energy curve as the second feature, and based on the distribution of the emitted audio waves on each of these discrimination planes, (step 1) O-Δ plane Separate (/N/) on the top, (step 2) On the N-D plane (/S-line phoneme/),
(/affricate/), (/fricative/), (/the phoneme of the line/), (/the phoneme of the line M/, /the phoneme of the line N/, /
Separate each phoneme and phoneme group (phoneme in the G line /, phoneme in the /DA line /, / phoneme in the B line /), and (step 3) on the O-Δ plane (phoneme in the /Y line/, /RA line). Separate the phoneme group /, phoneme of /Wa line/) on the D-S plane (phoneme of /Ka line/, /plosive phoneme/ of /Ta line, /phoneme of /P line/)
Separate the phoneme group of (step 5) On the N-D plane (/phoneme in row A/)
The above phoneme groups or phonemes are classified by a series of steps consisting of the above steps in which phonemes are identified.
このように構成されるこの発明に係る音声波検
出方法にあつては、まず、不特定多数人の発声音
に基づく予備実験によつて、前記D−S平面、O
−Δ平面、N−D平面を決定し、ある不特定人の
発声音を前記各平面に基づいて各音韻群又は音韻
に分類して、その音声を認識する。 In the audio wave detection method according to the present invention configured as described above, first, a preliminary experiment based on the vocalizations of an unspecified number of people is carried out to determine whether the D-S plane, the O
- The Δ plane and the ND plane are determined, and the utterances of an unspecified person are classified into phoneme groups or phonemes based on the planes, and the speech is recognized.
したがつて、この発明によると、以上のように
構成し、鼻腔・口腔両出力の明瞭に異なる2つの
波形に基づいてその音声を認識するようにしたの
で、音声の認識が極めて正確かつ容易となり、コ
ンピユータ処理する際には、認識率・認識時間及
び記憶容量が著しく向上する。 Therefore, according to the present invention, the voice is recognized based on the two clearly different waveforms of the nasal cavity and oral cavity outputs, so that voice recognition is extremely accurate and easy. , when processing by computer, the recognition rate, recognition time, and storage capacity are significantly improved.
以下、この発明の実施例を添付図面に基づいて
説明する。 Embodiments of the present invention will be described below with reference to the accompanying drawings.
第1図に示すように、人の頭部1に固定される
ヘツドアーム2からは先端にマイク3m,3nを
支持するアーム4が位置調整可能の顔の前面に延
びている。マイク3nは音声の鼻腔出力を検出す
るために鼻孔に向かい、マイク3mは音声の口腔
出力を検出するために口に向つている。なお、上
記マイク3n,3mのいずれか一方は位置調整可
能に設けることが望ましい。また、マイク3mと
3n間には鼻腔出力と口腔出力とを分離する遮断
板5を設けて、両出力が混合することなく各マイ
ク3n,3mに入力することが望ましい。 As shown in FIG. 1, from a head arm 2 fixed to a human head 1, an arm 4 supporting microphones 3m and 3n at its tip extends in front of the face, the position of which is adjustable. The microphone 3n faces the nostril to detect the nasal cavity output of sound, and the microphone 3m faces the mouth to detect the oral cavity output of sound. Note that it is desirable that either one of the microphones 3n and 3m is provided so that its position can be adjusted. Further, it is desirable to provide a shielding plate 5 between the microphones 3m and 3n to separate the nasal cavity output and the oral cavity output, so that both outputs are inputted to each microphone 3n and 3m without being mixed.
上記各マイク3m,3nからの信号は、第2図
に示すようにマルチプレクサ等の切換器6を介し
てAD変換器7に入力され、デイジタル信号に変
換されてコンピユータ8に入力される。コンピユ
ータ8において両マイク3m,3nの出力に基づ
き、発せられた音声の認識処理を行なう。 The signals from each of the microphones 3m and 3n are inputted to an AD converter 7 via a switch 6 such as a multiplexer, as shown in FIG. 2, converted into digital signals, and inputted to a computer 8. The computer 8 performs recognition processing of the emitted voice based on the outputs of both the microphones 3m and 3n.
この認識処理は種々の手段を取り得るが例えば
以下のようにして行なう。 This recognition process can be performed by various means, for example, as follows.
この手段は、マイク3m,3nによつて検出し
て得られた音声出力のエネルギー曲線NEm,NE
nに基づき、下記のD、S、(En/Eo)p′、Δ
(En/Eo)、CR、DTなるパラメータを算出し、
各パラメータにより、D−S平面、(En/Eo)p−
Δ(En/Eo)平面(0−Δ平面)、CR−DT平面
(N−D平面)を得て、この各平面に基づき発声
音を識別するものである。 This means uses the energy curves N E m and N E of the audio output detected by the microphones 3 m and 3 n
Based on n, the following D, S, (En/Eo) p ′, Δ
Calculate the parameters (En/Eo), CR, and DT,
Depending on each parameter, the D-S plane, (En/Eo) p −
The Δ(En/Eo) plane (0-Δ plane) and the CR-DT plane (ND plane) are obtained, and vocalizations are identified based on these planes.
D:NEnとNEmの立上り時間差
S:NEnとNEmの立上り時の傾斜比
En:鼻腔の通過エネルギー
Eo:咽頭腔の通過エネルギー
(En/Eo)p:En/Eo曲線の開始点の値
Δ(En/Eo):En/Eo曲線の最大傾斜値
CR:雑音除去差分零交叉数
DT:雑音除去差分零交叉数とNEnの立上り時間
差
つぎに、上記各平面の作成及びそれに基づく音
韻識別を述べる。D: Rise time difference between N E n and N E m S: Slope ratio at the rise of N E n and N E m En: Energy passing through the nasal cavity Eo: Energy passing through the pharyngeal cavity (En/Eo) p : En/Eo Value at the starting point of the curve Δ(En/Eo): Maximum slope value of the En/Eo curve CR: Noise removal difference zero crossing number DT: Noise removal difference zero crossing number and the rise time difference of N E n Next, each of the above planes This section describes the creation of ``phoneme'' and the phonological identification based on it.
(i) D−S平面
第3図a〜lは、音声(ア)/a/、(カ)/ka/、
(サ)/sa/、(タ)/ta/、(ナ)/na/、
(ハ)/ha/、(マ)/ma/、(ヤ)/ya/、
(ラ)/ta/、(ワ)/wa/、(パ)/pa/、
(ン)/N/をマイク3m,3nで検出して得
られたエネルギー曲線NEm、NEnであり、観
規した音声波エネルギーを各々最大値で正規化
したエネルギーの時間変化曲線である。(i) D-S plane Figure 3 a-l shows the sounds (a) /a/, (f) /ka/,
(sa) /sa/, (ta) /ta/, (na) /na/,
(ha) /ha/, (ma) /ma/, (ya) /ya/,
(la) /ta/, (wa) /wa/, (pa) /pa/,
These are the energy curves N E m and N E n obtained by detecting (n)/N/ with microphones 3m and 3n, and are energy time change curves obtained by normalizing the observed acoustic wave energy by the maximum value. be.
このエネルギー曲線において、/a/、/
ha/ではNEnとNEmが同時に立上り、発声中
のNEnとNEmは同じ変化をしている。/
ka/、/ta/、/pa/ではNEmの立上り時に
破裂気流によるピークが現われ、NEmはNEn
より早く立上つている。/sa/では/s/の区
間でNEmに小さな値(矢印)が現われてい
る。/na/、/ma/ではNEnがNEmより早
く立上り、NEmが増加を始めると同時にNEn
が減少を始める。/ya/、/ra/、/wa/で
はNEnとNEmがほぼ同時に立上るが立上り時
の傾斜はNEnがNEmより大きい。/N/では
口腔出力が極めて小さく、NEmには室内騒音
のエネルギー曲線が現われている。 In this energy curve, /a/, /
In ha/, N E n and N E m rise simultaneously, and N E n and N E m change in the same way during utterance. /
In ka/, /ta/, /pa/, a peak due to burst airflow appears at the rise of N E m, and N E m is N E n
I'm getting up faster. In /sa/, a small value (arrow) appears in N E m in the /s/ interval. In /na/, /ma/, N E n rises earlier than N E m, and N E n starts to increase at the same time as N E m starts increasing.
starts to decrease. For /ya/, /ra/, and /wa/, N E n and N E m rise almost simultaneously, but the slope at the time of rise is larger for N E n than N E m. The oral cavity output is extremely small at /N/, and the energy curve of indoor noise appears at N E m.
以上の各音韻のエネルギー曲線の特徴を表わ
すパラメータとして次式(1)、(2)で定義する遅延
時間Dと傾斜比Sを算出する。 The delay time D and slope ratio S defined by the following equations (1) and (2) are calculated as parameters representing the characteristics of the energy curve of each phoneme.
D=top−tnp ………(1)
S=NEn(tn3)−NEn(tnp)/NEm(tn3)−NE
m(tnp)………(2)
但し、top、tnpはNEn、NEmが各々最大値の
5%点を初めて越えた時刻、tn3はtnpから任意
の時間例えば、19.2msec後の時刻である。式
(1)はNEnとNEmの立上り時間差を、式(2)はNE
mの立上り時におけるNEnとNEmの傾斜の比
を表わす。 D=t op −t np ………(1) S=N E n (t n3 ) − N E n (t np )/N E m (t n3 ) − N E
m (t np )......(2) However, t op and t np are the times when N E n and N E m exceed the 5% point of their maximum values for the first time, and t n3 is an arbitrary time from t np , e.g. , 19.2 msec later. formula
(1) is the rise time difference between N E n and N E m, and equation (2) is N E
It represents the ratio of the slope of N E n and N E m at the time of rise of m.
第4図は第3図における/sa/と/N/を除
く10種の単音節の2つのパラメータを算出して
D−S平面上に発声音の頭文字で例えば/ta/
はTでプロツトしたもので、音声試料は10名の
男性が孤立発声したものである。なお、/sa/
では/s/の区間でNEm値のばらつきが大き
くtnpの検出が不安定となり、また、/N/で
は口腔出力が極めて小さいためtnpが決定でき
ないという理由で除外した。図面においては、
S>3.0の場合、S=3.0の位置にプロツトして
いる。 Figure 4 shows the two parameters of 10 types of monosyllables excluding /sa/ and /N/ in Figure 3 calculated and plotted on the D-S plane with the initial letters of the vocalizations, such as /ta/.
is plotted with T, and the audio samples are isolated vocalizations by 10 men. In addition, /sa/
In the /s/ interval, the variation in the N E m value was large, making the detection of t np unstable, and in the /N/ interval, the oral cavity output was extremely small, so t np could not be determined, so these were excluded. In the drawing,
When S>3.0, it is plotted at the position of S=3.0.
この図によれば/a/、/ha/の遅延時間
Dは小さく、傾斜比Sは1.0を中心に分布す
る。/ka/、/tm/、/pa/ではD>0でS
は小さい。。又、/na/、/ma/ではD<0、
S<0である。/ya/、/ra/、/wa/では
Dが小さく、Sが他の音韻群より大きい。この
音韻ではNEmがNEnの立上りよりややおくれ
る(D<0)音声試料があるが、それらは、
NEnとNEmの概形は第3図とほぼ同じ形状で
あつたが、NEn曲線の最初のピークが早く、
そのピークの頂上付近でNEnの傾きを計算す
ることとなるため、傾斜比Sがやや小さくなつ
たと考える(第4図矢印)。 According to this figure, the delay time D of /a/ and /ha/ is small, and the slope ratio S is distributed around 1.0. For /ka/, /tm/, /pa/, D>0 and S
is small. . Also, for /na/ and /ma/, D<0,
S<0. For /ya/, /ra/, and /wa/, D is small and S is larger than other phoneme groups. In this phoneme, there are speech samples in which N E m is slightly later than the rising edge of N E n (D<0), but these are
The outline shapes of N E n and N E m were almost the same as those in Figure 3, but the first peak of the N E n curve was earlier;
Since the slope of N E n is calculated near the top of the peak, it is thought that the slope ratio S becomes slightly smaller (arrow in Figure 4).
以上により、D、Sを求めることにより各音
韻を数種のグループに分類し得ることが理解で
きる。 From the above, it can be understood that by determining D and S, each phoneme can be classified into several types of groups.
(ii) O−Δ平面
第1図に示すように、咽頭腔9の通過エネル
ギーをEp(t)、鼻腔10と口腔11の出力エネ
ルギーを各々Eo(t)、En(t)又、マイク3
n,3mが観測するエネルギーをEo(in)(t)、
En(in)(t)とした時、まず、観測値Eo(in)
(t)、En(in)(t)からEo(t)/Ep(t)の
時間変化曲線を推定する。( ii ) O - Δ plane As shown in FIG. , microphone 3
The energy observed by n, 3m is E o (in) (t),
When E n (in) (t), first, the observed value E o (in)
(t), E n (in) (t) to estimate the time change curve of E o (t)/E p (t).
エネルギーは声道内で無損失であると仮定す
ると式(3)が成り立つ。 Assuming that energy is lossless within the vocal tract, equation (3) holds true.
Ep(t)=Eo(t)+En(t) ………(3)
又、Co、Cnを放射エネルギーのうち各マイ
クに入る比率とすれば、
Eo(in)(t)=CoEo(t)
En(in)(t)=CnEn(t) ………(4)
となり、式(3)と式(4)より
CoEp(t)=Eo(in)(t)
+(Co/Cn)En(in)(t) ………(5)
が得られる。 E p (t) = E o (t) + E n (t) ...... (3) Also, if C o and C n are the proportions of the radiated energy that enter each microphone, then E o (in) (t )=C o E o (t) E n (in) (t)=C n E n (t) ......(4) From equations (3) and (4), C o E p (t) = E o (in) (t) + (C o /C n ) E n (in) (t) ......(5) is obtained.
ここで、Co/Cnの算出が問題になるが、例え
ば円筒の一端開口部に1個のマイクを配置し、他
端開口部で口及び鼻を覆つてEpを検出するととも
に第1図に示す手段によりEn、Eoを検出し、Ep
=En+EoとなるCo、Cnを算出すればよい。Cn、
Coはマイク3m,3nの位置で変化するため、
固定して行ないEp、En、Eoは複数回の平均値で
比較するとよい。 Here, calculation of C o /C n is a problem, but for example, one microphone is placed in the opening at one end of the cylinder, and the opening at the other end covers the mouth and nose to detect E p and the first microphone. Detect E n and E o by the means shown in the figure, and E p
It is sufficient to calculate C o and C n such that =E n +E o . Cn ,
Since C o changes depending on the position of microphone 3m and 3n,
It is better to fix E p , E n , and E o and compare the average values of multiple times.
このようにして得たCo/Cnに基づき次式(6)を
得る。 Based on C o /C n thus obtained, the following equation (6) is obtained.
Eo/Ep=Eo(t)/Ep(t)
=Eo(in)(t)/Eo(in)(t)+(Co/Cn)En(in
)(t)
………(6)
第5図に上述の発声音/a/…における上式
(6)のEo/Epの時間変化曲線を示す。 E o /E p = E o (t) / E p (t) = E o (in) (t) / E o (in) (t) + (C o /C n ) E n (in
)(t) ......(6) Figure 5 shows the above equation for the vocal sound /a/...
The time change curve of E o /E p in (6) is shown.
このEo/Epの曲線の特徴を表わすパラメー
タとして次式を定義する。 The following equation is defined as a parameter representing the characteristics of this E o /E p curve.
(Eo/Ep)p=Eo(to)/Ep(to) ………(7)
Δ(Eo/Ep)=Max〔Eo(t)/Ep(t)
−Eo(t)′/Ep(t)′〕 ………(8)
ただし、Maxはかぎかつこ内の最大値を意
味する。また、tpはEpが最大値の15%点を初め
て越えた時時刻、t′はtpから任意の時刻tから
ある時間例えば、19.2msec後の時刻である。
式(7)はEo/Ep曲線の左端の値を、また、式(8)
は曲線の最大傾斜を表わしている。第6図は第
5図に示した10名の男性が孤立発声した12種の
単音節の上記2つのパラメータをO−Δ平面に
発声音の頭文字でプロツトしたものである。
(第6図では図を見やすくするため5個の音声
試料のみプロツトした音韻があるが、他の5個
も同様の分布をしている。)
この図によれば12種の単音節が(/a/、/
ka/、/sa/、/ta/、/ha/、/pa/)、
(/na/、/ma/)、(/ya/、/ra/、/
wa/)、(/N/)の4群に分類できる。 (E o / E p ) p = E o (to) / E p (to) ………(7) Δ (E o / E p ) = Max [E o (t) / E p (t) −E o (t)′/E p (t)′] ………(8) However, Max means the maximum value within the hook. Further, t p is the time when E p exceeds the 15% point of the maximum value for the first time, and t' is the time after a certain time, for example, 19.2 msec, from t p .
Equation (7) calculates the leftmost value of the E o /E p curve, and Equation (8)
represents the maximum slope of the curve. FIG. 6 shows the above two parameters of the 12 types of monosyllables uttered in isolation by the 10 men shown in FIG. 5, plotted on the O-Δ plane using the initial letters of the utterances.
(In Figure 6, there are phonemes plotted for only 5 phonetic samples to make the diagram easier to read, but the other 5 phonemes have a similar distribution.) According to this figure, 12 types of monosyllables (/ a/,/
ka/, /sa/, /ta/, /ha/, /pa/),
(/na/, /ma/), (/ya/, /ra/, /
It can be classified into four groups: wa/) and (/N/).
なお、この平面図上では/∫a/、/
za/、/ga/、/da/、/ba/を扱わない
が、/∫a/の分布は/sa/と同じであり、/
za/、/ga/、/da/、/ba/の分布は/
na/に類似しているが分布は広く分類しにく
いからである。 Note that /∫a/, / on this plan view
Although za/, /ga/, /da/, and /ba/ are not treated, the distribution of /∫a/ is the same as /sa/, and /
The distribution of za/, /ga/, /da/, /ba/ is /
This is because although it is similar to na/, the distribution is wide and difficult to classify.
(iii) N−D平面
この平面は/s/、/z/などの摩擦音、/
t∫i/、/tsu/などの破擦音を識別するもので
あり、まず、その1つのパラメータである雑音
除去差分零交叉数(Noise rejected
differential zero crossing rate)について述
べる。(iii) N-D plane This plane is used for fricatives such as /s/, /z/, /
It identifies affricates such as t∫i/ and /tsu/. First, one of its parameters, the number of noise rejected differential zero crossovers,
This section describes differential zero crossing rate.
第7図に示す例えば/su/の口腔出力音声波
形において、ある点における口腔出力音声信号
を{xi}とするとき、雑音除去差分零交叉数を
CRを次式(9)で定義する。 For example, in the oral cavity output speech waveform of /su/ shown in FIG. 7, when the oral cavity output speech signal at a certain point is {xi}, the noise removal difference zero crossing number is
CR is defined by the following equation (9).
{(xi+1−xi)(xi−xi−1)}<0、かつ{|
xi+1|しきい値または|xi|>しきい値また
は|xi−1|>しきい値}ならばサンプル点i
において雑音除去差分零交叉が1回あつたと
し、この零交叉をある区間内で合計したもの。{(xi+1−xi)(xi−xi−1)}<0 and {|
If xi + 1 | threshold or | xi | > threshold or | xi − 1 | > threshold, then sample point i
Assuming that there is one noise removal difference zero crossing in , this is the sum of these zero crossings within a certain interval.
(9)
第8図a乃至fに/u/、/su/、/
zu/、/tsu/、/hu/、/nu/の発声音のCR
と時間の関係を示す。 (9) In Figure 8 a to f, /u/, /su/, /
CR of vocalizations of zu/, /tsu/, /hu/, /nu/
shows the relationship between and time.
つぎに、もう1つのパラメータとして、次式(10)
で示すCRとNEnの立上り時間差DT(Delay
time)を定義する。 Next, as another parameter, the following equation (10)
The rise time difference DT (Delay
time).
Delay time=top−zp
但し、Dalay time<0のとき
Deley time=top−tnp (10)
ここで、NEn、NEmの立上り時刻を各々top、
tnp、雑音除去差分零交叉数の立上りを、その交
叉数が9回を初めて越えた時刻zpとする。第8図
a乃至fに鼻腔、口腔出力エネルギーの正規化時
間変化曲線NEn、NEmを示す。 Delay time=t op −z pHowever , when Delay time<0, Delay time=t op −t np (10) Here, the rising times of NEn and NEm are t op and
t np , the rise of the noise removal difference zero crossing number is defined as the time z p when the number of crossings exceeds 9 times for the first time. FIGS. 8a to 8f show normalized time change curves N E n and N E m of nasal cavity and oral cavity output energy.
第9図は、雑音除去差分零交叉数〔N.R.−D.
Z.C.R.〕(CR)を縦軸に雑音除去差分零交叉数
CRとNEnの立上り時間差(Delay time)を横軸
にとつた平面(N−D平面)に後続母音別に各単
音節を発声音の頭文字でプロツトしたものであ
る。 Figure 9 shows the noise removal differential zero crossing number [NR-D.
ZCR] (CR) is the noise removal difference zero crossing number on the vertical axis.
Each monosyllable is plotted by the initial letter of the uttered sound for each subsequent vowel on a plane (ND plane) with the delay time between CR and N E n as the horizontal axis.
この図によると、単音節/s/、/∫/、/
z/、/h/、(/n/、/m/、/g/、/
d/、/b/)及び母音に分類できることが確認
できる。 According to this diagram, the monosyllables /s/, /∫/, /
z/, /h/, (/n/, /m/, /g/, /
d/, /b/) and vowels.
なお、この図において、/ka/、/ta/、/
pa/、/ya/、/ra/、/wa/、/N/を扱つ
ていないが、/ka/、/ta/、/pa/の分布
は/a/、/ha/の間にあり、/ya/、/
ra/、/wa/の分布は/a/と同じで、分類上
不都合であり、また、/N/は口腔出力が存在し
ないためN−D平面上にプロツトすることができ
ないからである。 In this figure, /ka/, /ta/, /
Although pa/, /ya/, /ra/, /wa/, and /N/ are not treated, the distribution of /ka/, /ta/, and /pa/ is between /a/ and /ha/. , /ya/, /
The distribution of ra/ and /wa/ is the same as that of /a/, which is inconvenient for classification, and /N/ cannot be plotted on the ND plane since there is no oral output.
以上で各平面図の作成方法を述べたが、つぎに
これらの平面を使用して音韻認識したアルゴリズ
ムの一例を示す。 The method for creating each plan view has been described above, and next, an example of an algorithm for phoneme recognition using these planes will be shown.
このアルゴリズムの一例は、音韻/a/、/
ka/、/sa/、/ta/、/na/、/ha/、/
ma/、/ya/、/ra/、wa/、/pa/、/
∫a/、/za/、/ga/、/da/、/ba/、/
N/を識別するものであり、第10図に示す識別
フローチヤートによつて行なう。 An example of this algorithm is the phonemes /a/, /
ka/, /sa/, /ta/, /na/, /ha/, /
ma/, /ya/, /ra/, wa/, /pa/, /
∫a/, /za/, /ga/, /da/, /ba/, /
This is to identify N/, and is carried out according to the identification flowchart shown in FIG.
step1ではO−Δ平面上で/N/を分離識別す
る。これはO−Δ平面上での/N/の分布が非常
に顕著であり、最初に他の音韻から分離しておく
ことが適切であることによる。 In step 1, /N/ is separated and identified on the O-Δ plane. This is because the distribution of /N/ on the O-Δ plane is very prominent, and it is appropriate to first separate it from other phonemes.
step2ではN−D平面上で/sa/、/∫a/、/
za/、(/ma/、/na/、/ga/、/da/、/
ba/)の4群を分離識別する。 In step 2, /sa/, /∫a/, / on the N-D plane
za/, (/ma/, /na/, /ga/, /da/, /
Separate and identify the four groups of ba/).
step3では再びO−Δ平面上で(/ya/、/
ra/、/wa/)の1群を分離識別する。 In step 3, on the O-Δ plane again (/ya/, /
ra/, /wa/) is separated and identified.
step4ではD−S平面上で(/a/、/ha/)
と(/ka/、/ta/、/pa/)の分離を行なう。 In step 4, on the D-S plane (/a/, /ha/)
and (/ka/, /ta/, /pa/) are separated.
step5では再びN−D平面上で/a/と/ha/
の識別を行なう。 In step 5, /a/ and /ha/ are again on the N-D plane.
Identification is performed.
以上の5段階の処理によつて17種の音韻を9群
に分類する。このように5段階の構成をとる理由
は各平面上で他の音韻群から顕著に分離している
音韻を先に分離識別する方法を採用していること
による。 The 17 types of phonemes are classified into 9 groups through the above 5 steps of processing. The reason for this five-stage configuration is that a method is used to first separate and identify phonemes that are significantly separated from other phoneme groups on each plane.
以上は、母音及び後続母音が/a/のもの、す
なわちア母音列(ア、カ、サ、タ…)の音韻識別
であつたが、母音/a/を母音/e/又は/o/
に置き換え、後続母音/a/を後続母音/e/又
は/o/に置き換えれば、同様にして、エ母音
列、オ母音列の音韻群又は音韻を分類することが
できる。また、母音/a/を母音/i/又は/
u/に置き換え、後続母音/a/を後続母音/
i/又は/u/に置き換え、破擦音/t∫i/及
び/tsu/の音韻をstep2で分離すれば、同様にし
て、イ母音列、ウ母音列の各音韻群又は音韻を分
類することができる。この破擦音が分類できるこ
とは第9図から確認できる(図中、Tが破擦音で
あり、同図b/t∫i/、同図cが/tsu/)。 Above, the vowel and the following vowel were /a/, that is, the phonetic identification of the A vowel string (a, ka, sa, ta...), but the vowel /a/ was replaced with the vowel /e/ or /o/.
By replacing the following vowel /a/ with the following vowel /e/ or /o/, the phoneme groups or phonemes of the E vowel string and the O vowel string can be classified in the same way. Also, the vowel /a/ can be replaced with the vowel /i/ or /.
Replace the following vowel /a/ with the following vowel /
If the affricate /t∫i/ and /tsu/ are replaced with i/ or /u/ and the phonemes of /t∫i/ and /tsu/ are separated in step 2, each phoneme group or phoneme of the i vowel string and the u vowel string can be classified in the same way. be able to. It can be confirmed from FIG. 9 that this affricate can be classified (in the figure, T is an affricate, b/t∫i/ in the figure, and /tsu/ in c in the figure).
この様にして多群に分類された各音韻群の中に
おいて、従来から行なわれている周知な認識手
法、例えばスペクトルの重心周波数・ピーク周波
数・谷周波数およびそれらの時間変化に基づく認
識手法により各音韻を識別し、最終的な判定を下
す。 Within each phoneme group classified into multiple groups in this way, each phoneme is identified using a conventionally well-known recognition method, such as a recognition method based on the centroid frequency, peak frequency, and valley frequency of the spectrum and their temporal changes. Identify the phonology and make the final judgment.
前記実施例は、鼻腔出力と口腔出力を分離する
遮蔽板を設けたものであつたが、遮蔽板を設けな
い場合には例えば第11図に示すように、エネル
ギー曲線NEm、NEnにおいて、最大値の35%点
と15%点とを直線で結び、この直線と時間軸との
交点をtop、(tnp)とするなどの補正をしてD−S
平面、O−Δ平面、N−D平面を作成すればよ
い。 In the above embodiment, a shielding plate was provided to separate the nasal cavity output and the oral cavity output, but in the case where the shielding plate is not provided, the energy curves N E m and N E n as shown in FIG. 11, for example. , connect the 35% point and 15% point of the maximum value with a straight line, and make corrections such as setting the intersection of this straight line and the time axis as t op , (t np ), and calculate D-S.
What is necessary is to create a plane, an O-Δ plane, and an N-D plane.
なお、上記音声の認識において、口の動きを検
出するカメラ等を用いた検出器を設け、この検出
器とこの発明の検出方法との組合わせで検出すれ
ば、より正確に識別できる。 In the above-mentioned speech recognition, more accurate identification can be achieved by providing a detector using a camera or the like that detects mouth movements and performing detection in combination with this detector and the detection method of the present invention.
第1図はこの発明の一例を示す説明図、第2図
はこの発明を利用する制御ブロツク図、第3図a
〜lは時間とエネルギー分布を示すグラフ、第4
図はD−S平面を示すグラフ、第5図a〜lは
Eo/Epの時間変化曲線を示すグラフ、第6図は
O−Δ平面を示すグラフ、第7図は/su/の口腔
出力音声波形図、第8図a乃至fは鼻腔、口腔出
力エネルギーの正規化時間変化曲線及び口腔出力
の雑音除去差分零交叉数の時間変化曲線のグラ
フ、第9図a〜eはN−D平面を示すグラフ、第
10図は音声認識の一例を示すフローチヤート、
第11図は補正例を示すグラフである。
3m,3n……マイク、4……支持アーム。
FIG. 1 is an explanatory diagram showing an example of this invention, FIG. 2 is a control block diagram using this invention, and FIG. 3 a
~l is a graph showing time and energy distribution, the fourth
The figure is a graph showing the D-S plane, and Figures a to l are
A graph showing the time change curve of E o /E p , Fig. 6 is a graph showing the O-Δ plane, Fig. 7 is an oral output speech waveform diagram of /su/, and Fig. 8 a to f are nasal cavity and oral cavity outputs. Graphs of the normalized energy time change curve and the time change curve of the noise removal difference zero crossover number of the oral cavity output, Figures 9a to 9e are graphs showing the N-D plane, and Figure 10 is a flow chart showing an example of speech recognition. Chart,
FIG. 11 is a graph showing an example of correction. 3m, 3n...Microphone, 4...Support arm.
Claims (1)
からの音声出力を検出し、 鼻腔出力エネルギー曲線と口腔出力エネルギー
曲線の立上り時間差を第1の特徴量とし、口腔出
力エネルギー曲線の立上り時における正規化鼻腔
出力エネルギー曲線と正規化口腔出力エネルギー
曲線の傾斜の比を第2の特徴量とする判別用D−
S平面と、 咽頭腔通過エネルギー曲線の立上り時における
咽頭腔通過エネルギーに対する鼻腔出力エネルギ
ーの比を第1の特徴量とし、発声中における前記
比の時間変化曲線の最大傾斜の値を第2の特徴量
とする判別用O−Δ平面と、 口腔出力音声信号における無音区間の差分零交
叉数を除去した雑音除去差分零交叉数を第1の特
徴量とし、この雑音除去差分零交叉数と鼻腔出力
エネルギー曲線の立上り時間差を第2の特徴量と
する判別用N−D平面と、 を作成し、発せられた音声波のこれらの判別用各
平面上における分布に基づき、 (step1) O−Δ平面上で(/N/)を分離し、 (step2) N−D平面上で(/サ行の音韻/)、
(/破擦音/)、(/摩擦音/)、(/ザ行の音
韻/)、(/マ行の音韻/、/ナ行の音韻/、/
ガ行の音韻/、/ダ行の音韻/、/バ行の音
韻/)の各音韻及び音韻群を分離し、 (step3) O−Δ平面上で(/ヤ行の音
韻/、/ラ行の音韻/、/ワ行の音韻/)の音
韻群を分離し、 (step4) D−S平面上で(/カ行の音
韻/、/タ行の破裂音韻/、/パ行の音韻/)
の音韻群を分離し、 (step5) N−D平面上で(/ア行の音韻/)
と(/ハ行の音韻/) の音韻の識別を行なう、 上記各Stepからなる一連の手順により上記各
音韻群又は音韻を分類することを特徴とする音声
波検出方法。[Scope of Claims] 1. Detect the audio output from the oral cavity at the same time as the audio output from the nasal cavity, and set the rise time difference between the nasal cavity output energy curve and the oral cavity output energy curve as the first feature quantity, and set the difference in the rise time of the nasal cavity output energy curve and the oral cavity output energy curve Discrimination D- whose second feature is the ratio of the slopes of the normalized nasal output energy curve and the normalized oral cavity output energy curve at the time of rising.
The S plane and the ratio of the nasal cavity output energy to the pharyngeal cavity passage energy at the rise of the pharyngeal cavity passage energy curve are the first feature quantity, and the value of the maximum slope of the time change curve of the ratio during phonation is the second feature. The first feature quantity is the O-Δ plane for discrimination, which is the quantity, and the noise-removed differential zero-crossing number obtained by removing the differential zero-crossing number of silent sections in the oral cavity output audio signal, and the noise-removed differential zero-crossing number and the nasal cavity output. Create a discrimination N-D plane with the rise time difference of the energy curve as the second feature, and based on the distribution of the emitted audio waves on each of these discrimination planes, (step 1) O-Δ plane Separate (/N/) on the top, (step 2) On the N-D plane (/S-line phoneme/),
(/affricate/), (/fricative/), (/the phoneme of the line/), (/the phoneme of the line M/, /the phoneme of the line N/, /
Separate each phoneme and phoneme group (phoneme in the G line /, phoneme in the /DA line /, / phoneme in the B line /), and (step 3) on the O-Δ plane (phoneme in the /Y line/, /RA line). Separate the phoneme group /, phoneme of /Wa line/) on the D-S plane (phoneme of /Ka line/, /plosive phoneme/ of /Ta line, /phoneme of /P line/)
Separate the phoneme group of (step 5) On the N-D plane (/phoneme in row A/)
and (/phoneme in line C/) A speech wave detection method characterized in that each of the phoneme groups or phonemes is classified by a series of steps consisting of the steps described above, in which phonemes are identified.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP7242083A JPS59197100A (en) | 1983-04-23 | 1983-04-23 | Voice wave detector |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP7242083A JPS59197100A (en) | 1983-04-23 | 1983-04-23 | Voice wave detector |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| JPS59197100A JPS59197100A (en) | 1984-11-08 |
| JPH0223878B2 true JPH0223878B2 (en) | 1990-05-25 |
Family
ID=13488770
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP7242083A Granted JPS59197100A (en) | 1983-04-23 | 1983-04-23 | Voice wave detector |
Country Status (1)
| Country | Link |
|---|---|
| JP (1) | JPS59197100A (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| ATE45831T1 (en) * | 1983-05-18 | 1989-09-15 | Speech Systems Inc | VOICE RECOGNITION SYSTEM. |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS5011505A (en) * | 1973-05-09 | 1975-02-06 |
-
1983
- 1983-04-23 JP JP7242083A patent/JPS59197100A/en active Granted
Also Published As
| Publication number | Publication date |
|---|---|
| JPS59197100A (en) | 1984-11-08 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Nirgianaki | Acoustic characteristics of Greek fricatives | |
| US20050171774A1 (en) | Features and techniques for speaker authentication | |
| Truong et al. | Automatic pronunciation error detection: an acoustic-phonetic approach | |
| Nandwana et al. | A new front-end for classification of non-speech sounds: a study on human whistle | |
| JPH0223878B2 (en) | ||
| Demolin et al. | Whispery voiced nasal stops in rwanda. | |
| Denzer-King | The acoustics of uvulars in Tlingit | |
| JP2006154212A (en) | Voice evaluation method and evaluation apparatus | |
| Frid et al. | Acoustic-phonetic analysis of fricatives for classification using SVM based algorithm | |
| Pickett | Sound patterns of speech: An introductory sketch | |
| Jayan et al. | Automated detection of transition segments for intensity and time-scale modification for speech intelligibility enhancement | |
| Jijomon et al. | An offline signal processing technique for accurate localisation of stop release bursts in vowel-consonant-vowel utterances | |
| Maddela et al. | Phonetic–Acoustic Characteristics of Telugu Lateral Approximants | |
| Sun | Analysis and interpretation of glide characteristics in pursuit of an algorithm for recognition | |
| Signorello et al. | Aerodynamic Features of French Fricatives. | |
| Tran et al. | Predicting F0 and voicing from NAM-captured whispered speech | |
| Ali et al. | Formants based analysis for speech recognition | |
| JPS60166995A (en) | Voice wave detection | |
| JPH036519B2 (en) | ||
| JPH036520B2 (en) | ||
| JP2006284907A (en) | Phoneme segmentation method and apparatus | |
| JP4595124B2 (en) | Discrimination apparatus and method for audio signal and non-audio signal | |
| de Haya | Spectral study with automatic formant extraction to improve non-native pronunciation of English vowels | |
| Nakazato et al. | Speech Signal Processing Using Consonant-Vowel Location Detection | |
| JP2557497B2 (en) | How to identify male and female voices |