JPS6043697A

JPS6043697A - Consonant and vowel boundary detection device

Info

Publication number: JPS6043697A
Application number: JP58152034A
Authority: JP
Inventors: 三船　義照
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1983-08-19
Filing date: 1983-08-19
Publication date: 1985-03-08
Also published as: JPH0534678B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】産業上の利用分野本発明は単音節認識における子音と母音の境界を検出す
る境界検出装置に関する。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a boundary detection device for detecting boundaries between consonants and vowels in monosyllable recognition.

従来例の構成とその問題点従来の単音節認識装置における子音と母音の境界の検出
のだめの処理構成は、語頭の特徴ベクトルから語尾の方
向に向かって順次フレーム間距離Ｄ＝ｄｌＳ（ｘｔ　ｉ
　＋ｘｔｌ−１−１）ヲ計算シ、７＋／−ム間距離りが
一定の閾値以上になるフレームを子音と母音の境界とし
ていた。しかしこのような構成では、やや処理時間は早
くなるものの、元来子音と母音の境界付近では非定常な
わたり部分が存在するためにフレーム間距離りが変動を
受け易く、検出精度に大きな問題が有り著しい認識率の
低下の原因となっていた。そこでやや検出精度を向上さ
せる方式においては、語頭と語尾の音声区間の検出を終
了した後に、前記音声区間の定常部を母音標準パターン
との正規化相関係数をフレーム毎に計算し一定の閾値以
上となる連続したフレームとして検出し、前記定常部の
平均ベクトルを計算し、順次語頭の方向に正規化相関係
数を計算しこの値が一定の閾値以下になるフレームを子
音と母音の境界としていた。しかしこの方式の場合にお
いても、単音節の子音部が母音部の入シゎたり部１で大
きく影響を与えるために、摩擦音／ｓ／ｃ／ｈ／と破裂
音／ｐ／ｌ／に／や／ｂ／ｄ／ｑ／　さらに鼻子音／／
ｒｒ１／／ｒ１／／ｑ／との間には、前記正規化相関係
数の閾値に変動が有り、かつ話者の影響が有る等実用性
に問題が有った。Configuration of the conventional example and its problems The processing configuration for detecting the boundary between consonants and vowels in the conventional monosyllable recognition device is to sequentially calculate the interframe distance D=dlS(xt i
+xtl-1-1) The frame in which the distance between 7 +/- is equal to or greater than a certain threshold was defined as the boundary between a consonant and a vowel. However, with this configuration, although the processing time is slightly faster, the inter-frame distance is susceptible to fluctuations due to the presence of unsteady transitions near the boundaries between consonants and vowels, which poses a major problem in detection accuracy. This was the cause of a significant decrease in the recognition rate. Therefore, in a method that slightly improves the detection accuracy, after completing the detection of the speech sections at the beginning and end of a word, the normalized correlation coefficient between the stationary part of the speech section and the vowel standard pattern is calculated for each frame, and a fixed threshold value is calculated. The average vector of the stationary part is calculated, and the normalized correlation coefficient is sequentially calculated in the direction of the beginning of the word.The frame where this value is below a certain threshold is determined as the boundary between a consonant and a vowel. there was. However, even in this case, since the consonant part of a monosyllable has a large influence on the vowel part 1, the fricative /s/c/h/ and the plosive /p/l/ are affected by / and so on. /b/d/q/ more nasal consonants//
rr1//r1//q/ has problems in practicality, such as variations in the threshold value of the normalized correlation coefficient and influence of the speaker.

発明の目的本発明は上記従来の問題点を解消するもので、わたり部
の変動や、子音の種別による母音入りわたり部の変動を
、母音定常部の特徴ベクトルの最大値を与えるチャネル
から最小値を与えるチャネルの中で上位の複数チャネル
についてのみ前記正規化相関係数を計算すること腎より
、母音部のフォルマント周波数のトレースを行って、子
音部の母音区間に与える影響や話者による影響を取り除
き、子音と母音の境界検出精度を安定して向上させ、か
つ実時間処理を可能とする子音と母音の境界検出装置を
提供することを目的とする。Purpose of the Invention The present invention solves the above-mentioned conventional problems.The present invention is aimed at solving the above-mentioned problems of the conventional art. Calculate the normalized correlation coefficient only for the upper channels among the channels that give It is an object of the present invention to provide a consonant-vowel boundary detection device that can stably improve consonant-vowel boundary detection accuracy and enable real-time processing.

発明の構成本発明は、入力音声の電力系列によって語頭を検出し、
各フレーム（ｘｔｌ）について前もって記憶した標準パ
ターンである母音（Ａ／　ｉ　／Ｕ／Ｅ　１０／Ｘ　）
との距離を計算しその距離が一定の閾値θ７より小さく
なるフレームが一定区間以上連続する場合にそのフレー
ムの区間［１ｓｓｓ〜１．ｅｓｓｌを入力音声の母音定
常区間とし、前記母音定常区間の中心）（Ｎ：平均フレ
ーム数、ｊ：入力音声の母音定常部の中心フレーム）を
計算し、前記平均ベクトルＸａｖのチャネル数をＭとす
ると、最大値を持つチャネルから上位のＬチャネル（Ｌ
＜Ｍ）を検出しておき、このＬ個のチャネルについて、
前記平均ベクトルｘａｖと定常部以前のフレームｍ（ｍ
＜ｉ、８Ｂ）との正規化相関係数Ｃ０正規化相関係数Ｃ０が前もって記憶した閾値θ。。１（
θ。。１〈１）以下になるフレーム１゜。１を検出し、
さらに前もって記憶した閾値θ。。２（θ。。２〈θ。Structure of the Invention The present invention detects the beginning of a word based on the power sequence of input speech,
Vowels (A/i/U/E 10/X) which are standard patterns memorized in advance for each frame (xtl)
Calculate the distance to the frame, and if there are consecutive frames for which the distance is smaller than a certain threshold θ7 for a certain period or more, the frame period [1sss to 1. Let essl be the vowel stationary section of the input voice, calculate the center of the vowel stationary section) (N: average number of frames, j: center frame of the vowel stationary section of the input voice), and let M be the number of channels of the average vector Xav. Then, the upper L channel (L
<M), and for these L channels,
The average vector xav and the frame m before the stationary part (m
<i, 8B) Normalized correlation coefficient C0 The threshold value θ that the normalized correlation coefficient C0 is stored in advance. . 1(
θ. . 1〈1) Frame 1゜ is less than or equal to 1〈1〉. 1 is detected,
Furthermore, a threshold value θ is stored in advance. . 2(θ..2〈θ.

。１〈１）以下になるフレーム１゜０２　”検出し、前
記フレーム１゜。１とフレーム１ｃ０２の中心フレーム
１ＣＯＩＣＯ−”ＣＯＩ＋１ＣＯ２１）を子音と母音の
境界として検出し、電力系列による語尾の検出をもって
処理の終了とすることによって、子音と母音のわたり部
の変動や子音の種別による母音入りわたり部の変動を吸
収し、話者による母音定常区間の変動も吸収することに
より子音と母音の境界の検出精度を安定して向上させか
つ処理時間を短縮し、単音節音声認識の認識率の改善と
実時間処理を達成し、実用化を図ったものである。. 1〈1) or less, frame 1゜02'' is detected, the center frame 1COICO-''COI+1CO21) of the frame 1゜.1 and frame 1c02 is detected as the boundary between a consonant and a vowel, and the end of the word is detected using the power sequence and processed. The boundary between consonants and vowels can be detected by absorbing fluctuations in the transition between consonants and vowels and fluctuations in the transition between vowels depending on the type of consonant, as well as fluctuations in the constant vowel interval depending on the speaker. The aim is to stably improve accuracy, shorten processing time, improve the recognition rate of monosyllabic speech recognition, achieve real-time processing, and put it into practical use.

実施例の説明第１図は本発明の一実施例における子音と母音の境界検
出装置のブロック図を示すものである。DESCRIPTION OF THE EMBODIMENTS FIG. 1 shows a block diagram of a consonant-vowel boundary detection device according to an embodiment of the present invention.

入力部ｆｉ、Ａ／Ｄ変換器１と、入力音声信号の語頭と
語尾を例えば電力系列の変化を一定の閾値によって検出
する音声区間検出手段２と、音声時系列から一定時間間
隔ごとに例えばフィルタ・バンクの出力系列あるい１ｄ
ＬＰｃ係数の出力系列等の特徴系列（Ｘｔｉ］に変換す
る特徴系列変換手段３と、特徴系列（Ｘｔｉｌを一定区
間記憶する特徴系列記憶部４からなる。５は大刀系列の
特徴ベクトル（Ｘｔｉ）と母音標準パターン５″との距
離を距離計算手段５′によって計算し、一定の閾値以下
の区間を母音定常区間とする定常部検出手段である。An input section fi, an A/D converter 1, a voice section detection means 2 that detects the beginning and end of a word of an input voice signal, for example, by using a certain threshold value to detect a change in the power series, and a filter that detects, for example, a change in the power series at a certain time interval from the voice time series.・Bank output series or 1d
It consists of a feature series conversion means 3 that converts into a feature series (Xti) such as an output series of LPc coefficients, and a feature series storage unit 4 that stores the feature series (Xtil) over a certain interval. 5 is a feature vector (Xti) of a large sword series. The distance from the vowel standard pattern 5'' is calculated by the distance calculation means 5', and the section where the distance is equal to or less than a certain threshold value is defined as a constant vowel section.

６は定常部検出手段５によって検出された母音定常区間
の中心フレーム付近の数フレームから母音定常部平物ベ
クトル（ｘａｖ）６′を計算する母音定常区間均ベクト
ル計算手段である。７は母音定常部平物ベクトル（Ｘａ
ｖ１６’の全チャネルからピーク値を持つＬ個のチャネ
ルを検出する、平均ベクトルピークチャネル検出手段で
ある。子音と母音゛の境界検出部は、特徴系列記憶部４
に記憶された母音定常区間以前の特徴ベクトル（Ｘｔｍ
ｌと母音定常部平物ベクトル（Ｘａｖ）６′との正規化
相関係数を、平均ベクトルピークチャネル検出手段７で
、検出したチャネルのみにおいて計算する正規化係数計
算手段８と、正規化相関係数が一定の閾値θＣｏ１９″
と閾値θ。。ｇ／／／以下となるフレームｌ。。１゜１
ｃｏ２を検出する比較手段９′を具備し、前記フレーム
１ｃｏ１　と１ｃｏ２の相加平均（（１ｃｏ１　”　ｃ
ｏ２　Ｖ２）を計算し、境界フレーム信号９“とじて出
力する境界フレーム検出手段９からなる０１０は音声の語頭を音声区間検出手段２より音声開始１
１として入力した後、特徴系列変換手段３からのベクト
ル系列（Ｘｔ工）３′を特徴系列記憶部４に記憶する指
示（図示せず）を出力する一方、定常部検出手段５から
の定常区間開始１５及び定常区間終了１６の信号を入力
し、特徴系列記憶部４に定常フレーム指示１３を出力し
て母音定常区間均ベクトル計算手段６によって母音定常
部平均ベクトルｔＸａｖ）６”＆計算させ、さらに平均
ベクトルピークチャネル検出手段７によって、母音定常
部平均ベクトル（ｘａｖ）６′のピークチャネルを検出
し、このＬチャネルについて、母音定常平均ベクトル（
ｘａｖ）６′と定常区間開始１５以前の特徴ベクトル（
ｘｔｍｌを比較シレーム指示１４で与えることによって
正規化相関係数を正規化相関係数計算手段８によってめ
、境界フレーム検出手段９によって検出した境界フレー
ム信号９″“を入力し、子音母音境界フレーム１７を出
力する総合制御手段である。Reference numeral 6 denotes a vowel stationary section average vector calculating means for calculating a vowel stationary section average vector (xav) 6' from several frames near the center frame of the vowel stationary section detected by the stationary section detecting means 5. 7 is the vowel stationary part flat vector (Xa
This is an average vector peak channel detection means that detects L channels having peak values from all channels of v16'. The boundary detection unit between a consonant and a vowel is a feature series storage unit 4.
The feature vector before the vowel stationary section (Xtm
The normalization coefficient calculation means 8 calculates the normalized correlation coefficient between l and the vowel stationary part normal vector (Xav) 6' only in the detected channel by the average vector peak channel detection means 7, and the normalization correlation coefficient Threshold value θCo19″ with a constant number
and threshold θ. . Frame l that is less than or equal to g///. . 1゜1
co2, the arithmetic mean of the frames 1co1 and 1co2 ((1co1''c
o2 V2) and outputs the boundary frame signal 9 by dividing it into a boundary frame signal 9.
1, then outputs an instruction (not shown) to store the vector sequence (Xt) 3' from the feature sequence conversion means 3 in the feature sequence storage section 4, while The signals of the start 15 and the end of the stationary section 16 are input, the stationary frame instruction 13 is outputted to the feature sequence storage section 4, and the vowel stationary section average vector calculation means 6 calculates the vowel stationary section average vector tXav)6''&. The average vector peak channel detection means 7 detects the peak channel of the vowel stationary part average vector (xav) 6', and for this L channel, the vowel stationary average vector (
xav) 6' and the feature vector before the start of the steady interval 15 (
xtml by the comparison sirem instruction 14, the normalized correlation coefficient is calculated by the normalized correlation coefficient calculation means 8, and the boundary frame signal 9'' detected by the boundary frame detection means 9 is input, and the consonant/vowel boundary frame 17 is inputted. It is a comprehensive control means that outputs.

以上のように構成された本実施例の境界検出精度につい
て、以下第２図を用いてその動作を説明する。同図にお
いて、入力単音節音声（Ｃ＋Ｖ）の原波形を１８に示し
ている。特徴系列変換手段３の出力としてのベクトル系
列［Ｘ、＋３’を３“に示している。入力信号の特徴ベ
クトル３″に語頭の音声開始１１と語尾の音声終了１２
を示し、また定常部検出手段５によって検出される定常
区間開始１５と定常区間終了１６に挾１れた母音定常区
間を１９に示す。母音定常区間１９の中心付近の数フレ
ームの平均値によってまる母音定常平均ベクトル（Ｘａ
ｖ）ｅ’を６“に示す。平均ベクトルピークチャネル検
出手段７によって検出されるＬ個のピークチャネルもク
ロメノ・ソチによって示す。The operation of the boundary detection accuracy of this embodiment configured as described above will be explained below using FIG. 2. In the figure, the original waveform of the input monosyllabic voice (C+V) is shown at 18. The vector sequence [X, +3' as the output of the feature sequence conversion means 3 is shown in 3''.The feature vector 3'' of the input signal includes the beginning of the word's initial voice 11 and the voice end of the word's end 12.
, and a vowel steady section sandwiched between the steady section start 15 and the steady section end 16 detected by the steady section detecting means 5 is shown at 19. A vowel stationary average vector (Xa
v) e' is shown as 6". The L peak channels detected by the average vector peak channel detection means 7 are also shown by Chromeno-Sochi.

母音定常平均ベクトル（、Ｘａ、１６’と定常区間開始
以前のフレームとのＬ個のチャネルにおける正規化相関
係数の計算の様子を２０に示す。語頭方向へ向う正規化
相関係数値の変化する様子と閾値θ　９″と閾値θ。。Figure 20 shows how the normalized correlation coefficients are calculated in L channels between the vowel stationary average vector (, Xa, 16' and frames before the start of the stationary interval. Condition, threshold value θ 9″, and threshold value θ.

ｇ／／／による境界フレーム信号０１９“の出力の様子を２１に示す。Boundary frame signal 01 by g/// 9" is shown in 21.

入力音声は、Ａ／Ｄ変換器１によってディジタル系列に
変換され、特徴系列変換手段３によって一定時間間隔ご
とに特徴系列３′に変換される。また音声区間検出手段
２によって電力系列から音声開始１１が検出されると、
特徴系列記憶部４において一定区間ごとの特徴系列の記
憶が開始され３″。Input speech is converted into a digital sequence by the A/D converter 1, and converted into a characteristic sequence 3' at regular time intervals by the characteristic sequence converting means 3. Further, when the voice section detection means 2 detects the voice start 11 from the power series,
The feature series storage unit 4 starts storing feature series for each predetermined section 3''.

同時に定常部検出手段５によって、各フレームの母音標
準パターン５“との距離を距離削算手段５′によって計
算し、一定の閾値以下になる区間を母音定常区間として
定常区間開始１５と定常区間終了１６の信号として検出
する。初音定常部平均ベクトル計算手段６は、母音定常
区間の特徴系列記憶部４の特徴ベクトルについて、母音
定常部平均ベクトル（ｘａｖ）６′を計算し、平均ベク
トルビーク検出手段７によって検出した母音定常部平均
ベクトル（ｘａｖ）６′のＬ個のピークチャネルについ
てのみ、定常区間開始１５以前のフレームの特徴ベクト
ル（ｘｔ工：　ｉ　＜　ｍ　）と母音定常部平均ベクト
ルの、正規化相関係数値を正規化相関係数計算手段８に
よってめ２０，２１．一定の閾値θ。。１９″以下およ
びθ。。ｇ／７／以下にはじめてなるフレーム’ｃｏｊ
　＋１ｃｏ２　を検出し、その相加平均を境界フレーム
信号９″′として検出し、子音母音境界フレーム出力１
７とする。語尾の検出は上記の処理と並行して音声区間
検出手段２によって行い、音声終了１２と見方し、次の
発声の待機状態とする。この並列処理によって実時間処
理を可能にしている。At the same time, the distance from the vowel standard pattern 5'' of each frame is calculated by the steady-state detection means 5, and the distance reduction means 5' calculates the distance between each frame and the vowel standard pattern 5'', and the intervals where the distance is equal to or less than a certain threshold are determined as vowel steady-state intervals, and the steady-state interval start 15 and the steady-state end end. The initial constant part average vector calculation means 6 calculates the vowel steady part average vector (xav) 6' for the feature vectors in the feature sequence storage unit 4 of the vowel stationary section, and calculates the vowel steady part average vector (xav) 6'. Only for the L peak channels of the vowel stationary part average vector (xav) 6' detected in step 7, the normalization of the feature vector (xt: i < m) of the frame before the start of the stationary interval 15 and the vowel stationary part average vector The normalized correlation coefficient calculation means 8 calculates the normalized correlation coefficient 20, 21. A certain threshold value θ..19″ or less and θ. . g/7/below is the first frame 'coj
+1co2 is detected, its arithmetic mean is detected as boundary frame signal 9'', and consonant-vowel boundary frame output 1
Set it to 7. The end of the word is detected by the voice section detecting means 2 in parallel with the above processing, and the end of the voice is regarded as 12, and the next utterance is in a standby state. This parallel processing enables real-time processing.

以上の構成をとることによって子音と母音の境界におけ
るわたり部の変動や子音の種別による母音入りわたり部
の変動や話者変動のゆらぎを吸収して、子音と母音の境
界検出精度を安定して向上させることが出来、さらに実
時間処理が図れる。By adopting the above configuration, fluctuations in the transition part at the boundary between a consonant and a vowel, fluctuations in the vowel transition part depending on the type of consonant, and fluctuations in speakers can be absorbed, and the accuracy of detecting the boundary between a consonant and a vowel can be stabilized. Furthermore, real-time processing can be achieved.

第３図〜第８図に前記実施例装置を実際にＯ段の音節に
ついて代表的子音の種類の音節に適用した場合の例を示
している。各図において横軸は、語頭からの継続時間（
ｍｓ）を示してお９．１フレーム６ｍ５ｅＣ，第１番目
の黒線は語頭、第２番目の黒線はセグメント境界、第３
番目の黒線は母音定常区間心を示している。また各図ａ
は単音節音声の電力の時間変化を示しており、同図すは
同様の特徴ベクトル系列の時間変化（ここでは周波数の
ランニングスペクトル）を示している。いずれの単音節
においても良好に子音と母音の境゛界が検出されている
ことがわかる。FIGS. 3 to 8 show examples in which the above-mentioned embodiment apparatus is actually applied to syllables of typical consonant types in the O stage. In each figure, the horizontal axis is the duration from the beginning of the word (
ms) is shown at 9.1 frame 6m5eC, the first black line is the beginning of the word, the second black line is the segment boundary, and the third black line is the beginning of the word.
The th black line indicates the vowel stationary interval center. Also each figure a
shows the temporal change in the power of monosyllabic speech, and the same figure shows the temporal change in a similar feature vector series (here, the frequency running spectrum). It can be seen that the boundaries between consonants and vowels are well detected in all monosyllables.

第３図、第４図に代表的破裂子音／に／、／Ｇ／を示し
、第５図に代表的摩擦子音／Ｓ／第６図に代表的鼻子音
／Ｎ／、第７図に代表的はじき音／Ｒ／。Figures 3 and 4 show typical plosive consonants /ni/ and /G/, Figure 5 shows typical fricative consonants /S, Figure 6 shows typical nasal consonants /N/, and Figure 7 shows typical Target sound /R/.

第８図に代表的気音／ＨＡ示す。これらのことからいか
なる子音種別からくる変動の影響も受け彦いことがわか
る。Figure 8 shows typical aspirators/HA. From these facts, it can be seen that it is not affected by variations caused by any consonant type.

発明の効果本発明は、子音と母音の検出処理を、電力系列による語
頭検出を行ない、語頭検出後に特徴ベクトル系列におけ
る母音標準ノくターンとの距離を計算し、一定の閾値以
下となるフレームが一定長以上連続する区間（ｉｓｓｓ
〜１ｅ８８１を母音定常区間とし、母音定常区間中央の
数フレームの母音定常部平均ベクトル（Ｘａｖ）をめ、
このベクトルのヒ。Effects of the Invention The present invention performs consonant and vowel detection processing by detecting the beginning of a word using a power sequence, and after detecting the beginning of the word, calculates the distance from the standard vowel turn in the feature vector series, and detects frames whose values are below a certain threshold. A continuous section of a certain length or more (isss
~1e881 is the vowel stationary section, and the vowel stationary section average vector (Xav) of several frames in the center of the vowel stationary section is calculated,
This vector h.

−ク値な持つチャネルをＬ個検出しておき、このＬ個の
チャネルについてのみ、母音定常部平均ベクトル（ｘａ
ｖ）と母音定常区間以前のフレームｍ（ｍ　＜　ｉ　、
　８８１との正規化相関係数Ｃ０θｃｏ１　（〈１）以
下および閾値θ。。２（〈θ。。１く１）以下に初めて
なるフレームの相加平均を子音と母音の境界フレームと
し、語尾の検出を並列処理し、次の発声に備えるように
した子音と冊もの土飯泊使！：１！背置【閏すうもの′
ヒ゛めり、上記の構成をとることによって、子音と母音
の境界におけるわたり部の変動や、子音の種別からくる
母音入りわたり部の変動を吸収し、また話者変動による
母音定常区間のゆらぎを吸収して、子音と母音の境界検
出精度を安定して向上させることが出来、さらに実時間
処理も可能となり実用化が達成されるものである。- Detect L channels having a
v) and the frame m before the vowel stationary interval (m < i,
Normalized correlation coefficient C0θco1 with 881 (<1) or less and threshold value θ. . The arithmetic mean of the first frames below 2 (〈θ..1 × 1) is used as the boundary frame between consonants and vowels, and the detection of word endings is processed in parallel to prepare for the next utterance. Night envoy! :1! Separation [leaping thing'
In summary, by adopting the above configuration, it is possible to absorb fluctuations in the transition part at the boundary between a consonant and a vowel, as well as fluctuations in the vowel transition part caused by the type of consonant, and also to absorb fluctuations in the vowel steady interval due to speaker variations. By absorbing this information, it is possible to stably improve the accuracy of detecting the boundary between consonants and vowels, and furthermore, real-time processing is also possible, thereby achieving practical use.

[Brief explanation of the drawing]

第１図は本発明の一実施例における子音と母音の境界検
出装置のブロック図、第２図は第１図に示した実施例装
置の動作を説明するだめの説明図、第３図〜第８図はそ
れぞれ同実施例装置にオ段甲音節の代表的な子音分類の
子音を持つ単音節Ａ■乙／Ｇｏ／、／Ｓｏ／、／Ｎｏ／
、／ＲＯ／、／ＨＯ／を入力したときの実施結果を示す
特性図である０１・・・・・・Ａ／Ｄ変換器、２・・・
・・・音声区間検出手段、３・・・・・・特徴系列変換
手段、４・・・・・・特徴系列記憶部、５′・・・・・
・距離計算手段、５″・・・・・・母音標準バクーン、
５・・・・・・定常部検出手段、６・・・・・・母音定
常区間均ベクトル計算手段、７・・・・・・平均ベクト
ルビーク検出手段、８・・・・・・正規比相°関係数言
」薄手段、９・・・・・・境界フレーム検出手段。FIG. 1 is a block diagram of a consonant-vowel boundary detection device according to an embodiment of the present invention, FIG. 2 is an explanatory diagram for explaining the operation of the embodiment device shown in FIG. 1, and FIGS. Figure 8 shows monosyllables A■/Go/, /So/, /No/ with consonants of the typical consonant classification of the upper syllable in the same example device.
, /RO/, /HO/ are characteristic diagrams showing the implementation results when inputting 01... A/D converter, 2...
...Voice section detection means, 3...Feature sequence conversion means, 4...Feature sequence storage section, 5'...
・Distance calculation means, 5″...Vowel standard Bakun,
5...Steady part detection means, 6...Vowel steady section average vector calculation means, 7...Mean vector peak detection means, 8...Normal ratio phase °Relational number word"thin means, 9... Boundary frame detection means.

Claims

[Claims]

The input speech is expressed as a time series pattern of feature vector xti (Xt
l, xt2. . . . IxtN), a speech section detection means that extracts a speech section from the input signal, and a frame whose distance from the vowel standard pattern is less than a certain threshold in the feature vector series. a stationary part detecting means for detecting a section where the vowel stationary section continues for a certain section length or more as a vowel stationary section; and a normalized correlation coefficient divider for calculating a normalized correlation coefficient between the average vector of the vowel stationary section and frames before the stationary section. calculating means, sequentially from the stationary interval toward the beginning of the word, calculate the normalization phase for the plurality of upper channels among the channels having the maximum value of the vowel stationary interval average vector (Xtil) and the channel having the minimum value. A consonant-vowel boundary detection device, characterized in that it compares whether a value obtained by a relational coefficient calculation means is below a certain threshold value, and sets the frame in which the value is below the threshold value for the first time as a consonant-vowel boundary frame. .