JPH0383100A

JPH0383100A - Detector for voice section

Info

Publication number: JPH0383100A
Application number: JP1220027A
Authority: JP
Inventors: Masashi Tokuda; 正志徳田
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1989-08-25
Filing date: 1989-08-25
Publication date: 1991-04-09

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】産業上の利用分野この発明は変換器を介して得られた電気信号中に含まれ
る音声区間を検出する装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION FIELD OF INDUSTRIAL APPLICATION The present invention relates to a device for detecting voice segments contained in an electrical signal obtained via a transducer.

従来の技術音メイ信号を電気信号から切り出して、マイクロコンピ
ュータなどのデータ処理装買に入力して処理する場合が
ある。Conventional technology There are cases in which a sound signal is extracted from an electrical signal and input to data processing equipment such as a microcomputer for processing.

一般的に、音声切り出し方式としては、音声信号のエネ
ルギーたとえば振幅を検出し、所定のしきい値と比較す
るレベル検出法、音声信号の零交差数や極性ビット系列
を検出し判定する方法、また、両者を複合する方法等が
用いられている。In general, audio extraction methods include a level detection method that detects the energy, such as amplitude, of an audio signal and compares it with a predetermined threshold; a method that detects and determines the number of zero crossings or polarity bit sequence of an audio signal; , a method of combining both is used.

レベル検出法は音声信号の有無を判別しようとする電気
信号を比較器で所定しきい値と比較して電気信号のレベ
ルがしきい値レベルより大のとき音声信号であると判定
して、その音声信号部分を出力する。このしきい値とし
ては符号化、復号化時の音声波形からパワーを求め、有
音／無音の判定を行う場合の判定レベルであるパワーし
きい値が用いられている。The level detection method uses a comparator to compare the electrical signal used to determine the presence or absence of an audio signal with a predetermined threshold, and when the level of the electrical signal is higher than the threshold level, it is determined to be an audio signal. Output the audio signal part. As this threshold, a power threshold is used, which is a determination level for determining whether there is speech or no speech by determining the power from the audio waveform during encoding and decoding.

発明が解決しようとする課題このレベル検出法は、方法が簡弔ではあるが、設定する
しきい値が大きすぎると音声区間を検出できない場合が
生じる一方、しきい値が小さずぎろと、背景雑音の大き
い場合に誤検出を起こ（７たり、子音などのエネルギー
レベルの低い部分が検出されなかったりする問題かあっ
た。Problems to be Solved by the Invention Although this level detection method is simple, if the threshold value is set too high, it may not be possible to detect a vocal section. There were problems with false detections when there was a lot of noise, and parts with low energy levels such as consonants were not detected.

また従来の音声切出方式にお；子るレベル検出用のしき
い値か［司定したしのであったので前述の問題と＋Ｒな
って背ａ音のレベルが大きくなると、背景音をら音Ｊｉ
であると誤って認識してしまう欠点がｄ５す、適切なし
きい値レベルを設２ずろことか困雅てあった。In addition, in the conventional voice extraction method, the threshold value for detecting the level of the background sound increases. Ji
The disadvantage of erroneously recognizing d5 is that it is difficult to set an appropriate threshold level.

この挿の音声区間切り出Ｉ２方式は、雑音による誤動作
の少ないこと、語頭語尾か欠落しないことさらに汎用マ
イクロコンピュータによって処理する為、ソフト亀（処
理時間）の少ないものが要求されていｌ：。This I2 method for cutting out voice segments is required to have fewer malfunctions due to noise, to avoid missing beginnings and endings of words, and to be processed by a general-purpose microcomputer, so it requires less software processing time.

この発明は上述の問題を解決して、背景雑音の大きい信
号でも音声区間を確実に認識できる音声信号切出方法を
提供することを目的とする。SUMMARY OF THE INVENTION An object of the present invention is to solve the above-mentioned problems and provide a method for cutting out a speech signal that can reliably recognize a speech section even in a signal with large background noise.

課題を解決セる手段この発明は３声信号を検出しようとする信号を複数のブ
〔１ツクに分割する手段と、各ブロック中の信号間のエ
ネルギーレベルの差分絶対値和を計算する回路と、上記
絶対値和から音声信号のレベル判定用のしきい値を計算
する回路とを備えたことを特徴とする。Means for Solving the Problems The present invention comprises means for dividing a signal from which a three-tone signal is to be detected into a plurality of blocks, and a circuit for calculating the sum of absolute differences in energy levels between signals in each block. , and a circuit for calculating a threshold value for determining the level of an audio signal from the sum of absolute values.

作用この発明においては、音声信号を検出しようとする信号
を複数のブロックに分割して、ブロック中の信号間のエ
ネルギーレベルの差分絶対値和を特徴量として、音声信
号のレベル判定用のしきい値を上記特徴量から決定する
。特徴量としきい値設定は以下の計算による。In this invention, a signal from which an audio signal is to be detected is divided into a plurality of blocks, and the sum of absolute differences in energy levels between signals in the blocks is used as a feature quantity to set a threshold for determining the level of the audio signal. The value is determined from the above feature amount. The feature values and threshold settings are calculated as follows.

特徴量音声信号をあるブロックに区切り、同一ブロック内の隣
合った音声信号の差分絶対値和を音声区間切り出しの特
！ｋｉＦとする。Divide the feature audio signal into blocks, and calculate the sum of absolute differences between adjacent audio signals in the same block to extract the audio section! Let it be kiF.

ｔブロックのインデックス。t Block index.

（０，１，２，３，・・・）時間ｔの時の音声信号振幅。(0, 1, 2, 3,...) Audio signal amplitude at time t.

ブロック中の音声信号の数。Ｆが安定した値になるように１ビッヂ以上の長さとした。Number of audio signals in the block. F is stable length of 1 bit or more so that the value is Satoshi.

８０（８ＫＨｚサンプリングで１０ｍ５ｅｃ）しきい値
設定１ｖ３λＷｇによる誤検出を防ぐために、しきい値Ｔ　
１−（を背１☆堆跨レベルによって適応的に設定する。80 (10m5ec at 8KHz sampling) To prevent false detection due to threshold setting 1v3λWg, threshold value T
1-( is set adaptively depending on the back 1☆ straddle level.

ＣＮＴ　Ｎ０ＩＳＥブロツク長の背景雑音を入力し、そ
の平均特徴量Ｆ゛から次式により設定する。The background noise of the CNT NOISE block length is input, and the average feature amount F' is set using the following equation.

ＴＨ＝Ｇ（Ｆ’） −・例として　ＣＮ　Ｔ　Ｎ０ＩＳＥり長。ｌＯブロックとした。TH=G(F’) -・For example, CN T N0ISE length. It was set as IO block.

Ｇ（ｘ）　＝　　１．２５＊Ｘ→５００　（Ｘ≧５５０
）＝　　１２００　　　　　　（Ｘ＜　５５０）とした
。G(x) = 1.25*X→500 (X≧550
)=1200 (X<550).

背は雑音人カブロソ実施例第１図において、音声データは適宜な時間間隔てサンプ
リングされたものである。In the embodiment shown in FIG. 1, the audio data is sampled at appropriate time intervals.

１は一つ前のサンプリング時期における音声データを入
れておく直前データレジスタ、２は当該サンプリング時
期に入力される音声データの値と直前データレジスタか
ら印加される一つホｆのサンプリング時期の音声データ
の値の差分絶対値を計算する回路、３は差分絶対値と差分絶対値和を加算する加算回路、４は差分絶対値和から計算によりしきい値を計算する回
路、５は定数１と差分絶対値和を比較するコンパレータ、６はしきい値を保持するレジスタ、７は計算回路３から得た信号から音声データを切り出す
コンパレータであり、しきい値としてはしきい値レジス
タ６から得られたしきい値をｍいる。各ビットは電源ラ
インやグランドラインに接続する。背景音が全くない場
合でも、ある一定の値をしきい値としてセットする。そ
の一定の値か定数２である。しきい値として定数２を採
用するかどうかを判定するのに使用する値が定数１であ
る。1 is the immediately preceding data register that stores the audio data from the previous sampling period, and 2 is the audio data value input at the sampling period and the audio data from the immediately preceding sampling period that is applied from the immediately preceding data register. 3 is an addition circuit that adds the absolute difference value and the sum of absolute difference values; 4 is a circuit that calculates the threshold value from the sum of absolute difference values; 5 is the constant 1 and the difference. A comparator that compares the sum of absolute values; 6 a register that holds a threshold value; 7 a comparator that extracts audio data from the signal obtained from the calculation circuit 3; the threshold value is obtained from the threshold register 6; The threshold value is m. Each bit is connected to a power line or a ground line. Even if there is no background sound, a certain value is set as the threshold. Its constant value is constant 2. Constant 1 is the value used to determine whether constant 2 is to be adopted as the threshold value.

８は音声データをいくつ読んだかを記憶するカウンタ、９は全体の処理フローをコントロールし、各しノスタに
対して値をセットするための信号を発するコントローラ
である。8 is a counter that stores how many pieces of audio data have been read; 9 is a controller that controls the overall processing flow and issues signals to set values for each nostar.

１１は語頭検出のためのカウンタで、ＴＨより大きい総
和をもつブロックからのブロック数を数えるカウンタ、Ｉ２は語頭検出のためのカウンタで、ＴＨより大きい総
和をもつブロックの数を数えるカウンタ、１３はバース
ト誤り検出のためのカウンタて、語頭が見つか、ってか
らのブロックの数を数えるカウンタで、このカウンタの
内容がある値を超えないと音声区間と認められない。11 is a counter for detecting the beginning of a word, which counts the number of blocks starting from a block with a total sum greater than TH; I2 is a counter for detecting the beginning of a word, a counter that counts the number of blocks having a sum greater than TH; 13, a counter for detecting the beginning of a word; A counter for burst error detection is a counter that counts the number of blocks after the beginning of a word is found, and unless the content of this counter exceeds a certain value, it is not recognized as a speech section.

１４は語尾を見つけるためのカウンタで、Ｔ）ｆより小
さいブロックの数を数えるカウンタである。14 is a counter for finding the end of a word, and is a counter for counting the number of blocks smaller than T)f.

第２図は音声区間の検出を行うための比較器７用のしき
い値を算出する処理フローチャートである。FIG. 2 is a processing flowchart for calculating a threshold value for the comparator 7 for detecting a voice section.

ステップＳ】で音声データを１つ績み込んで、ステップ
Ｓ２で直前データとステップＳｌで読み込んだ音声デー
タとの差分絶対値を差分絶対値計算回路２で計算し、ス
テップＳ２で計算された差分絶対値をステップＳ３で、
計算回路３を用いて差分絶対値和に加算４−る。ステッ
プＳ４でデータカウンタ８が決められた値ｎになったか
どうか判定し、もしｎになっていればステップＳ６にと
ぶ。In step S], one piece of audio data is loaded, and in step S2, the absolute difference value between the previous data and the audio data read in step Sl is calculated by the absolute difference calculation circuit 2, and the difference calculated in step S2 is calculated. The absolute value is determined in step S3,
The calculation circuit 3 is used to add 4- the sum of the absolute difference values. In step S4, it is determined whether the data counter 8 has reached a predetermined value n, and if it has reached n, the process jumps to step S6.

データカウンタがｎより小さいときはステップＳ５へ進
む。そしてデータカウンタ８を１インクリメントし、今
回のサンプリングで得た音声データの値を直前データレ
ジスタ１ｉこセットしてステップＳ１へ戻る。When the data counter is smaller than n, the process advances to step S5. Then, the data counter 8 is incremented by 1, the value of the audio data obtained in the current sampling is set in the immediately preceding data register 1i, and the process returns to step S1.

ステップＳ６では差分絶対値和をブロック数（ある決ま
った音声の個数ａでｌブロックとし、そのブロックを決
まった個数す集めて差分絶対値とする。我々の場合ａ−
１２８，ｂ＝８とした。）で割り平均をとる。In step S6, the sum of the absolute difference values is calculated by the number of blocks (a certain number of sounds is defined as l block, and a certain number of such blocks are collected to obtain the absolute difference value. In our case, a-
128, b=8. ) and take the average.

ステップＳ６で計算した平均がある値α（定数１）より
大きいかどうかをステップＳ７で比較器５により判定す
る。ステップＳ６で計算した平均よりαか大きいときは
ステップＳ８へ、そうでないときはステップＳ９へとぶ
。平均よりαが大きいのは背景雑音の大きいときであり
、マルチプレクサ１５をしきい漬汁算回路４側に切り換
える。In step S7, the comparator 5 determines whether the average calculated in step S6 is larger than a certain value α (constant 1). If α is larger than the average calculated in step S6, the process goes to step S8; otherwise, the process goes to step S9. α is larger than the average when the background noise is large, and the multiplexer 15 is switched to the threshold juice calculation circuit 4 side.

平均よりαが小さいのは背景雑音の小さいときであり、
マルチプレクサ１５を定数２側へ切り換えろ。α is smaller than the average when the background noise is small,
Switch multiplexer 15 to constant 2 side.

ステップＳ８では計算式をもとにしきい値をしきい漬汁
算回路４で計算し、それをしきい値レノスタ６にセット
する。計算式は一例としてしきい値−１２５×平均値＋
５００／１６）を用いた。In step S8, the threshold value is calculated by the threshold juice calculating circuit 4 based on the calculation formula, and the threshold value is set in the threshold value renostator 6. The calculation formula is, for example, threshold value - 125 x average value +
500/16) was used.

上記のようにして、このしきい値は比較器７に印加され
て、加算和計算回路３から印加されるデータ信号と比較
され、比較器７はこのしきい値を越えるデータ信号を音
声データ信号として出力する。これによって入力される
データ信号から音声データの切出しが行なわれる。As described above, this threshold value is applied to the comparator 7 and compared with the data signal applied from the sum calculation circuit 3, and the comparator 7 converts the data signal exceeding this threshold value into an audio data signal. Output as . As a result, audio data is extracted from the input data signal.

しきい値を計算したあと、音声区間の判定は次の様に行
なう。After calculating the threshold value, the voice section is determined as follows.

（語頭）隣接するデータ間で差分絶対値を計算し、ｌブロック分
の差分絶対値の総和Ｓを計算する。しきい値ＴＨよりＳ
のほうが大きいブロックが連続するＴ　ＨＤ　（定数）
ブロック中に、Ｔ　ＨＴ　Ｈ（定数）ブロック以上あれ
ば、語頭とする。実際には、語頭が見つかった時点より
ある一定時間さかのぼった時点を語頭とする。(Start of word) Calculate the absolute difference value between adjacent data, and calculate the sum S of the absolute difference values for l blocks. S from threshold TH
T HD (constant) where larger blocks are consecutive
If there are T HTH (constant) blocks or more in the block, it is treated as the beginning of a word. In reality, the beginning of a word is defined as a point a certain amount of time before the beginning of the word was found.

（語尾検出）Ｔ　ＨＥＮＤブロック長中に、しきい値ＴＨより大きい
特徴量のブロックがなければ、その最初を語尾とする。(End of Word Detection) If there is no block with a feature amount larger than the threshold value TH in the T HEND block length, the first block is determined as the end of the word.

（バースト検出）語頭と語尾の間隔が一定時間より短ずぎる場合は、バー
スト誤りとして音声区間とはみなさない。(Burst Detection) If the interval between the beginning and end of a word is shorter than a certain period of time, it is considered a burst error and is not considered as a speech section.

これは舌打ち音などの突発的な雑音をジェクトするため
である。This is to reject sudden noises such as tongue clicks.

なお第１図に示した回路において、直前データレジスタ
１、計算回路２，３．４は音声区間の切出しにもしきい
値の計算にもこれらの回路を音声データ信号とデータ信
号とに対して時分割的に使用することによって共用する
ことができる。In the circuit shown in Fig. 1, the immediately preceding data register 1 and the calculation circuits 2, 3.4 are used to extract the audio section and calculate the threshold value, and these circuits are used for the audio data signal and the data signal. It can be shared by dividing it into parts.

発明の効果本発明は、音声区間の切り出しを行うとき、音声区間か
どうかの判定を行うためのしきい値をその時り背景音を
考慮して自動的に変更１−て設定できるよう１こしノニ
のて、雑音の多いところでも事１７ｇ：よく音声区間の
切ζ）出しが行えるようにすることができる。Effects of the Invention The present invention provides a method for automatically changing and setting the threshold value for determining whether a speech section is a speech section when cutting out a speech section, taking into account the background sound at that time. Therefore, even in noisy areas, it is possible to easily cut out voice sections.

[Brief explanation of drawings]

第１図はこの発明の音声区間の切出方法に用いられる装
置の一例を示すブロック図、第２図は第１図の装置の動
作を示すフ〔１−ヂャート、第３［４は音声区間の一例
を示す波形図である。ｌ・・・直前データレノスタ、２・差分絶対値計算回路、３・・加算回路、４　しきい漬汁算回路、５・比較器、６・・しきい値レンスタ、７・・比較器、８・・・データカウンタ、９・・制御回路。第２図FIG. 1 is a block diagram showing an example of a device used in the voice section extraction method of the present invention, and FIG. 2 is a diagram showing the operation of the device shown in FIG. FIG. 1. Immediately before data renostar, 2. Absolute difference calculation circuit, 3. Addition circuit, 4. Threshold juice calculation circuit, 5. Comparator, 6. Threshold value renostar, 7. Comparator. 8...Data counter, 9...Control circuit. Figure 2

Claims

[Claims]

(1) A means for dividing a signal from which an audio signal is to be detected into a plurality of blocks, a circuit for calculating the sum of absolute differences in energy levels between signals in each block, and a level of the audio signal based on the sum of absolute values. 1. A speech interval detection device comprising: a circuit for calculating a threshold for determination.