JPH01285994A

JPH01285994A - Speech section detector

Info

Publication number: JPH01285994A
Application number: JP63116694A
Authority: JP
Inventors: Nobuo Sugi; 杉　伸夫
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1988-05-13
Filing date: 1988-05-13
Publication date: 1989-11-16

Abstract

PURPOSE:To enable setting of the optimum threshold and detecting of a speech section with high accuracy by monitoring the average power of noises by effective utilization of the idle time before a command is given and setting the threshold according to the max. value thereof. CONSTITUTION:The noise average power of every prescribed period is obtd. from the signal power detected by a characteristic detecting part 1, particularly the signal power of the noise in the period when there is no speech signal, in a threshold calculating part 2. The average level of the noise around the speech input part is monitored over a long period of time by utilizing the idle time before the command for starting the recognition processing is given from the control part in such a manner. The max. value of the noise average level is determined from the data on said level stored in the memory part and the threshold for detecting the speech section is calculated when the command is given. The detection of the speech section is thereby executed by setting the optimum threshold regardless of the length of the period of the change in the noise power.

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）本発明は音声認識等に供される人力音声の音声区間検出
に用いられる閾値をノイズレベルに応じて適応的に最適
設定することのできる音声区間検出装置に関する。[Detailed Description of the Invention] [Objective of the Invention] (Industrial Application Field) The present invention adaptively optimally sets a threshold value used for speech segment detection of human speech used for speech recognition etc. according to the noise level. The present invention relates to a voice section detection device capable of detecting speech intervals.

（従来の技術）音声認識システムでは、入力音声の音声区間を高精度に
検出し、この音声区間の入力音声情報だけを切出して認
識処理に供することが重要な課題となる。(Prior Art) In a speech recognition system, it is important to detect a speech section of an input speech with high precision, extract only the input speech information of this speech section, and provide it for recognition processing.

従来、この種の音声区間検出は、例えば所定の周期（５
〜２０ｉｓｅｃ）毎に特徴抽出して検出される人力音声
の音声パワーを所定の閾値で弁別して行われる。具体的
には音声パワーが成る閾値を一定時間以上継続して上回
ったとき、最初に閾値を上回った時点を音声の始端とし
て検出し、その後、音声パワーが上記閾値を一定時間以
上継続して下回ったとき、最初に閾値を下回った時点を
音声の終端として検出して音声区間検出が行われる。ま
た始端検出用の閾値と終端検出用の閾値をそれぞれ設定
し、これらの閾値と音声パワーとの関係から音声区間検
出することも行われる。Conventionally, this type of voice section detection has been performed, for example, at a predetermined period (5
This is performed by discriminating the voice power of human voice detected by feature extraction every 20 isec) using a predetermined threshold value. Specifically, when the audio power exceeds the threshold value for a certain period of time or more, the first point in time when the audio power exceeds the threshold value is detected as the beginning of the audio, and then when the audio power exceeds the threshold value for a certain period of time or more, it is detected as the beginning of the audio. When this occurs, the voice section detection is performed by detecting the point in time when the voice first falls below the threshold as the end of the voice. In addition, a threshold value for detecting the start end and a threshold value for detecting the end end are set respectively, and a voice section is detected from the relationship between these threshold values and the voice power.

ところでこのような音声区間検出の為の閾値は、通常、
音声区間開始前のノイズ区間におけるノイズパワーに基
づいて設定される。一般的には制御部から認識処理開始
のコマンドが与えられたとき、入力信号の取込みを開始
した直後の数フレームのノイズパワーを平均化し、この
ノイズ平均パワーに従って閾値の設定が行われる。By the way, the threshold value for detecting such voice sections is usually
It is set based on the noise power in the noise section before the start of the voice section. Generally, when a command to start recognition processing is given from the control unit, the noise power of several frames immediately after starting to capture the input signal is averaged, and a threshold is set according to this noise average power.

ところが音声入力部を取巻く周囲環境はさまざまに変化
し、そこに存在するノイズもそのパワー変化の周期が長
いものや短いもの等、さまざまである。この為、認識開
始直後の数フレームのノイズパワーの平均値を求めて閾
値を設定する従来の方式にあっては、パワー変化の周期
が１〜２フレ一ム程度の短いノイズに対しては成る程度
型しい閾値の設定が可能であるが、例えば１０〜２０フ
レームに亙る長い周期のノイズが存在する場合には、そ
の閾値を正しく設定することが困難となる。However, the surrounding environment surrounding the audio input section changes in various ways, and the noise that exists there also varies, with some having long and short periods of power change. For this reason, the conventional method of determining the average value of the noise power of several frames immediately after the start of recognition and setting the threshold does not work well for noise with a short period of power change of about 1 to 2 frames. Although it is possible to set a fairly regular threshold value, if there is noise with a long period of, for example, 10 to 20 frames, it becomes difficult to set the threshold value correctly.

つまりノイズパワーの平均値を求めた時点がノイズパワ
ーの低い時点であった場合には、本来あるべき値よりも
小さい値の閾値が設定され、ノイズ区間を含んで音声区
間検出が行われる不具合が生じる。また逆にノイズパワ
ーが高い時点であった場合には、本来あるべき値よりも
大きい値の閾値が設定され、音声区間の一部を欠落して
音声区間検出が行われる不具合が生じる。In other words, if the average value of the noise power is calculated at a time when the noise power is low, the threshold value will be set to a smaller value than it should be, which may cause a problem in which voice section detection is performed including the noise section. arise. On the other hand, when the noise power is high, the threshold value is set to a value larger than the original value, and a problem arises in which voice section detection is performed with part of the voice section being omitted.

このような不具合は、成る程度長い期間に亙ってノイズ
平均パワーの検出を行うことによって解消し得る。しか
し制御部から認識処理開始のコマンドが与えられてから
上記ノイズ平均パワー検出の為に長時間を設定するには
限界がある。またこのような時間を設定した場合には、
音ｆ’４人力開始までの待ち時間が長くなり、その応答
が遅れてしまう等の問題が生じる。Such a problem can be resolved by detecting the noise average power over a reasonably long period of time. However, there is a limit to the length of time that can be set for detecting the noise average power after a command to start recognition processing is given from the control unit. Also, if you set a time like this,
Problems arise such as the waiting time until the sound f'4 is started by human power, and the response is delayed.

（発明が解決しようとする課題）このように従来にあってはパワー変化の周期の長いノイ
ズが存在する環境下でそのノイズ平均パワーを精度良く
検出し、音声区間検出の為の閾値を最適設定することが
困難であり、音声区間検出の精度を高めることが難しい
と云う問題があった。(Problem to be solved by the invention) In this way, in the past, in an environment where noise with a long period of power change exists, the average power of the noise is detected with high accuracy, and the threshold value for detecting voice sections is optimally set. There was a problem in that it was difficult to improve the accuracy of voice section detection.

本発明はこのような事情を考慮してなされたもので、そ
の目的とするところは、ノイズのノくワー変化の周期の
長さに拘ることなく、そのノイズ平均パワーを精度良く
求めて音声区間検出の為の閾値を最適設定することので
きる音声区間検出装置を提供することにある。The present invention has been made in consideration of the above circumstances, and its purpose is to accurately determine the average power of the noise, regardless of the length of the period of the noise noise change, and to calculate the noise interval. An object of the present invention is to provide a voice section detection device that can optimally set a threshold value for detection.

［発明の構成］（問題点を解決するための手段）本発明は音声入力開始前のノイズレベルを検出し、この
検出ノイズレベルに従って閾値を算出し、この閾値を用
いて入力音声レベルを弁別して入力音声の音声区間を検
出する音声区間検出装置において、音声区間検出の指示が与えられるまでの間に、所定の周
期毎にノイズ平均レベルを求め、このノイズ平均レベル
を所定数に亙って記憶部に格納すると共に、この記憶部
に格納されたノイズ平均レベルを逐次更新して常に最新
のノイズ平均レベルを該記憶部に求める。そして音声区
間検出開始時には上記レジスタに格納されているノイズ
平均レベルの最大値を検出し、この、ノイズ平均レベル
の最大値に基づいて音声区間検出の為の閾値を最適設定
するようにしたことを特徴とするものである。[Structure of the Invention] (Means for Solving the Problems) The present invention detects the noise level before the start of audio input, calculates a threshold value according to this detected noise level, and uses this threshold value to discriminate the input audio level. In a speech section detection device that detects a speech section of input speech, the noise average level is determined at every predetermined period until an instruction to detect a speech section is given, and this noise average level is stored for a predetermined number of times. At the same time, the noise average level stored in this storage section is updated one after another to always obtain the latest noise average level in the storage section. At the start of voice section detection, the maximum value of the average noise level stored in the above register is detected, and the threshold for voice section detection is optimally set based on this maximum value of the noise average level. This is a characteristic feature.

また更に、上記所定の周期で検出されるノイズ平均レベ
ルの記憶部への格納に際して、所定の周期毎に検出され
るノイズ平均レベルが著しく大なるとき、このノイズ平
均レベルの記憶部への格納を保留し、この著しく大なる
ノイズ平均レベルが一定時間以上継続しない場合には上
記保留したノイズ平均レベルを棄却するようにしたこと
を特徴とするものである。Furthermore, when storing the noise average level detected at the predetermined period in the storage section, if the noise average level detected at each predetermined period becomes significantly large, storing the noise average level in the storage section is performed. This is characterized in that the suspended noise average level is discarded if this extremely large noise average level does not continue for a certain period of time or more.

（作用）本発明によれば、制御部から認識処理開始のコマンドが
与えられる以前の空き時間を利用して音声入力部周囲の
ノイズの平均レベルを長時間に亙って監視し、上記コマ
ンドが与えられたとき、記憶部に格納されている最新の
所定数のノイズ平均レベルのデータからその最大値を求
め、このノイズ平均レベルの最大値に従って音声区間検
出の為の閾値を算出するので、ノイズパワーの変化の周
期の長さに拘ることなく、最適な閾値を設定して音声区
間検出を行うことが可能となる。(Function) According to the present invention, the average level of noise around the voice input unit is monitored for a long time using the free time before the command to start recognition processing is given from the control unit, and the above command is executed. When given, the maximum value is calculated from the latest predetermined number of noise average level data stored in the storage unit, and the threshold for detecting voice sections is calculated according to this maximum value of the noise average level. It becomes possible to set an optimal threshold value and perform voice section detection regardless of the length of the period of power change.

また著しく大きなノイズ平均パワーが検出された場合に
は、その検出ノイズ平均パワーの記憶部への格納を保留
し、著しく大きなノイズ平均パワーが一定時間以上継続
しない場合には、上記保留した検出ノイズ平均パワーを
棄却し、一定時間以上継続した場合にのみ前記記憶部に
格納するので、−時的に発生した大きなノイズに左右さ
れることなく最適な閾値の設定が可能となる。In addition, if an extremely large noise average power is detected, the storage of the detected noise average power in the storage unit is suspended, and if the extremely large noise average power does not continue for a certain period of time, the stored detected noise average power is Since the power is rejected and stored in the storage unit only when it continues for a certain period of time or more, it is possible to set an optimal threshold value without being affected by large noises that occur from time to time.

故に、音声入力部の周囲環境に応じた閾値を設定して、
音声区間検出を高精度に行うことが可能となる。Therefore, by setting a threshold according to the surrounding environment of the audio input section,
It becomes possible to perform voice section detection with high accuracy.

（実施例）以下、図面を参照して本発明の一実施例について説明す
る。(Example) Hereinafter, an example of the present invention will be described with reference to the drawings.

第１図は本発明の一実施例に係る音声区間検出装置を組
込んで構成される音声認識装置の概略構成図である。こ
の第１図においてＩは入力信号を所定の周期（例えば２
〜２０ｒＡｓｅｃのフレーム）で音響分析し、上記入力
信号の種々の特徴パラメータを検出する特徴抽出部であ
る。この特徴抽出部ｌにて検出される特徴パラメータの
１つに信号パワーがある。閾値計算部２は上記特徴抽出
部１で検出された信号パワー、特に音声信号の入力がな
い期間におけるノイズの信号パワーから上記所定周期毎
のノイズ平均パワーを求め、音声の認識処理の開始が指
示されたとき、上記ノイズ平均パワーに基づいて音声区
間検出の為の閾値を求めるものである。FIG. 1 is a schematic configuration diagram of a speech recognition device incorporating a speech segment detection device according to an embodiment of the present invention. In this FIG.
This is a feature extraction unit that performs acoustic analysis on a frame of ~20 rAsec) and detects various feature parameters of the input signal. One of the feature parameters detected by the feature extractor l is signal power. The threshold calculation unit 2 calculates the noise average power for each predetermined period from the signal power detected by the feature extraction unit 1, especially the signal power of noise during a period when no audio signal is input, and instructs the start of speech recognition processing. When the noise is detected, a threshold value for voice section detection is determined based on the noise average power.

この閾値計算部２で求められた閾値が音声区間検出部３
に与えられ、前記特徴抽出部１で求められた人力音声中
の音声区間の検出と、その検出された音声区間における
前記特徴パラメータ系列の抽出が行われる。そしてマツ
チング部４は上述した如く抽出された入力音声の特徴パ
ラメータ系列（人力音声の特徴パターン）と標準パター
ンメモリ５に登録されている認識対象音声の各標準パタ
ーンとをそれぞれ照合し、例えばその類似度を求めるも
のである。この標準パターンとの照合は、例えば複合類
似度値をそれぞれ計算する等して行われる。The threshold value calculated by the threshold calculation unit 2 is the voice section detection unit 3.
, and the feature extraction unit 1 detects the speech section in the human voice and extracts the feature parameter series in the detected speech section. Then, the matching unit 4 compares the characteristic parameter series of the input voice extracted as described above (characteristic pattern of human voice) with each standard pattern of the recognition target voice registered in the standard pattern memory 5, and searches for similarities, for example. It is a matter of seeking degree. The comparison with this standard pattern is performed, for example, by calculating respective composite similarity values.

このようにして各標準パターンとの間で求められた類似
度が判定部６にて相互に比較され、最大類似度をとる標
準パターンのカテゴリを認識結果（認識候補）として求
めることで、その認識処理が行われる。The degree of similarity obtained in this way with each standard pattern is compared with each other in the determination unit 6, and the category of the standard pattern that has the maximum degree of similarity is determined as a recognition result (recognition candidate). Processing takes place.

第２図はこのように構成された音声認識装置において、
上記音声区間検出に用いられる閾値を算出する閾値計算
部２の構成例を示す図であり、第３図はその処理概念（
機能ブロック）を模式的に示す図である。Figure 2 shows the speech recognition device configured in this way.
FIG. 3 is a diagram showing an example of the configuration of a threshold calculation unit 2 that calculates the threshold used for the voice section detection, and FIG. 3 shows the processing concept (
FIG. 2 is a diagram schematically showing functional blocks.

この閾値計算部２は制御部（ホスト）から認識処理開始
のコマンドが与えられるまでの期間、前記特徴抽出部ｌ
を介して与えられる入力信号の特徴情報をスイッチＳν
を介して取込み、上記コマンドが与えられたときには前
記特徴抽出部ｌから与えられる上記入力信号の特徴情報
を上記スイッチＳｗを介して音声区間検出部３にバイパ
スする如く構成される。つまり認識処理開始のコマンド
が与えられるまでの空き時間を利用して、その期間にお
けるノイズを入力し、閾値算出の基準となるノイズ平均
パワーを逐次求め、上記コマンドの入力時に上記ノイズ
平均パワーに基づいて閾値を設定する如く構成される。This threshold calculation unit 2 operates during the period until a command to start recognition processing is given from the control unit (host).
The characteristic information of the input signal given through the switch Sν
When the command is given, the feature information of the input signal given from the feature extraction section 1 is bypassed to the voice section detection section 3 via the switch Sw. In other words, using the free time until the command to start recognition processing is given, input the noise during that period, sequentially calculate the noise average power that will be the standard for calculating the threshold, and then use the noise average power as the reference for threshold calculation. The threshold value is set based on the threshold value.

しかして閾値計算部２は所定の周期で取込まれるノイズ
パワーをパワー格納用レジスタ１１に順次格納する（ス
テップａ）。そして上記パワー格納用レジスタ１１に所
定数のデータ（ノイズパワーＰＩ、Ｐ２．〜Ｐｋ）が格
納されたとき（ステップｂ）、これらのデータに従って
そのノイズ平均バワーｍを計算する（ステップＣ）。こ
のノイズ平均パワーｍの計算は、第３図にその処理概念
を示すようにパワー格納用レジスタ１１からノイズパワ
ー　Ｐ　Ｉ、Ｐ　２．〜Ｐｋを順次読出し、加算器１２
を用いてその総和（累積値）を求める。そしてこのノイ
ズパワーＰ１．Ｐ２．〜Ｐｋの総和を除算器１３にて加
算数にで除算することにより求められる。The threshold calculation unit 2 sequentially stores the noise power taken in at a predetermined period in the power storage register 11 (step a). When a predetermined number of data (noise powers PI, P2. to Pk) are stored in the power storage register 11 (step b), the noise average power m is calculated according to these data (step C). The calculation of this noise average power m is performed by inputting the noise powers P I, P 2 . ~ Pk is read out sequentially, and the adder 12
Find the sum (cumulative value) using . And this noise power P1. P2. It is obtained by dividing the sum of ~Pk by the number of additions in the divider 13.

このようにして求められるノイズ平均パワーｍを平均値
格納用レジスタ１４に順次格納し、前記パワー格納用レ
ジスタｌｌを初期化して次の周期でのノイズ平均パワー
の算出に備える。このような平均値格納用レジスタ１４
へのノイズ平均パワーｍの格納は、前述した認識処理開
始のコマンドが与えられるまで繰返し実行される（ステ
ップｄ）。そして平均値格納用レジスタ１４には、上記
所定の周期毎に求められるノイズ平均パワーｍが所定の
周期数（９個）に亙って格納され、新たなノイズ平均パ
ワーが求められる都度、レジスタ１４の更新が行われる
。この結果、平均値格納用レジスタ１４には常に最新の
９個のノイズ平均パワーｍが格納される。The noise average power m obtained in this manner is sequentially stored in the average value storage register 14, and the power storage register 11 is initialized to prepare for calculation of the noise average power in the next cycle. Such an average value storage register 14
The storage of the noise average power m is repeatedly executed until the above-mentioned command to start the recognition process is given (step d). The average value storage register 14 stores the noise average power m obtained for each predetermined cycle over a predetermined number of cycles (nine), and each time a new noise average power is obtained, the register 14 will be updated. As a result, the latest nine noise average powers m are always stored in the average value storage register 14.

しかして前記認識処理開始のコマンドが与えられると、
先ずノイズ平均レベルの最大値の抽出か行われる（ステ
ップｅ）。このノイズ平均レベルの最大値の抽出は、前
記平均値格納用レジスタ１４に格納されたノイズ平均パ
ワーｍを順次比較器Ｉ５に読出し、レジスタ１６に格納
されているノイズ平均パワーの参照値と比較しながら、
その値の大きいノイズ平均パワーを求め、これを上記レ
ジスタＩ６に格納して新たな参照値とすることによって
行われる。尚、レジスタ１６は［０］に初期設定されて
いる。However, when the command to start the recognition process is given,
First, the maximum value of the noise average level is extracted (step e). The maximum value of the noise average level is extracted by sequentially reading out the noise average power m stored in the average value storage register 14 to the comparator I5 and comparing it with the reference value of the noise average power stored in the register 16. While
This is done by finding the noise average power with the larger value, storing it in the register I6, and using it as a new reference value. Note that the register 16 is initially set to [0].

このようなノイズ平均パワーｍの逐次比較により、平均
値格納用レジスタ１４に格納されているノイズ平均パワ
ー中の最大値が求められる。そしてこのようにして求め
られたノイズ・＋ｉ均パワー中の最大値が閾値算出部１
７に与えられて音声区間検出の為の閾値の設定が行われ
る（ステップｆ）。By successive comparison of the noise average powers m, the maximum value among the noise average powers stored in the average value storage register 14 is obtained. Then, the maximum value of the noise +i average power obtained in this way is determined by the threshold calculation unit 1.
7, and a threshold value for voice section detection is set (step f).

このように構成された閾値計算部２によれば、音声入力
の空き時間を利用して長期間に亙ってノイズ平均パワー
を求め、このノイズ平均パワーを常に最新の複数周期分
に亙って前記平均値格納用レジスタ１４に格納しておき
、この平均値格納用レジスタ１４に格納されているノイ
ズ平均パワー中の最大値を用いて閾値を設定するので、
ノイズパワーの変化の周期が長い場合であっても、長期
的に亙るノイズ平均パワーの測定結果に従って、そのノ
イズパワーに応じた適切な閾値を設定することができる
。そして精度の高い音声区間検出を行うことが可能とな
る。According to the threshold calculation unit 2 configured in this way, the noise average power is obtained over a long period of time using the idle time of the audio input, and this noise average power is always calculated over the latest plural cycles. Since the threshold value is set using the maximum value of the noise average power stored in the average value storage register 14,
Even when the period of change in noise power is long, it is possible to set an appropriate threshold value according to the noise power according to the measurement results of the noise average power over a long period of time. Then, it becomes possible to perform highly accurate speech section detection.

つまり従来のように認識処理開始のコマンドが与えられ
た直後の短時間におけるノイズ平均パワーの測定を行う
ものとは異なるので、ノイズパワー変化の周期に拘るこ
となく、そのノイズ平均パワーを精度良く求め、最適な
閾値を設定することが可能となる。In other words, this is different from the conventional method, which measures the noise average power in a short period of time immediately after a command to start recognition processing is given, so the noise average power can be accurately calculated regardless of the period of noise power change. , it becomes possible to set an optimal threshold value.

ところでノイズの中には、−時的にその値が高くなるも
のがある。このような−時的にパワーの大きいノイズを
そのまま検出してノイズ平均パワーを算出した場合、不
本意にそのノイズ平均パワ一が高くなる虞れがある。従
ってこのような場合には、−時的にパワーが大きくなる
ノイズ成分を除去してノイズ平均パワーを算出する方が
好ましい。By the way, some noises have values that increase over time. If the noise average power is calculated by detecting such temporally large noise as it is, there is a risk that the noise average power will become unintentionally high. Therefore, in such a case, it is preferable to calculate the noise average power by removing noise components whose power increases over time.

第４図は前述した閾値計算部２に設けられ、上述した一
時的なピークノイズを棄却する機能を示すもので、第５
図はその機能ブロック図である。FIG. 4 shows a function provided in the threshold calculation unit 2 described above to reject the above-mentioned temporary peak noise.
The figure is its functional block diagram.

このピークノイズ棄却機能は、前述したように所定の周
期でノイズパワーを検出しくステップＡ）、その値が所
定のノイズレベルよりも高いが否かを判定して行われる
（ステップＢ）。具体的には、今まで検出されていたノ
イズパワーのＭ倍以上のノイズパワーが検出されるか否
かを比較器２■にて比較判定することにより行われる。This peak noise rejection function is performed by detecting the noise power at a predetermined period (step A) and determining whether the value is higher than a predetermined noise level (step B), as described above. Specifically, this is performed by comparing and determining whether or not a noise power that is M times or more the noise power that has been detected so far is detected using the comparator 2.

この比較判定に用いられる比較基準は、具体的には１周
期前に検出されたノイズパワーを乗算器２２にてＭ倍し
、これをレジスタ２３に格納して上記比較器２１に与え
ることにより行われる。Specifically, the comparison standard used for this comparison judgment is made by multiplying the noise power detected one cycle before by M in the multiplier 22, storing this in the register 23, and applying it to the comparator 21. be exposed.

しかして極端に大きくなったノイズパワーが検出された
場合には、これを前記パワー格納用レジスタ１１にその
まま格納することなく、別のパワー格納用レジスタ２４
に一時的に格納する（ステップＣ）。しかる後、前述し
たノイズパワーの検出を繰返し実行しくステップＤ）、
今度は検出ノイズパワーが極端に小さくなったかを判定
する（ステップＥ）。この判定は、先に検出されたノイ
ズパワーの（１／Ｍ）以下のノイズパワーが検出される
か否かを比較器２５にて比較判定することにより行われ
る。即ち、この比較判定に用いられる比較基準は、具体
的には１周期前に検出されたノイズパワーを除算器２６
にて（１／Ｍ）倍しくＭで割る）、これをレジスタ２７
に格納して上記比較器２５に与えることにより行われる
。However, when extremely large noise power is detected, it is not stored in the power storage register 11 as it is, but is stored in another power storage register 24.
(Step C). Thereafter, step D) of repeatedly performing the noise power detection described above;
Next, it is determined whether the detected noise power has become extremely small (step E). This determination is made by comparing and determining in the comparator 25 whether or not a noise power that is equal to or less than (1/M) of the previously detected noise power is detected. That is, the comparison standard used for this comparison judgment is specifically that the noise power detected one cycle ago is divided by the divider 26.
(1/M) times M), and register 27
This is done by storing the data in the data and supplying it to the comparator 25.

そして検出されたノイズパワーが小さくならなかった場
合には、その値を前記パワー格納用レジスタ２４に順次
格納する（ステップＦ）。この処理を繰返し実行し、上
記パワー格納用レジスタ２４に所定個数のノイズパワー
が格納されたとき（ステップＧ）、このパワー格納用レ
ジスタ２４に格納されたデータを前記パワー格納用レジ
スタ１１に転送する（ステップＨ）。If the detected noise power has not become smaller, its value is sequentially stored in the power storage register 24 (step F). This process is repeatedly executed, and when a predetermined number of noise powers are stored in the power storage register 24 (step G), the data stored in the power storage register 24 is transferred to the power storage register 11. (Step H).

逆に上記パワー格納用レジスタ２４に所定個数のノイズ
パワーのデータが格納される以前に、その検出ノイズパ
ワーが極端に小さくなった場合には、上述した如く検出
されたパワーの大なるノイズは一時的なものであると判
定し、前記パワー格納用レジスタ２４に格納されたノイ
ズパワーのデータの全てを棄却する（ステップ■）。On the other hand, if the detected noise power becomes extremely small before a predetermined number of noise power data are stored in the power storage register 24, the detected noise with a large power will temporarily disappear as described above. It is determined that the noise power data is the same, and all of the noise power data stored in the power storage register 24 is discarded (step (2)).

以上の処理手続きにより一時的に大きくなったノイズの
成分が棄却される。従ってこのようなピークノイズ棄却
機能を備えた本装置によれば、−時的な極端なノイズ変
化に影響されることなく、その平均的なノイズ平均パワ
ーを検出することが可能となり、音声区間検出の為の適
切な閾値を効果的に設定することが可能となる。Through the above processing procedure, noise components that have become temporarily large are discarded. Therefore, with this device equipped with such a peak noise rejection function, it is possible to detect the average noise average power without being affected by extreme temporal noise changes, and it is possible to detect speech intervals. It becomes possible to effectively set an appropriate threshold value for.

尚、本発明は上述した実施例に限定されるものではない
。例えばピークノイズ棄却昨日を平均値格納用レジスタ
１４にノイズ平均パワーを格納する際にも用いることが
可能である。このようにすれば、更に長い周期でのノイ
ズパワーの変化に対処することが可能となる。そして定
常的なノイズ成分だけを考慮して効果的な音声区間検出
を行うことが可能となる。その他、本発明はその要旨を
逸脱しない範囲で種々変形して実施することが可能であ
る。Note that the present invention is not limited to the embodiments described above. For example, the peak noise rejection data can also be used when storing the noise average power in the average value storage register 14. In this way, it becomes possible to deal with changes in noise power over a longer period. Then, it becomes possible to perform effective speech section detection by considering only stationary noise components. In addition, the present invention can be implemented with various modifications without departing from the gist thereof.

［発明の効果］以上説明したように本発明によれば、認識処理開始のコ
マンドが与えられた直後の短時間におけるノイズ平均パ
ワーを検出して音声区間検出の為の閾値を設定するもの
とは異なり、上記コマンドが与えられる前の空き時間を
有効に用いてノイズの平均パワーを監視し、その最大値
に従って上記閾値を設定するので、ノイズパワーの変化
の周期に拘ることなしに部品に最適な閾値を設定して高
精度に音声区間検出を行うことが可能となる等゛の実用
上多大なる効果が奏せられる。[Effects of the Invention] As explained above, according to the present invention, the noise average power is detected in a short period of time immediately after a command to start recognition processing is given, and a threshold value for speech section detection is set. Differently, since the free time before the above command is given is effectively used to monitor the average noise power and the above threshold is set according to its maximum value, it is possible to determine the optimum value for the part without being concerned with the period of change in the noise power. Great practical effects can be achieved, such as being able to perform speech segment detection with high accuracy by setting a threshold value.

[Brief explanation of the drawing]

図は本発明の一実施例を示すもので、第１図は本発明の
一実施例に係る音声区間検出装置を組込んで構成される
音声認識装置の概略構成図、第２図は実施例装置を示す
閾値計算部での処理手続きを示す図、第３図は閾値計算
部の機能ブロック図、第４図は閾値計算部に組込まれる
ピークノイズ棄却機能の構成例を示す図、第５図はピー
クノイズ棄却機能のブロック構成図である。２・・・閾値計算部、３・・パ音声区間検出部、１１・
・・パワー格納用レジスタ、１２・・・加算器、１３・
・・除算器、１４・・・平均値格納用レジスタ、■５・
・・比較器、１６・・・レジスタ、１７・・・閾値算出
部、２１・・・比較器、２２・・・乗算器、２３・・・
レジスタ、２４・・・パワー格納用レジスタ、２５・・
・比較器、２６・・・除算器、２７・・・レジスタ。出願人代理人　弁理士　鈴江武彦The figures show one embodiment of the present invention, and FIG. 1 is a schematic configuration diagram of a speech recognition device incorporating a speech segment detection device according to an embodiment of the present invention, and FIG. 2 is an embodiment of the present invention. FIG. 3 is a functional block diagram of the threshold calculation section; FIG. 4 is a diagram showing a configuration example of a peak noise rejection function incorporated in the threshold calculation section; FIG. 5 is a block diagram of a peak noise rejection function. 2...Threshold value calculation unit, 3...Pa voice section detection unit, 11.
...Power storage register, 12...Adder, 13.
...Divider, 14...Register for storing average value, ■5.
... Comparator, 16... Register, 17... Threshold calculation unit, 21... Comparator, 22... Multiplier, 23...
Register, 24...Power storage register, 25...
-Comparator, 26...Divider, 27...Register. Applicant's agent Patent attorney Takehiko Suzue

Claims

[Claims]

(1) In a speech section detection device that calculates a threshold according to the noise level before the start of speech input, uses this threshold to discriminate the speech level, and detects the speech section of the input speech, until an instruction to detect the speech section is given. means for determining the noise average level at each predetermined period; a storage unit that stores the noise average level determined by this means while updating the noise average level over a predetermined number of times; and 1. A voice section detection device, comprising means for setting a threshold value based on a maximum value of an average noise level stored in the section.

(2) In the speech interval detection device according to claim 1, when the average noise level detected at each predetermined period becomes extremely large, storage of this average noise level in the storage unit is suspended, 1. A voice section detection device characterized by having a function of discarding the suspended noise average level if the noise average level does not continue for a certain period of time or more.