JPH0371720B2

JPH0371720B2 -

Info

Publication number: JPH0371720B2
Application number: JP59099115A
Authority: JP
Inventors: Tadaharu Kato; Takao Nishitani
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1984-05-17
Filing date: 1984-05-17
Publication date: 1991-11-14
Also published as: JPS60242500A

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、音声信号の有無を判定する音声検出
方法及び回路に関する。（従来技術とその問題点）音声検出回路は主にDSI（Digital Speech
Interpolationの略称である。）装置に組み込ま
れ、DSI装置への入力チヤンネルに音声信号が存
在するか否かを判定するために用いられる。尚、DSI装置に関しては、例えば1976年３月発
行の文献、「コムサツトテクニカルレビユー
（COMSAT TECHNICAL REVIEW）」vol.6No.
１の第127〜158頁に掲載されているエス・ジエ
ー・キヤムパネラ（S.J.Campanella）による論
文、デイジタルスピーチインターポレーシヨ
ン（Digital Speech Interpolation）」に詳述さ
れているので参照されたい。従来、ハードウエア規模が簡単で検出論理が明
瞭である方法としてレベル検出法が知られている
が、この方法は入力信号の信号エネルギー（電力
及び振幅）を検出後、閾値と比較することにより
音声信号の有無を判定するものである。またレベ
ル検出法を用いた音声検出器の中で、入力信号の
振幅と予め定められた閾値とを比較する固定閾値
型音声検出器が、最もハードウエア規模が簡単で
かつ、確実な音声検出器として知られている。次に図面を参照しながら、この固定閾値型音声
検出器の原理を説明する。第１図は固定閾値型音声検出器の原理を示すブ
ロツク図であり、信号入力端子１、振幅閾値入力
端子２、振幅比較回路３、累積回路４、累積回路
への入力信号として＋１、−１を与える増加・減
少制御線５および６、音声検出用フリツプフロツ
プ７、音声検出用フリツプフロツプセツト・リセ
ツト制御線８および９、音声検出結果出力端子１
０からなつている。なお、この場合累積回路は可
逆カウンタ（アツプダウン・カウンタ）で置換で
きる。図において、端子１より入力される入力信号は
標本化周期ごとに振幅比較回路３において、端子
２より入力される予められた振幅閾値（THa）
と比較される。その結果、入力信号振幅が振幅閾
値よりも大きいと、累積回路増加制御線５を使つ
て累積回路４の内容が１だけ増加される。また、
逆に入力信号振幅が振幅閾値よりも小さいと、累
積回路減少制御線６を使つて累積回路４の内容が
１だけ減少される。但し、累積回路の内容は負の
値にならないようになつている。音声信号が到来し、振幅閾値を超える入力が多
くなると、累積回路の内容は順次増加する。もち
ろん、その間に振幅閾値以下の入力が加わると、
累積回路の内容は１だけ減少する。このようにし
て、累積回路の内容が予め設定された持続時間の
閾値（THt）に達すると、音声検出用フリツプ
フロツプのセツト制御線８を使つて音声検出用フ
リツプフロツプ７がセツトされ、音声が検出され
たことになり、端子１０よりその結果が出力され
る。また、音声が検出されなくなると、例えば、そ
れは累積回路４の内容が０になることで示される
が、その時、音声検出用フリツプフロツプ７は音
声検出用フリツプフロツプのリセツト制御線９を
使つてリセツトされ、端子１０よりその結果が出
力されるが、一般にはある一定時間の後にリセツ
トされる。これは、ハングオーバーと称され、通
話中の単語や句の間での切断に耳が敏感であるこ
とから設けられており、その時間長は100〜
250ms程度である。さらに理解を深めるために第１図で示される固
定閾値型音声検出器に第２図のａの１１で示され
る信号が入力した場合を例にとつて説明を加え
る。第２図では、入力信号１１、振幅閾値１２、累
積回路の内容１３、持続時間の閾値１４および音
声検出結果出力１５を示している。まず、入力信号１１が端子１から入力される
と、標本化周期Tsごとに振幅比較回路３により
振幅閾値１２と比較される。第２図から判るよう
に時刻t_a1になつて始めて入力信号の振幅の方が
振幅閾値よりも大きくなるので累積回路の内容１
３は時刻t_a1で始めて１になり（第２図ｂ）、以
後、時刻t_a2まで１ずつ増加されていく。その結
果、時刻t_b1になつて累積回路の内容１３が持続
時間の閾値１４よりも大きくなるので、音声が検
出されたことになり、出力１５は１になる。とこ
ろで、時刻t_a3になると、入力信号１１の振幅が
振幅閾値１２よりも小さくなるで、累積回路の内
容１３は１ずつ減少していき、時刻t_b2になり、
持続時間の閾値１４よりも小さくなるので音声信
号が無くなつたと判定され前述の理由でハングオ
ーバーが付加されハングオーバー終了後、出力１
５は０になる。第２図のｃにおけるT_Hがハング
オーバー時間を示している。以上説明してきた様な固定閾値型音声検出器で
は確かにハードウエア規模は簡単ではあるが、一
度閾値が設定されると閾値以上のレベルであれば
雑音でも検出してしまうという欠点があつた。（発明の目的）本発明の目的は、入力信号中に含まれる雑音電
力に応じて変動する閾値を有し、雑音の誤検出の
発生頻度を低下させかつ音声検出能力を向上させ
た可変閾値型音声検出器を提供することにある。（発明の構成）本発明によれば、標本時刻毎に入力される音声
信号と該音声信号の無音区間雑音レベルに応じて
変動する第１及び第２の閾値との大小判定結果を
累積し、該累積値と第３の閾値とを比較し入力信
号の信号レベルを判定し、該判定結果が高信号レ
ベルから低信号レベルに変わつた場合に予め定め
られた時間を該判定結果に有音区間として付加し
て出力を得る音声検出方法において、前記第３の閾値として高いレベルの予め定めら
れた閾値と雑音レベルに応じて決定される低いレ
ベルの閾値とを用意し、前記音声検出出力が音声
信号有りを知らせる時には低いレベルの閾値を用
い、前記音声検出出力が音声信号無しを知らせる
時には高いレベルの閾値を用いて音声検出を行な
い付加された有音区間でも雑音に応じて決定され
る低いレベルの閾値を用いることを特徴とする音
声検出方法が得られる。また本発明によれば、標本時刻毎に入力される
入力信号の無音区間雑音レベルを計算する雑音電
力計算回路と、該雑音電力計算回路の出力に応じ
て変動する第１及び第２の閾値を発生する第１の
閾値発生回路と、前記入力信号と前記第１及び第
２の閾値との大小判定を行ないその結果を出力す
るレベル検出回路と該レベル検出回路の出力を累
積する累積回路と、前記第１の閾値発生回路より
出力される前記第２の閾値を入力し該第２の閾値
のレベルに応じて予め定められた複数個の低いレ
ベルの閾値の中から１つを選択し出力する選択回
路と、該選択回路から出力される閾値と予め定め
られた高いレベルの閾値とを入力し、後述する出
力保持回路の出力が音声信号有りを知らせる時に
は前記選択回路から出力される低いレベルの閾値
を選択し、前記出力保持回路の出力が音声信号無
しを知らせる時には前記予め定められた高いレベ
ルの閾値を選択し第３の閾値として出力する第２
の閾値発生回路と、前記累積回路の出力と前記第２の閾値発生回路
から出力される第３の閾値とを比較することによ
り音声信号の有無を判定する判定回路と、該判定
回路の出力が音声信号有りから無しに変わつた場
合に予め定められた時間を前記判定回路の出力に
有音区間として付加する出力保持回路とから少な
くとも構成され、振幅閾値（第１及び第２の閾
値）と有音区間で使用する第３の閾値を雑音レベ
ルに応じて変動させることを特徴とした音声検出
回路が得られる。（実施例）本発明は上述の構成をとり、振幅閾値（第１及
び第２の閾値）と有音区間で使用する第３の閾値
（TH3L）とを雑音レベルに応じて適応的に変え
ることにより雑音による誤検出を増加させること
なく、音声検出能力を向上させている。本発明を図面を参照しながら詳細に説明する。
第３図は本発明の一実施例であり、入力端子２
０、偶数ビツト反転回路２１、符号変換回路２
２、整流回路２３、電力計算回路２４、第１の閾
値発生回路２５、レベル検出回路２６、累積回路
２７、比較回路２８、第２の閾値発生回路２９、
可逆カウンタ３０、カウンタ設定回路３１、判定
回路３２、出力端子３３及び選択回路３４から構
成されている。例えば、国際電信電話諮問委員会、（CCITT；
Ｃomite´ Ｃonsultatif Ｉnternational Ｔ´
elegraphaique et Ｔ´ele´phonique）からの勧
告案G.711に基づき非線形符号化され、８ビツト
のＡ−Law符号（オレンジブツクVol.−２，
pp409〜410参照のこと。）となつた入力信号が入
力端子２０から入力する場合を例にとつて説明を
加える。通常、電話回線を伝送されるＡ−Law
符号信号はMSB（Ｍost Ｓignificant Ｂitの略
称である。）側からみて、偶数ビツト目が反転さ
れているので、偶数ビツト反転回路２１により入
力信号は偶数ビツトが反転され伝送される前のも
との信号に戻される。もとに戻つたＡ−Law符
号信号は符号変換回路２２で、第４図で示すよう
に、正のＡ−Law符号信号に対してはMSBだけ、
負のＡ−Law符号信号に対しては全ビツト反転
され８ビツトの２の補数（Tw´os complement）
符号信号に変換され、整流回路２３へ入力され
る。整流回路２３では、この入力信号を絶対値信
号（大きさのみを表わす信号）に変換し、一方は
電力計算回路２４へ、もう一方はレベル検出回路
２６へ送り出す。電力計算回路２４では、入力信号に含まれる雑
音を取り出し、雑音の実効値を計算する。具体的には、音声が検出されない時（例えば、
後述する比較回路２８の出力が０の時）はすべて
の入力信号を雑音とみなすとともに、音声が検出
された時（例えば後述する比較回路２８の出力が
１の時）であつても予め定めれたレベル以下の信
号は雑音であるとみなし、この雑音を低域通過フ
イルタに入力することにより雑音の実効値を計算
し、その結果を第１の閾値発生回路２５に送出す
る。従つて、雑音の実効値を計算する際に除外さ
れる音声信号とは、後述の比較回路の出力が１で
ありかつ、予め定められたレベル以上の信号レベ
ルを有する信号である。第１の閾値発生回路２５
では、電力計算回路２４からの出力を定数倍する
ことにより、レベル検出回路２６で使用される第
１の閾値（TH1）と、第１の閾値より6dB高い
所に第２の閾値（TH2）とを設定し、レベル検
出回路２６へ送出する。レベル検出回路２６では、整流回路２３の出力
と、第１の閾値発生回路２５より送出される第１
閾値及び第２の閾値とを比較し、整流回路の出力
が第２の閾値より大きい場所には入力信号が音声
信号である確率が高いので＋２、第１の閾値と第
２の閾値との間に位置する場合には入力信号が音
声信号である確率と雑音である確率とがほぼ等し
いかあるいは前者が少し高い程であるので＋１、
第１の閾値より小さい場合には入力信号が雑音で
ある確率が高いので−１を出力する。累積回路２
７ではレベル検出回路２６の出力を累積しており
その累積値を比較回路２８へ送出する。比較回路
２８では、後述する第２の閾値発生回路２９から
出力される第３の閾値（TH3）と前記累積値と
を比較し、後者が前者よりも大きい場合には入力
信号が音声信号であると判定し、＋１を、また、
前者が後者よりも大きい場合には入力は雑音であ
ると判定し０を出力する。選択回路３４では、前述の第１の閾値発生回路
より出力される第２の閾値（TH2）のレベルに
応じて、予め定められた複数個の低いレベルの閾
値の中から１つを選択し出力する。第２の閾値発生回路２９では、前記選択回路か
ら出力される低いレベルの閾値（TH3L）とあら
かじめ定められた高いレベルの閾値（TH3H）
とを入力し、後述する判定回路３２の出力が０の
場合（無音時）には前記予め定められた高いレベ
ルの閾値を選択し、また、後述する判定回路３２
の出力が１の場合（有音時）には前記選択回路よ
り出力される低いレベルの閾値を選択し、前記比
較回路２８で使用する第３の閾値として出力す
る。可逆カウンタ３０では、前記比較回路２８の出
力を入力し、該入力信号が１の時にはカウンタの
内容を１だけ増加させ、０の時にはカウンタの内
容を１だけ減少させ前記比較回路の出力を累積し
ている。また、カウンタ設定回路３１は前記比較
回路の出力を監視しておりその出力が１から０に
変化する時点を検出し、その時点で、前記可逆カ
ウンタ３０の内容を予め定められた値に設定す
る。判定回路３２では、前記可逆カウンタ３０の
内容が予め定められた値（通常は０を用いる。）
より大きい場合に音声信号が検出されたとして１
を出力端子３３を介して外部に出力する。もちろ
ん小さい場合には０を出力するがこうすることに
より前述のハングオーバーも付加されることにな
る。第３図における電力計算回路２４及び第１の閾
値発生回路２５としては第５図の回路が使用で
き、絶対値信号入力端子５０、雑音判定レベル入
力端子５１、比較回路出力信号入力端子５２、比
較器５３、論理和回路５４、乗算器５５，５６，
５７，５８、被乗数入力端子５９，６０，６１，
６２，６３、被乗数選択器６４、加算器６５、リ
ミツター６６，６７、メモリー６８、第１の閾値
出力端子６９および第２の閾値出力端子７０から
構成されている。絶対値入力信号は入力端子５０
より入力され、一方は乗算器５５へ、もう一方は
比較器５３へ送られる。比較器５３では、前記入
力信号と入力端子５１より入力される雑音判定レ
ベルと比較され、前者が後者よりも大きい場合に
０、小さい場合に＋１を出力し、論理和回路５４
では、比較器５３の出力信号と、比較回路２８か
らの出力信号を反転した信号との論理和がとら
れ、少なくともどちらか一方が＋１のときに＋１
が出力され、乗算器５６の制御信号及び、被乗数
選択器６４の選択制御信号となる。前記被乗数選
択器６４では、前記選択制御信号が＋１の時には
被乗数入力端子５９より入力される被乗数が選択
され、また、０の時には被乗数入力端子６０より
入力される被乗数（現在は０を用いている。）が
選択され乗算器５５の被乗数となる。また、乗算器５５では、絶対値入力信号と前述
のようにして選択された被乗数との積がとられ加
算器６５へ送られる。一方、乗算器５６では被乗
数入力端子６１より入力される被乗数とメモリー
６８の内容との積がとられ加算器６５へ送られ
る。但し、論理和回路５４の出力が０の時はこの乗
算は行なわずメモリー６８の内容がそのまま出力
される。そして、加算器６５で前述の乗算器５５
の出力と乗算器５６の出力との加算が行なわれそ
の結果がリミツター６６を介してメモリー６８に
備えられる。また、それと同時にリミツター６６
の出力は乗算器５７により、被乗数入力端子６２
より入力される被乗数との積がとられ、雑音の実
効値（σ）と等しくなり、リミツター６７を介し
て第１の閾値（TH1）として、出力端子６９よ
り出力される。また、リミツター６７の出力は、乗算器５８で
被乗数入力端子６３より入力される被乗数（現在
は２を用いている。）との積がとられ、第２の閾
値（TH2）として、出力端子７０より出力され
る。ここで、リミツター６６，６７を用いているの
はメモリー６８の内容及び閾値（TH1）の可変
領域を制限することにより閾値調整速度を敏速に
かつ、音声検出器の受信感度および感動レベル範
囲を制限し、雑音に対する免疫性を保証するため
である。尚、電力計算回路２４は前述の様に絶対値信号
を一次の低域通過フイルタに通すことにより、雑
音のレベルを算出していたが、それは振幅分布が
Gauss分布であり分散がσ²である雑音をその絶対
値をとつて一次の低域通過フイルタに通して得ら
れる電力Ｐが次式で表す様に近似的に標準偏差
（実効値とも云いσで表す５に比例した値となる
ためである。ここで∫^∞ ₀ｘ exp｛−x²／（2σ²）｝dx （x²＝ｙとおくと、2xdx＝dy） ∫^∞ ₀１／２exp｛−ｙ／（2σ²）｝dy ＝σ² ゆえに(1)式は次のようになる。従つて、前記処理を施す事により、一次の低域
通過フイルタの出力で雑音振幅の標準偏差σにほ
ぼ比例した値が得られ、その値を定数倍すれば雑
音振幅の標準偏差（実効値σ）が得られることに
なる。第３図で用いられる可逆カウンタ３０、カウン
タ設定回路３１及び判定回路３２としては、第６
図の回路が使用でき、入力端子７１、１サンプル
遅延回路７２、論理積回路７３、カウンタ設定値
入力端子７４、可逆カウンタ７５、比較回路７
６、閾値入力端子７７及び出力端子７８から構成
されており、破線で囲まれた３０，３１，３２は
それぞれ第３図で示す可逆カウンタ、カウンタ設
定回路、判定回路を示している。入力端子７１よ
り入力された入力信号は一方は可逆カウンタ７５
へ、もう一方は１サンプル遅延回路７２及び論理
積回路７３へ送られる。論理積回路７３では現入力信号を反転した信号
と１サンプル時刻前の入力信号との積がとられ、
その結果を可逆カウンタ７５へ送出する。可逆カ
ウンタ７５では入力信号が１の時にはカウンタの
内容を１だけ増加させ、また、０の時にはカウン
タの内容を１だけ減少させるとともに、前記論理
積回路の出力が１の時、すなわち、入力信号が１
から０に変化する時に、カウンタの内容を強制的
にカウンタ設定値入力端子７４から入力される予
め定められた値に設定される。比較器７６では閾
値入力端子７７より入力される閾値（実際は０を
使用している。）と前記可逆カウンタ７５から出
力されるカウンタの内容とを比較し、カウンタの
内容が大きい場合に１を出力端子７８を介して外
部に出力する。また第３図の如く第２の閾値発生回路２９を設
け比較回路２８で使用される第３の閾値（TH3）
として、判定回路３２の出力を選択信号とし、該
選択信号が１の時には低いレベルの第３の閾値
（TH3L）を、０の時には高いレベルの第３の閾
値（TH3H）を選択し使用しているが、これは
比較回路２８の出力にヒステリシスを設けること
により音声検出器の過剰なON−OFFを避けるた
めであり、以下に述べる様に語中脱落や語尾切断
が減少する。例えば、第７図ａで示す音声信号（波形１３０
および１３１を含む。）が入力された場合を考え
る。波形１３０が到来し、レベル検出回路２６に
おいて第１および第２の閾値と比較されその結
果、第７図ｂで示す波形１３２の様な出力が累積
回路２７の出力として得られたとすると、比較回
路２８では、累積回路２７の出力波形１３２と変
化波形１３４で示す第３の閾値とが比較される。第７図ｂで示す様に、時刻T₁までは第３の閾
値の変化波形１３４の方が累積回路２７の出力波
形１３２よりも大きいので音声信号は検出されな
いが時刻T₁で後者の方が大きくなるので比較回
路２８の出力波形１３５は１となり、出力保持用
可逆カウンタ３０の内容波形１３７も増加し始め
る。従つて出力端子３３の出力波形１３８も１と
なり音声信号が検出されたことになる。ところで、時刻T₁までは出力端子３３の出力
が０であるので第３の閾値としては高いレベルの
第３の閾値（TH₃H）が選択されていたが、時刻
T₁以後は出力端子３３の出力が１となるので、
第７図ｂの変化波形１３４で示す様に低いレベル
の第３の閾値（TH₃L）が選択される。その後、第７図ａに示す入力波形１３０の振幅
が小さくなり、時刻T₂で累積回路２７の出力波
形１３２が第３の閾値の変化波形１３４より小さ
くなると、比較回路２８の出力波形１３５は第７
図ｃに示す様に０になる。しかしながら、可逆カ
ウンタ３０の内容波形１３７は第７図ｃで示す様
に、すぐには０にならずデータ設定回路３１によ
りハングオーバー用のデータがセツトされるので
第７図ｄに示した端子３３の出力波形にはハング
オーバーが付加される。そこで、ハングオーバーが付加されている時に
入力端子２０に第７図ａで示す波形１３１が到来
すると、第７図ｂで示す様に時刻T₃で累積回路
２７の出力波形１３３が第２の閾値発生回路２９
の出力する第３の閾値を示す変化波形１３４より
大きくなるので、比較回路の２８の出力波形１３
６は、第７図ｃで示す様に１となり、出力保持用
可逆カウンタ３０の内容１３７は再び増加してゆ
く。そして、第７図ｂで示す様に累積回路２７の出
力波形１３３が時刻T₄で第３の閾値を示す変化
波形１３４よりも小さくなると比較回路２８の出
力波形１３６は第７図ｃで示す様に０になり、前
述の様に可逆カウンタ３０にハングオーバー用の
データがセツトされハングオーバーが付加される
ことになる。そして、時刻T₅になるとハングオーバーが終
了するので、出力端子３３の出力波形１３８（第
７図ｄ）が０となり、第３の閾値は再び高いレベ
ルの第３の閾値が選択されることになる。このように、第３の閾値を出力端子３３の出力
によつて高いレベルの第３の閾値（TH3H）と
低いレベルの第３の閾値（TH3L）とを使い分け
ることにより、有音区間（出力端子３３の出力が
１の時）における低いレベルの音声信号（第７図
ａで示す波形１３１）をも検出できる様になり、
語中脱落及び語尾欠落が低減される。また本発明ではTH3L選択回路３４は第８図の
グラフで示す様に、第１の閾値発生回路２５より
出力される第２の閾値（TH2）を入力し、TH2
に応じて予め定められた値を選択しTH3Lとして
第２の閾値発生回路２９へ出力する。それ故、閾値TH2が入力信号に含まれる雑音
レベルに応じて決定される値であるので、TH3L
も入力信号に含まれる雑音レベルに応じて決定さ
れる値となる。尚、第８図で示す様に第２の閾値（TH2）が
小さい程TH3Lとして小さな値を用いている。こ
れは、ハングオーバー付加時において、雑音レベ
ルが小さい程累積値の時間平均値が小さな値とな
るため累積値と第３の閾値（TH3L）との距離を
一定に保つためである。そして、雑音レベルに依存せずこの距離を一定
に保つことにより一定の検出能力が保証されるの
で動作時間率も雑音レベルに依らず一定の値が得
られる。それ故、第３の閾値を有音区間と無音区間とで
使いわけ、有音区間で使用する第３の閾値
（TH3L）を第８図で示す様に雑音レベルに応じ
て可変とすれば、ハングオーバー付加時に生じる
雑音による誤検出の低減が計れ、また、語中脱落
や語尾欠落も低減でき、雑音レベルに依存せず一
定の良好な動作時間率を有する音声検出器が実現
できる。また、本実施例では有音区間で用いる第３の閾
値（TH3L）の選択回路３４の出力信号として、
閾値TH2を用いたが、閾値TH₁や電力計算回路
２４の出力に変更しても、雑音レベルに応じて変
動するTH3Lが得られ、本発明と同じ効果が得ら
れるので本発明に含まれる。尚、本発明で用いる選択回路３４としては、
ROM（Ｒead Ｏnly Ｍemory）だけで構成で
きるので、簡単なハードウエアの追加だけで良
い。さらに、無音区間で用いるTH3HもTH3Lの１
部と考えれば、選択回路３４と第２の閾値発生回
路２９とを１つ選択回路（例えばROM等の使用
により）で実現できるので、ハードウエアの追加
もほとんどいらない。（発明の効果）以上の様に本発明の可変閾値型音声検出回路で
は、PCM符号で信号処理を行なつていることに
より、ハードウエア規模が増大しないこと、雑音
レベルに応じて第１及び第２の閾値を変動させて
いるため雑音に対する免疫性が強いこと、前記閾
値の最大値・最小値を規定することにより受信感
度や感動レベル範囲を任意に設定できること、及
び有音区間と無音区間とで異なるレベルの第３の
閾値を使用していることにより、雑音レベルに依
存せず一定の良好な音声検出特性を実現できる。 DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a voice detection method and circuit for determining the presence or absence of a voice signal. (Prior art and its problems) Voice detection circuits are mainly DSI (Digital Speech
It is an abbreviation for Interpolation. ) is incorporated into the device and is used to determine whether an audio signal is present on the input channel to the DSI device. Regarding DSI equipment, for example, see the document "COMSAT TECHNICAL REVIEW" vol.6 No. published in March 1976.
1, pp. 127-158, "Digital Speech Interpolation", please refer to the paper by SJ Campanella. Conventionally, the level detection method is known as a method with simple hardware scale and clear detection logic, but this method detects the signal energy (power and amplitude) of the input signal and then compares it with a threshold. This is to determine the presence or absence of a signal. Furthermore, among voice detectors that use the level detection method, fixed-threshold voice detectors that compare the amplitude of the input signal with a predetermined threshold are the simplest and most reliable voice detectors in terms of hardware scale. known as. Next, the principle of this fixed threshold type voice detector will be explained with reference to the drawings. FIG. 1 is a block diagram showing the principle of a fixed-threshold voice detector, in which there is a signal input terminal 1, an amplitude threshold input terminal 2, an amplitude comparison circuit 3, an accumulation circuit 4, and input signals of +1 and -1 to the accumulation circuit. increase/decrease control lines 5 and 6, flip-flop 7 for voice detection, flip-flop set/reset control lines 8 and 9 for voice detection, voice detection result output terminal 1
Starting from 0. In this case, the accumulation circuit can be replaced with a reversible counter (up-down counter). In the figure, the input signal input from terminal 1 is sent to the amplitude comparator circuit 3 every sampling period, and the input signal is input from terminal 2 to a predetermined amplitude threshold value (THa).
compared to As a result, if the input signal amplitude is greater than the amplitude threshold, the contents of the accumulator circuit 4 are incremented by one using the accumulator increment control line 5. Also,
Conversely, if the input signal amplitude is less than the amplitude threshold, the contents of the accumulator circuit 4 are decremented by one using the accumulator decrement control line 6. However, the contents of the accumulator circuit are designed not to take a negative value. As audio signals arrive and more inputs exceed the amplitude threshold, the contents of the accumulator circuit increase sequentially. Of course, if an input below the amplitude threshold is added during that time,
The contents of the accumulator circuit are decreased by one. In this way, when the contents of the accumulator circuit reach the preset duration threshold (THt), the voice detection flip-flop 7 is set using the voice detection flip-flop set control line 8, and the voice is detected. The result is output from the terminal 10. Also, when no voice is detected, which is indicated by the content of the accumulator circuit 4 becoming 0, for example, the voice detection flip-flop 7 is reset using the voice detection flip-flop reset control line 9; The result is output from terminal 10, but is generally reset after a certain period of time. This is called a hangover, and is created because the ear is sensitive to disconnections between words or phrases during a call.
It is about 250ms. For further understanding, an explanation will be given by taking as an example a case where a signal indicated by 11 in a of FIG. 2 is input to the fixed threshold type voice detector shown in FIG. 1. FIG. 2 shows an input signal 11, an amplitude threshold 12, an accumulation circuit content 13, a duration threshold 14, and a voice detection result output 15. First, when the input signal 11 is input from the terminal 1, it is compared with the amplitude threshold value 12 by the amplitude comparison circuit 3 every sampling period Ts. As can be seen from Figure 2, the amplitude of the input signal becomes larger than the amplitude threshold only at time t _a1 , so the contents of the accumulation circuit 1
3 starts to become 1 at time t _a1 (Fig. 2b), and thereafter increases by 1 until time t _a2 . As a result, at time _tb1 , the content 13 of the accumulator circuit becomes larger than the duration threshold 14, which means that voice has been detected, and the output 15 becomes 1. By the way, at time t _a3 , the amplitude of the input signal 11 becomes smaller than the amplitude threshold 12, so the content 13 of the accumulation circuit decreases by 1 until time t _b2 ,
Since the duration becomes smaller than the threshold value 14, it is determined that the audio signal has disappeared, and a hangover is added for the above-mentioned reason, and after the hangover ends, the output is 1.
5 becomes 0. T _H in c of FIG. 2 indicates the hangover time. Although the fixed-threshold sound detector described above does have a simple hardware scale, it has the drawback that once the threshold is set, it will detect even noise as long as it is at a level above the threshold. (Object of the Invention) The object of the present invention is to provide a variable threshold type that has a threshold that changes depending on the noise power contained in the input signal, reduces the frequency of false detection of noise, and improves the voice detection ability. The purpose of the present invention is to provide a voice detector. (Structure of the Invention) According to the present invention, the magnitude determination results between the audio signal input at each sample time and the first and second thresholds that vary depending on the silent section noise level of the audio signal are accumulated, The signal level of the input signal is determined by comparing the cumulative value with a third threshold value, and when the determination result changes from a high signal level to a low signal level, a predetermined time period is added to the determination result. In the voice detection method, a predetermined high-level threshold and a low-level threshold determined according to the noise level are prepared as the third threshold, and the voice detection output is determined as a voice. A low level threshold is used to notify the presence of a signal, and a high level threshold is used to perform voice detection when the voice detection output indicates no voice signal, and even in the added sound section, the low level is determined depending on the noise. A voice detection method is obtained which is characterized by using a threshold value of . Further, according to the present invention, there is provided a noise power calculation circuit that calculates a silent section noise level of an input signal inputted at each sample time, and a first and second threshold value that fluctuates according to the output of the noise power calculation circuit. a first threshold generation circuit that generates a first threshold, a level detection circuit that determines the magnitude of the input signal and the first and second thresholds and outputs the result, and an accumulation circuit that accumulates the output of the level detection circuit; The second threshold output from the first threshold generation circuit is input, and one of a plurality of predetermined low-level thresholds is selected and output according to the level of the second threshold. A selection circuit, a threshold value outputted from the selection circuit, and a predetermined high level threshold value are inputted, and when the output of an output holding circuit, which will be described later, indicates the presence of an audio signal, the low level outputted from the selection circuit is inputted. a second threshold value, and when the output of the output holding circuit indicates that there is no audio signal, the predetermined high level threshold value is selected and outputted as a third threshold value;
a threshold generation circuit; a determination circuit that determines the presence or absence of an audio signal by comparing the output of the accumulation circuit with a third threshold output from the second threshold generation circuit; an output holding circuit that adds a predetermined time as a sound interval to the output of the determination circuit when the sound signal changes from presence to absence; A voice detection circuit characterized in that the third threshold value used in the sound interval is varied according to the noise level is obtained. (Embodiment) The present invention has the above-described configuration, and adaptively changes the amplitude thresholds (first and second thresholds) and the third threshold (TH3L) used in the sound section according to the noise level. This improves voice detection ability without increasing false detections due to noise. The present invention will be explained in detail with reference to the drawings.
FIG. 3 shows an embodiment of the present invention, in which the input terminal 2
0, even bit inversion circuit 21, code conversion circuit 2
2. Rectifier circuit 23, power calculation circuit 24, first threshold generation circuit 25, level detection circuit 26, accumulation circuit 27, comparison circuit 28, second threshold generation circuit 29,
It is composed of a reversible counter 30, a counter setting circuit 31, a determination circuit 32, an output terminal 33, and a selection circuit 34. For example, the International Telegraph and Telephone Consultative Committee, (CCITT;
Comite´ C onsultatif I nternational T ´
elegraphaique et T´ele´phonique ), and is non-linearly encoded based on recommendation G.711 from
See pp409-410. ) is input from the input terminal 20 as an example. A-Law, which is usually transmitted over telephone lines.
Since the even-numbered bits of the code signal are inverted when viewed from the MSB (abbreviation for Most Significant Bit ) side, the even-numbered bits of the input signal are inverted by the even-numbered bit inversion circuit 21, and the input signal is converted into a signal before being transmitted. The original signal is returned. The returned A-Law code signal is sent to the code conversion circuit 22, and as shown in FIG. 4, for the positive A-Law code signal, only the MSB,
For negative A-Law code signals, all bits are inverted and an 8-bit two's complement number is used.
It is converted into a code signal and input to the rectifier circuit 23. The rectifier circuit 23 converts this input signal into an absolute value signal (signal representing only the magnitude), one of which is sent to the power calculation circuit 24 and the other to the level detection circuit 26. The power calculation circuit 24 extracts the noise contained in the input signal and calculates the effective value of the noise. Specifically, when no audio is detected (e.g.
(When the output of the comparison circuit 28 described below is 0), all input signals are considered as noise, and even when voice is detected (for example, when the output of the comparison circuit 28 described later is 1), the predetermined A signal below this level is considered to be noise, and this noise is input to a low-pass filter to calculate the effective value of the noise, and the result is sent to the first threshold generation circuit 25. Therefore, the audio signals that are excluded when calculating the effective value of noise are those signals in which the output of a comparison circuit, which will be described later, is 1 and which has a signal level equal to or higher than a predetermined level. First threshold generation circuit 25
Now, by multiplying the output from the power calculation circuit 24 by a constant, the first threshold (TH1) used in the level detection circuit 26 and the second threshold (TH2) are set at a location 6 dB higher than the first threshold. is set and sent to the level detection circuit 26. The level detection circuit 26 uses the output of the rectification circuit 23 and the first
The threshold value and the second threshold value are compared, and since there is a high probability that the input signal is an audio signal where the output of the rectifier circuit is larger than the second threshold value, +2 is added between the first threshold value and the second threshold value. , the probability that the input signal is a voice signal and the probability that it is noise are approximately equal, or the former is slightly higher, so
If it is smaller than the first threshold, there is a high probability that the input signal is noise, so -1 is output. Accumulation circuit 2
7, the output of the level detection circuit 26 is accumulated and the accumulated value is sent to the comparison circuit 28. The comparison circuit 28 compares a third threshold (TH3) output from a second threshold generation circuit 29, which will be described later, with the cumulative value, and if the latter is larger than the former, the input signal is an audio signal. It is determined that +1 is given, and
If the former is larger than the latter, the input is determined to be noise and 0 is output. The selection circuit 34 selects and outputs one of a plurality of predetermined low-level thresholds according to the level of the second threshold (TH2) output from the first threshold generation circuit described above. do. The second threshold generation circuit 29 generates a low level threshold (TH3L) output from the selection circuit and a predetermined high level threshold (TH3H).
is input, and when the output of the determination circuit 32 described later is 0 (when there is no sound), the predetermined high level threshold is selected, and the determination circuit 32 described later
When the output is 1 (when there is a sound), the low level threshold outputted from the selection circuit is selected and outputted as the third threshold used by the comparison circuit 28. The reversible counter 30 inputs the output of the comparison circuit 28, and when the input signal is 1, the contents of the counter are increased by 1, and when the input signal is 0, the contents of the counter are decreased by 1, and the output of the comparison circuit is accumulated. ing. Further, the counter setting circuit 31 monitors the output of the comparison circuit, detects the point in time when the output changes from 1 to 0, and sets the contents of the reversible counter 30 to a predetermined value at that point. . In the determination circuit 32, the contents of the reversible counter 30 are set to a predetermined value (usually 0 is used).
1 if an audio signal is detected.
is output to the outside via the output terminal 33. Of course, if it is small, 0 is output, but doing so will also result in the above-mentioned hangover. The circuit shown in FIG. 5 can be used as the power calculation circuit 24 and the first threshold generation circuit 25 in FIG. unit 53, OR circuit 54, multipliers 55, 56,
57, 58, multiplicand input terminals 59, 60, 61,
62, 63, a multiplicand selector 64, an adder 65, limiters 66, 67, a memory 68, a first threshold output terminal 69, and a second threshold output terminal 70. Absolute value input signal is input terminal 50
One is sent to the multiplier 55 and the other is sent to the comparator 53. The comparator 53 compares the input signal with the noise judgment level input from the input terminal 51, and outputs 0 if the former is larger than the latter, and +1 if it is smaller.
Then, the output signal of the comparator 53 and the inverted signal of the output signal from the comparison circuit 28 are logically summed, and when at least one of them is +1, the output signal is +1.
is output, and serves as a control signal for the multiplier 56 and a selection control signal for the multiplicand selector 64. In the multiplicand selector 64, when the selection control signal is +1, the multiplicand input from the multiplicand input terminal 59 is selected, and when it is 0, the multiplicand input from the multiplicand input terminal 60 is selected (currently 0 is used). ) is selected and becomes the multiplicand of the multiplier 55. Further, in the multiplier 55, the product of the absolute value input signal and the multiplicand selected as described above is calculated and sent to the adder 65. On the other hand, multiplier 56 multiplies the multiplicand input from multiplicand input terminal 61 and the contents of memory 68 and sends the product to adder 65 . However, when the output of the OR circuit 54 is 0, this multiplication is not performed and the contents of the memory 68 are output as they are. Then, in the adder 65, the multiplier 55 described above
The output of the multiplier 56 is added to the output of the multiplier 56, and the result is stored in the memory 68 via the limiter 66. At the same time, limiter 66
The output of is sent to the multiplicand input terminal 62 by the multiplier 57.
The multiplicand inputted from the multiplicand is multiplied by the multiplicand, which becomes equal to the effective value (σ) of the noise, and is outputted from the output terminal 69 via the limiter 67 as the first threshold value (TH1). Further, the output of the limiter 67 is multiplied by the multiplicand (currently 2 is used) inputted from the multiplicand input terminal 63 in a multiplier 58, and the product is determined as a second threshold value (TH2) at the output terminal 70. It is output from Here, the limiters 66 and 67 are used to quickly adjust the threshold value by limiting the contents of the memory 68 and the variable range of the threshold value (TH1), and to limit the reception sensitivity and emotional level range of the audio detector. This is to ensure immunity against noise. Note that the power calculation circuit 24 calculates the noise level by passing the absolute value signal through a first-order low-pass filter as described above, but this is because the amplitude distribution is
The power P obtained by taking the absolute value of noise with a Gaussian distribution and a variance of σ ² and passing it through a first-order low-pass filter is approximately equal to the standard deviation (also called the effective value, with σ This is because the value is proportional to 5, which is expressed as 5. Here, ∫ ^∞ ₀ x exp{−x ² /(2σ ² )}dx (If x ² =y, 2xdx=dy) ∫ ^∞ ₀ 1/2exp{−y/(2σ ² )}dy = σ ² Therefore, equation (1) becomes as follows. Therefore, by performing the above processing, a value approximately proportional to the standard deviation σ of the noise amplitude can be obtained at the output of the first-order low-pass filter, and if this value is multiplied by a constant, the standard deviation of the noise amplitude (the effective value σ ) will be obtained. The reversible counter 30, counter setting circuit 31, and determination circuit 32 used in FIG.
The circuit shown in the figure can be used, including an input terminal 71, a one-sample delay circuit 72, an AND circuit 73, a counter setting value input terminal 74, a reversible counter 75, and a comparison circuit 7.
6, a threshold input terminal 77 and an output terminal 78, and 30, 31, and 32 surrounded by broken lines indicate a reversible counter, a counter setting circuit, and a determination circuit shown in FIG. 3, respectively. One side of the input signal input from the input terminal 71 is sent to the reversible counter 75.
and the other one is sent to a one-sample delay circuit 72 and an AND circuit 73. The AND circuit 73 multiplies the signal obtained by inverting the current input signal and the input signal one sample time ago.
The result is sent to the reversible counter 75. The reversible counter 75 increases the contents of the counter by 1 when the input signal is 1, and decreases the contents of the counter by 1 when the input signal is 0. 1
When the value changes from 0 to 0, the contents of the counter are forcibly set to a predetermined value input from the counter setting value input terminal 74. The comparator 76 compares the threshold value input from the threshold input terminal 77 (actually, 0 is used) with the content of the counter output from the reversible counter 75, and outputs 1 if the content of the counter is larger. It is output to the outside via the terminal 78. Further, as shown in FIG. 3, a second threshold generation circuit 29 is provided to generate a third threshold (TH3) used in the comparison circuit 28.
As such, the output of the determination circuit 32 is used as a selection signal, and when the selection signal is 1, a low level third threshold (TH3L) is selected, and when it is 0, a high level third threshold (TH3H) is selected and used. However, this is to avoid excessive ON/OFF of the speech detector by providing hysteresis in the output of the comparator circuit 28, and as will be described below, drop-offs and endings of words are reduced. For example, the audio signal (waveform 130
and 131. ) is input. If a waveform 130 arrives and is compared with the first and second threshold values in the level detection circuit 26, and as a result, an output like the waveform 132 shown in FIG. 7b is obtained as the output of the accumulation circuit 27, the comparison circuit At step 28, the output waveform 132 of the accumulation circuit 27 is compared with a third threshold value indicated by the changing waveform 134. As shown in FIG. 7b, until time T ₁ , the third threshold change waveform 134 is larger than the output waveform 132 of the accumulator 27, so no audio signal is detected, but at time T ₁ , the latter is larger. Since the value increases, the output waveform 135 of the comparison circuit 28 becomes 1, and the content waveform 137 of the reversible output holding counter 30 also begins to increase. Therefore, the output waveform 138 of the output terminal 33 also becomes 1, indicating that an audio signal has been detected. By the way, since the output of the output terminal 33 is 0 until time T ₁ , a high-level third threshold (TH ₃ H) is selected as the third threshold;
After T ₁ , the output of the output terminal 33 becomes 1, so
A third threshold (TH ₃ L) at a lower level is selected as shown by the changing waveform 134 in FIG. 7b. Thereafter, the amplitude of the input waveform 130 shown in _FIG . 7
It becomes 0 as shown in Figure c. However, as shown in FIG. 7c, the content waveform 137 of the reversible counter 30 does not immediately become 0, and data for hangover is set by the data setting circuit 31. A hangover is added to the output waveform. Therefore, when the waveform 131 shown in FIG. 7a arrives at the input terminal 20 when a hangover is added, the output waveform 133 of the accumulation circuit 27 reaches the second threshold value at time T ₃ as shown in FIG. 7b. Generation circuit 29
28 output waveform 13 of the comparator circuit is larger than the change waveform 134 indicating the third threshold outputted by
6 becomes 1 as shown in FIG. 7c, and the content 137 of the output holding reversible counter 30 increases again. Then, as shown in FIG. 7b, when the output waveform 133 of the accumulator circuit 27 becomes smaller than the changing waveform 134 indicating the third threshold at time _T4 , the output waveform 136 of the comparator circuit 28 becomes as shown in FIG. 7c. becomes 0, data for hangover is set in the reversible counter 30 as described above, and hangover is added. Then, at time _T5 , the hangover ends, so the output waveform 138 of the output terminal 33 (FIG. 7d) becomes 0, and the third threshold, which is at a high level, is selected again. Become. In this way, by selectively using the third threshold value as a high level third threshold value (TH3H) and a low level third threshold value (TH3L) depending on the output of the output terminal 33, the sound interval (output terminal When the output of 33 is 1), it becomes possible to detect low level audio signals (waveform 131 shown in FIG. 7a).
Word dropouts and word endings are reduced. Further, in the present invention, the TH3L selection circuit 34 inputs the second threshold value (TH2) output from the first threshold value generation circuit 25, as shown in the graph of FIG.
A predetermined value is selected according to the threshold value and outputted to the second threshold generation circuit 29 as TH3L. Therefore, since the threshold TH2 is a value determined according to the noise level included in the input signal, TH3L
is also a value determined according to the noise level included in the input signal. Note that, as shown in FIG. 8, the smaller the second threshold value (TH2), the smaller the value used as TH3L. This is because when a hangover is added, the smaller the noise level is, the smaller the time average value of the cumulative value becomes, so the distance between the cumulative value and the third threshold (TH3L) is kept constant. By keeping this distance constant regardless of the noise level, a constant detection ability is guaranteed, and therefore a constant operating time rate can be obtained regardless of the noise level. Therefore, if the third threshold value is used differently in the sound section and the silent section, and the third threshold value (TH3L) used in the sound section is made variable according to the noise level as shown in Fig. 8, then It is possible to reduce erroneous detection due to noise that occurs when a hangover is added, and also to reduce dropouts in the middle of words and dropouts at the end of words, and it is possible to realize a speech detector that has a constant and good operating time rate independent of the noise level. In addition, in this embodiment, as the output signal of the selection circuit 34 of the third threshold value (TH3L) used in the sound section,
Although the threshold value TH2 is used, even if the threshold value _TH1 or the output of the power calculation circuit 24 is changed, a TH3L that varies depending on the noise level can be obtained, and the same effect as the present invention can be obtained, and therefore it is included in the present invention. Note that the selection circuit 34 used in the present invention includes:
Since it can be configured using only ROM ( Read Only Memory ), only a simple addition of hardware is required. Furthermore, TH3H used in the silent section is also 1 of TH3L.
Considering the selection circuit 34 and the second threshold generation circuit 29 as a single selection circuit (for example, by using a ROM or the like), almost no additional hardware is required. (Effects of the Invention) As described above, in the variable threshold type voice detection circuit of the present invention, by performing signal processing using PCM codes, the hardware size does not increase, and the first and second Since the threshold value of 2 is varied, immunity to noise is strong. By specifying the maximum and minimum values of the threshold value, the reception sensitivity and emotional level range can be set arbitrarily. By using the third threshold values at different levels, it is possible to achieve constant good voice detection characteristics independent of the noise level.

[Brief explanation of drawings]

第１図は従来の音声検出器を示すブロツク図、
第２図は第１図の各部の波形を示す図、第３図は
本発明の音声検出器を示すブロツク図、第４図は
符号変換方法を示す図、第５図、第６図は第３図
の１部の構成要素の詳細を示すブロツク図、第７
図は本発明の音声検出回路の動作を説明するため
の図、第８図は本発明における第２の閾値と第３
の閾値との関係を示す図である。図において、２０は入力端子、２１は偶数ビツ
ト反転回路、２２は符号変換回路、２３は整流回
路、２４は電力計算回路、２５は第１の閾値発生
回路、２６はレベル検出回路、２７は累積回路、
２８は比較回路、２９は第２の閾値発生回路、３
０は可逆カウンタ、３１はカウンタ設定回路、３
２は判定回路、３３は出力端子、３４は選択回路
である。 Figure 1 is a block diagram showing a conventional voice detector.
2 is a diagram showing the waveforms of each part in FIG. 1, FIG. 3 is a block diagram showing the speech detector of the present invention, FIG. 4 is a diagram showing the code conversion method, and FIGS. Block diagram showing details of a part of the components in Fig. 3, No. 7
The figure is a diagram for explaining the operation of the voice detection circuit of the present invention, and Figure 8 shows the second and third threshold values in the present invention.
FIG. In the figure, 20 is an input terminal, 21 is an even bit inversion circuit, 22 is a code conversion circuit, 23 is a rectifier circuit, 24 is a power calculation circuit, 25 is a first threshold generation circuit, 26 is a level detection circuit, and 27 is an accumulation circuit. circuit,
28 is a comparison circuit, 29 is a second threshold generation circuit, 3
0 is a reversible counter, 31 is a counter setting circuit, 3
2 is a determination circuit, 33 is an output terminal, and 34 is a selection circuit.

Claims

[Claims] 1. A first signal that varies depending on the audio signal input at each sample time and the noise level of the silent section of the audio signal.
and a second threshold value, the result of the magnitude determination is digitized and accumulated, and the cumulative value is compared with the third threshold value to determine the presence or absence of an audio signal, and when the determination result changes from the presence of an audio signal to the absence of an audio signal. In a voice detection method for obtaining a voice detection output by adding a predetermined time to the determination result as a sound interval, the third threshold is determined according to a high level predetermined threshold and a noise level. A low level threshold is prepared, and the low level threshold is used when the voice detection output indicates the presence of a voice signal, and the high level threshold is used when the voice detection output indicates the absence of a voice signal. A voice detection method characterized in that a low level threshold determined according to noise is used even in an added voiced section. 2. A noise power calculation circuit that calculates the noise level of the silent period specified by the output of the below-mentioned determination circuit of the input signal input at each sample time, and a first threshold generation circuit that generates a second threshold, a level detection circuit that determines the magnitude of the input signal and the first and second thresholds, converts the result into a numerical value, and outputs the result; and an output of the level detection circuit. an accumulation circuit that accumulates the threshold value, and the second threshold value outputted from the first threshold value generation circuit, and selects one from among a plurality of predetermined low-level threshold values according to the second threshold level. A selection circuit that selects and outputs one, a threshold output from the selection circuit and a predetermined high level threshold are input, and when the output of an output holding circuit, which will be described later, indicates the presence of an audio signal, the selection circuit outputs an audio signal. a second threshold generation circuit that selects a low level threshold to be set, and selects the predetermined high level threshold and outputs it as a third threshold when the output of the output holding circuit indicates that there is no audio signal; a determination circuit that determines the presence or absence of an audio signal by comparing the output of the accumulation circuit and a third threshold output from the second threshold generation circuit; and an output of the determination circuit that changes from presence of audio signal to absence of audio signal. an output holding circuit that adds a predetermined time as a sound interval to the output of the determination circuit when the change occurs; A voice detection circuit characterized in that a threshold value of No. 3 is varied according to a noise level.