JPH0146079B2

JPH0146079B2 -

Info

Publication number: JPH0146079B2
Application number: JP57229278A
Authority: JP
Inventors: Yasuo Sato; Takayuki Fujimoto; Tadayasu Sugita
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1982-12-29
Filing date: 1982-12-29
Publication date: 1989-10-05
Also published as: JPS59123896A

Description

【発明の詳細な説明】 (A) 発明の技術分野本発明は音声認識装置、特に帯域フイルタ群を
用いて入力音声の周波数分析を行い、単音節また
は単語等の音声認識を行う音声認識装置におい
て、音声認識率を低下させることなく、照合すべ
き特徴パラメータ時系列のパラメータ量を削減
し、かつ分析ハードウエア量の削減を可能とした
音声認識装置に関するものである。[Detailed Description of the Invention] (A) Technical Field of the Invention The present invention relates to a speech recognition device, particularly a speech recognition device that performs frequency analysis of input speech using a group of bandpass filters and recognizes speech such as monosyllables or words. The present invention relates to a speech recognition device that can reduce the amount of parameters in a time series of feature parameters to be compared and reduce the amount of analysis hardware without reducing the speech recognition rate.

(B) 技術の背景と問題点音声認識方式として、広帯域の音声周波数分析
を行うため、多数チヤネルの帯域通過フイルタを
使用し、各フイルタの出力を整流積分等によつて
帯域別スペクトル電力に変換し、それらを対数変
換して帯域別対数スペクトル電力を求め、スペク
トルの正規化のため、全チヤネルの平均値が零と
なるように帯域別対数スペクトル電力を変換した
後、正規化されたすべての帯域別対数スペクトル
電力を照合用特徴パラメータ時系列として使用
し、予め辞書に登録された標準特徴パラメータ時
系列と、例えばダイナミツクプログラミング
（DP）マツチング法等により照合して、単音節ま
たは単語等の音声認識を行う方式が知られてい
る。(B) Technical background and issues As a speech recognition method, in order to perform wideband speech frequency analysis, bandpass filters with multiple channels are used, and the output of each filter is converted to band-specific spectral power by rectification and integration. Then, logarithmically transform them to obtain the logarithmic spectral power for each band. To normalize the spectrum, transform the logarithmic spectral power for each band so that the average value of all channels is zero, and then calculate all normalized log spectral powers. The logarithmic spectral power for each band is used as a feature parameter time series for matching, and it is matched with a standard feature parameter time series registered in a dictionary in advance using, for example, a dynamic programming (DP) matching method, to identify monosyllables, words, etc. A method for performing voice recognition is known.

上記音声認識方式において、音声の認識率を高
めるためには、帯域フイルタの数、すなわちチヤ
ネル数を多くする必要がある。しかし、チヤネル
数を増加させると、音声周波数を分析するための
ハードウエア量が多く必要になるだけでなく、特
徴パラメータの要素が増えることから、照合に用
いるメモリ量が多く必要になり、また辞書に格納
する標準特徴パラメータ時系列の格納領域も多く
必要になる。さらに、照合のための演算処理時間
も多くかかることになる。 In the above speech recognition method, in order to increase the speech recognition rate, it is necessary to increase the number of band filters, that is, the number of channels. However, increasing the number of channels not only requires a large amount of hardware to analyze audio frequencies, but also increases the number of feature parameter elements, which requires a large amount of memory for matching. A large amount of storage space is also required for the standard feature parameter time series stored in the . Furthermore, it takes a lot of time to process the computation for verification.

しかし、チヤネル数を減らせば、必要とするメ
モリ量等を少なくすることができるが、音声認識
率が劣化することになる。 However, if the number of channels is reduced, the amount of memory required can be reduced, but the speech recognition rate will deteriorate.

ところで、本発明者等は、本発明の完成に先立
つて、多くの実験・研究を積み重ねた結果、音声
認識における次のような特性を発見した。音声周
波数分析は、高周波数帯域部分も含めて、広帯域
にわたつて行つたほうが良好な結果が得られる
が、特に高周波数帯域部分については、各サンプ
リングごとのパワースペクトルの相対的な音声エ
ネルギー量が重要であり、例えば、そのパワース
ペクトルのピークが、5KHzの周波数部分にある
か、7KHzの周波数部分にあるかは、音声認識上
それ程重要ではないということである。これは、
人間の耳では、おそらく高周波数帯域における周
波数のわずかな違いは、認識が困難であるためと
考えられる。 By the way, the inventors of the present invention have accumulated many experiments and researches prior to completing the present invention, and have discovered the following characteristics in speech recognition. Better results can be obtained when audio frequency analysis is performed over a wide band, including the high frequency band, but especially for the high frequency band, the relative amount of audio energy in the power spectrum for each sampling is For example, whether the peak of the power spectrum is in the 5KHz frequency region or the 7KHz frequency region is not that important for speech recognition. this is,
This is probably because the human ear has difficulty perceiving slight differences in frequency in high frequency bands.

そこで本発明者等は、高周波数帯域部分も含め
た複数の帯域フイルタで分析したパラメータを正
規化した後、高域部分の複数チヤネルのパラメー
タを除去するようにして、音声認識率が変化する
かどうかを実験してみたところ、特徴パラメータ
時系列として高域部分も含めた全チヤネルについ
てのパラメータを用いた場合に比べて、認識率が
低下しないことが判明した。一方、高周波数帯域
部分を始めから正規化の条件に加えない場合に
は、音声認識率が低下することが判明した。 Therefore, the present inventors normalized the parameters analyzed using multiple band filters, including the high frequency band, and then removed the parameters of multiple channels in the high frequency band, and investigated whether the speech recognition rate would change. When we conducted an experiment, we found that the recognition rate did not decrease compared to when parameters for all channels, including the high frequency part, were used as the feature parameter time series. On the other hand, it has been found that if the high frequency band portion is not added to the normalization conditions from the beginning, the speech recognition rate decreases.

(C) 発明の目的と構成本発明は上記の点に鑑み、従来方式の改善を図
り、音声認識率を低下させることなく、照合すべ
き特徴パラメータ量を減少させて、メモリ量等の
削減を可能とするとともに、スペクトル分析のた
めのハードウエア量を削減することを目的として
いる。換言すれば、従来と同じ特徴パラメータ量
であれば、音声の認識率がさらに向上するように
することを目的としている。そのため、本発明の
音声認識装置は、音声を周波数分析して得られる
特徴パラメータ時系列の照合を行い音声を認識す
る音声認識装置において、広帯域の音声周波数帯
域にわたつてスペクトル分析を行うスペクトル分
析部と、そのスペクトル分析部による分析結果の
高周波数帯域部分をその帯域幅に応じて重み付け
する重み付け手段およびその重み付け結果を含む
上記分析結果の平均値を算出する平均値算出手段
を有し、その平均値に基づいて、高周波数帯域部
分を除くスペクトルの正規化を行うスペクトル正
規化部とを備え、照合用特徴パラメータ時系列と
して上記高周波数帯域部分を除去した正規化スペ
クトルのパラメータ時系列を用いるようにしたこ
とを特徴としている。以下、図面を参照しつつ実
施例に従つて説明する。(C) Object and structure of the invention In view of the above points, the present invention aims to improve the conventional method, and reduces the amount of feature parameters to be matched without reducing the speech recognition rate, thereby reducing the amount of memory, etc. The aim is to reduce the amount of hardware required for spectrum analysis. In other words, the objective is to further improve the speech recognition rate with the same amount of feature parameters as before. Therefore, the speech recognition device of the present invention is a speech recognition device that recognizes speech by collating feature parameter time series obtained by frequency analysis of speech. and weighting means for weighting the high frequency band portion of the analysis result by the spectrum analysis section according to its bandwidth, and average value calculation means for calculating the average value of the analysis results including the weighted results, and and a spectrum normalization unit that normalizes the spectrum excluding the high frequency band portion based on the value, and uses the parameter time series of the normalized spectrum from which the high frequency band portion is removed as the feature parameter time series for matching. It is characterized by the fact that Hereinafter, embodiments will be described with reference to the drawings.

(D) 発明の実施例図は本発明の一実施例構成を示す。(D) Examples of the invention The figure shows the configuration of an embodiment of the present invention.

図中、１は音声入力部、２はパラメータ抽出
部、３はスペクトル分析部、４−１ないし４−ｎ
は帯域通過フイルタ、５−１ないし５−ｎは整流
器、６−１ないし６−ｎはアナログ・デイジタル
変換回路、７はスペクトル正規化部、８−１ない
し８−ｎは対数変換部、９は定数記憶部、１０は
乗算器、１１は平均値算出部、１２−１ないし１
２−（ｎ−１）は減算器、１３は音声認識部、１
４は辞書を表わす。 In the figure, 1 is an audio input section, 2 is a parameter extraction section, 3 is a spectrum analysis section, 4-1 to 4-n
is a band pass filter, 5-1 to 5-n are rectifiers, 6-1 to 6-n are analog-to-digital conversion circuits, 7 is a spectrum normalization section, 8-1 to 8-n are logarithmic conversion sections, and 9 is a Constant storage unit, 10 is a multiplier, 11 is an average value calculation unit, 12-1 to 1
2-(n-1) is a subtracter, 13 is a speech recognition unit, 1
4 represents a dictionary.

音声入力部１から入力された単音節または単語
からなる音声のアナログ信号は、パラメータ抽出
部２に入力される。パラメータ抽出部２は、音声
アナログ信号の周波数分析を行い、認識すべき入
力音声の特徴パラメータ時系列を抽出生成するも
のである。そのため、パラメータ抽出部２は、広
帯域の音声周波数帯域にわたつてスペクトル分析
を行うスペクトル分析部３と、スペクトル分析部
３の出力を、高周波数帯域部分について重み付け
して正規化し、高周波数帯域部分を除く正規化ス
ペクトルを照合用の特徴パラメータP₁，P₂，…
P_o-1として出力するスペクトル正規化部７とを有
している。 A voice analog signal consisting of a single syllable or word inputted from the voice input section 1 is inputted to the parameter extraction section 2 . The parameter extraction unit 2 performs frequency analysis of the audio analog signal and extracts and generates a time series of characteristic parameters of the input audio to be recognized. Therefore, the parameter extraction unit 2 weights and normalizes the outputs of the spectrum analysis unit 3 that performs spectrum analysis over a wide audio frequency band and the spectrum analysis unit 3 with respect to the high frequency band portion. Feature parameters for matching the normalized spectra excluding P ₁ , P ₂ ,...
It has a spectrum normalization unit 7 that outputs as P _o-1 .

スペクトル分析部３は、帯域別に複数（ｎ個）
の帯域通過フイルタ４−１〜４−ｎを有してい
る。図において、上部の帯域通過フイルタ４−１
から順に下位に向うに従つて、通過周波数が高く
なつている。帯域通過フイルタ４−１〜４−ｎ
は、例えば隣接する帯域通過フイルタの3dBの減
衰点が一致するように配置され、例えば180Hzか
ら7.8KHzまでの広帯域にわたつてカバーするよ
うにされる。特に、帯域通過フイルタ４−１から
４−（ｎ−１）までは、例えば帯域幅が170Hzない
し620Hz程度に定められるが、最高周波帯域の帯
域通過フイルタ４−ｎは、例えば3KHzというよ
うな広い帯域特性をもつようにされている。 The spectrum analysis section 3 includes a plurality of (n pieces) for each band.
It has band pass filters 4-1 to 4-n. In the figure, the upper bandpass filter 4-1
The passing frequency increases as one goes to the lowest level. Bandpass filters 4-1 to 4-n
are arranged so that, for example, the 3 dB attenuation points of adjacent band pass filters coincide, and cover a wide band from, for example, 180 Hz to 7.8 KHz. In particular, the bandpass filters 4-1 to 4-(n-1) are set to have a bandwidth of, for example, 170Hz to 620Hz, but the bandpass filter 4-n, which has the highest frequency band, has a wide band width of, for example, 3KHz. It is designed to have band characteristics.

音声入力部１からの音声信号は、帯域通過フイ
ルタ４−１〜４−ｎによつて帯域別にろ波され、
それぞれ整流器５−１〜５−ｎに入力される。各
整流器５−１〜５−ｎは、例えば10msの整流積
分時定数でもつて、入力信号の整流平滑化を行
う。整流器５−１〜５−ｎの出力は、アナログ・
デイジタル変換器６−１〜６−ｎに入力され、帯
域別スペクトル電力をデイジタル量として表わし
たものが求められる。変換結果は、スペクトル正
規化部７へ出力される。 The audio signal from the audio input section 1 is filtered by band by bandpass filters 4-1 to 4-n,
The signals are respectively input to rectifiers 5-1 to 5-n. Each of the rectifiers 5-1 to 5-n performs rectification and smoothing of the input signal with a rectification and integration time constant of, for example, 10 ms. The outputs of the rectifiers 5-1 to 5-n are analog
The signal is input to digital converters 6-1 to 6-n, and the spectrum power for each band is expressed as a digital quantity. The conversion result is output to the spectrum normalization section 7.

スペクトル正規化部７に入力された帯域別スペ
クトル電力は、対数変換部８−１〜８−ｎによつ
て、人間が感じる音の強弱に出力値が比例するよ
う対数変換されて、帯域別対数スペクトル電力が
求められる。次に、この帯域別対数スペクトル電
力について、入力音声が大きな声であつても、小
さな声であつても同じ特徴パラメータとして表わ
れるようにするために、以下のような変換が行わ
れる。 The spectrum power for each band inputted to the spectrum normalization unit 7 is logarithmically transformed by the logarithmic conversion units 8-1 to 8-n so that the output value is proportional to the strength of the sound perceived by humans, and is converted into a logarithm for each band. The spectral power is determined. Next, in order to make the input voice appear as the same feature parameter regardless of whether the input voice is a loud voice or a soft voice, the following conversion is performed on the logarithmic spectral power for each band.

まず、対数変換部８−ｎの出力値、すなわち最
高周波数帯域の対数スペクトル電力に、予め定数
記憶部９に格納された重み付け定数を、乗算器１
０によつて、掛け合わせる。これは、上述の如
く、帯域通過フイルタ４−ｎについては、他の帯
域通過フイルタ４−１〜４−（ｎ−１）よりも広
い帯域幅をもつようにしているため、１チヤネル
でもつて複数チヤネル分のウエイトを持つからで
ある。もし、該最高周波数帯域の１チヤネルが、
低域における３チヤネル分の帯域幅に相当する場
合には、重み付け定数として「３」が定数記憶部
９に格納され、乗算器１０によつて、対数変換部
８−ｎの出力値が３倍されることになる。 First, a weighting constant stored in the constant storage section 9 in advance is applied to the output value of the logarithmic conversion section 8-n, that is, the logarithmic spectral power of the highest frequency band.
Multiply by 0. This is because, as mentioned above, the bandpass filter 4-n has a wider bandwidth than the other bandpass filters 4-1 to 4-(n-1). This is because it has the weight of the channel. If one channel of the highest frequency band is
When the bandwidth corresponds to three channels in the low frequency band, "3" is stored as a weighting constant in the constant storage section 9, and the output value of the logarithmic conversion section 8-n is tripled by the multiplier 10. will be done.

平均値算出部１１は、上記重み付けが考慮され
た帯域別対数スペクトル電力についての平均値を
算出する。例えば各対数変換部８−１〜８−ｎの
出力値が、それぞれP′₁，P′₂，…，P′_o-1，P′_oで
あり、重み付け定数がωであるとすると、平均値
Ｐは次のようになる。 The average value calculation unit 11 calculates an average value of the logarithmic spectral power by band in which the weighting described above is taken into consideration. For example, if the output values of each logarithmic conversion unit 8-1 to 8-n are P' ₁ , P' ₂ , ..., P' _o-1 , P' _o , and the weighting constant is ω, then the average The value P is as follows.

減算器１２−１〜１２−（ｎ−１）は、対数変
換部８−１〜８−（ｎ−１）に対応して設けられ
る。すなわち、対数変換部８−ｎに対応する減算
器は設けられず、帯域別対数スペクトル電力P′_o
は、平均値の算出のためにだけ用いられ、平均値
の算出後は除去される。減算器１２−１〜１２−
（ｎ−１）は、各帯域別対数スペクトル電力P′₁，
P′₂，…，P′_o-1から、平均値算出部１１の出力
の減算を行う。すなわち、減算器１２−１〜１２
−（ｎ−１）の出力P_iは、各々次のようになる。 The subtracters 12-1 to 12-(n-1) are provided corresponding to the logarithmic conversion units 8-1 to 8-(n-1). That is, a subtracter corresponding to the logarithmic conversion section 8-n is not provided, and the logarithmic spectral power P′ _o for each band is
is used only for calculating the average value, and is removed after calculating the average value. Subtractors 12-1 to 12-
(n-1) is the logarithmic spectral power P′ ₁ for each band,
The output of the average value calculation unit 11 is subtracted from P′ ₂ , . . . , P′ _o-1 . That is, subtracters 12-1 to 12
The outputs P _i of -(n-1) are each as follows.

P_i＝P′_i−（ｉ＝１，２，…，ｎ−１）この減算器１２−１〜１２−（ｎ−１）の出力
P_iは、照合用特徴パラメータとして、音声認識部
１３に出力される。 P _i = P' _i - (i=1, 2,..., n-1) Output of this subtractor 12-1 to 12-(n-1)
P _i is output to the speech recognition unit 13 as a matching feature parameter.

音声認識部１３は、（ｎ−１）個の特徴パラメ
ータの組からなる特徴パラメータ時系列によつ
て、予め辞書１０に登録された標準特徴パラメー
タ時系列と、例えばDPマツチング法により照合
することにより入力音声の認識を行う。すなわ
ち、簡単に言えば時間軸の正規化を行い、対応す
る時点におけるｍ個の入力特徴パラメータP_iと標
準特徴パラメータP′_iとの距離（P_i−P′_i）²をｉ＝
１からｉ＝ｍまで加算し、これを一連の時系列に
ついて加えた結果が最小になる標準特徴パラメー
タに対応する単音節または単語を認識結果とす
る。 The speech recognition unit 13 collates the feature parameter time series consisting of a set of (n-1) feature parameters with the standard feature parameter time series registered in advance in the dictionary 10 by, for example, the DP matching method. Performs input voice recognition. That is, to put it simply, the time axis is normalized, and the distance (P _i - P' _i ) ² between the m input feature parameters P _i and the standard feature parameter P' _i at the corresponding time point is defined as i=
1 to i=m, and the monosyllable or word corresponding to the standard feature parameter for which the result of adding this for a series of time series is the minimum is set as the recognition result.

本発明者等は、本発明の効果を試験するため
に、次のような実験を行つた。まず、第１チヤネ
ルから第19チヤネルまで、180KHzから7.8KHzま
での帯域をカバーする19個の帯域フイルタを用意
した。特に第17チヤネル、第18チヤネル、第19チ
ヤネルの帯域フイルタの特性を記すと、それぞれ
中心周波数は5145Hz、5910Hz、7020Hz、下限周波
数は4800Hz、5514Hz、6334Hz、上限周波数は5514
Hz、6334Hz、7800Hzであり、帯域幅はそれぞれ
714Hz、820Hz、1466Hzである。そして、従来方式
により、この全チヤネルの帯域別対数スペクトル
電力を正規化した19個の正規化スペクトル電力を
特徴パラメータとして音声認識を行つた。 The present inventors conducted the following experiment in order to test the effects of the present invention. First, we prepared 19 band filters covering the band from 180KHz to 7.8KHz from the 1st channel to the 19th channel. In particular, the characteristics of the band filters for the 17th, 18th, and 19th channels are as follows: The center frequencies are 5145Hz, 5910Hz, and 7020Hz, the lower limit frequencies are 4800Hz, 5514Hz, and 6334Hz, and the upper limit frequency is 5514Hz.
Hz, 6334Hz, 7800Hz, and the bandwidth is respectively
They are 714Hz, 820Hz, and 1466Hz. Then, using the conventional method, speech recognition was performed using 19 normalized spectral powers obtained by normalizing the band-specific logarithmic spectral powers of all channels as feature parameters.

次に、第１チヤネルから第16チヤネルまでは、
上述のものと同じ帯域フイルタを用意し、第17チ
ヤネルから第19チヤネルまでの帯域フイルタに替
えて、下限周波数が4800Hz、上限周波数が7.8K
Hzの帯域フイルタを用い、上記実施例で説明した
如く、17個のチヤネルによつてスペクトル分析を
行い、重み付け定数を「３」として、第17チヤネ
ルの出力値の重み付けを行つて平均値を算出し、
そのうえで、第17チヤネルの出力値を除いた16個
の帯域別対数スペクトルから平均値を減算し、結
果を特徴パラメータとした。この16個の特徴パラ
メータに基づいて、新たに作成し直した16個の特
徴パラメータの組からなる標準特徴パラメータと
照合して音声認識を行つたが、音声認識率は上記
19個の特徴パラメータを用いた場合と同様な結果
が得られた。 Next, from the 1st channel to the 16th channel,
Prepare the same band filter as above, replace the band filter from channel 17 to channel 19, and set the lower limit frequency to 4800Hz and the upper limit frequency to 7.8K.
Using a Hz band filter, perform spectrum analysis on 17 channels as explained in the above example, set the weighting constant to "3", weight the output value of the 17th channel, and calculate the average value. death,
Then, the average value was subtracted from the 16 band-specific logarithmic spectra excluding the output value of the 17th channel, and the result was used as the characteristic parameter. Based on these 16 feature parameters, speech recognition was performed by comparing them with standard feature parameters consisting of a newly created set of 16 feature parameters, but the speech recognition rate was
Similar results were obtained when using 19 feature parameters.

なお、最初から、第１チヤネルから第16チヤネ
ルまでのスペクトル分析しか行わなかつたものに
ついては、高域部分の情報が全く加味されないた
め、音声認識率が低下することは、以前の実験で
わかつている。 In addition, previous experiments have shown that if only spectrum analysis from the 1st channel to the 16th channel is performed from the beginning, the speech recognition rate will decrease because information in the high frequency region is not taken into account at all. There is.

さらに、周波数帯域を変化させて実験を繰り返
したが同様な効果を得ることができた。 Furthermore, we repeated the experiment by changing the frequency band and were able to obtain similar effects.

(E) 発明の効果以上説明した如く本発明によれば、簡単な手段
によつて、音声認識率を低下させることなく、照
合／格納特徴パラメータ量を削減することがで
き、メモリ量、演算機構等を節減することができ
るとともに、高周波数帯域部分をまとめることに
よつて、スペクトル分析のためのハードウエア量
を減少させることができるようになる。さらに、
周波数帯域を広げることによつて、音声認識率を
向上させることができるようになる。(E) Effects of the Invention As explained above, according to the present invention, the amount of matching/storing feature parameters can be reduced by simple means without reducing the speech recognition rate, and the amount of memory and calculation mechanism can be reduced. By consolidating the high frequency band parts, the amount of hardware for spectrum analysis can be reduced. moreover,
By widening the frequency band, it becomes possible to improve the speech recognition rate.

[Brief explanation of drawings]

図は本発明の一実施例構成を示す。図中、１は音声入力部、２はパラメータ抽出
部、３はスペクトル分析部、４−１ないし４−ｎ
は帯域通過フイルタ、５−１ないし５−ｎは整流
器、６−１ないし６−ｎはアナログ・デイジタル
変換回路、７はスペクトル正規化部、８−１ない
し８−ｎは対数変換部、９は定数記憶部、１０は
乗算器、１１は平均値算出部、１２−１ないし１
２−（ｎ−１）は減算器、１３は音声認識部、１
４は辞書を表わす。 The figure shows the configuration of an embodiment of the present invention. In the figure, 1 is an audio input section, 2 is a parameter extraction section, 3 is a spectrum analysis section, 4-1 to 4-n
is a band pass filter, 5-1 to 5-n are rectifiers, 6-1 to 6-n are analog-to-digital conversion circuits, 7 is a spectrum normalization section, 8-1 to 8-n are logarithmic conversion sections, and 9 is a Constant storage unit, 10 is a multiplier, 11 is an average value calculation unit, 12-1 to 1
2-(n-1) is a subtracter, 13 is a speech recognition unit, 1
4 represents a dictionary.

Claims

[Scope of Claims] 1. A speech recognition device that recognizes speech by collating feature parameter time series obtained by frequency analysis of speech, comprising: a spectrum analysis unit that performs spectrum analysis over a wide speech frequency band; Weighting means for weighting the high frequency band portion of the analysis result by the spectrum analysis section according to its bandwidth; and average value calculation means for calculating the average value of the analysis results including the weighted results, and based on the average value. and a spectrum normalization unit that normalizes the spectrum excluding the high frequency band portion, and uses the parameter time series of the normalized spectrum from which the high frequency band portion is removed as the feature parameter time series for matching. A voice recognition device featuring: 2 The spectrum analysis section is configured to use multi-channel bandpass filters and convert the output of each filter into spectrum power for each band, and the spectrum normalization section logarithmically transforms the spectrum power for each band. After determining the logarithmic spectral power for each band, weight one channel of the highest frequency band to determine the weighted average value of all channels,
The speech recognition device according to claim 1, wherein the speech recognition device is configured to output spectral power excluding one channel in the highest frequency band, which is obtained by subtracting the weighted average value from the logarithmic spectral power for each band. .