JPH04295895A

JPH04295895A - voice recognition device

Info

Publication number: JPH04295895A
Application number: JP3061623A
Authority: JP
Inventors: Takeshi Norimatsu; 武志則松
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1991-03-26
Filing date: 1991-03-26
Publication date: 1992-10-20

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】本発明は、入力音声の特徴パター
ンと予め登録された認識対象となる音声の特徴パターン
とのパターンマッチングにより認識結果を導き出す音声
認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition device that derives a recognition result by pattern matching between a characteristic pattern of an input voice and a characteristic pattern of a voice to be recognized that has been registered in advance.

【０００２】0002

【従来の技術】従来の単語音声認識装置では、２つの音
声パターン間の類似度を、スペクトルパラメータ時系列
同志の時間正規化マッチングにより求めるのが一般的で
ある。しかしこの方法はエネルギー情報が考慮されてい
ないため、エネルギー形状の異なったパターン間でも類
似度が高くなることがある。これを解決するために、ス
ペクトル距離だけでなくエネルギー距離も加味した距離
尺度を用い、エネルギー形状の違いを距離に反映させる
ことによって認識率向上を図った方法が提案されている
（例えば、特公平２−４１７６０号公報）。2. Description of the Related Art In conventional word speech recognition devices, the degree of similarity between two speech patterns is generally determined by time-normalized matching of spectral parameter time series. However, since this method does not take energy information into consideration, the degree of similarity may be high even between patterns with different energy shapes. In order to solve this problem, a method has been proposed that uses a distance measure that takes into account not only the spectral distance but also the energy distance, and aims to improve the recognition rate by reflecting the difference in energy shape in the distance (for example, 2-41760).

【０００３】0003

【発明が解決しようとする課題】しかしながら上記の音
声認識装置では、エネルギー距離を用いることにより時
間正規化マッチング経路の範囲をある程度制御すること
は可能であるが、エネルギーの変化状態に沿った正確な
パターンマッチングが実行されることが保証されている
わけではない。そのため、エネルギー形状の異なったパ
ターン間でも類似度が高くなり誤認識を生じる場合があ
る。[Problem to be Solved by the Invention] However, in the above speech recognition device, although it is possible to control the range of the time-normalized matching path to some extent by using the energy distance, it is difficult to accurately follow the energy change state. There is no guarantee that pattern matching will be performed. Therefore, the degree of similarity between patterns with different energy shapes may become high, resulting in erroneous recognition.

【０００４】また音韻の類似した単語間の認識では、音
声の定常部、過渡部同志の違いを正確に比較することが
重要である。音声の定常部、過渡部はほぼエネルギー変
化の定常部、過渡部に一致すると考えてよい。定常部は
音声中に占める割合が多いため従来の方法でも違いを区
別することができるが、過渡部は音声中に占める割合が
小さいため正確なマッチングによる２つのパターンの比
較が要求される。例えば人名「ＳＡＴＯ」「ＫＡＴＯ」
の場合、両者の違いはほぼ音声の過渡部に相当する語頭
の子音「Ｓ」、「Ｋ」の部分であり、これらを正確にマ
ッチングできなければ認識性能向上は望めない。しかし
従来の音声認識装置では、定常部、過渡部の正確なマッ
チングを実行することが難しく、過渡部の違いが最終の
類似度に反映されにくいため、特に類似単語間で誤認識
を生じる原因となっていた。[0004] Furthermore, in recognizing words with similar phonemes, it is important to accurately compare the differences between the steady and transient parts of speech. It can be considered that the steady part and transient part of voice almost correspond to the steady part and transient part of energy change. Since the stationary part occupies a large proportion of the voice, it is possible to distinguish the difference using conventional methods, but the transient part occupies a small proportion of the voice, so it is necessary to compare the two patterns through accurate matching. For example, the person's name "SATO""KATO"
In the case of , the difference between the two lies mostly in the initial consonants "S" and "K" that correspond to the transitional part of the voice, and unless these can be matched accurately, no improvement in recognition performance can be expected. However, with conventional speech recognition devices, it is difficult to accurately match the steady and transient parts, and differences in the transient part are difficult to reflect in the final similarity, which can cause erroneous recognition, especially between similar words. It had become.

【０００５】本発明は上記従来の課題を解決するもので
あり、エネルギー形状の異なった単語間、及び音韻の類
似した単語間の誤認識を簡単な制御で大幅に低減するこ
とのできる音声認識装置を提供することを目的とするも
のである。The present invention solves the above-mentioned conventional problems, and provides a speech recognition device that can significantly reduce misrecognition between words with different energy shapes and between words with similar phonemes through simple control. The purpose is to provide the following.

【０００６】[0006]

【課題を解決するための手段】上記課題を解決するため
に本発明の音声認識装置は、音声のスペクトルパラメー
タと正規化された対数エネルギーパラメータの時系列を
抽出する音声分析部と、エネルギー隣接フレーム間のエ
ネルギー変化即ち上昇，下降，変化なしの遷移パターン
に応じてエネルギー変化度を決定するエネルギー変化度
決定部と、入力パターン、標準パターンのエネルギー変
化度により任意に定めた対応付けの規則によってマッチ
ング経路を制限するパターンマッチング部とを備えたも
のである。[Means for Solving the Problems] In order to solve the above problems, the speech recognition device of the present invention includes a speech analysis section that extracts a time series of speech spectral parameters and normalized logarithmic energy parameters, and an energy adjacent frame. An energy change degree determination unit that determines the energy change degree according to the transition pattern of energy change between, i.e., rise, fall, and no change, and a matching rule that is arbitrarily determined based on the energy change degree of the input pattern and the standard pattern. The system also includes a pattern matching section that limits routes.

【０００７】また、本発明の音声認識装置は、上記パタ
ーンマッチング部を、入力音声パターンの隣接フレーム
間のエネルギー変化度に応じて、任意に定めた標準パタ
ーンとの対応付けの規則に従いマッチング範囲を制限し
、隣接フレーム間のエネルギー変化の度合が大きい部分
にマッチング経路の重みを大きくしながら２つのパター
ン間の時間正規化マッチングを実行するよう構成したも
のである。[0007] Furthermore, the speech recognition device of the present invention causes the pattern matching section to determine a matching range according to a rule for association with an arbitrarily determined standard pattern according to the degree of energy change between adjacent frames of the input speech pattern. This configuration is configured to perform time-normalized matching between two patterns while increasing the weight of the matching path in a portion where the degree of energy change between adjacent frames is large.

【０００８】[0008]

【作用】本発明は上述した構成により、音声パターンの
エネルギーの変化の状態に合わせてマッチングが実行で
きるため、音声の定常部、過渡部が正確に対応付けられ
、エネルギー形状の異なった単語間の誤認識を大幅に低
減することのできる音声認識装置を提供することができ
る。[Operation] With the above-described configuration, the present invention can perform matching according to the state of energy change in the speech pattern, so that the steady and transient parts of the speech can be accurately matched, and words with different energy shapes can be matched. It is possible to provide a speech recognition device that can significantly reduce misrecognition.

【０００９】またパターンマッチング部を、入力音声パ
ターンの隣接フレーム間のエネルギー変化度に応じて、
任意に定めた標準パターンとの対応付けの規則に従いマ
ッチング範囲を制限し、隣接フレーム間のエネルギー変
化の度合が大きい部分にマッチング経路の重みを大きく
しながら２つのパターン間の時間正規化マッチングを実
行するよう構成することにより、上記作用に加えて、エ
ネルギー変化の過渡部に重みが加えられることにより音
声の過渡部の相違が２つのパターン間の類似度に反映さ
れ、特に類似単語間での誤認識を抑えることのできる音
声認識装置を提供することができる。[0009] Furthermore, the pattern matching section is configured to perform
The matching range is limited according to rules for association with an arbitrarily defined standard pattern, and time-normalized matching between two patterns is performed while increasing the weight of the matching path in areas where the degree of energy change between adjacent frames is large. By configuring this to A speech recognition device that can suppress recognition can be provided.

【００１０】0010

【実施例】以下本発明の一実施例の音声認識装置につい
て図面を参照しながら説明する。（図１）は本発明の一
実施例における音声認識装置のブロック構成図である。（図１）において、１は音声の特徴パラメータを抽出す
る音声分析部で、入力された音声信号から一定時間毎に
音声スペクトルの特徴パラメータの時系列を抽出するス
ペクトルパラメータ抽出部１１と、音声信号の対数エネ
ルギー値の正規化した値を抽出するエネルギーパラメー
タ抽出部１２とから構成される。２はエネルギーの変化
度を計算するエネルギー変化度決定部で、エネルギーの
差分値からエネルギー変化の状態を判断するエネルギー
差分符合計算部２１と、各時点でのエネルギー差分符号
値の遷移状態からエネルギーの変化の度合を計算するエ
ネルギー変化度計算部２２より構成される。３は入力音
声の特徴パラメータの時系列を記憶する入力パターンメ
モリ、４は認識対象となる音声の特徴パラメータの時系
列を記憶する標準パターンメモリである。５は入力パタ
ーンと標準パターンとの類似度を計算するパターンマッ
チング部で、エネルギー変化状態に応じて入力パターン
と標準パターンとの対応付けの制限を記述したテーブル
５１と、このテーブル３１の制限に従ってマッチングの
範囲を制限しながらパターンマッチングを実行する制御
部５２と、エネルギー変化の大きい部分にマッチング経
路の重みを大きくする重み付け部５３により構成される
。６は入力パターンと各標準パターンとの類似度から類
似度が最大となる標準パターンを選び出す類似度比較部
である。DESCRIPTION OF THE PREFERRED EMBODIMENTS A speech recognition apparatus according to an embodiment of the present invention will be described below with reference to the drawings. (FIG. 1) is a block diagram of a speech recognition device according to an embodiment of the present invention. In FIG. 1, reference numeral 1 denotes a speech analysis section that extracts feature parameters of speech; a spectral parameter extraction section 11 that extracts a time series of feature parameters of a speech spectrum at regular intervals from an input speech signal; and an energy parameter extraction unit 12 that extracts a normalized value of the logarithmic energy value of . 2 is an energy change degree determination unit that calculates the energy change degree; an energy difference sign calculation unit 21 that determines the state of energy change from the energy difference value; It is composed of an energy change degree calculation section 22 that calculates the degree of change. Reference numeral 3 denotes an input pattern memory that stores a time series of characteristic parameters of input speech, and 4 denotes a standard pattern memory that stores a time series of characteristic parameters of a voice to be recognized. Reference numeral 5 denotes a pattern matching unit that calculates the degree of similarity between the input pattern and the standard pattern, and performs matching according to the restrictions in the table 51 and the table 31 that describes the restrictions on the correspondence between the input pattern and the standard pattern according to the energy change state. The control unit 52 includes a control unit 52 that executes pattern matching while limiting the range of , and a weighting unit 53 that increases the weight of the matching path in areas where energy changes are large. Reference numeral 6 denotes a similarity comparison unit that selects a standard pattern having the maximum similarity from the similarity between the input pattern and each standard pattern.

【００１１】次に、本実施例における音声認識装置の動
作を（図１）を用いて詳細に説明する。マイクロホン等
を通して入力された音声信号ａは、スペクトルパラメー
タ抽出部１１で例えば、線形予測分析などにより一定時
間毎に線形予測係数等の特徴パラメータに変換される。音声信号ａはまたエネルギーパラメータ抽出部１２に送
られ、ここで一定時間毎の対数エネルギー値が計算され
る。この対数エネルギー値は、さらに音声区間中のエネ
ルギー値の最大値と最小値との間で正規化した値に変換
される。このエネルギーの正規化は、発声の都度また単
語によりエネルギーの大きさがばらばらであるため、エ
ネルギー変化等の値を統一して扱えるようにするために
行う。スペクトルパラメータ抽出部１１及びエネルギー
パラメータ抽出部１２で算出されたスペクトルパラメー
タ、エネルギーパラメータの時系列は入力パターンメモ
リ３に格納される。Next, the operation of the speech recognition apparatus in this embodiment will be explained in detail using FIG. 1. The audio signal a inputted through a microphone or the like is converted into feature parameters such as linear prediction coefficients at regular time intervals by a spectral parameter extraction unit 11, for example, by linear prediction analysis. The audio signal a is also sent to the energy parameter extractor 12, where logarithmic energy values are calculated at fixed time intervals. This logarithmic energy value is further converted into a value normalized between the maximum and minimum energy values in the voice section. This normalization of energy is performed in order to be able to handle values such as energy changes in a unified manner, since the magnitude of energy varies depending on each utterance or word. The time series of the spectrum parameters and energy parameters calculated by the spectrum parameter extraction unit 11 and the energy parameter extraction unit 12 are stored in the input pattern memory 3.

【００１２】このエネルギーパラメータ抽出部１２で算
出された正規化対数エネルギー値の時系列を用いて、エ
ネルギー差分符号計算部２１で各フレーム毎にエネルギ
ー変化パターン、即ち上昇，下降，変化なしを決定する
。ここで入力音声パターンの任意のフレームをｉとし、
フレームｉの正規化対数エネルギー値をＥｉとする。Using the time series of normalized logarithmic energy values calculated by the energy parameter extraction unit 12, the energy difference sign calculation unit 21 determines the energy change pattern, that is, rising, falling, or no change, for each frame. . Here, let i be an arbitrary frame of the input audio pattern,
Let Ei be the normalized logarithmic energy value of frame i.

【００１３】まずフレームｉとその前フレーム（ｉ−１
）でのエネルギー値の差分値ΔＥｉを次式で求める。First, frame i and the previous frame (i-1
) is calculated using the following equation.

【００１４】[0014]

【数１】[Math 1]

【００１５】（数１）で得られた差分値ΔＥｉの符号、
即ち＋（プラス）及び−（マイナス）を求め、＋に値１
を、−に値−１を与える。なお差分値の絶対値が任意に
定めた定数ａより小さい場合、即ち次式の条件を満たす
場合はエネルギーの変化が小さいとして値０を与える。[0015] The sign of the difference value ΔEi obtained by (Equation 1),
In other words, find + (plus) and - (minus), and set + to the value 1.
, give the value -1 to -. Note that if the absolute value of the difference value is smaller than an arbitrarily determined constant a, that is, if the condition of the following equation is satisfied, the change in energy is considered to be small and a value of 0 is given.

【００１６】[0016]

【数２】[Math 2]

【００１７】ここで得られた任意のフレームｉのエネル
ギー差分の変化パターン（−１，０，１）をＦｉとおく
。このＦｉの時系列を用いて、エネルギー変化度計算部
２２では、各フレームでのエネルギーの変化度Ｓｉを次
式で算出する。Let Fi be the change pattern (-1, 0, 1) of the energy difference of any frame i obtained here. Using this time series of Fi, the energy change degree calculation unit 22 calculates the energy change degree Si in each frame using the following equation.

【００１８】[0018]

【数３】[Math 3]

【００１９】この値は任意のフレームｉとその前後１フ
レーム間のエネルギーの変化の傾向を表す量として定義
される。このようにして算出された変化度Ｓｉの時系列
は入力パターンメモリ３に記憶される。This value is defined as a quantity representing the tendency of energy change between an arbitrary frame i and one frame before and after it. The time series of the degree of change Si calculated in this way is stored in the input pattern memory 3.

【００２０】なお、認識の対象となる各標準パターンの
スペクトルパラメータと、エネルギー変化度Ｓの時系列
は予め標準パターンメモリ４に記憶されているものとす
る。It is assumed that the spectral parameters of each standard pattern to be recognized and the time series of the energy change degree S are stored in the standard pattern memory 4 in advance.

【００２１】次にパターンマッチング部５の動作を説明
する。ここで、処理の対象となる標準パターンの任意の
フレームをｊ、フレームｊにおけるエネルギー変化度を
Ｓｊとおく。入力パターンと標準パターンのパターンマ
ッチングは、例えば（図２）で示されるような傾斜制限
に従った動的計画法による時間正規化マッチング（ＤＰ
マッチングと呼ぶ。）により実行される。（図２）にお
いて、矢印上の数値はマッチングパスが各矢印方向に進
む時の重みを表わす。漸化式としては次式のようになる
。Next, the operation of the pattern matching section 5 will be explained. Here, it is assumed that an arbitrary frame of the standard pattern to be processed is j, and the degree of energy change in frame j is Sj. Pattern matching between the input pattern and the standard pattern is performed by time-normalized matching (DP) using dynamic programming according to slope constraints, as shown in (Figure 2), for example
It's called matching. ) is executed. In FIG. 2, the numerical values on the arrows represent the weights when the matching path moves in each arrow direction. The recurrence formula is as follows.

【００２２】[0022]

【数４】[Math 4]

【００２３】ここで、ｄｉｊは入力パターンｉフレーム
、標準パターンｊフレームのスペクトル特徴パターンの
ベクトル距離、ｇｉｊはその累積距離である。Here, dij is the vector distance of the spectral feature patterns of the input pattern i frame and the standard pattern j frame, and gij is the cumulative distance thereof.

【００２４】今、入力パターンのｉフレーム、標準パタ
ーンのｊフレームでのマッチング計算を実行しているも
のとする。制御部５２ではまず、入力パターンのフレー
ムｉのエネルギー変化度Ｓｉの値に対し、例えば（表１
）に示したような対応付けの制限を定めたテーブル５１
に従い、（数４）の漸化式を計算するかどうかを判断す
る。It is now assumed that matching calculations are being performed using i-frames of the input pattern and j-frames of the standard pattern. First, the control unit 52 calculates, for example, (Table 1) the value of the energy change degree Si of frame i of the input pattern.
) table 51 that defines the restrictions on mapping as shown in
According to (Equation 4), it is determined whether to calculate the recurrence formula (Equation 4).

【００２５】[0025]

【表１】[Table 1]

【００２６】例えばＳｉが２の場合、（表１）に従うと
、Ｓｊが１或は２の場合にはマッチング計算（数４）を
実行し、それ以外の場合は実行しないことになる。この
ように、テーブル５１の制限に従い各標準パターンにつ
いてパターンマッチング計算を行い類似度を算出する。エネルギー形状の類似した２つのパターン間でパターン
マッチング部５によるマッチング計算を行った結果例を
（図３）に示す。（図３）の斜線部はマッチング計算が
実行可能な範囲を表す。（図３）より明らかなように、
入力パターン、標準パターンのエネルギー変化の対応す
る過渡部、定常部同志が最適にマッチングされているこ
とがわかる。これにより音声の過渡部、定常部同志の最
適なマッチングが実行される。For example, when Si is 2, according to Table 1, matching calculation (Equation 4) is executed when Sj is 1 or 2, and is not executed in other cases. In this way, pattern matching calculations are performed for each standard pattern according to the restrictions in the table 51, and the degree of similarity is calculated. An example of the results of a matching calculation performed by the pattern matching section 5 between two patterns having similar energy shapes is shown in FIG. 3 (FIG. 3). The shaded area in FIG. 3 represents the range in which matching calculations can be performed. (Figure 3) As is clearer,
It can be seen that the corresponding transient parts and steady parts of the energy changes of the input pattern and the standard pattern are optimally matched. As a result, optimal matching between the transient and steady parts of the audio is performed.

【００２７】また、パターンマッチング部５でエネルギ
ー形状の異なる２つのパターン間のパターンマッチング
を実行すると、マッチング経路が終点まで到達できず、
この段階でこの標準パターンを認識候補音声から除外す
ることができる。Furthermore, when the pattern matching section 5 executes pattern matching between two patterns with different energy shapes, the matching path cannot reach the end point,
At this stage, this standard pattern can be excluded from recognition candidate speech.

【００２８】ところで音韻の類似した単語間では、音声
の過渡部のみスペクトルが異なる場合が多い。この過渡
部の違いをパターンマッチングの結果得られる類似度に
より大きく反映させると両者を区別しやすくなる。By the way, between words with similar phonemes, the spectra often differ only in the transient part of the speech. If this difference in the transition portion is reflected in the degree of similarity obtained as a result of pattern matching, it becomes easier to distinguish between the two.

【００２９】そこで重み付け部５３では、入力パターン
、標準パターンのエネルギー変化度Ｓの値に応じて、（
図２）に示した傾斜制限の重みを過渡部により大きく重
み付けされるように変更する。重み付けの例として例え
ば（図４）に示すような方法が考えられる。これは、エ
ネルギー変化度Ｓが２のとき、即ちエネルギー変化が確
実に上昇傾向にある場合に（図２）の場合の２倍の重み
をつけた例である。なお、（図４）において矢印上の数
値はマッチングパスが各矢印方向に進む時の重みを表わ
し、（ａ）はＳｉ＝２かつＳｊ＝２の時を、（ｂ）はＳ
ｉ＝２かつＳｊ≠２の時を、（ｃ）はＳｉ≠２かつＳｊ
＝２の時を、（ｄ）はＳｉ≠２かつＳｊ≠２の時を示し
ている。このようにＳの値によって傾斜制限の重みを変
更しながら、制御部５３でテーブル５１の制限に従いＤ
Ｐマッチング計算を実行する。これによりエネルギーの
過渡部での相違が全体の類似度により大きく反映される
ことになる。Therefore, the weighting unit 53 calculates (
The weight of the slope restriction shown in FIG. 2) is changed so that it is weighted more heavily in the transition part. As an example of weighting, a method as shown in FIG. 4 can be considered. This is an example in which twice the weight is applied when the energy change degree S is 2, that is, when the energy change is definitely on an upward trend (FIG. 2). In (Fig. 4), the numerical values on the arrows represent the weights when the matching path advances in each arrow direction, (a) when Si = 2 and Sj = 2, (b) when S
When i=2 and Sj≠2, (c) is Si≠2 and Sj
(d) shows the case when Si≠2 and Sj≠2. In this way, while changing the weight of the slope restriction according to the value of S, the controller 53 controls D according to the restrictions in the table 51.
Perform P matching calculation. As a result, the difference in the energy transition portion is more greatly reflected in the overall similarity.

【００３０】最後に類似度比較部６では、得られた各標
準パターンとの類似度のうち、最大となる類似度を与え
る標準パターンを認識候補として出力する。Finally, the similarity comparing section 6 outputs the standard pattern that provides the maximum similarity among the obtained similarities with each standard pattern as a recognition candidate.

【００３１】以上のように本実施例によれば、エネルギ
ー変化度決定部２で各フレームの前後のフレーム間のエ
ネルギー変化の傾向からエネルギーの変化の度合を数値
で決定し、この数値によって予め定められたテーブル５
１のマッチングの対応付け規則に従い時間正規化マッチ
ングを実行するようにしたことにより、エネルギー変化
の過渡部、定常部同志の最適なマッチングが実現でき、
エネルギー形状の異なった単語間の誤認識を低減するこ
とができる。As described above, according to this embodiment, the energy change degree determination unit 2 numerically determines the degree of energy change from the tendency of energy change between frames before and after each frame, and uses this value to determine the degree of energy change in advance. table 5
By performing time-normalized matching according to the matching rule in 1, it is possible to achieve optimal matching between the transient and steady parts of energy changes.
Misrecognition between words with different energy shapes can be reduced.

【００３２】また、パターンマッチング部５にエネルギ
ー変化の過渡部を重視する重み付け部５３を設けること
により、音声の過渡部の違いが全体の類似度に大きく反
映され、音韻の似通った単語間の誤認識を極力抑えるこ
とができる。Furthermore, by providing the pattern matching unit 5 with a weighting unit 53 that emphasizes the transitional part of energy change, differences in the transitional part of speech are greatly reflected in the overall similarity, and errors between words with similar phonemes are reduced. Recognition can be minimized.

【００３３】[0033]

【発明の効果】以上のように本発明によれば、音声パタ
ーンの各フレームの前後でのエネルギー変化傾向から上
昇、下降といったエネルギー変化の度合を数値的に定め
、その変化度合に応じて、予めその変化傾向が似通った
部分同志がマッチングされるように定めた制限に従って
、パターンマッチングを制御するように構成したことに
より、エネルギー変化の過渡部、定常部同志の正確なマ
ッチングが可能となり、認識性能を大幅に向上させるこ
とができる。[Effects of the Invention] As described above, according to the present invention, the degree of energy change, such as rising or falling, is determined numerically from the energy change tendency before and after each frame of the audio pattern, and the degree of energy change, such as rising or falling, is determined in advance according to the degree of change. By configuring pattern matching to be controlled according to a set limit so that parts with similar change trends are matched, accurate matching of transient and steady parts of energy changes is possible, and recognition performance is improved. can be significantly improved.

【００３４】また、パターンマッチングの際にエネルギ
ー変化の大きい部分に重み付けを施すことにより、音声
の過渡部の違いが強調され、特に音韻の類似した単語間
の誤認識を低減することのできる音声認識装置を提供す
ることができる。[0034] Furthermore, by weighting parts with large energy changes during pattern matching, differences in transient parts of speech are emphasized, and speech recognition can particularly reduce misrecognition between words with similar phonemes. equipment can be provided.

[Brief explanation of drawings]

【図１】本発明の一実施例における音声認識装置のブロ
ック構成図である。FIG. 1 is a block diagram of a speech recognition device according to an embodiment of the present invention.

【図２】パターンマッチングの傾斜制限を示す説明図で
ある。FIG. 2 is an explanatory diagram showing slope restrictions in pattern matching.

【図３】本実施例のパターンマッチングの結果例を示す
説明図である。FIG. 3 is an explanatory diagram showing an example of the result of pattern matching in this embodiment.

【図４】重み付けを施した傾斜制限の例を示す説明図で
ある。FIG. 4 is an explanatory diagram showing an example of weighted slope restriction.

[Explanation of symbols]

１　　音声分析部２　　エネルギー変化度決定部３　　入力パターンメモリ４　　標準パターンメモリ５　　パターンマッチング部６　　類似度比較部１１　　スペクトルパラメータ抽出部１２　　エネルギーパラメータ抽出部２１　　エネルギー差分符号計算部２２　　エネルギー変化度計算部５１　　テーブル５２　　制御部５３　　重み付け部 1 Speech analysis section 2 Energy change degree determination section 3 Input pattern memory 4 Standard pattern memory 5 Pattern matching section 6 Similarity comparison section 11 Spectral parameter extraction section 12 Energy parameter extraction section 21 Energy difference code calculation unit 22 Energy change degree calculation section 51 Table 52 Control section 53 Weighting section

Claims

[Claims]

1. A voice analysis unit that extracts a voice feature spectrum and a time series of normalized logarithmic energy values from a voice signal; and a voice analysis unit that extracts a time series of a voice feature spectrum and a normalized logarithmic energy value; an energy change degree determination unit that determines the degree of energy change between adjacent frames; and an energy change degree determination unit that determines the degree of energy change between adjacent frames; A speech recognition device comprising: a pattern matching unit that executes time-normalized matching between two patterns while limiting the range of matching according to rules for association with standard patterns.

[Claim 2] The energy change degree determining unit determines, for each frame, an increase from an energy difference value with an adjacent frame.
2. The speech recognition apparatus according to claim 1, wherein three energy change patterns are determined, ie, a downward trend and no change, and the degree of energy change is determined numerically from a transition state of the change pattern.

[Claim 3] The pattern matching section is configured to perform a pattern matching process according to the degree of energy change between adjacent frames of the input audio pattern.
The matching range is limited according to rules for association with an arbitrarily defined standard pattern, and time-normalized matching between two patterns is performed while increasing the weight of the matching path in areas where the degree of energy change between adjacent frames is large. The speech recognition device according to claim 1, characterized in that:

4. The speech recognition apparatus according to claim 2, wherein the energy change degree determination unit determines that there is no energy change when the absolute value of the energy difference value with an adjacent frame is smaller than a certain value.