JPH10320370A

JPH10320370A - Pattern recognition method by integrating multiple discriminant functions

Info

Publication number: JPH10320370A
Application number: JP9127563A
Authority: JP
Inventors: Shuko Ueda; 修功上田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: NTT Inc
Priority date: 1997-05-16
Filing date: 1997-05-16
Publication date: 1998-12-04

Abstract

(57)【要約】（修正有）【課題】複数の識別関数の統合によるパターン認識を
行う。【解決手段】予め用意しておいたＭ種類の識別関数
を、特徴ベクトルとこの特徴ベクトルが帰属する帰属ク
ラスとのペアから成るＮ組の訓練データを用いてＭ種類
の識別関数群を各々個別に学習させてＭ種類のレベル０
識別関数を構成し、Ｎ組の訓練データから１組の訓練デ
ータを順次抜き取ると共に１組抜き取る毎に、残りのＮ
−１組の訓練データを用いてＭ種類の識別関数を各々新
たに学習してＭ種類のレベル１識別関数を各クラス毎に
構成した後、抜き取った１組のデータに対するＭ種類の
レベル１識別関数の出力値から成るＫＭ次元ベクトルと
抜き取ったデータのクラスラベルとのペアから成る計Ｎ
組のレベル１データを構成し、前記学習済みの識別関数
の出力の線形和として新たな識別関数を構成する。 (57) [Summary] (With correction) [Problem] To perform pattern recognition by integrating a plurality of identification functions. SOLUTION: M kinds of discriminant functions prepared in advance are individually classified into N kinds of discriminant functions using N sets of training data composed of pairs of feature vectors and classes to which the feature vectors belong. To learn, M kinds of level 0
A discriminant function is constructed, and one set of training data is sequentially extracted from the N sets of training data.
−1 sets of training data are used to newly learn M kinds of discriminant functions, M kinds of level 1 discriminant functions are constructed for each class, and then M kinds of level 1 discriminants for one set of extracted data A total N consisting of a pair of a KM-dimensional vector composed of the output values of the function and a class label of the extracted data
A set of level 1 data is formed, and a new discriminant function is formed as a linear sum of the outputs of the learned discriminant functions.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、データの確率分布
を仮定しないノンパラメトリックなパターン認識を実現
することを可能とする複数の識別関数の統合によるパタ
ーン認識方法に関するものである。[0001] 1. Field of the Invention [0002] The present invention relates to a pattern recognition method by integrating a plurality of discriminant functions, which can realize nonparametric pattern recognition without assuming a probability distribution of data.

【０００２】[0002]

【従来の技術】まず、特徴ベクトルｘをＫクラスのいず
れかに分類する問題を考える。このとき、各クラスに各
々関数ｆ^(k)（ｘ），ｋ＝１，…，Ｋを対応させ、ｆ
^(k)（ｘ）の値が最大となるクラスを特徴ベクトルｘの
クラスとして決定する方法を識別関数法と呼び、このと
き用いられる関数を識別関数という。2. Description of the Related Art First, consider the problem of classifying a feature vector x into one of K classes. At this time, a function f ^(k) (x), k = 1,...
^(k) A method of determining the class having the maximum value of (x) as the class of the feature vector x is called an identification function method, and the function used at this time is called an identification function.

【０００３】すなわち、識別関数によるクラス決定はThat is, the class determination by the discriminant function is

【数１】と書ける。ここで、Ｃ（ｘ）は特徴ベクトルｘのクラス
を表す。従って、１つの分類器（classifier）はＫ個の
識別関数から構成される。(Equation 1) I can write Here, C (x) represents the class of the feature vector x. Thus, one classifier is composed of K discriminant functions.

【０００４】識別関数法では、まず、識別関数をパラメ
トリックに定め、次いで、特徴ベクトルとそのクラスラ
ベルからなる予め与えられた訓練データを用いて、訓練
データができるだけ正しく認識されるよう識別関数のパ
ラメータを推定する。このパラメータ推定過程は識別関
数の学習過程と呼ばれる。In the discriminant function method, first, a discriminant function is determined parametrically, and then parameters of the discriminant function are determined using training data consisting of a feature vector and its class label so that the training data is recognized as correctly as possible. Is estimated. This parameter estimation process is called a learning process of the discriminant function.

【０００５】最も単純な識別関数として、パラメータが
特徴ベクトルｘに関して線形なｆ^(k)（ｘ）＝Θ^Tｘ …（１）で表される線形識別関数がある。ここで、Θは未知パラ
メータベクトル、Ｔはベクトルの転置を表すものとす
る。[0005] As the simplest identification function, there is a linear discriminant function parameter is represented by linear ^{f (k) (x) =} Θ T x ... (1) with respect to the feature vector x. Here, Θ represents an unknown parameter vector, and T represents transposition of the vector.

【０００６】識別関数によるパターン認識では、その認
識性能は用いる識別関数のモデル、すなわち、識別関数
としてどのような関数を用いるかに左右される。In pattern recognition using a discriminant function, the recognition performance depends on the model of the discriminant function to be used, that is, what kind of function is used as the discriminant function.

【０００７】例えば、線形分離可能なクラス境界からな
る分類問題では線形境界を生成する線形識別関数が適切
であり、複雑なクラス境界の場合、その複雑さの度合い
に応じたより自由度の高い識別関数が望まれる。For example, a linear discriminant function that generates a linear boundary is appropriate for a classification problem including class boundaries that can be linearly separated. In the case of a complex class boundary, a discriminant function having a higher degree of freedom according to the degree of complexity is appropriate. Is desired.

【０００８】[0008]

【発明が解決しようとする課題】しかしながら、従来、
何等かの基準で最適な識別関数のモデルを決定するよう
にしていたが、こうして得られた単一モデルでは、同程
度の複雑さのクラス境界から成る分類問題では、確か
に、良好な識別関数のモデルが選択されるが、単純なク
ラス境界と複雑なクラス境界とが混在する分類問題の場
合では、両者の中間的な複雑さの識別関数を選択しなけ
ればならないという問題が生じる。例えば、文字認識の
ようにクラス数が多い応用ではこうした状況は容易に起
こり得る。However, conventionally,
Although we tried to determine the optimal model of the discriminant function based on some criteria, the single model obtained in this way would not be a good one for a classification problem consisting of class boundaries of similar complexity. Is selected, but in the case of a classification problem in which a simple class boundary and a complex class boundary are mixed, a problem arises in that an identification function having an intermediate complexity between the two must be selected. For example, in an application having a large number of classes such as character recognition, such a situation can easily occur.

【０００９】本発明は、上記課題に鑑みてなされたもの
で、単一モデルの選択によって生じる識別関数によるパ
ターン認識における識別関数のモデル選択の問題を解決
することのできる複数の識別関数の統合によるパターン
認識方法を提供することを目的とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and is based on the integration of a plurality of discriminant functions which can solve the problem of model selection of discriminant functions in pattern recognition using a discriminant function generated by the selection of a single model. It is an object to provide a pattern recognition method.

【００１０】[0010]

【課題を解決するための手段】前述した目的を達成する
ために、本発明のうちで請求項１記載の発明は、あるパ
ターンの観測結果として得られる特徴ベクトルをＫクラ
スのいずれかに分類するパターン認識問題に対し、Ｋ個
の識別関数を用意し、該識別関数の値が最大となるクラ
スを前記データの帰属クラスとする識別関数によるパタ
ーン認識方法において、予め用意しておいたＭ種類の識
別関数を、特徴ベクトルとこの特徴ベクトルが帰属する
帰属クラスとのペアから成るＮ組の訓練データを用いて
前記Ｍ種類の識別関数群を各々個別に学習させてＭ種類
のレベル０識別関数を構成するレベル０学習工程と、前
記Ｎ組の訓練データから１組の訓練データを順次抜き取
ると共に１組抜き取る毎に、残りのＮ−１組の訓練デー
タを用いて前記Ｍ種類の識別関数を各々新たに学習して
Ｍ種類のレベル１識別関数を各クラス毎に構成した後、
前記抜き取った１組のデータに対するＭ種類のレベル１
識別関数の出力値から成るＫＭ次元ベクトルと前記抜き
取ったデータのクラスラベルとのペアから成る計Ｎ組の
レベル１データを構成するレベル１データ生成工程と、
前記レベル１データを用いて、前記レベル０学習工程で
学習済みの識別関数の出力の線形和として新たな識別関
数を構成する識別関数統合工程とを有して、前記識別関
数統合工程で構成された識別関数によってパターン認識
を行うことを要旨とする。In order to achieve the above object, according to the present invention, a feature vector obtained as an observation result of a certain pattern is classified into one of K classes. For the pattern recognition problem, K discriminating functions are prepared, and in a pattern recognition method using a discriminating function in which a class having the maximum value of the discriminant function is a class to which the data belongs, M kinds of prepared in advance are used. The discrimination functions are individually trained using the M sets of discrimination functions using N sets of training data composed of a pair of a feature vector and a class to which the feature vector belongs. The level 0 learning process to be constituted, and one set of training data is sequentially extracted from the N sets of training data, and each time one set is extracted, the M is set using the remaining N-1 sets of training data. After configuring the level 1 discriminant function M types for each class by each newly learned identification function classes,
M types of level 1 for the extracted set of data
A level 1 data generating step of forming a total of N sets of level 1 data consisting of a pair of a KM dimensional vector including an output value of the identification function and a class label of the extracted data;
A discriminant function integrating step of forming a new discriminant function as a linear sum of outputs of the discriminant functions learned in the level 0 learning step using the level 1 data. The gist is that pattern recognition is performed by the discrimination function.

【００１１】また、請求項２記載の発明は、請求項１記
載の発明の構成のうち、識別関数統合工程において、分
類誤りの度合いの関数として定義される損失関数のレベ
ル１データに渡る平均値を最小化するよう前記線形重み
を求めることを要旨とする。According to a second aspect of the present invention, in the configuration of the first aspect of the invention, in the discriminant function integrating step, the average value over the level 1 data of the loss function defined as a function of the degree of the classification error is provided. The point is to obtain the linear weight so as to minimize.

【００１２】[0012]

【発明の実施の形態】以下、本発明の複数の識別関数の
統合によるパターン認識方法について説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, a pattern recognition method according to the present invention by integrating a plurality of identification functions will be described.

【００１３】上記目的を達成するため、請求項１に記載
の本発明では、Ｍ種類の識別関数の線形結合として表現
される統合識別関数を考える。すなわち、ｆ
_ens ^(k)（ｘ）をある特徴ベクトルｘに対する統合後の
第ｋクラスの識別関数の出力を表すものとすると、本発
明による各クラスの統合識別関数は、以下に示すよう
に、予め与えられたＭ種類の識別関数を同一の訓練デー
タＤ＝｛（ｘ_i，Ｃ（ｘ_i))；ｉ＝１，…，Ｎ｝を用いて各々個別に学習した識別関数（レベル０識別関
数と呼ぶ）のｘに対するＭＫ個の出力の線形結合として
定義される。In order to achieve the above object, in the present invention, an integrated discriminant function expressed as a linear combination of M kinds of discriminant functions is considered. That is, f
_{Assuming that ens} ^(k) (x) represents the output of the k-th class identification function after integration for a certain feature vector x, the integrated identification function of each class according to the present invention is given in advance as shown below. and M types of the same training discriminant function data _{D = {(x i, C} (x i)); i = 1, ..., N} is called with each individually learned discriminant function (level 0 discriminant function using ) Is defined as a linear combination of the MK outputs for x.

【００１４】[0014]

【数２】は、ｘに対するＭＫ個のレベル０識別関数の出力からな
るＭＫ次元ベクトルである。アンサンブル分類器による
識別規則は以下で与えられる。(Equation 2) Is an MK dimensional vector consisting of the outputs of the MK level 0 discriminant functions for x. The identification rules by the ensemble classifier are given below.

【００１５】[0015]

【数３】式（２）を行列表示すると、ｙ（ｘ）＝Ｗ^Tｆ＾（ｘ；Ｄ） …（４）但し、(Equation 3) When the equation (2) is expressed as a matrix, y (x) = W ^T f ＾ (x; D) (4)

【数４】と書ける。(Equation 4) I can write

【００１６】次に、請求項２に記載の線形重みＷを求め
る手段について説明する。まず、式（３）における線形
重みＷは識別関数空間（ｆ＾空間）からＫ次元への線形
写像：Ｗ：Ｒ^KM → Ｒ^K となっている。ここで式（４）を注意深く見ると、ｆ
_ens ^(k)は識別関数空間ｆ＾上ではα^(k)をパラメータ
とする線形識別関数と見做せる。すなわち、式（４）の
線形重みＷは、関数空間上のＮ点ｆ＾（ｘ₁），…，ｆ
＾（ｘ_N）を、その各々のクラスラベルＣ（ｘ₁），
…，Ｃ（ｘ_N）にできるだけ忠実に分類するための線形
識別関数のパラメータに対応していることがわかる。つ
まり、最適結合重みの決定問題は、識別関数空間上での
最適線形識別関数の設計問題に帰着されることになる。Next, the means for determining the linear weight W according to claim 2 will be described. First, linear mapping of the linear weighting W in Equation (3) from the identification function space (f ^ space) into K-dimensional: W: has a R ^KM → R ^K. Here, if we look carefully at equation (4), f
_ens ^(k) can be regarded as a linear discriminant function having α ^(k) as a parameter in the discriminant function space f ＾. That is, the linear weight W in the equation (4) is calculated at N points f ＾ (x ₁ ),.
＾ (x _N ) is replaced by its respective class label C (x ₁ ),
.., C (x _N ), which correspond to the parameters of a linear discriminant function for classifying as faithfully as possible. That is, the problem of determining the optimal connection weight is reduced to a design problem of the optimal linear discriminant function in the discriminant function space.

【００１７】今、ある線形重みＷが与えられたとき、式
（２），式（３）により第ｋクラスのサンプルｘを誤分
類した際の損失をｇ_k（ｆ＾（ｘ）；Ｗ）で表し、Ｆを
分布ｐ（ｆ＾）に従う確率変数とすると、損失の期待値
はNow, given a certain linear weight W, the loss caused by misclassification of the sample x of the k-th class by the equations (2) and (3) is represented by g _k (f ＾ (x); W). Where F is a random variable that follows the distribution p (f ＾), the expected value of the loss is

【数５】と書ける。但し、Ｐ_kは第ｋクラスの事前確率（prior
）とする。従って、期待損失最小化の観点で最適な線
形重みＷはＬの最小化問題を解くことにより求まる。し
かしながら、実際には確率変数Ｆの分布は未知であるこ
とから、上記の期待値計算は学習データに基づく経験分
布による期待値計算：(Equation 5) I can write Here, P _k is the k-th class prior probability (priority
). Therefore, the optimal linear weight W from the viewpoint of minimizing expected loss can be obtained by solving the L minimization problem. However, since the distribution of the random variable F is actually unknown, the above-described expected value calculation is performed using the empirical distribution based on the learning data:

【数６】で近似される。ここに、１（ｕ）はｕが真（true）のと
き「１」でそれ以外は「０」を返す関数である。(Equation 6) Is approximated by Here, 1 (u) is a function that returns “1” when u is true, and returns “0” otherwise.

【００１８】しかしながら、式（５）ではＤがｆ＾の推
定と線形重みＷの推定に重複して使われているので、得
られたアンサンブル分類器は特定の学習データＤに対し
て過学習（overfit ）となる問題が生じる。そこで、公
知の手段である、“StackedGeneralization”（Wolpert
D.H., “Stacked generalization，”Neural Network
s,vol.5,no.2,pp.241-259,1992）で提案された“レベル
１データ”を用いてこの問題に対処する。However, in equation (5), since D is used for both the estimation of f と and the estimation of the linear weight W, the obtained ensemble classifier performs over-learning on the specific training data D. overfit). Therefore, a known means, “StackedGeneralization” (Wolpert
DH, “Stacked generalization,” Neural Network
s, vol.5, no.2, pp.241-259, 1992), this problem is addressed using "level 1 data".

【００１９】つまりレベル１データは、原学習データを
「１つ抜き交差確認法（leave-one-out cross validati
on）」を適用することにより得られる。具体的には、原
データからｉ番目のサンプル点ｘ_iを抜いたＤ_(-i)≡Ｄ−｛（ｘ_i，Ｃ（ｘ_i))｝でｆ_m ^(k)を学習し、抜かれたデータｘ_iに対するｆ_m
^(k)の出力をｆ_m ^(k)（ｘ_i；Ｄ_(-i)）で表すものとす
ると、レベル１データはThat is, the level 1 data is obtained by subtracting the original learning data from the “leave-one-out cross validity check method”.
on) ”. Specifically, D to disconnect the i th sample point x _i from the original data _(-i) ≡D - learn _{{(x i, C (x} i))} with f _m ^(k), it is unplugged F _m for data x _i
^{Assuming that} the output of ^(k ) is represented by f _m ^(k) (x _i ; D _(-i) ), the level 1 data is

【数７】Ｄ′＝｛（ｆ＾_i，Ｃ（ｆ＾_i))；ｉ＝１，…，Ｎ｝となる。ここに、D ′ = {(f ＾ _i , C (f ＾ _i )); i = 1,..., N}. here,

【数８】また、明らかに、Ｃ（ｆ＾_i）≡Ｃ（ｘ_i）。結局、線
形重みＷの推定値はＤ′から次式の最小化問題を解くこ
とにより求められる。(Equation 8) Also, _{obviously, C (f ^ i) ≡C} (x i). After all, the estimated value of the linear weight W is obtained by solving the following minimization problem from D ′.

【００２０】[0020]

【数９】明らかに、式（６）において、ｆ＾_iをｘ_iと置き換え
れば、特徴ベクトル空間での通常の識別関数の設計問題
となっている。従って、損失関数として、例えば、公知
の誤分類尺度に基づく平滑化０−１損失関数（Juang B.
H.and KatagiriS.,“Discriminantlearning for minimu
m error classification,”IEEE Trans.Signal Proc.,
vol.40,no.12,1992）：(Equation 9) Obviously, replacing f ＾ _i with x _i in equation (6) is a design problem for a normal discriminant function in the feature vector space. Therefore, as the loss function, for example, a smoothed 0-1 loss function (Juang B.
H. and KatagiriS., “Discriminantlearning for minimu
m error classification, ”IEEE Trans.Signal Proc.,
vol.40, no.12,1992):

【数１０】を適用できる。ここで、ξはsigmoid 関数の勾配を制御
する正定数である。また、ｄ_kはクラスｋのサンプルを
誤分類した際の誤分類の度合いを示す尺度で、次式で定
義される。(Equation 10) Can be applied. Where ξ is a positive constant that controls the slope of the sigmoid function. D _k is a scale indicating the degree of misclassification when a class k sample is misclassified, and is defined by the following equation.

【００２１】[0021]

【数１１】式（８）の詳細説明は上記公知論文“Discriminantlear
ning for minimum errorclassification,”に詳しい。[Equation 11] The detailed description of equation (8) is described in the above-mentioned known paper “Discriminantlear
ning for minimum errorclassification, ”

【００２２】翻って、本問題の場合、式（８）でｄ
_k（ｘ）をｄ_k（ｆ＾，ｗ）に、ｆ^(k)（ｘ）をα^(k)T
ｆ＾に置き換えることによりＣ（ｆ＾）＝ｋなるｆ＾に
対する誤分類尺度Conversely, in the case of this problem, d
_k (x) to d _k (f ＾, w) and f ^(k) (x) to α ^{(k) T}
misclassification measure for f ＾ such that C (f ＾) = k by substituting f ＾

【数１２】を得る。ηは正定数。(Equation 12) Get. η is a positive constant.

【００２３】ｆ＾が正しく分類されているときはｄ
_k（ｆ＾；Ｗ）＜０となり、誤分類されているときはｄ
_k（ｆ＾；Ｗ）＞０となる。また、ｆ＾が正しく分類さ
れているとき、｜ｄ_k（ｆ＾；Ｗ）｜の値が大きくなる
につれて、損失関数の値が０に漸近し、一方、誤分類の
ときはｄ_k（ｆ＾；Ｗ）の値が大きくなるにつれて、損
失関数の値が１に漸近する。つまり、損失関数の値が、
分類の正解、不正解だけでなく、その度合いに応じて決
まる。When f ＾ is correctly classified, d
_k (f ＾; W) <0, and d if misclassified
_k (f ＾; W)> 0. Also, when f ＾ is correctly classified, the value of the loss function gradually approaches 0 as the value of | d _k (f 関数; W) | increases, while d _k (f The value of the loss function asymptotically approaches 1 as the value of 漸; W) increases. That is, the value of the loss function is
It depends not only on the correct or incorrect answer of the classification, but also on the degree.

【００２４】また、明らかに、ｄ_k＝０付近では分類結
果の正解、不正解に関わらず、同程度の損失が付与され
ることになる。これにより、過学習が抑制され、正則化
と同様、未知データに対する頑健性を高める効果があ
る。In addition, it is apparent that around d _k = 0, the same degree of loss is provided regardless of whether the classification result is correct or incorrect. This suppresses over-learning, and has the effect of increasing the robustness to unknown data, as well as regularization.

【００２５】式（８），（９）から損失関数が陽に得ら
れれば、式（６）の目的関数（経験損失関数）Ｊが線形
重みＷの関数として得られる。この場合、経験損失関数
Ｊは線形重みＷに関して非線形となるので閉形式の解は
得られず、反復法で線形重みＷを推定することになる。
例えば、公知の手法である確率的降下法（Amari S.,“A
theory of adaptive pattrn classifiers,”IEEE Tran
s.Elec.Comput.,vol.16,pp.299-307,1967）を用いて線
形重みＷを逐次推定できる。If the loss function is obtained explicitly from equations (8) and (9), the objective function (empirical loss function) J of equation (6) is obtained as a function of the linear weight W. In this case, since the empirical loss function J is nonlinear with respect to the linear weight W, a closed-form solution cannot be obtained, and the linear weight W is estimated by an iterative method.
For example, a stochastic descent method (Amari S., “A
theory of adaptive pattrn classifiers, ”IEEE Tran
s.Elec.Comput., vol. 16, pp. 299-307, 1967) can be used to sequentially estimate the linear weight W.

【００２６】[0026]

【数１３】また、ＵはＫ次正定値行列（実際には単位行列で良い）
である。さらにε（ｔ）は学習レートで、以下の条件を
満たすとき、アルゴリズムの局所最適解への収束が理論
保証される。(Equation 13) U is a K-th positive definite matrix (actually, a unit matrix may be used)
It is. Further, ε (t) is a learning rate, and when the following condition is satisfied, the convergence of the algorithm to the local optimal solution is theoretically guaranteed.

【００２７】[0027]

【数１４】以上説明したように、本発明では、単一モデルの識別関
数ではなく、複数モデルを線形結合した識別関数を用い
ているので、複雑さの異なるクラス境界が混在する分類
問題に対しても、適応的に適切な複雑さのクラス境界が
自動生成され、良好なクラス境界が得られる。[Equation 14] As described above, the present invention uses an identification function obtained by linearly combining a plurality of models instead of an identification function of a single model, so that the present invention can be applied to a classification problem in which class boundaries having different complexity are mixed. A class boundary of appropriate complexity is automatically generated, and a good class boundary is obtained.

【００２８】[0028]

【実施例】以下、本発明の実施例を図面を用いて詳細に
説明する。図１は、本発明の一実施形態の複数の識別関
数の統合によるパターン認識方法を実施するための装置
の機能構成を示すブロック図である。Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 is a block diagram showing a functional configuration of an apparatus for implementing a pattern recognition method by integrating a plurality of identification functions according to an embodiment of the present invention.

【００２９】前記レベル０学習工程では、外部より与え
られた前記訓練データを用いて、予め与えられたＭ種類
の識別関数の未知パラメータを推定し、前記レベル０識
別関数を構成する。このレベル０識別関数の構成は用い
る識別関数に応じた公知の手法が利用できる。例えば、
非線形識別関数として知られる３層ニューラルネットを
識別関数として採用した場合（この場合、入力ユニット
数は特徴ベクトルの次元数で、出力ユニット数はクラス
数Ｋとなる）の実施例を以下に示す。In the level 0 learning step, unknown parameters of M kinds of identification functions given in advance are estimated using the training data given from outside, and the level 0 identification function is constructed. For the configuration of the level 0 identification function, a known method according to the identification function to be used can be used. For example,
An example in which a three-layer neural network known as a nonlinear discriminant function is adopted as a discriminant function (in this case, the number of input units is the number of dimensions of the feature vector and the number of output units is the number of classes K) will be described below.

【００３０】ニューラルネットのモデル選択法として、
正則化パラメータによる公知の方法が利用できる。前記
正則化パラメータは実数値で、その値が大きい程、ニュ
ーラルネットの自由度が減少するので、モデル選択とし
て用いることができる。そこで、訓練データに対し、Ｍ
種類の前記正則化パラメータを設定して各々ニューラル
ネットを学習し、Ｍ種類の識別関数を構成する。ニュー
ラルネットの学習は公知の逆誤差伝搬法が利用できる。As a method of selecting a neural network model,
Known methods using regularization parameters can be used. The regularization parameter is a real value, and the larger the value is, the more the degree of freedom of the neural network is reduced. Therefore, for the training data, M
The neural network is learned by setting the types of the regularization parameters, and M types of discriminant functions are configured. A known back error propagation method can be used for learning the neural network.

【００３１】レベル１データ生成工程では、ｉ＝１，
２，…，Ｎの各々に対し、手順１，２を実行する。In the level 1 data generation step, i = 1,
Steps 1 and 2 are performed for each of 2, 2,..., N.

【００３２】（手順１）前記訓練データＤから第ｉ番目
のペア（ｘ_i，Ｃ（ｘ_i))を取り除いたＤ_(-i)＝Ｄ−（ｘ_i，Ｃ（ｘ_i)) を用いて前記Ｍ種類のニューラルネットを新たに学習す
る。[0032] The (Step 1) the i-th pair from the training data D D removal of the _{_{(x i, C (x i}} )) (-i) = D- (x i, C (x i)) using Thus, the M types of neural nets are newly learned.

【００３３】（手順２）手順１で得られた学習済みのＭ
種類のニューラルネットに対し、手順１で抜き取ったｘ
_iを入力する。第ｍ番目のニューラルネットの出力（Ｋ
次元ベクトル）をｆ＾_i＝（ｆ_m ⁽¹⁾（ｘ_i；Ｄ_(-i)）で表すと、（ｆ＾_i，Ｃ（ｆ＾_i))をレベル１データの
第ｉ番目のペアとする。(Procedure 2) The learned M obtained in the procedure 1
X extracted in step 1 for each type of neural network
Enter _i . Output of the m-th neural network (K
The dimensional _{_{vector) f ^ i = (f m}} (1); expressed by _{_{(x i D (-i))}} , (f ^ i, C (f ^ i)) the level 1 data i-th pair And

【００３４】以上の手順より、計Ｎ組のペアからなるレ
ベル１データＤ′＝｛（ｆ＾_i，Ｃ（ｆ＾_i))；ｉ＝１，…，Ｎ｝を得る。According to the above procedure, level 1 data D ′ = {(f ＾ _i , C (f ＾ _i )); i = 1,...

【００３５】識別関数統合工程では、まず、前記レベル
１データを用いて、前記線形重みを以下の手順１，２で
求める。In the discriminant function integrating step, first, the linear weights are obtained by the following procedures 1 and 2 using the level 1 data.

【００３６】（手順１）線形重みの初期値Ｗ（０）＝（α⁽¹⁾，…，α^(K)）を適当に設定する。ｔ←０とする。(Procedure 1) An initial value of linear weight W (0) = (α ⁽¹⁾ ,..., Α ^(K) ) is appropriately set. Let t ← 0.

【００３７】（手順２）適当な収束条件を満たすまで、(Procedure 2) Until an appropriate convergence condition is satisfied,

【数１５】を実行し、収束したＷの値を線形重みの値とする。(Equation 15) Is performed, and the converged value of W is set as the value of the linear weight.

【００３８】次に、前記で得られた線形重みとレベル０
学習工程で得られたＭ種類の学習済みニューラルネット
を用いて、式（４）に示した線形結合により統合識別関
数を得る。Next, the linear weight obtained above and the level 0
Using the M types of learned neural nets obtained in the learning step, an integrated discriminant function is obtained by the linear combination shown in equation (4).

【００３９】図２乃至図７は本発明の有効性を実験的に
示すものである。実験では、２次元、４クラスのガウス
分布FIGS. 2 to 7 show experimentally the effectiveness of the present invention. In the experiment, two-dimensional, four-class Gaussian distribution

【数１６】から人工的に学習データ：５０／class 、テストデー
タ：３００／class を生成した。上記分布から算出した
真の分類境界（Bayes 境界）を図２に重畳表示する。(Equation 16) , Artificially generated learning data: 50 / class and test data: 300 / class. The true classification boundary (Bayes boundary) calculated from the distribution is superimposed on FIG.

【００４０】前記正則化パラメータ値をλ＝５．０，
１．０，０．２，０．０４と変動させて中間ユニットＨ
＝２０のニューラルネットを各々学習して得られたレベ
ル１データを基に前記手順で統合識別関数を構成した。
図３に正則化パラメータ値λ＝５．０のとき、図４に同
λ＝１．０のとき、図５に同λ＝０．２のとき、図６に
λ＝０．０４のときの単一のニューラルネットから得ら
れたクラス境界をそれぞれ示す。また、図７に統合され
たニューラルネットから得られたクラス境界を示す。The regularization parameter value is λ = 5.0,
Intermediate unit H changed to 1.0, 0.2, 0.04
= 20, and an integrated discriminant function was constructed by the above procedure based on the level 1 data obtained by learning each of the neural networks of = 20.
3 when the regularization parameter value λ = 5.0, FIG. 4 when λ = 1.0, FIG. 5 when λ = 0.2, and FIG. 6 when λ = 0.04. Each class boundary obtained from a single neural network is shown. FIG. 7 shows class boundaries obtained from the integrated neural network.

【００４１】上述したように、図２乃至図６から、正則
化パラメータの値が大き過ぎる（λ＝５）と、クラス境
界が単純すぎるため柔軟な認識ができず、逆に、小さ過
ぎる（λ＝０．０４）と複雑なクラス境界により学習デ
ータに特化したクラス境界となってしまう。As described above, from FIGS. 2 to 6, if the value of the regularization parameter is too large (λ = 5), the class boundary is too simple to perform flexible recognition, and conversely, too small (λ = 0.04), which results in a class boundary specialized for learning data due to a complicated class boundary.

【００４２】実際、λ＝５．０，１．０，０．２，０．
０４に対する各ニューラルネットの学習データに対する
分類誤り率（％）は、順に、５８．０，２８．５，２
２．０，１９．５で、テストデータに対するそれは、順
に、５９．７，２８．３，２３．３，２３．６であっ
た。単一ニューラルネットでの予備実験ではλ＝０．２
のときが汎化誤差（テストエラー）が最小であった。In practice, λ = 5.0, 1.0, 0.2, 0.
The classification error rate (%) for the training data of each neural network for No. 04 is 58.0, 28.5, 2
At 2.0 and 19.5, that for the test data was 59.7, 28.3, 23.3 and 23.6, respectively. In preliminary experiments with a single neural network, λ = 0.2
At the time, the generalization error (test error) was the minimum.

【００４３】一方、統合した場合、学習データおよびテ
ストデータに対する分類誤り率は各々２０．０，２２．
４であった。Ｈ＝２０での単一のニューラルネットでの
最良での（λ＝０．２に相当）の２３．３％よりも良い
結果（２２．４％）が得られ、所望の識別器が構成でき
ている。また、図７を見ると、得られた分類境界は、ク
ラス１、クラス３間に対しては、λ＝１．０のそれに類
似し、クラス２、クラス３間に対しては、λ＝０．２の
それに類似し、更に、クラス２、クラス４間に対して
は、λ＝０．０４のそれに類似していることがわかる。
この結果は、統合識別関数が、各識別器の平均的な識別
器を構成するのではなく、各識別器の長所を活かした最
良な統合識別器を構成可能であることを示している。On the other hand, when integrated, the classification error rates for the learning data and the test data are 20.0, 22.
It was 4. A better result (22.4%) than the best (corresponding to λ = 0.2) 23.3% of a single neural network with H = 20 was obtained, and the desired classifier could be constructed. ing. 7, the obtained classification boundary is similar to that of λ = 1.0 between class 1 and class 3, and λ = 0 between class 2 and class 3. .2, and between Class 2 and Class 4, it is similar to that of λ = 0.04.
This result indicates that the integrated discriminant function does not constitute the average discriminator of each discriminator, but can construct the best integrated discriminator taking advantage of each discriminator.

【００４４】[0044]

【発明の効果】以上、説明したように、本発明によれ
ば、単一モデルの識別関数ではなく、複数モデルを線形
結合した識別関数を用いているので、複雑さの異なるク
ラス境界が混在する分類問題に対しても、適応的に適切
な複雑さのクラス境界が自動生成され、良好なクラス境
界が得られる。As described above, according to the present invention, an identification function obtained by linearly combining a plurality of models is used instead of an identification function of a single model, so that class boundaries having different complexity are mixed. Also for a classification problem, a class boundary having appropriate complexity is automatically generated adaptively, and a good class boundary is obtained.

[Brief description of the drawings]

【図１】本発明の一実施形態の複数の識別関数の統合に
よるパターン認識方法を実施するための装置の機能構成
を示すブロック図である。FIG. 1 is a block diagram showing a functional configuration of an apparatus for implementing a pattern recognition method by integrating a plurality of identification functions according to an embodiment of the present invention.

【図２】本発明の有効性を実験から示すための図であ
り、真のクラス境界を示す図である。FIG. 2 is a diagram for showing the effectiveness of the present invention from an experiment, and showing a true class boundary.

【図３】本発明の有効性を実験から示すための図であ
り、正則化パラメータ値λ＝５．０のときの単一のニュ
ーラルネットから得られたクラス境界を示す図である。FIG. 3 is a diagram for showing the effectiveness of the present invention from an experiment, showing class boundaries obtained from a single neural network when the regularization parameter value λ = 5.0.

【図４】本発明の有効性を実験から示すための図であ
り、正則化パラメータ値λ＝１．０のときの単一のニュ
ーラルネットから得られたクラス境界を示す図である。FIG. 4 is a diagram for showing the effectiveness of the present invention experimentally and showing class boundaries obtained from a single neural network when the regularization parameter value λ = 1.0.

【図５】本発明の有効性を実験から示すための図であ
り、正則化パラメータ値λ＝０．２のときの単一のニュ
ーラルネットから得られたクラス境界を示す図である。FIG. 5 is a diagram for showing the effectiveness of the present invention through experiments, and showing class boundaries obtained from a single neural network when the regularization parameter value λ = 0.2.

【図６】本発明の有効性を実験から示すための図であ
り、正則化パラメータ値λ＝０．０４のときの単一のニ
ューラルネットから得られたクラス境界を示す図であ
る。FIG. 6 is a diagram for showing the effectiveness of the present invention from an experiment, and is a diagram showing class boundaries obtained from a single neural network when the regularization parameter value λ = 0.04.

【図７】本発明の有効性を実験から示すための図であ
り、統合されたニューラルネットから得られたクラス境
界を示す図である。FIG. 7 is a diagram for showing the effectiveness of the present invention through experiments, and showing class boundaries obtained from an integrated neural network.

[Explanation of symbols]

１レベル０学習工程３レベル１学習工程５統合工程７訓練データ９レベル１データ 1 Level 0 learning process 3 Level 1 learning process 5 Integration process 7 Training data 9 Level 1 data

Claims

[Claims]

For a pattern recognition problem in which a feature vector obtained as an observation result of a certain pattern is classified into one of K classes, K identification functions are prepared, and a class having the maximum value of the identification function is determined. In the pattern recognition method using a discriminant function as a class to which the data belongs, M kinds of discriminant functions prepared in advance are used to convert N sets of training data comprising a pair of a feature vector and a class to which the feature vector belongs. A level 0 learning step of individually learning each of the M types of discriminant functions to form M types of level 0 discriminant functions, and sequentially extracting one set of training data from the N sets of training data and one set Each time sampling was performed, the M types of discriminant functions were newly learned using the remaining N-1 sets of training data, and M types of level 1 discriminant functions were constructed for each class. , A total of N consisting of pairs of class labels of the data drawn the the KM-dimensional vector consisting of the output value of the level 1 discriminant function M type for a set of data drawn the
A level 1 data generating step of forming a set of level 1 data; and a discriminant function integration of forming a new discriminant function as a linear sum of outputs of the discriminant functions learned in the level 0 learning step using the level 1 data. And performing pattern recognition using the identification function configured in the identification function integration step.

2. A level 1 of a loss function defined as a function of a degree of a classification error in the discriminant function integrating step.
2. The pattern recognition method according to claim 1, wherein the linear weight is determined so as to minimize an average value over data.