Feature Selection and Classification by a Modified Model with Latent Structure

The general classification or discrimination problem is about guessing or predicting the uknown nature of the object of the interest. The uknown nature of the object is called a class which is denoted by ω and takes values in a finite set Ω = {ω 1, ω 2,..., ω C}. An object is described by a D—dimensional vector of features x = (x 1, x 2,..., x D )T ∈ X ⊂ R D . We wish to build a rule r(x): R D → Ω, which represents one’s guess of a given x. The mapping is called classifier. Considering the statistical approach to the problem of classification the objects are supposed to occur randomly according to some true class conditional pdfs p ★(x|ω) and the respective a priori probabilities P ★(ω). Vector x can be then optimally classified using the Bayes decision rule r(x), r : X → Ω: $$if P*({\omega _i}|x) \geqslant P*({\omega _j}|x) for all j = 1,2,...,C$$ (1) then the object will be classified as belonging to class ω i , i.e. r(x) = ω i . Here P ★(ω j |x) are the posterior class probabilities $$P*(\omega |x) = \frac{{P*(\omega )p*(x|\omega )}}{{\sum\nolimits_{\omega \in \Omega } {P*(\omega )p*(x|\omega )} }}$$ (2) normalised so that \(\sum\nolimits_{\omega \in \Omega } {P*(\omega |x) = 1}\).

[1]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[2]  Josef Kittler,et al.  Automatic Machine Learning of Decision Rules for Classification Problems in Image Analysis , 1993, BMVC.

[3]  Dick E. Boekee,et al.  Some aspects of error bounds in feature selection , 1979, Pattern Recognit..

[4]  G J McLachlan,et al.  Mixture models for partially unclassified data: a case study of renal venous renin in hypertension. , 1989, Statistics in medicine.

[5]  Josef Kittler,et al.  Feature selection based on the approximation of class densities by finite mixtures of special type , 1995, Pattern Recognit..

[6]  Anil K. Jain Advances in statistical pattern recognition , 1987 .

[7]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[8]  Hans G. C. Tråvén,et al.  A neural network approach to statistical pattern classification by 'semiparametric' estimation of probability density functions , 1991, IEEE Trans. Neural Networks.

[9]  David J. Hand,et al.  Discrimination and Classification , 1982 .

[10]  P Pudil,et al.  Simultaneous learning of decision rules and important attributes for classification problems in image analysis , 1994, Image Vis. Comput..

[11]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[12]  D. M. Titterington,et al.  Neural Networks: A Review from a Statistical Perspective , 1994 .

[13]  Josef Kittler,et al.  Divergence Based Feature Selection for Multimodal Class Densities , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Jirí Grim Multivariate statistical pattern recognition with nonreduced dimensionality , 1986, Kybernetika.

[15]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[16]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[17]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[18]  R. Tibshirani,et al.  Discriminant Analysis by Gaussian Mixtures , 1996 .

[19]  Solomon Kullback,et al.  Approximating discrete probability distributions , 1969, IEEE Trans. Inf. Theory.

[20]  Hans Christian Palm A new method for generating statistical classifiers assuming linear mixtures of Gaussian densities , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[21]  Two‐Group Classification when Both Groups are Mixtures of Normals , 1984 .

[22]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[23]  Line Eikvil,et al.  Statistical classification using a linear mixture of two multinormal probability densities , 1991, Pattern Recognit. Lett..