The Linear Discriminant Analysis (LDA) technique is an important and well-developed area of classification, and to date many linear (and also nonlinear) discrimination methods have been put forward. A complication in applying LDA to real data occurs when the number of features exceeds that of observations. In this case, the covariance estimates do not have full rank, and thus cannot be inverted. There are a number of ways to deal with this problem. In this paper, we propose improving LDA in this area, and we present a new approach which uses a generalization of the Moore-Penrose pseudoinverse to remove this weakness. Our new approach, in addition to managing the problem of inverting the covariance matrix, significantly improves the quality of classification, also on data sets where we can invert the covariance matrix. Experimental results on various data sets demonstrate that our improvements to LDA are efficient and our approach outperforms LDA.
@article{bwmeta1.element.bwnjournal-article-amcv23z2p463bwm, author = {Tomasz G\'orecki and Maciej \L uczak}, title = {Linear discriminant analysis with a generalization of the Moore-Penrose pseudoinverse}, journal = {International Journal of Applied Mathematics and Computer Science}, volume = {23}, year = {2013}, pages = {463-471}, zbl = {06246503}, language = {en}, url = {http://dml.mathdoc.fr/item/bwmeta1.element.bwnjournal-article-amcv23z2p463bwm} }
Tomasz Górecki; Maciej Łuczak. Linear discriminant analysis with a generalization of the Moore-Penrose pseudoinverse. International Journal of Applied Mathematics and Computer Science, Tome 23 (2013) pp. 463-471. http://gdmltest.u-ga.fr/item/bwmeta1.element.bwnjournal-article-amcv23z2p463bwm/
[000] Anderson, T.W. (1984). An Introduction to Multivariate Analysis, Wiley, New York, NY. | Zbl 0651.62041
[001] Bensmail, H. and Celeux, G. (1996). Regularized Gaussian discriminant analysis through eigenvalue decomposition, Journal of the American Statistical Association 91(436): 1743-1748. | Zbl 0885.62068
[002] Bergmann, G. and Hommel, G. (1988). Improvements of general multiple test procedures for redundant systems of hypotheses, in P. Bauer, G. Hommel and E. Sonnemann (Eds.), Multiple Hypotheses Testing, Springer, Berlin, pp. 110-115.
[003] Chen, L.-F., Liao, H.-Y. M., Ko, M.-T., Lin, J.-C. and Yu, G.-J. (2000). A new LDA-based face recognition system which can solve the small sample size problem, Pattern Recognition 33(10): 1713-1726.
[004] Cozzolino, D., Restaino, E. and Fassio, A. (2002). Discrimination of yerba mate (Ilex paraguayensis st. hil.) samples according to their geographical origin by means of near infrared spectroscopy and multivariate analysis, Sensing and Instrumentation for Food Quality and Safety 4(2): 67-72.
[005] d'Aspremont, A., Banerjee, O. and El Ghaoui, L. (2008). First-order methods for sparse covariance selection, SIAM Journal on Matrix Analysis and Applications 30(1): 56-66. | Zbl 1156.90423
[006] Dempster, A. (1972). Covariance selection, Biometrics 28(1): 157-175.
[007] Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research 7(1): 1-30. | Zbl 1222.68184
[008] Dey, D.K. and Srinivasan, C. (1985). Estimation of a covariance matrix under Stein's loss, The Annals of Statistics 1(4): 1581-1591. | Zbl 0582.62042
[009] Dillon, W. and Goldstein, M. (1984). Multivariate Analysis: Methods and Applications, Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics, Wiley, New York, NY. | Zbl 0658.62071
[010] Duda, R., Hart, P. and Stork, D. (2001). Pattern Classification, Wiley, New York, NY. | Zbl 0968.68140
[011] Enis, P. and Geisser, S. (1986). Optimal predictive linear discriminants, Annals of Statistics 2(2): 403-410. | Zbl 0449.62043
[012] Frank, A. and Asuncion, A. (2010). UCI Machine Learning Repository, University of California, Irvine, CA, http://archive.ics.uci.edu/ml.
[013] Friedman, J. H. (1989). Regularized discriminant analysis, Journal of the American Statistical Association 84(405): 165-175.
[014] Garcia, S. and Herrera, F. (2008). An extension on “Statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons, Journal of Machine Learning Research 9(12): 2677-2694. | Zbl 1225.68178
[015] Gleim, G.W. (1984). The profiling of professional football players, Clinics in Sports Medicine 3(1): 185-197.
[016] Hawkins, A.D. and Rasmussen, K.J. (1978). The calls of gadoid fish, Journal of the Marine Biological Association of the United Kingdom 58(4): 891-911.
[017] Hommel, G. and Bernhard, G. (1994). A rapid algorithm and a computer program for multiple test procedures using logical structures of hypotheses, Computer Methods and Programs in Biomedicine 43(3-4): 213-6.
[018] Hong, Z.-Q. and Yang, J.-Y. (1991). Optimal discriminant plane for a small number of samples and design method of classifier on the plane, Pattern Recognition 24(4): 317-324.
[019] Iman, R. and Davenport, J. (1980). Approximations of the critical region of the Friedman statistic, Communications in Statistics-Theory and Methods 9(6): 571-595. | Zbl 0451.62061
[020] Kuo, B.-C. and Landgrebe, D.A. (2002). A covariance estimator for small sample size classification problems and its application to feature extraction, IEEE Transactions on Geoscience and Remote Sensing 40(4): 814-819.
[021] Kwak, N., Kim, S., Lee, C. and Choi, T. (2002). An application of linear programming discriminant analysis to classifying and predicting the symptomatic status of HIV/AIDS patients, Journal of Medical Systems 26(5): 427-438.
[022] Lim, T.-S., Loh, W.-Y. and Shih, Y.-S. (2000). A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms, Machine Learning 40(3): 203-228. | Zbl 0969.68669
[023] Morrison, D. (1990). Multivariate Statistical Methods, McGraw-Hill Series in Probability and Statistics, McGraw-Hill, New York, NY.
[024] Nemenyi, P. (1963). Distribution-free Multiple Comparisons, Ph.D. thesis, Princeton University, Princeton, NJ.
[025] Olkin, I. and Selliah, J. (1975). Estimating covariances in a multivariate normal distribution, Technical report, Stanford University, Stanford, CA. | Zbl 0432.62032
[026] Piegat, A. and Landowski, A. (2012). Optimal estimator of hypothesis probability for data mining problems with small samples, International Journal of Applied Mathematics and Computer Science 22(3): 629-645, DOI: 10.2478/v10006-012-0048-z. | Zbl 1302.93206
[027] Rao, C. and Mitra, S. (1971). Generalized Inverse of Matrices and Its Applications, Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics, Wiley, New York, NY. | Zbl 0236.15004
[028] Röbenack, K. and Reinschke, K. (2011). On generalized inverses of singular matrix pencils, International Journal of Applied Mathematics and Computer Science 21(1): 161-172, DOI: 10.2478/v10006-011-0012-3. | Zbl 1221.93096
[029] Sharma, A. and Paliwal, K.K. (2008). A gradient linear discriminant analysis for small sample sized problem, Neural Processing Letters 27(1): 17-24.
[030] Shin, Y.J. and Park, C.H. (2011). Analysis of correlation based dimension reduction methods, International Journal of Applied Mathematics and Computer Science 21(3): 549-558, DOI: 10.2478/v10006-011-0043-9. | Zbl 1230.68173
[031] Song, F., Zhang, D., Chen, Q. and Wang, J. (2007). Face recognition based on a novel linear discriminant criterion, Pattern Analysis and Applications 10(3): 165-174.
[032] StatSoft, I. (2007). Statistica (data analysis software system), version 8.0, http://www.statsoft.com.
[033] Stein, C., Efron, B. and Morris, C. (1972). Improving the Usual Estimator of a Normal Covariance Matrix, Stanford University, Stanford, CA.
[034] Swets, D.L. and Weng, J. (1996). Using discriminant eigenfeatures for image retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence 18(8): 831-836.
[035] Tian, Q., Fainman, Y., Gu, Z.H. and Lee, S.H. (1988). Comparison of statistical pattern-recognition algorithms for hybrid processing, I: Linear-mapping algorithms, Journal of the Optical Society of America A: Optics, Image Science and Vision 5(10): 1655-1669.
[036] Titterington, D. (1985). Common structure of smoothing techniques in statistics, International Statistical Review 53(2): 141-170. | Zbl 0569.62026
[037] van der Heijden, F., Duin, R., de Ridder, D. and Tax, D. (2004). Classification, Parameter Estimation and State Estimation, Wiley, New York, NY. | Zbl 1079.62129
[038] Yu, H. and Yang, J. (2001). A direct LDA algorithm for high-dimensional data with application to face recognition, Pattern Recognition 34(10): 2067-2070. | Zbl 0993.68091