Soit un couple aléatoire à valeurs dans et de loi inconnue. Soient des répliques i.i.d. de , de loi empirique associée . Soit un dictionnaire composé de fonctions. Pour tout , on note . Soit fonction de perte donnée que l’on suppose convexe en la seconde variable. On note . On étudie le problème de minimisation du risque empirique pénalisé suivant qui correspond à la version empirique du problème (ici est un paramètre de régularisation; correspond au cas ). Ce cadre général englobe un certain nombre de problèmes de régression et de classification. On s’intéresse au cas où , mais reste proche de 1 (de sorte que soit de l’ordre , ou inférieur). On montre que la «sparsité» de implique la «sparsité» de . En outre, on étudie les conséquences de la «sparsité» en termes de bornes supérieures sur l’excès de risque des solutions obtenues pour les différents problèmes de minimisation du risque empirique.
Let be a random couple in with unknown distribution . Let be i.i.d. copies of , being their empirical distribution. Let be a dictionary consisting of functions. For , denote . Let be a given loss function, which is convex with respect to the second variable. Denote . We study the following penalized empirical risk minimization problem which is an empirical version of the problem (here is a regularization parameter; corresponds to ). A number of regression and classification problems fit this general framework. We are interested in the case when , but it is close enough to 1 (so that is of the order , or smaller). We show that the “sparsity” of implies the “sparsity” of and study the impact of “sparsity” on bounding the excess risk of solutions of empirical risk minimization problems.
@article{AIHPB_2009__45_1_7_0, author = {Koltchinskii, Vladimir}, title = {Sparsity in penalized empirical risk minimization}, journal = {Annales de l'I.H.P. Probabilit\'es et statistiques}, volume = {45}, year = {2009}, pages = {7-57}, doi = {10.1214/07-AIHP146}, mrnumber = {2500227}, zbl = {1168.62044}, language = {en}, url = {http://dml.mathdoc.fr/item/AIHPB_2009__45_1_7_0} }
Koltchinskii, Vladimir. Sparsity in penalized empirical risk minimization. Annales de l'I.H.P. Probabilités et statistiques, Tome 45 (2009) pp. 7-57. doi : 10.1214/07-AIHP146. http://gdmltest.u-ga.fr/item/AIHPB_2009__45_1_7_0/
[1] Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 (1999) 301-413. | MR 1679028 | Zbl 0946.62036
, and .[2] Local Rademacher complexities. Ann. Statist. 33 (2005) 1497-1537. | MR 2166554 | Zbl 1083.62034
, and .[3] Lectures on Modern Convex Optimization. Analysis, Algorithms and Engineering Applications. MPS/SIAM, Series on Optimization, Philadelphia, 2001. | MR 1857264 | Zbl 0986.90032
and .[4] Aggregation for Gaussian regression. Ann. Statist. 35 (2007) 1674-1697. | MR 2351101 | Zbl pre05201517
, and .[5] Sparsity oracle inequalities for the LASSO. Electron. J. Statist. 1 (2007) 169-194. | MR 2312149 | Zbl 1146.62028
, and .[6] The Dantzig selector statistical estimation when p is much larger than n. Ann. Statist. 35 (2007) 2313-2351. | MR 2382644 | Zbl 1139.62019
and .[7] Error correction via linear programming. In Proc. 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS05) 295-308. IEEE, Pittsburgh, PA, 2005.
, , and .[8] Stable signal recovery from incomplete and inaccurate measurements. Comm. Pure Appl. Math. 59 (2006) 1207-1223. | MR 2230846 | Zbl 1098.94009
, and .[9] Statistical Learning Theory and Stochastic Optimization. Springer, New York, 2004. | MR 2163920 | Zbl 1076.93002
.[10] For most large underdetermined systems of equations the minimal ℓ1-norm near-solution approximates the sparsest near-solution. Preprint, 2004. | Zbl 1105.90068
.[11] For most large underdetermined systems of linear equations the minimal ℓ1-norm solution is also the sparsest solution. Comm. Pure Appl. Math. 59 (2006) 797-829. | MR 2217606 | Zbl 1113.15004
.[12] Compressed sensing. IEEE Trans. Inform. Theory 52 (2006) 1289-1306. | MR 2241189
.[13] Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52 (2006) 6-18. | MR 2237332
, and .[14] High-dimensional generalized linear models and the Lasso. Ann. Statist. 36 (2008) 614-645. | MR 2396809 | Zbl 1138.62323
.[15] Model selection and aggregation in sparse classification problems. Oberwolfach Reports Meeting on Statistical and Probabilistic Methods of Model Selection, October, 2005.
.[16] Local Rademacher complexities and oracle inequalities in risk mnimization. Ann. Statist. 34 (2006) 2593-2656. | MR 2329442 | Zbl 1118.62065
.[17] Complexities of convex combinations and bounding the generalization error in classification. Ann. Statist. 33 (2005) 1455-1496. | MR 2166553 | Zbl 1080.62045
and .[18] Probability in Banach Spaces. Springer, New York, 1991. | MR 1102015 | Zbl 0748.60004
and .[19] Some applications of concentration inequalities to statistics. Ann. Fac. Sci. Tolouse (IX) 9 (2000) 245-303. | Numdam | MR 1813803 | Zbl 0986.62002
.[20] Concentration Inequalities and Model Selection. Springer, Berlin, 2007. | MR 2319879 | Zbl 1170.60006
.[21] Reconstruction and subgaussian operators in Asymptotic Geometric Analysis. Geom. Funct. Anal. 17 (2007) 1248-1282. | MR 2373017 | Zbl 1163.46008
, and .[22] High-dimensional graphs and variable selection with the LASSO. Ann. Statist. 34 (2006) 1436-1462. | MR 2278363 | Zbl 1113.62082
and .[23] Topics in non-parametric statistics. In Ecole d'Et'e de Probabilités de Saint-Flour XXVIII, 1998 85-277. P. Bernard (Ed). Springer, New York, 2000. | MR 1775640 | Zbl 0998.62033
.[24] Geometric approach to error correcting codes and reconstruction of signals. Int. Math. Res. Not. 64 (2005) 4019-4041. | MR 2206919 | Zbl 1103.94014
and .[25] Regression shrinkage and selection via Lasso. J. Royal Statist. Soc. Ser. B 58 (1996) 267-288. | MR 1379242 | Zbl 0850.62538
.[26] Optimal rates of aggregation. In Proc. 16th Annual Conference on Learning Theory (COLT) and 7th Annual Workshop on Kernel Machines, 303-313. Lecture Notes in Artificial Intelligence 2777. Springer, New York, 2003. | Zbl pre05686304
.[27] Weak Convergence and Empirical Processes. Springer, New York, 1996. | MR 1385671 | Zbl 0862.60002
and .[28] Mixing strategies for density estimation. Ann. Statist. 28 (2000) 75-87. | MR 1762904 | Zbl 1106.62322
.[29] Aggregating regression procedures for a better performance. Bernoulli 10 (2004) 25-47. | MR 2044592 | Zbl 1040.62030
.[30] 7 (2006) 2541-2563. | MR 2274449
and . On model selection consistency of LASSO. J. Mach. Learn. Res.