Smoothing spline ANOVA for exponential families, with application to the Wisconsin Epidemiological Study of Diabetic Retinopathy : the 1994 Neyman Memorial Lecture
Wahba, Grace ; Wang, Yuedong ; Gu, Chong ; Klein, Ronald ; Klein, Barbara
Ann. Statist., Tome 23 (1995) no. 6, p. 1865-1895 / Harvested from Project Euclid
Let $y_i, i = 1, \dots, n$, be independent observations with the density of $y_i$ of the form $h(y_i, f_i) = \exp{y_i f_i - b(f_i) + c(y_i)]$, where b and c are given functions and b is twice continuously differentiable and bounded away from 0. Let $f_i = f(t(i))$, where $t = (t_1, \dots, t_d) \epsilon \mathsf{T}^{(1)} \otimes \dots \otimes \mathsf{T}^{(d)} = \mathsf{T}$, the $\mathsf{T}^{(\alpha)}$ are measurable spaces of rather general form and f is an unknown function on $\mathsf{T}$ with some assumed "smoothness" properties. Given ${y_i, t(i), i = 1, \dots, n}$, it is desired to estimate $f(t)$ for t in some region of interest contained in $\mathsf{T}$. We develop the fitting of smoothing spline ANOVA models to this data of the form $f(t) = C + \sum_{\alpha} f_{\alpha}(t_{\alpha}) + \sum_{\alpha < \beta} f_{\alpha \beta} (t_{\alpha}, t_{\beta}) + \dots$. The components of the decomposition satisfy side conditions which generalize the usual side conditions for parametric ANOVA. The estimate of f is obtained as the minimizer, in an appropriate function space, of $\mathsf{L}(y, f) + \sum_{\alpha} \lambda_{\alpha} J_{\alpha}(f_{\alpha}) + \sum_{\alpha <\beta} \lambda_{\alpha \beta} J_{\alpha \beta}(f_{\alpha \beta}) + \dots$, where $\mathsf{L}(y, f)$ is the negative log likelihood of $y = (y_1, \dots, y_n)'$ given f, the $J_{\alpha}, J_{\alpha \beta}, \dots$ are quadratic penalty functionals and the ANOVA decomposition is terminated in some manner. There are five major parts required to turn this program into a practical data analysis tool: (1) methods for deciding which terms in the ANOVA decomposition to include (model selection), (2) methods for choosing good values of the smoothing parameters $\lambda_{\alpha}, \lambda_{\alpha \beta}, \dots$, (3) methods for making confidence statements concerning the estimate, (4) numerical algorithms for the calculations and, finally, (5) public software. In this paper we carry out this program, relying on earlier work and filling in important gaps. The overall scheme is applied to Bernoulli data from the Wisconsin Epidemiologic Study of Diabetic Retinopathy to model the risk of progression of diabetic retinopathy as a function of glycosylated hemoglobin, duration of diabetes and body mass index. It is believed that the results have wide practical application to the analysis of data from large epidemiological studies.
Publié le : 1995-12-14
Classification:  Smoothing spline ANOVA,  nonparametric regression,  exponential families,  risk factor estimation,  62G07,  92C60,  68T05,  65D07,  65D10,  62A99,  62J07,  41A63,  41A15,  62M30,  65D15,  92H25,  49M15
@article{1034713638,
     author = {Wahba, Grace and Wang, Yuedong and Gu, Chong and Klein, Ronald and Klein, Barbara},
     title = {Smoothing spline ANOVA for exponential families, with application to the Wisconsin Epidemiological Study of Diabetic Retinopathy : the 1994 Neyman Memorial Lecture},
     journal = {Ann. Statist.},
     volume = {23},
     number = {6},
     year = {1995},
     pages = { 1865-1895},
     language = {en},
     url = {http://dml.mathdoc.fr/item/1034713638}
}
Wahba, Grace; Wang, Yuedong; Gu, Chong; Klein, Ronald; Klein, Barbara. Smoothing spline ANOVA for exponential families, with application to the Wisconsin Epidemiological Study of Diabetic Retinopathy : the 1994 Neyman Memorial Lecture. Ann. Statist., Tome 23 (1995) no. 6, pp.  1865-1895. http://gdmltest.u-ga.fr/item/1034713638/