Let $y_i, i = 1, \dots, n$, be independent observations with the density of $y_i$ of the form $h(y_i, f_i) = \exp{y_i f_i - b(f_i) + c(y_i)]$, where b and c are given functions and b is twice continuously differentiable and bounded away from 0. Let $f_i = f(t(i))$, where $t = (t_1, \dots, t_d) \epsilon \mathsf{T}^{(1)} \otimes \dots \otimes \mathsf{T}^{(d)} = \mathsf{T}$, the $\mathsf{T}^{(\alpha)}$ are measurable spaces of rather general form and f is an unknown function on $\mathsf{T}$ with some assumed "smoothness" properties. Given ${y_i, t(i), i = 1, \dots, n}$, it is desired to estimate $f(t)$ for t in some region of interest contained in $\mathsf{T}$. We develop the fitting of smoothing spline ANOVA models to this data of the form $f(t) = C + \sum_{\alpha} f_{\alpha}(t_{\alpha}) + \sum_{\alpha < \beta} f_{\alpha \beta} (t_{\alpha}, t_{\beta}) + \dots$. The components of the decomposition satisfy side conditions which generalize the usual side conditions for parametric ANOVA. The estimate of f is obtained as the minimizer, in an appropriate function space, of $\mathsf{L}(y, f) + \sum_{\alpha} \lambda_{\alpha} J_{\alpha}(f_{\alpha}) + \sum_{\alpha <\beta} \lambda_{\alpha \beta} J_{\alpha \beta}(f_{\alpha \beta}) + \dots$, where $\mathsf{L}(y, f)$ is the negative log likelihood of $y = (y_1, \dots, y_n)'$ given f, the $J_{\alpha}, J_{\alpha \beta}, \dots$ are quadratic penalty functionals and the ANOVA decomposition is terminated in some manner. There are five major parts required to turn this program into a practical data analysis tool: (1) methods for deciding which terms in the ANOVA decomposition
to include (model selection), (2) methods for choosing good values of the smoothing parameters $\lambda_{\alpha}, \lambda_{\alpha \beta}, \dots$, (3) methods for making confidence statements concerning the estimate, (4) numerical
algorithms for the calculations and, finally, (5) public software. In this paper we carry out this program, relying on earlier work and filling in important gaps. The overall scheme is applied to Bernoulli data from the Wisconsin Epidemiologic Study of Diabetic Retinopathy to model the risk of progression of diabetic retinopathy as a function of glycosylated hemoglobin, duration of diabetes and body mass index. It is believed that the results have wide practical application to the analysis of data from large epidemiological studies.