Let $(X, Y)$ be a pair of random variables such that $X = (X_1, \cdots, X_J)$ and let $f$ by a function that depends on the joint distribution of $(X, Y).$ A variety of parametric and nonparametric models for $f$ are discussed in relation to flexibility, dimensionality, and interpretability. It is then supposed that each $X_j \in \lbrack 0, 1\rbrack,$ that $Y$ is real valued with mean $\mu$ and finite variance, and that $f$ is the regression function of $Y$ on $X.$ Let $f^\ast,$ of the form $f^\ast(x_1, \cdots, x_J) = \mu + f^\ast_1(x_1) + \cdots + f^\ast_J(x_J),$ be chosen subject to the constraints $Ef^\ast_j = 0$ for $1 \leq j \leq J$ to minimize $E\lbrack(f(X) - f^\ast(X))^2\rbrack.$ Then $f^\ast$ is the closest additive approximation to $f,$ and $f^\ast = f$ if $f$ itself is additive. Spline estimates of $f^\ast_j$ and its derivatives are considered based on a random sample from the distribution of $(X, Y).$ Under a common smoothness assumption on $f^\ast_j, 1 \leq j \leq J,$ and some mild auxiliary assumptions, these estimates achieve the same (optimal) rate of convergence for general $J$ as they do for $J = 1.$