This paper deals with the following approach for estimating the mean $\mu$ of an $n$-dimensional random vector $Y$: first, a family $\mathbf{S}$ of $n \times n$ matrices is specified. Then, an element $\widehat{S} \in \mathbf{S}$ is selected by Mallows $C_L$, and $\widehat{\mu} = \widehat{S}\cdot Y$. The case is considered that $\mathbf{S}$ is an "ordered linear smoother" according to some easily interpretable, qualitative conditions. Examples include linear smoothing procedures in nonparametric regression (as, e.g., smoothing splines, minimax spline smoothers and kernel estimators). Stochastic probability bounds are given for the difference $(1/n)\|\mu - \widehat{S}\cdot Y\|^2_2 - (1/n)\|\mu - \widehat{S}_\mu\cdot Y\|^2_2$, where $\widehat{S}_\mu$ denotes the minimizer of $(1/n)\|\mu - S\cdot Y\|^2_2$ for $S \in \mathbf{S}$. These probability bounds are generalized to the situation that $\mathbf{S}$ is the union of a moderate number of ordered linear smoothers. The results complement work by Li on the asymptotic optimality of $C_L$. Implications for nonparametric regression are studied in detail. It is shown that there exists a direct connection between James-Stein estimation and the use of smoothing procedures, leading to a decision-theoretic justification of the latter. Further conclusions concern the choice of the order of a smoothing spline or a minimax spline smoother and the rates of convergence of smoothing parameters.