Let $(X, U)$ be jointly distributed on $\mathscr{X} \times \mathscr{R}^n$. Let $Y = E(U\mid X)$ and let $\mathscr{U}$ be the convex hull of the range of $U$. Let $C: \mathscr{X} \rightarrow \mathscr{C} = \{1,2,\ldots,k\}, k \geq 1$, induce a measurable $k$ way partition $\{\mathscr{X}_1,\ldots,\mathscr{X}_k\}$ of $\mathscr{X}$. Define the impurity of $\mathscr{X}_c = C^{-1}(c)$ to be $\phi(c, E(U\mid C(X) = c))$, where $\phi: \mathscr{C} \times \mathscr{U} \rightarrow \mathscr{R}^1$ is a concave function in its second argument. Define the impurity $\Psi$ of the partition as the average impurity of its members: $\Psi(C) = E\phi(C(X), E(U\mid C(X))$. We show that for any $C: \mathscr{X} \rightarrow \mathscr{C}$ there exists a mapping $\tilde{C}: \mathscr{U} \rightarrow \mathscr{C}$, such that $\Psi(\tilde{C}(Y)) \leq \Psi(C)$ and such that $\tilde{C}^{-1}(c)$ is convex, that is, for each $i, j \in C, i \neq j$, there exists a separating hyperplane between $\tilde{C}^{-1}(i)$ and $\tilde{C}^{-1}(j)$. This generalizes some results in statistics and information theory. Suitable choices of $U$ and $\phi$ lead to optimal partitions of simple form useful in the construction of classification trees and multidimensional regression trees.