Minimum Impurity Partitions
Burshtein, David ; Pietra, Vincent Della ; Kanevsky, Dimitri ; Nadas, Arthur
Ann. Statist., Tome 20 (1992) no. 1, p. 1637-1646 / Harvested from Project Euclid
Let $(X, U)$ be jointly distributed on $\mathscr{X} \times \mathscr{R}^n$. Let $Y = E(U\mid X)$ and let $\mathscr{U}$ be the convex hull of the range of $U$. Let $C: \mathscr{X} \rightarrow \mathscr{C} = \{1,2,\ldots,k\}, k \geq 1$, induce a measurable $k$ way partition $\{\mathscr{X}_1,\ldots,\mathscr{X}_k\}$ of $\mathscr{X}$. Define the impurity of $\mathscr{X}_c = C^{-1}(c)$ to be $\phi(c, E(U\mid C(X) = c))$, where $\phi: \mathscr{C} \times \mathscr{U} \rightarrow \mathscr{R}^1$ is a concave function in its second argument. Define the impurity $\Psi$ of the partition as the average impurity of its members: $\Psi(C) = E\phi(C(X), E(U\mid C(X))$. We show that for any $C: \mathscr{X} \rightarrow \mathscr{C}$ there exists a mapping $\tilde{C}: \mathscr{U} \rightarrow \mathscr{C}$, such that $\Psi(\tilde{C}(Y)) \leq \Psi(C)$ and such that $\tilde{C}^{-1}(c)$ is convex, that is, for each $i, j \in C, i \neq j$, there exists a separating hyperplane between $\tilde{C}^{-1}(i)$ and $\tilde{C}^{-1}(j)$. This generalizes some results in statistics and information theory. Suitable choices of $U$ and $\phi$ lead to optimal partitions of simple form useful in the construction of classification trees and multidimensional regression trees.
Publié le : 1992-09-14
Classification:  Classification,  discrimination,  decision theory,  regression,  partitioning,  decision trees,  CART,  62H30,  62C05,  62C10,  62J02,  68T05,  68T10
@article{1176348789,
     author = {Burshtein, David and Pietra, Vincent Della and Kanevsky, Dimitri and Nadas, Arthur},
     title = {Minimum Impurity Partitions},
     journal = {Ann. Statist.},
     volume = {20},
     number = {1},
     year = {1992},
     pages = { 1637-1646},
     language = {en},
     url = {http://dml.mathdoc.fr/item/1176348789}
}
Burshtein, David; Pietra, Vincent Della; Kanevsky, Dimitri; Nadas, Arthur. Minimum Impurity Partitions. Ann. Statist., Tome 20 (1992) no. 1, pp.  1637-1646. http://gdmltest.u-ga.fr/item/1176348789/