The problem of clustering a data set is formulated in terms of the
Wasserstein barycenter problem in optimal transport. The objective proposed is
the maximization of the variability attributable to class, further
characterized as the minimization of the variance of the Wasserstein
barycenter. Existing theory, which constrains the transport maps to rigid
translations, is generalized to include affine transformations, which are
proved optimal for the purpose of clustering. The resulting non-parametric
clustering algorithms include k-means as a special case and have more robust
performance, demonstrated by comparisons with popular clustering algorithms on
both artificial and real-world data sets.