Real world data often exhibit inhomogeneity - complexity of the target
function, noise level, etc. are not uniform over the input space. We address
the issue of estimating locally optimal kernel bandwidth as a way to describe
inhomogeneity. Estimated kernel bandwidths can be used not only for improving
the regression/classification performance, but also for Bayesian optimization
and active learning, i.e., we need more samples in the region where the
function complexity and the noise level are higher. Our method, called kernel
mixture of kernel experts regression (KMKER) follows the concept of mixture of
experts, which is constituted of several complementary inference models, the so
called experts, where in advance a latent classifier, called the gate, predicts
the best fitting expert for each test input to infer. For the experts we
implement Gaussian process regression models at different (global) bandwidths
and a multinomial kernel logistic regression model as the gate. The basic idea
behind mixture of experts is, that several distinct ground truth functions over
a joint input space drive the observations, which one may want to disentangle.
Each expert is meant to model one of the incompatible functions such that each
expert needs its individual set of hyperparameters. We differ from that idea in
the sense that we assume only one ground truth function which however exhibits
spacially inhomogeneous behavior. Under these assumptions we share the
hyperparameters among the experts keeping their number constant. We compare
KMKER to previous methods (which cope with inhomogeneity but do not provide the
optimal bandwidth estimator) on artificial and benchmark data and analyze its
performance and capability for interpretation on datasets from quantum
chemistry. We also demonstrate how KMKER can be applied for automatic adaptive
grid selection in fluid dynamics simulations.