Sample survey inference is historically concerned with
finite-population parameters, that is, functions (like means and totals) of the
observations for the individuals in the population. In scientific applications,
however, interest usually focuses on the “superpopulation”
parameters associated with a stochastic mechanismhypothesized to generate the
observations in the population rather than the finite-population parameters.
Two relevant findings discussed in this paper are that (1) with stratified
sampling, it is not sufficient to drop finite-population correction factors
from standard design-based variance formulas to obtain appropriate variance
formulas for superpopulation inference, and (2) with cluster sampling, standard
design-based variance formulas can dramatically underestimate superpopulation
variability, even with a small sampling fraction of the final units. A
literature review of inference for superpopulation parameters is given, with
emphasis on why these findings have not been previously appreciated. Examples
are provided for estimating superpopulation means, linear regression
coefficients and logistic regression coefficients using U.S. data from the 1987
National Health Interview Survey, the third National Health and Nutrition
Examination Survey and the 1986 National Hospital Discharge Survey.