Extending work of Rubin, this paper explores a Bayesian counterpart of the classical $p$-value, namely, a tail-area probability of a "test statistic" under a null hypothesis. The Bayesian formulation, using posterior predictive replications of the data, allows a "test statistic" to depend on both data and unknown (nuisance) parameters and thus permits a direct measure of the discrepancy between sample and population quantities. The tail-area probability for a "test statistic" is then found under the joint posterior distribution of replicate data and the (nuisance) parameters, both conditional on the null hypothesis. This posterior predictive $p$-value can also be viewed as the posterior mean of a classical $p$-value, averaging over the posterior distribution of (nuisance) parameters under the null hypothesis, and thus it provides one general method for dealing with nuisance parameters. Two classical examples, including the Behrens-Fisher problem, are used to illustrate the posterior predictive $p$-value and some of its interesting properties, which also reveal a new (Bayesian) interpretation for some classical $p$-values. An application to multiple-imputation inference is also presented. A frequency evaluation shows that, in general, if the replication is defined by new (nuisance) parameters and new data, then the Type I frequentist error of an $\alpha$-level posterior predictive test is often close to but less than $\alpha$ and will never exceed $2\alpha$.