Suppose that a fixed total number $N$ of observations are to be made in a population $Pi$ which is composed of two strata $\Pi_1$ and $\Pi_2$. For $i = 1, 2$, it is assumed that each observation in the stratum $\Pi_i$ has a normal distribution with unknown mean $\theta_i$ and specified precision $r_i (r_i > 0)$. It should be kept in mind that the precision of any normal distribution is the reciprocal of the variance. Let $p$ denote the unknown proportion of the total population $\Pi$ which is included in the stratum $\Pi_1$. Then the mean $\bar\theta$ of the population $\Pi$ is given by the equation $\bar{\theta} = p\theta_1 + q\theta_2$, where $p + q = 1$. In this paper, the problem of estimating the value of $\bar{\theta}$ will be studied. It will be assumed that the loss which results from any estimate $\delta$ is the squared error $(\delta - \bar{\theta})^2$. It is well known that for any prior distribution of $\theta_1, \theta_2$, and $p$, the Bayes estimate of $\bar{\theta}$, after all of the observations have been taken, will be the mean of the posterior distribution of $\bar{\theta}$. Furthermore, the expected loss from this estimate will be the variance of the posterior distribution of $\bar{\theta}$. Therefore, we must find a sampling procedure for which the expected value with respect to the prior distribution of this posterior variance will be minimized. Throughout this paper, it will be assumed that the joint prior distribution of $\theta_1, \theta_2$, and $p$ is as follows: $\theta_1, \theta_2$, and $p$ are independent; the distribution of $p$ is a beta distribution with parameters $\alpha$ and $\beta(\alpha > 0, \beta > 0)$; and for $i = 1, 2$, the distribution of $\theta_i$ is a normal distribution with mean $\mu_i$ and precision $h_i.$ This joint distribution has the following fundamental property: After any number of observations have been taken from $\Pi, \Pi_1,$ or $\Pi_2$, the posterior joint distribution of $\theta_1, \theta_1$, and $p$ will again be of the same form and, in particular, $\theta_1, \theta_1$, and $p$ will again be independent under their posterior distribution. We assume that sampling will be carried out in two stages. At the first stage, a random sample of size $m (0 \leqq m \leqq N)$ will be taken from the whole population $\Pi$. At the second stage, the remaining observations $N - m$ are to be allocated between the two strata $\Pi_1$ and $\Pi_2$. Hence, at the second stage, $n_i$ observations are taken from the stratum $\Pi_i$, where $n_1 + n_2 = N - m$. The problem is to find an optimal choice of the design constants $m, n_1,$ and $n_2$. Note that the value of $m$ must be chosen in advance of any sampling, whereas the constants $n_1$ and $n_2$ need not be chosen until the values of the first $m$ observations obtained from the whole population have been studied. In this paper, we shall develop effective approximations to the optimal sampling procedure for situations in which the total number $N$ of available observations is large and, therefore, the optimal number $m$ of observations which should be obtained at the first stage will also be large. The techniques which will be presented can be extended for studying populations which are composed of $k$ strata $(k \geqq 2)$, in each of which the observations have a normal distribution. However, although the theory can be extended without difficulty, the actual computations become somewhat more complex, and we shall not consider these extensions. The optimal allocation of observations from the Bayesian point of view has also been studied by Ericson [2], [3] and Draper and Guttman [1]. Ericson [2] studied a related optimal one-stage stratified sampling scheme in which the proportion in each stratum is known. Draper and Guttman considered the optimal allocation at the second stage of a two-stage process, extending [2]. Ericson [3] investigated an optimal two-stage design different from ours in a nonresponse context. Here we shall study the basic problem of finding the optimal choice of $m$ at the first stage.