台大應數所(碩士班) 102年度一般考試 機率統計解答

題目 https://exam.lib.ntu.edu.tw/sites/default/files/exam/graduate/102/102061.pdf

評論: 第2, 3, 5, 6題比較容易,第1題看起來是個基本題,不過需要一點技巧。第4題,找不偏檢定的題目比較少見,可能大部分的考生都不知如何下手。

1. Let X,Y be independent standard normal random variables. Find the distribution of 2XY / \sqrt{X^2+Y^2}.

(方法) 考慮極坐標 X=r\cos\theta,Y=r\sin\theta where r>0,0 \leq \theta < 2\pi,此時r, \theta為獨立的隨機變數。2XY / \sqrt{X^2+Y^2}=r\sin2\theta,因此我們只要證明\sin\theta\sin 2\theta的分布相同即可。

(解答) Define X=r\cos\theta, Y=r\sin\theta where r\geq 0 and \theta \in [0,2\pi). Let g: \mathbb{R}^2 \mapsto \mathbb{R} be an arbitrary bounded continuous function. Recall that the distribution of the random variables is characterized by the expected value of g(r,\theta):

E[g(r,\theta)] = \int\int_{x,y \in \mathbb{R}} g(\cdot,\cdot) \frac{1}{2\pi} \exp(-\frac{x^2+y^2}{2}) dxdy,
where r = \sqrt{x^2+y^2} and \theta is expressed by the quation below:
\theta = \begin{cases} \arctan(y/x) &  (x\geq0,y\geq0) \\ \arctan(y/x) +\pi & (x<0) \\ \arctan(y/x) + 2\pi &  (x\geq0, y<0) \end{cases}
By a variable transformation, the right hand side can be further rewritten as:
= \int_{\theta \in [0,2\pi)} \int_{r>0} g(r,\theta) \frac{1}{2\pi} r \exp(-\frac{r^2}{2})drd\theta,
from which we can see that r, \theta are independently distributed and the marginal distributions of r, \theta are 1. a distribution with a probabilty density function f_r(r) = r \exp(-\frac{r^2}{2})\mathbb{1}(r>0) and 2. {\rm Unif}(0,2\pi).

Recall that r\sin\theta follows N(0,1) and here r, \theta are independent random variables. Therefore if \sin\theta and \sin2\theta follow the identical distribution, then it follows that r\sin2\theta = \frac{2XY}{\sqrt{X^2+Y^2}} also follows N(0,1).

Again let g: \mathbb{R}\mapsto\mathbb{R} be an arbitrary bounded continuous function and consider the expected value of E[g(U)] where U=\sin\theta:

E[g(U)] = E[g(\sin\theta)] = \int_{0}^{2\pi} g(\sin\theta) \frac{1}{2\pi} d\theta.

We devide the integral into three parts: [0,\frac{\pi}{2}), [\frac{\pi}{2}, \frac{3\pi}{2}) and [\frac{3\pi}{2}, 2\pi), so that \theta and u are one-to-one. (The first interval and the third interval will be marged later.)

The above integral (the right hand side) can be splitted into three parts:
\int_{0}^{\pi/2} g(\sin\theta) \frac{1}{2\pi} d\theta + \int_{\pi/2}^{3\pi/2} g(\sin\theta) \frac{1}{2\pi} d\theta + \int_{3\pi/2}^{2\pi} g(\sin\theta) \frac{1}{2\pi} d\theta.

By a variable transformation the above integrals can be rewritten as:
\int_{0}^{1} g(u) \frac{1}{2\pi} \frac{1}{\sqrt{1-u^2}} du + \int_{-1}^{1} g(u) \frac{1}{2\pi} \frac{1}{\sqrt{1-u^2}} du + \int_{-1}^{0} g(u) \frac{1}{2\pi} \frac{1}{\sqrt{1-u^2}}du.

By merging the above three integrals, we have:
E[g(U)] = \int_{-1}^{1} g(u) \frac{1}{\pi} \frac{1}{\sqrt{1-u^2}} du,
which implies that the probability density function of U=\sin\theta is:
\frac{1}{\pi} \frac{1}{\sqrt{1-u^2}} \mathbb{1}_{[-1,1]}(u).

By repeating a similar argument on \sin2\theta, we also have the distribution of \sin2\theta is identical with that of \sin\theta.

Finally, we thus conclude that the distribution of the \frac{2XY}{\sqrt{X^2+Y^2}} is also a standard normal.

2. Let U_1, \ldots, U_n, \ldots be iid Uniform(0,1) random variables and X have the distribution:
P(X=x)=\frac{1}{(e-1)x!}\mathbb{1}_{\mathbb{N}}(x).
Find the distribution of Z=\min\{U_1, \ldots, U_X\}.

(方法) 考慮Z的cdf。計算Z的CDF時可以先取給定X的條件期望再對X取期望。

(解答) Note that
P(Z \leq Z) = 1-P(Z>z).
Consider 1-P(Z>z):
P(Z>z) = EE[\mathbb{1}(Z>z)\mid X] = E[P(Z>z\mid X)].
Consider the part P(Z>z\mid X).
P(Z>z\mid X) = \prod_{i=1}^X P(U_i > z) = \begin{cases} 1 & (z<0) \\ (1-z)^X & (0 \leq z \leq 1) \\ 0 & (z>1) \end{cases}
Note that for 0 \leq z \leq 1, we have
P(Z>z) = E[P(Z>z\mid X)] = E[(1-z)^X].
Furthermore,
E[(1-z)^X] = \frac{1}{e-1} \sum_{x=1}^\infty \frac{(1-z)^x}{x!} = \frac{e^{1-z}-1}{e-1}.
Finally, we conclude that the CDF of Z is as follows:
P(Z\leq z) = \begin{cases} 0 & (z<0) \\ 1-\frac{e^{1-z}-1}{e-1} & (0 \leq z \leq 1) \\ 1 & (z>1) \end{cases}.
We can also characterize the distribution of Z by its probability density function:
f_Z(z)= \frac{e}{e-1} e^{-z} \mathbb{1}_{[0,1]}(z).

3. Let X_1, \ldots, X_n be a random sample from a population with a probability denity funtion:
f(x\mid \theta_0) = \theta_0 x^{\theta_0-1} \mathbb{1}_{(0,1)}(x),
where \theta_0>0. Find the UMVUE of \theta_0.

(方法) Beta分配屬於指數族。在找到\theta的完備充分統計量T(\bm{x})後,根據T(\bm{x})造一個期望值為\theta的統計量。

(解答) Note that the joint probability density function of x_1, \cdots, x_n is written as:
\frac{\theta^n}{x_1\cdots x_n} \exp((-\theta)\sum_{i=1}^n (-\ln x_i)) \mathbb{1}(0<x_{(1)}, x_{(n)}<1).
Therefore we can see that the distribution belongs to an exponential family. Furthermore, note that a subset in \mathbb{R}^1 , \{-\theta \mid \theta > 0\} contains an interior point, (i.e., it can contain an interval), thus \sum_{i=1}^n (-\ln x_i) is a complete sufficient statistic for \theta.

Next let us find a distribution of Y =-\ln X where X \sim {\rm Beta}(\theta,1). The distribution of a random variable is characterized by the expectation of g(Y) where g(\cdot) is an arbitrary bounded continuous function.

Consider E[g(Y)]:
E[g(Y)] = E[g(-\ln X)] = \int_0^1 g(-\ln x) \theta x^{\theta-1} dx = \int_0^\infty g(y) \theta \exp(-\theta y) dy,
where the third equality is obtained by a variable transformation. From the last equation, we can see that the probability density function of Y is \theta \exp(-\theta y) \mathbb{1}_{(0,\infty)}(y). So Y \sim \exp(\theta) = {\rm Gamma}(1,\theta) with E[Y] = \frac{1}{\theta}.

By the reproductive property of a Gamma distribution, we have \sum_{i=1}^n (-\ln X_i) \sim {\rm Gamma}(n,\theta).

Since E[\frac{1}{T}] = \frac{\theta}{n-1}, we have E[\frac{n-1}{T}] = \theta. Since T is a complete sufficient statistic, this implies that E[\frac{n-1}{T}] is the UMVUE for \theta.

4. Let X_1, \ldots, X_n \stackrel{\rm iid}{\sim} N(\mu, \sigma^2). Find a size \alpha unbiased test for hypotheses H_0: \theta \in [\theta_1, \theta_2] vs H_1: \theta \notin [\theta_1, \theta_2].

(方法) 此題並無指定如何建構不偏檢定,最簡單的方法是考慮隨機化檢定(randomized test)。

(解答) This problem does not require any specific form for the unbiased test that we have to find, thus let \varphi(\bm{x}) \stackrel{\rm def}{=} \alpha. And this is a size \alpha unbaised test for the given hypotheses.

Note that
\alpha = \sup_{\theta \in [\theta_1, \theta_2]} E_\theta[\varphi(\bm{x})] \leq \inf_{\theta \notin [\theta_1, \theta_2]} E_\theta[\varphi(\bm{x})] = \alpha.

So we conclude that the above test is an unbiased test.

(補充) 我們也可以建造一個非randomized的不偏檢定。(日後再補充) (或見Casella的習題)

5. Let X_1, \cdots, X_n \stackrel{\rm iid}{\sim} {\rm Po}(\lambda) and suppose that \lambda follows a Gamma distribution with parameters \alpha, \beta, which are known constant. Find a Bayes estimator for \lambda under the squared error loss function.

(方法) 平方誤差之下的貝氏估計量為後驗期望。出題者可能想知道考生是否理解貝氏估計量的定義,因此我們應該從定義出發。

(解答) Let \delta(\bm{x}) be the desired statistic. We are required to find \delta(\bm{x}) s.t the average risk

R(\delta) \stackrel{\rm def}{=} E[E[(\delta(\bm{x})-\lambda)^2\mid\lambda]_{x_1,\ldots, x_n \sim {\rm Po(\lambda)}}]_{\lambda\sim{\rm Gamma(\alpha,\beta)}}

is minimized. By swapping the order of the integral, the above equation can be written as follows:

E[E[(\delta(\bm{x})-\lambda)^2\mid \bm{x}]_{\lambda \sim \pi(\lambda\mid \bm{x})}]_{\bm{x} \sim m(\bm{x})}

In the above equation, let us consider the minimization of the conditional expectation given \bm{x}:

E[(\delta(\bm{x})-\lambda)^2\mid \bm{x}].

This can be rewritten as:
E[(\delta(\bm{x})-E[\lambda\mid\bm{x}]+E[\lambda\mid\bm{x}]-\lambda)^2\mid \bm{x}],

which can further be rearranged as:

= E[(\delta(\bm{x})-E[\lambda\mid\bm{x}])^2\mid \bm{x}]+ E[(\lambda-E[\lambda\mid\bm{x}])^2\mid \bm{x}].

Note that E[(\lambda-E[\lambda\mid\bm{x}])^2\mid \bm{x}] = V[\lambda\mid\bm{x}].

So we have we rewrite it as

= E[(\delta(\bm{x})-E[\lambda\mid\bm{x}])^2\mid \bm{x}]+ V[\lambda \mid \bm{x}] \geq V[\lambda\mid\bm{x}].

In the above inequality, the equality holds when \delta(\bm{x}) = E[\lambda\mid\bm{x}] (a.s.).

Therefore, the Bayes estimator under the squared error loss is the posterior mean. Now we consider the posterior distribution of \lambda. Note that

\pi(\lambda\mid \bm{x}) \approx f(\bm{x}\mid \lambda) \pi(\lambda),

where \approx means that the equation is equal if we only focus on the part \lambda.

The right hand side is:
\approx e^{-n\lambda} \lambda^{x_1+\ldots+x_n} \lambda^{\alpha-1} \exp(-\beta \lambda) = \lambda^{\sum_{i=1}^n x_i + \alpha-1}  \exp(-(n+\beta)\lambda),

which implies that the posterior distribution is {\rm Gamma}(\sum_{i=1}^n x_i + \alpha, n+\beta).

So its posterior mean is \frac{\sum_{i=1}^n x_i + \alpha}{n+\beta}. And this is the desired Bayes estimator.

6. Let X_i \sim {\rm Bin}(n_i, p_i) ~ (i=1,\ldots,m) be indepenent random variables. Derive the likelihood ratio test for the hypotheses H_0: p_1 = \ldots = p_m vs H_1: \exists i \neq j ~s.t.~ p_i \neq p_j.

(方法) 依照Likelihood Ratio Test的定義建立檢定即可。最後我們也可以討論其檢定的漸進分布,以便給出

(解答)

タイトルとURLをコピーしました