台大應數所(碩士班) 102年度一般考試機率統計解答

題目 https://exam.lib.ntu.edu.tw/sites/default/files/exam/graduate/102/102061.pdf

評論: 第2, 3, 5, 6題比較容易，第1題看起來是個基本題，不過需要一點技巧。第4題，找不偏檢定的題目比較少見，可能大部分的考生都不知如何下手。

1. Let $X,Y$ be independent standard normal random variables. Find the distribution of $2XY / \sqrt{X^2+Y^2}$ .

(方法) 考慮極坐標 $X=r\cos\theta,Y=r\sin\theta$ where $r>0,0 \leq \theta < 2\pi$ ，此時 $r, \theta$ 為獨立的隨機變數。 $2XY / \sqrt{X^2+Y^2}=r\sin2\theta$ ，因此我們只要證明 $\sin\theta$ 與 $\sin 2\theta$ 的分布相同即可。

(解答) Define $X=r\cos\theta, Y=r\sin\theta$ where $r\geq 0$ and $\theta \in [0,2\pi)$ . Let $g: \mathbb{R}^2 \mapsto \mathbb{R}$ be an arbitrary bounded continuous function. Recall that the distribution of the random variables is characterized by the expected value of $g(r,\theta)$ :

$E[g(r,\theta)] = \int\int_{x,y \in \mathbb{R}} g(\cdot,\cdot) \frac{1}{2\pi} \exp(-\frac{x^2+y^2}{2}) dxdy,$
where $r = \sqrt{x^2+y^2}$ and $\theta$ is expressed by the quation below:
$\theta = \begin{cases} \arctan(y/x) & (x\geq0,y\geq0) \\ \arctan(y/x) +\pi & (x<0) \\ \arctan(y/x) + 2\pi & (x\geq0, y<0) \end{cases}$
By a variable transformation, the right hand side can be further rewritten as:
$= \int_{\theta \in [0,2\pi)} \int_{r>0} g(r,\theta) \frac{1}{2\pi} r \exp(-\frac{r^2}{2})drd\theta,$
from which we can see that $r, \theta$ are independently distributed and the marginal distributions of $r, \theta$ are 1. a distribution with a probabilty density function $f_r(r) = r \exp(-\frac{r^2}{2})\mathbb{1}(r>0)$ and 2. ${\rm Unif}(0,2\pi)$ .

Recall that $r\sin\theta$ follows $N(0,1)$ and here $r, \theta$ are independent random variables. Therefore if $\sin\theta$ and $\sin2\theta$ follow the identical distribution, then it follows that $r\sin2\theta = \frac{2XY}{\sqrt{X^2+Y^2}}$ also follows $N(0,1)$ .

Again let $g: \mathbb{R}\mapsto\mathbb{R}$ be an arbitrary bounded continuous function and consider the expected value of $E[g(U)]$ where $U=\sin\theta$ :

E[g(U)] = E[g(\sin\theta)] = \int_{0}^{2\pi} g(\sin\theta) \frac{1}{2\pi} d\theta.

We devide the integral into three parts: $[0,\frac{\pi}{2})$ , $[\frac{\pi}{2}, \frac{3\pi}{2})$ and $[\frac{3\pi}{2}, 2\pi)$ , so that $\theta$ and $u$ are one-to-one. (The first interval and the third interval will be marged later.)

The above integral (the right hand side) can be splitted into three parts:
$\int_{0}^{\pi/2} g(\sin\theta) \frac{1}{2\pi} d\theta + \int_{\pi/2}^{3\pi/2} g(\sin\theta) \frac{1}{2\pi} d\theta + \int_{3\pi/2}^{2\pi} g(\sin\theta) \frac{1}{2\pi} d\theta.$

By a variable transformation the above integrals can be rewritten as:
$\int_{0}^{1} g(u) \frac{1}{2\pi} \frac{1}{\sqrt{1-u^2}} du + \int_{-1}^{1} g(u) \frac{1}{2\pi} \frac{1}{\sqrt{1-u^2}} du + \int_{-1}^{0} g(u) \frac{1}{2\pi} \frac{1}{\sqrt{1-u^2}}du.$

By merging the above three integrals, we have:
$E[g(U)] = \int_{-1}^{1} g(u) \frac{1}{\pi} \frac{1}{\sqrt{1-u^2}} du,$
which implies that the probability density function of $U=\sin\theta$ is:
$\frac{1}{\pi} \frac{1}{\sqrt{1-u^2}} \mathbb{1}_{[-1,1]}(u).$

By repeating a similar argument on $\sin2\theta$ , we also have the distribution of $\sin2\theta$ is identical with that of $\sin\theta$ .

Finally, we thus conclude that the distribution of the $\frac{2XY}{\sqrt{X^2+Y^2}}$ is also a standard normal.

2. Let $U_1, \ldots, U_n, \ldots$ be iid Uniform(0,1) random variables and X have the distribution:
$P(X=x)=\frac{1}{(e-1)x!}\mathbb{1}_{\mathbb{N}}(x).$
Find the distribution of $Z=\min\{U_1, \ldots, U_X\}.$

(方法) 考慮 $Z$ 的cdf。計算 $Z$ 的CDF時可以先取給定 $X$ 的條件期望再對 $X$ 取期望。

(解答) Note that
$P(Z \leq Z) = 1-P(Z>z).$
Consider $1-P(Z>z)$ :
$P(Z>z) = EE[\mathbb{1}(Z>z)\mid X] = E[P(Z>z\mid X)].$
Consider the part $P(Z>z\mid X)$ .
$P(Z>z\mid X) = \prod_{i=1}^X P(U_i > z) = \begin{cases} 1 & (z<0) \\ (1-z)^X & (0 \leq z \leq 1) \\ 0 & (z>1) \end{cases}$
Note that for $0 \leq z \leq 1$ , we have
$P(Z>z) = E[P(Z>z\mid X)] = E[(1-z)^X].$
Furthermore,
$E[(1-z)^X] = \frac{1}{e-1} \sum_{x=1}^\infty \frac{(1-z)^x}{x!} = \frac{e^{1-z}-1}{e-1}.$
Finally, we conclude that the CDF of $Z$ is as follows:
$P(Z\leq z) = \begin{cases} 0 & (z<0) \\ 1-\frac{e^{1-z}-1}{e-1} & (0 \leq z \leq 1) \\ 1 & (z>1) \end{cases}.$
We can also characterize the distribution of $Z$ by its probability density function:
$f_Z(z)= \frac{e}{e-1} e^{-z} \mathbb{1}_{[0,1]}(z).$

3. Let $X_1, \ldots, X_n$ be a random sample from a population with a probability denity funtion:
$f(x\mid \theta_0) = \theta_0 x^{\theta_0-1} \mathbb{1}_{(0,1)}(x),$
where $\theta_0>0$ . Find the UMVUE of $\theta_0$ .

(方法) Beta分配屬於指數族。在找到 $\theta$ 的完備充分統計量 $T(\bm{x})$ 後，根據 $T(\bm{x})$ 造一個期望值為 $\theta$ 的統計量。

(解答) Note that the joint probability density function of $x_1, \cdots, x_n$ is written as:
$\frac{\theta^n}{x_1\cdots x_n} \exp((-\theta)\sum_{i=1}^n (-\ln x_i)) \mathbb{1}(0<x_{(1)}, x_{(n)}<1).$
Therefore we can see that the distribution belongs to an exponential family. Furthermore, note that a subset in $\mathbb{R}^1$ , $\{-\theta \mid \theta > 0\}$ contains an interior point, (i.e., it can contain an interval), thus $\sum_{i=1}^n (-\ln x_i)$ is a complete sufficient statistic for $\theta$ .

Next let us find a distribution of $Y =-\ln X$ where $X \sim {\rm Beta}(\theta,1)$ . The distribution of a random variable is characterized by the expectation of $g(Y)$ where $g(\cdot)$ is an arbitrary bounded continuous function.

Consider $E[g(Y)]$ :
$E[g(Y)] = E[g(-\ln X)] = \int_0^1 g(-\ln x) \theta x^{\theta-1} dx = \int_0^\infty g(y) \theta \exp(-\theta y) dy,$
where the third equality is obtained by a variable transformation. From the last equation, we can see that the probability density function of $Y$ is $\theta \exp(-\theta y) \mathbb{1}_{(0,\infty)}(y)$ . So $Y \sim \exp(\theta) = {\rm Gamma}(1,\theta)$ with $E[Y] = \frac{1}{\theta}$ .

By the reproductive property of a Gamma distribution, we have $\sum_{i=1}^n (-\ln X_i) \sim {\rm Gamma}(n,\theta)$ .

Since $E[\frac{1}{T}] = \frac{\theta}{n-1}$ , we have $E[\frac{n-1}{T}] = \theta$ . Since $T$ is a complete sufficient statistic, this implies that $E[\frac{n-1}{T}]$ is the UMVUE for $\theta$ .

4. Let $X_1, \ldots, X_n \stackrel{\rm iid}{\sim} N(\mu, \sigma^2)$ . Find a size $\alpha$ unbiased test for hypotheses $H_0: \theta \in [\theta_1, \theta_2]$ vs $H_1: \theta \notin [\theta_1, \theta_2]$ .

(方法) 此題並無指定如何建構不偏檢定，最簡單的方法是考慮隨機化檢定(randomized test)。

(解答) This problem does not require any specific form for the unbiased test that we have to find, thus let $\varphi(\bm{x}) \stackrel{\rm def}{=} \alpha$ . And this is a size $\alpha$ unbaised test for the given hypotheses.

Note that
$\alpha = \sup_{\theta \in [\theta_1, \theta_2]} E_\theta[\varphi(\bm{x})] \leq \inf_{\theta \notin [\theta_1, \theta_2]} E_\theta[\varphi(\bm{x})] = \alpha.$

So we conclude that the above test is an unbiased test.

(補充) 我們也可以建造一個非randomized的不偏檢定。(日後再補充) (或見Casella的習題)

5. Let $X_1, \cdots, X_n \stackrel{\rm iid}{\sim} {\rm Po}(\lambda)$ and suppose that $\lambda$ follows a Gamma distribution with parameters $\alpha, \beta$ , which are known constant. Find a Bayes estimator for $\lambda$ under the squared error loss function.

(方法) 平方誤差之下的貝氏估計量為後驗期望。出題者可能想知道考生是否理解貝氏估計量的定義，因此我們應該從定義出發。

(解答) Let $\delta(\bm{x})$ be the desired statistic. We are required to find $\delta(\bm{x})$ s.t the average risk

R(\delta) \stackrel{\rm def}{=} E[E[(\delta(\bm{x})-\lambda)^2\mid\lambda]_{x_1,\ldots, x_n \sim {\rm Po(\lambda)}}]_{\lambda\sim{\rm Gamma(\alpha,\beta)}}

is minimized. By swapping the order of the integral, the above equation can be written as follows:

E[E[(\delta(\bm{x})-\lambda)^2\mid \bm{x}]_{\lambda \sim \pi(\lambda\mid \bm{x})}]_{\bm{x} \sim m(\bm{x})}

In the above equation, let us consider the minimization of the conditional expectation given $\bm{x}$ :

E[(\delta(\bm{x})-\lambda)^2\mid \bm{x}].

This can be rewritten as:
$E[(\delta(\bm{x})-E[\lambda\mid\bm{x}]+E[\lambda\mid\bm{x}]-\lambda)^2\mid \bm{x}],$

which can further be rearranged as:

= E[(\delta(\bm{x})-E[\lambda\mid\bm{x}])^2\mid \bm{x}]+ E[(\lambda-E[\lambda\mid\bm{x}])^2\mid \bm{x}].

Note that $E[(\lambda-E[\lambda\mid\bm{x}])^2\mid \bm{x}] = V[\lambda\mid\bm{x}]$ .

So we have we rewrite it as

= E[(\delta(\bm{x})-E[\lambda\mid\bm{x}])^2\mid \bm{x}]+ V[\lambda \mid \bm{x}] \geq V[\lambda\mid\bm{x}].

In the above inequality, the equality holds when $\delta(\bm{x}) = E[\lambda\mid\bm{x}]$ (a.s.).

Therefore, the Bayes estimator under the squared error loss is the posterior mean. Now we consider the posterior distribution of $\lambda$ . Note that

\pi(\lambda\mid \bm{x}) \approx f(\bm{x}\mid \lambda) \pi(\lambda),

where $\approx$ means that the equation is equal if we only focus on the part $\lambda$ .

The right hand side is:
$\approx e^{-n\lambda} \lambda^{x_1+\ldots+x_n} \lambda^{\alpha-1} \exp(-\beta \lambda) = \lambda^{\sum_{i=1}^n x_i + \alpha-1} \exp(-(n+\beta)\lambda),$

which implies that the posterior distribution is ${\rm Gamma}(\sum_{i=1}^n x_i + \alpha, n+\beta)$ .

So its posterior mean is $\frac{\sum_{i=1}^n x_i + \alpha}{n+\beta}$ . And this is the desired Bayes estimator.

6. Let $X_i \sim {\rm Bin}(n_i, p_i) ~ (i=1,\ldots,m)$ be indepenent random variables. Derive the likelihood ratio test for the hypotheses $H_0: p_1 = \ldots = p_m$ vs $H_1: \exists i \neq j ~s.t.~ p_i \neq p_j$ .

(方法) 依照Likelihood Ratio Test的定義建立檢定即可。最後我們也可以討論其檢定的漸進分布，以便給出

(解答)