題目: https://exam.lib.ntu.edu.tw/sites/default/files/exam/graduate/103/103061.pdf
1 . Let X_1, X_2, X_3 be a random sample from {\rm Po}(\lambda). Moreover, let Y_1 = X_1+X_3, Y_2=X_2+X_3 and Z_i = \mathbb{1}_{\{Y_i=0\}}. Compute the correlation between Z_1, Z_2.
(解答)
First, consider V[Z_1]. It is important to note that Z_1^2=Z_1. So we have
V[Z_1] = E[Z_1^2] - E[Z_1]^2 = E[Z_1] - E[Z_1]^2 = E[Z_1] (1-E[Z_1]).Here,
E[Z_1] = P(Y_1=0)=P(X_1+X_3=0) = e^{-2\lambda}.So we have:
V[Z_1] = e^{-2\lambda} (1-e^{-2\lambda}).By a similar argument, we also have:
V[Z_2] = e^{-2\lambda} (1-e^{-2\lambda}).So now it is sufficient to find Cov[Z_1, Z_2]:
Cov[Z_1,Z_2] = E[Z_1Z_2]-E[Z_1]E[Z_2] =P(Y_1=0, Y_2=0) - e^{-4\lambda}.P(Y_1=0, Y_2=0) can further be rewritten as:
= P(X_1+X_3=0, X_2+X_3=0) = P(X_1=0, X_2=0, X_3=0) = e^{-3\lambda}.
Finally
\rho = \frac{Cov[Z_1,Z_2]}{V[Z_1]^{1/2} V[Z_2]^{1/2}} = \frac{e^{-3\lambda}-e^{-4\lambda}}{e^{-2\lambda} (1- e^{-2\lambda})} = \frac{e^{-\lambda}}{1+e^{-\lambda}}
2 . Let (X,Y)^\top be a bivariate random vector with finite variances. Show that
a.) Cov[X, Y-E[Y\mid X]]=0
b.) V[Y-E[Y\mid X]] = E[V[Y\mid X]]
(方法) 雙重期望值原理。i.e., EE[Y\mid X] = E[Y]
考慮機率空間(\Omega, \mathcal{F}, P)。令X,Y為隨機變數,也就是該機率空間上的可測函數。考慮\mathcal{F}的子\sigma代數\sigma[X] \stackrel{\rm def}{=} \{X^{-1}(B) \mid B \in \mathcal{B}:~{\rm Borel~family}\}.
在基於測度論的機率論上,條件期望值的定義如下:
E[Y \mid X] 為 \sigma[X]可測函數(可以想成是g(X) with g: Borel function)使得對於任何A \in \sigma[X]滿足等式:
\int_{A} E[Y \mid X] dP = \int_{A} Y dP.
當A = \Omega時,變成雙重期望值原理的等式。
(解答 a.) The left hand side is
Cov[X, Y-E[Y\mid X]] = Cov[X,Y]-E[X,E[Y \mid X]].
Note that E[X,E[Y \mid X]] is rewritten as:
= E[X \cdot E[Y \mid X]] - E[X] \cdot EE[Y\mid X] = EE[XY \mid X] - E[X]E[Y] = E[XY]-E[X]E[Y],
which is equal to Cov[X,Y]. And we have the desired result.
(解答 b.) The left hand side is:
V[Y-E[Y\mid X]] = E[(Y-E[Y\mid X])^2] - E[Y - E[Y \mid X]]^2 = E[(Y-E[Y\mid X])^2] - 0^2.
Furthermore, the right hand side in the above equation is:
= E[E[(Y-E[Y\mid X])^2 \mid X]] = E[V[Y \mid X]],
which is our desired conclusion.
3 . Let (X_1, Y_1), \ldots, (X_n, Y_n) are a random sample from a bivariate normal distribution with correlation \rho. Using the fact that \sqrt{n}(r-\rho) \stackrel{d}{\rightarrow} N(0, (1-\rho^2)^2), where r is a sample correlation coefficient, try to find g(r) which converges to a normal distribution with constant variance.
(方法) 這個叫做variance stabilizing transformation。我們利用\delta-method以及解個簡單的微分方程即可。
(解答) Let g: \mathbb{R} \mapsto \mathbb{R} be a differentiable function. By the \delta-method, we have
\sqrt{n}(g(r) - \rho) \stackrel{d}{\rightarrow} N(0, (1-\rho^2)^2 \cdot g'(\rho)^2).Since (1-\rho^2)^2 \cdot g'(\rho)^2 is irrelevant to \rho, we, for example, may assume that:
(1-\rho^2)^2 \cdot g'(\rho)^2 = 1,thus, one of the solutions is:
g'(\rho) = \frac{1}{1-\rho^2} = \frac{1}{2}(\frac{1}{1-\rho}+\frac{1}{1+\rho}).By solving the differential equation, we can eventually consider g(\rho) = \frac{1}{2} \ln\left(\frac{1+\rho}{1-\rho}\right), which satisfies the requirement for the problem.
4 . Let X_1, \ldots, X_n be a random sample from a population with a probability density function f(x\mid \theta) = \theta x^{\theta-1} \mathbb{1}_{(0,1)}(x). Find the UMVUE for \theta.
(解答) 見102年度一般考試解答
5 . Let X_1, \ldots, X_n be a random sample from N(\theta, \sigma^2). Find a size \alpha unbiased test for hypotheses: H_0: \theta \in [\theta_1, \theta_2] vs H_1: \theta \notin [\theta_1, \theta_2].
(解答) 見102年度一般考試解答
6 . Let X_1, \ldots, X_n be a random sample from {\rm Beta}(\theta,1) and \theta have a marginal distribution {\rm Gamma}(\alpha, \beta) where \alpha,\beta are known constants. Find the Bayes estimator for \theta under the squared error loss function.
(方法) 求\theta後驗期望值。
(解答) Let \delta(\bm{x}) be an estimator. We are required to find \delta(\bm{x}) s.t the average risk:
R(\delta) \stackrel{\rm def}{=} E[E[(\delta(\bm{x})-\theta)^2\mid \theta]_{x_1,\ldots, x_n \sim {\rm Beta(\theta,1)}}]_{\theta\sim{\rm Gamma(\alpha,\beta)}}is minimized. By swapping the order of the integral, the above equation can be written as follows:
E[E[(\delta(\bm{x})-\theta)^2\mid \bm{x}]_{\theta \sim \pi(\theta\mid \bm{x})}]_{\bm{x} \sim m(\bm{x})}In the above equation, let us consider the minimization of the conditional expectation given \bm{x}:
E[(\delta(\bm{x})-\theta)^2\mid \bm{x}].This can be rewritten as:
E[(\delta(\bm{x})-E[\theta\mid\bm{x}]+E[\theta\mid\bm{x}]-\theta)^2\mid \bm{x}],
which can further be rearranged as:
= E[(\delta(\bm{x})-E[\theta\mid\bm{x}])^2\mid \bm{x}]+ E[(\theta-E[\theta\mid\bm{x}])^2\mid \bm{x}].Note that E[(\theta-E[\theta\mid\bm{x}])^2\mid \bm{x}] = V[\theta\mid\bm{x}].
So we have we rewrite it as
= E[(\delta(\bm{x})-E[\theta\mid\bm{x}])^2\mid \bm{x}]+ V[\theta \mid \bm{x}] \geq V[\theta\mid\bm{x}].In the above inequality, the equality holds when \delta(\bm{x}) = E[\theta\mid\bm{x}] (a.s.).
Therefore, the Bayes estimator under the squared error loss is the posterior mean. Now we consider the posterior distribution of \theta. Note that
\pi(\theta\mid \bm{x}) \approx f(\bm{x}\mid \theta) \pi(\theta),where \approx means that the equation is equal if we only focus on the part \theta.
The right hand side is:
\approx \theta^n (x_1 \cdots x_n)^\theta \cdot \theta^{\alpha-1} \exp(-\beta \theta),
which can be rearranged as:
= \theta^{n+\alpha-1} \exp(-(\beta-\sum_{i=1}^n \ln x_i)).This implies that the posterior distribution is {\rm Gamma}(n+\alpha, \beta-\sum_{i=1}^n \ln x_i).
So its posterior mean is \frac{n + \alpha}{\beta-\sum_{i=1}^n \ln x_i}. And this is the desired Bayes estimator.
7 . Let X \sim f(x) and generate Y_1, \ldots, Y_n \stackrel{\rm iid}{\sim} g(y), where f, g are probability density functions. Given Y_1, \ldots, Y_n, define a random variable P(X^*=Y_i \mid Y_1, \ldots, Y_n) = q_i where
q_i = \frac{f(Y_i)/g(Y_i)}{\sum_{j=1}^n f(Y_j)/g(Y_j)}.
Show that P(X^* \leq x \mid Y_1, \ldots, Y_n) \stackrel{P}{\rightarrow} P(X \leq x).
(方法) 利用大數法則
(解答) First, it is important to note that P(X^* \leq x \mid Y_1, \ldots, Y_n) is equal to the sum of \{q_1, \ldots, q_n\} where Y_i \leq x. So we have
P(X^* \leq x \mid Y_1, \ldots Y_n) \stackrel{\rm eq}{=} \sum_{i=1}^n q_i \mathbb{1}(Y_i \leq x),
which can further be rearranged as:
Let A_i \stackrel{\rm def}{=} f(Y_i)/g(Y_i) \cdot \mathbb{1}(Y_i \leq x) and B_i \stackrel{\rm def}{=} f(Y_i)/g(Y_i).
By the Strong Law of Large Numbers,
\frac{1}{n} \sum_{i=1}^n A_i \stackrel{a.s.}{\rightarrow} E[f(Y)/g(Y) \mathbb{1}(Y \leq x)]_{Y \sim g},
which is equal to:
= \int f(y)/g(y) \mathbb{1}(y \leq x) \cdot g(y)dy = \int_{-\infty}^x f(y)dy = P(X \leq x).
And similarly, we also have:
\frac{1}{n} \sum_{i=1}^n B_i \stackrel{a.s.}{\rightarrow} E[f(Y)/g(Y)]_{Y \sim g} = \int f(y)/g(y) \cdot g(y)dy = \int f(y) dy = 1.
So we have P(X^* \leq x \mid Y_1,\ldots, Y_n) \stackrel{a.s.}{\rightarrow} P(X \leq x), which also implies convergence in probability.