Question

1. Given β = XT 1×nAn×nXn×1, show that the gradient of β with respect to X...

1. Given β = XT 1×nAn×nXn×1, show that the gradient of β with respect to X has the following form: ∇β = X T (A + A T ). Also, simplify the above result when A is symmetric. (Hint: β can be written as Pn j=1 Pn i=1 aijxixj ).

2. In this problem, we consider a probabilistic view of linear regression y (i) = θ T x (i)+ (i) , i = 1, . . . , n, which can be expressed as y = Xθ + . Define the prior on the parameter θ as, p(θ) = N (0, τ −1 θ I), where τθ is a known scalar that controls the variance of the Gaussian prior. Recall that a multivariate Gaussian distribution with mean µ and covariance matrix Σ is given by the probability density function, 1 |2πΣ| 1/2 exp − 1 2 (θ − µ) T Σ −1 (θ − µ) . Also define the likelihood as, p(D|θ) = N (Xθ, τ −1 n I), where τn is another fixed scalar defining the variance of noise. (a) Show that maximizing the log posterior, i.e., log p(θ|D), has the following form argmaxθ log p(θ|D) = argmaxθ [log p(θ) + log p(D|θ)]. Hint: you may want to use Bayes’ theorem and conclude that the posterior is proportional to the prior times the likelihood.

(b) Show that maximizing the log posterior is equivalent to minimizing a regularized loss function in the following form for a λ expressed in terms of the constants τθ and τn, L(θ) + λR(θ), where L(θ) = 1 2 ky − Xθk 2 2 , R(θ) = 1 2 kθk 2 2 . Hint: you may want to drop constant terms and recall that for any vector θ, we have θ T θ = kθk 2 2 . (c) Notice that the form of the posterior is the same as the form of the ridge regression loss. Compute the gradient of the loss above with respect to θ.

3. Suppose we have a single variable linear regression model using the following hypothesis hθ(x) = θ1x + θ0, and the cost function: 1 J(θ) = 1 N X N i=1 (y (i) − hθ(x (i) ))2 (a) Find the partial derivatives of J with respect to θ1 and θ0. (b) Show that: θ1 = P N i=1 x (i) y (i) − x¯yN¯ P N i=1 (x (i) ) 2 − Nx¯ 2 θ0 = ¯y − θ1x¯ where x¯ = P N i=1 x (i)/N, y¯ = P N i=1 y (i)/N.

4. Suppose we have a regression problem with input vectors xn ∈ R D and three outputs such that yn = [yn1, yn2, yn3]. The output of each linear model is given by: yni = θi0 + θi1xn1 + θi2 + . . . θiDxnD = θ T i xn, where i = 1, 2, 3. Find θ1, θ2 and θ3 that minimize the following cost function: L(θ1, θ2, θ3) = X N n=1 X 3 i=1 1 2 [(yni − θ T i xn) 2 + λi X D d=0 θ 2 id]

Homework Answers

Know the answer?
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for?
Ask your own homework help question
Similar Questions
Suppose that x = {x1, . . . , xm} is an independant and identically distributed...
Suppose that x = {x1, . . . , xm} is an independant and identically distributed sample from N(θ1, σ2). y = {y1, . . . , yn} is an independant and identically distributed sample from N(θ2, σ2). x and y are independent. The unknown parameters are θ1∈ (−∞, ∞), θ2∈ (−∞, ∞), and σ2∈ (0, ∞). a) Under the improper prior p(θ1, θ2, σ2) ∝ (σ2)−2, find the joint posterior distribution of (θ1, θ2, σ2) by finding the following...
Given β = XT1xn Anxn Xnx1, show that the gradient of β with respect to X...
Given β = XT1xn Anxn Xnx1, show that the gradient of β with respect to X has the following form: ∇β = XT(A + AT ). Also, simplify the above result when A is symmetric. (Hint: β can be written as Enj=1 Eni=1 (summation of i and j from 1 to n) Aij xi xj )
If X1….. Xn be iid random variables with pdf of f(x/ϒ,β)= (βϒ^β/X^ β +1) I (x>ϒ)...
If X1….. Xn be iid random variables with pdf of f(x/ϒ,β)= (βϒ^β/X^ β +1) I (x>ϒ) where ϒ>0 and β>0. Also both βand ϒ are unknown i. Find joint sufficient statistic for (β,ϒ) ii. Find maximum likelihood estimators of ϒ and β iii. A given fixed e∈( 0,1), find the MLE of d where e = P (X₁<d) iv. If β> 2, Find methods of moment estimator of β, and ϒ
Let X1. ..., Xn, be a random sample from Exponential(β) with pdf f(x) = 1/β(e^(-x/β)) I(0,...
Let X1. ..., Xn, be a random sample from Exponential(β) with pdf f(x) = 1/β(e^(-x/β)) I(0, ∞)(x), B > 0 where β is an unknown parameter. Find the UMVUE of β^2.
A generalized numerical method for the initial value problemY′(x) =f(x,Y(x))Y(0) =Y0is given as:yn+1=yn+h[(1−θ)f(xn,yn) +θf(xn+1,yn+1)] (a) Show...
A generalized numerical method for the initial value problemY′(x) =f(x,Y(x))Y(0) =Y0is given as:yn+1=yn+h[(1−θ)f(xn,yn) +θf(xn+1,yn+1)] (a) Show that the numerical method forθ=12is absolutely stable for the modeldifferential equation whenλ <0 . (b) What is the region of absolute stabilty of the numerical method ifθ=13? you can use Elementary Numerical Analysis (3rd Edition) as reference
1. (a) Y1,Y2,...,Yn form a random sample from a probability distribution with cumulative distribution function FY...
1. (a) Y1,Y2,...,Yn form a random sample from a probability distribution with cumulative distribution function FY (y) and probability density function fY (y). Let Y(1) = min{Y1,Y2,...,Yn}. Write the cumulative distribution function for Y(1) in terms of FY (y) and hence show that the probability density function for Y(1) is fY(1)(y) = n{1−FY (y)}n−1fY (y). [8 marks] (b) An engineering system consists of 5 components connected in series, so, if one components fails, the system fails. The lifetimes (measured in...
Let {X1, ..., Xn} be i.i.d. from a distribution with pdf f(x; θ) = θ/xθ+1 for...
Let {X1, ..., Xn} be i.i.d. from a distribution with pdf f(x; θ) = θ/xθ+1 for θ > 2 and x > 1. (a) (10 points) Calculate EX1 and V ar(X1). (b) (5 points) Find the method of moments estimator of θ. (c) (5 points) If we denote the method of moments estimator as ˆθ1. What does √ n( ˆθ1 − θ) converge in distribution to? (d) (5 points) Is the method of moment estimator efficient? Verify your answer.
Let X ∼ Beta(α, β). (a) Show that EX 2 = (α + 1)α (α +...
Let X ∼ Beta(α, β). (a) Show that EX 2 = (α + 1)α (α + β + 1)(α + β) . (b) Use the fact that EX = α/(α + β) and your answer to the previous part to show that Var X = αβ (α + β) 2 (α + β + 1). (c) Suppose X is the proportion of free-throws made over the lifetime of a randomly sampled kid, and assume that X ∼ Beta(2, 8) ....
A geometric distribution has a pdf given by P(X = x) = p(1-p)^x, where x =...
A geometric distribution has a pdf given by P(X = x) = p(1-p)^x, where x = 0, 1, 2,..., and 0 < p < 1. This form of the geometric starts at x=0, not at x=1. Given are the following properties: E(X) = (1-p)/p and Var(X) = (1-p)/p^2 A random sample of size n is drawn, the data x1, x2, ..., xn. Likelihood is p = 1/(1+ x̄)) MLE is p̂ = 1/(1 + x̄)) asymptotic distribution is p̂ ~...
A Bernoulli differential equation is one of the form dxdy+P(x)y=Q(x)yn Observe that, if n=0 or 1,...
A Bernoulli differential equation is one of the form dxdy+P(x)y=Q(x)yn Observe that, if n=0 or 1, the Bernoulli equation is linear. For other values of n, the substitution u=y^(1−n) transforms the Bernoulli equation into the linear equation du/dx+(1−n)P(x)u=(1−n)Q(x) Use an appropriate substitution to solve the equation y'−(3/x)y=y^4/x^2 and find the solution that satisfies y(1)=1