Question

1. Given β = XT 1×nAn×nXn×1, show that the gradient of β with respect to X...

1. Given β = XT 1×nAn×nXn×1, show that the gradient of β with respect to X has the following form: ∇β = X T (A + A T ). Also, simplify the above result when A is symmetric. (Hint: β can be written as Pn j=1 Pn i=1 aijxixj ).

2. In this problem, we consider a probabilistic view of linear regression y (i) = θ T x (i)+ (i) , i = 1, . . . , n, which can be expressed as y = Xθ + . Define the prior on the parameter θ as, p(θ) = N (0, τ −1 θ I), where τθ is a known scalar that controls the variance of the Gaussian prior. Recall that a multivariate Gaussian distribution with mean µ and covariance matrix Σ is given by the probability density function, 1 |2πΣ| 1/2 exp − 1 2 (θ − µ) T Σ −1 (θ − µ) . Also define the likelihood as, p(D|θ) = N (Xθ, τ −1 n I), where τn is another fixed scalar defining the variance of noise. (a) Show that maximizing the log posterior, i.e., log p(θ|D), has the following form argmaxθ log p(θ|D) = argmaxθ [log p(θ) + log p(D|θ)]. Hint: you may want to use Bayes’ theorem and conclude that the posterior is proportional to the prior times the likelihood.

(b) Show that maximizing the log posterior is equivalent to minimizing a regularized loss function in the following form for a λ expressed in terms of the constants τθ and τn, L(θ) + λR(θ), where L(θ) = 1 2 ky − Xθk 2 2 , R(θ) = 1 2 kθk 2 2 . Hint: you may want to drop constant terms and recall that for any vector θ, we have θ T θ = kθk 2 2 . (c) Notice that the form of the posterior is the same as the form of the ridge regression loss. Compute the gradient of the loss above with respect to θ.

3. Suppose we have a single variable linear regression model using the following hypothesis hθ(x) = θ1x + θ0, and the cost function: 1 J(θ) = 1 N X N i=1 (y (i) − hθ(x (i) ))2 (a) Find the partial derivatives of J with respect to θ1 and θ0. (b) Show that: θ1 = P N i=1 x (i) y (i) − x¯yN¯ P N i=1 (x (i) ) 2 − Nx¯ 2 θ0 = ¯y − θ1x¯ where x¯ = P N i=1 x (i)/N, y¯ = P N i=1 y (i)/N.

4. Suppose we have a regression problem with input vectors xn ∈ R D and three outputs such that yn = [yn1, yn2, yn3]. The output of each linear model is given by: yni = θi0 + θi1xn1 + θi2 + . . . θiDxnD = θ T i xn, where i = 1, 2, 3. Find θ1, θ2 and θ3 that minimize the following cost function: L(θ1, θ2, θ3) = X N n=1 X 3 i=1 1 2 [(yni − θ T i xn) 2 + λi X D d=0 θ 2 id]

Homework Answers

Know the answer?
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for?
Ask your own homework help question
Similar Questions
Suppose that x = {x1, . . . , xm} is an independant and identically distributed...
Suppose that x = {x1, . . . , xm} is an independant and identically distributed sample from N(θ1, σ2). y = {y1, . . . , yn} is an independant and identically distributed sample from N(θ2, σ2). x and y are independent. The unknown parameters are θ1∈ (−∞, ∞), θ2∈ (−∞, ∞), and σ2∈ (0, ∞). a) Under the improper prior p(θ1, θ2, σ2) ∝ (σ2)−2, find the joint posterior distribution of (θ1, θ2, σ2) by finding the following...
Given β = XT1xn Anxn Xnx1, show that the gradient of β with respect to X...
Given β = XT1xn Anxn Xnx1, show that the gradient of β with respect to X has the following form: ∇β = XT(A + AT ). Also, simplify the above result when A is symmetric. (Hint: β can be written as Enj=1 Eni=1 (summation of i and j from 1 to n) Aij xi xj )
Let X1. ..., Xn, be a random sample from Exponential(β) with pdf f(x) = 1/β(e^(-x/β)) I(0,...
Let X1. ..., Xn, be a random sample from Exponential(β) with pdf f(x) = 1/β(e^(-x/β)) I(0, ∞)(x), B > 0 where β is an unknown parameter. Find the UMVUE of β^2.
1. (a) Y1,Y2,...,Yn form a random sample from a probability distribution with cumulative distribution function FY...
1. (a) Y1,Y2,...,Yn form a random sample from a probability distribution with cumulative distribution function FY (y) and probability density function fY (y). Let Y(1) = min{Y1,Y2,...,Yn}. Write the cumulative distribution function for Y(1) in terms of FY (y) and hence show that the probability density function for Y(1) is fY(1)(y) = n{1−FY (y)}n−1fY (y). [8 marks] (b) An engineering system consists of 5 components connected in series, so, if one components fails, the system fails. The lifetimes (measured in...
Let {X1, ..., Xn} be i.i.d. from a distribution with pdf f(x; θ) = θ/xθ+1 for...
Let {X1, ..., Xn} be i.i.d. from a distribution with pdf f(x; θ) = θ/xθ+1 for θ > 2 and x > 1. (a) (10 points) Calculate EX1 and V ar(X1). (b) (5 points) Find the method of moments estimator of θ. (c) (5 points) If we denote the method of moments estimator as ˆθ1. What does √ n( ˆθ1 − θ) converge in distribution to? (d) (5 points) Is the method of moment estimator efficient? Verify your answer.
Let X ∼ Beta(α, β). (a) Show that EX 2 = (α + 1)α (α +...
Let X ∼ Beta(α, β). (a) Show that EX 2 = (α + 1)α (α + β + 1)(α + β) . (b) Use the fact that EX = α/(α + β) and your answer to the previous part to show that Var X = αβ (α + β) 2 (α + β + 1). (c) Suppose X is the proportion of free-throws made over the lifetime of a randomly sampled kid, and assume that X ∼ Beta(2, 8) ....
A geometric distribution has a pdf given by P(X = x) = p(1-p)^x, where x =...
A geometric distribution has a pdf given by P(X = x) = p(1-p)^x, where x = 0, 1, 2,..., and 0 < p < 1. This form of the geometric starts at x=0, not at x=1. Given are the following properties: E(X) = (1-p)/p and Var(X) = (1-p)/p^2 A random sample of size n is drawn, the data x1, x2, ..., xn. Likelihood is p = 1/(1+ x̄)) MLE is p̂ = 1/(1 + x̄)) asymptotic distribution is p̂ ~...
A Bernoulli differential equation is one of the form dxdy+P(x)y=Q(x)yn Observe that, if n=0 or 1,...
A Bernoulli differential equation is one of the form dxdy+P(x)y=Q(x)yn Observe that, if n=0 or 1, the Bernoulli equation is linear. For other values of n, the substitution u=y^(1−n) transforms the Bernoulli equation into the linear equation du/dx+(1−n)P(x)u=(1−n)Q(x) Use an appropriate substitution to solve the equation y'−(3/x)y=y^4/x^2 and find the solution that satisfies y(1)=1
(1 point) A Bernoulli differential equation is one of the form dydx+P(x)y=Q(x)yn     (∗) Observe that, if n=0...
(1 point) A Bernoulli differential equation is one of the form dydx+P(x)y=Q(x)yn     (∗) Observe that, if n=0 or 1, the Bernoulli equation is linear. For other values of n, the substitution u=y1−n transforms the Bernoulli equation into the linear equation dudx+(1−n)P(x)u=(1−n)Q(x).dudx+(1−n)P(x)u=(1−n)Q(x). Consider the initial value problem y′=−y(1+9xy3),   y(0)=−3. (a) This differential equation can be written in the form (∗) with P(x)= , Q(x)= , and n=. (b) The substitution u= will transform it into the linear equation dudx+ u= . (c) Using...
use r-studio code to compute 1. Let T ∼ t – distributed with the given degrees...
use r-studio code to compute 1. Let T ∼ t – distributed with the given degrees of freedom (df), then compute the following probabilities with a nice little picture beside each problem: [5 points] (e) df = ∞, P(T > 2.3) 2. Let T ∼ t – distributed with the given degrees of freedom (df), compute the following quantiles (percentiles) with a nice little picture beside each problem: [5 points] (a) df = 2, 0.05th percentile (b) df = 7,...
ADVERTISEMENT
Need Online Homework Help?

Get Answers For Free
Most questions answered within 1 hours.

Ask a Question
ADVERTISEMENT