1. Given β = XT 1×nAn×nXn×1, show that the gradient of β with respect to X has the following form: ∇β = X T (A + A T ). Also, simplify the above result when A is symmetric. (Hint: β can be written as Pn j=1 Pn i=1 aijxixj ).
2. In this problem, we consider a probabilistic view of linear regression y (i) = θ T x (i)+ (i) , i = 1, . . . , n, which can be expressed as y = Xθ + . Define the prior on the parameter θ as, p(θ) = N (0, τ −1 θ I), where τθ is a known scalar that controls the variance of the Gaussian prior. Recall that a multivariate Gaussian distribution with mean µ and covariance matrix Σ is given by the probability density function, 1 |2πΣ| 1/2 exp − 1 2 (θ − µ) T Σ −1 (θ − µ) . Also define the likelihood as, p(D|θ) = N (Xθ, τ −1 n I), where τn is another fixed scalar defining the variance of noise. (a) Show that maximizing the log posterior, i.e., log p(θ|D), has the following form argmaxθ log p(θ|D) = argmaxθ [log p(θ) + log p(D|θ)]. Hint: you may want to use Bayes’ theorem and conclude that the posterior is proportional to the prior times the likelihood.
(b) Show that maximizing the log posterior is equivalent to minimizing a regularized loss function in the following form for a λ expressed in terms of the constants τθ and τn, L(θ) + λR(θ), where L(θ) = 1 2 ky − Xθk 2 2 , R(θ) = 1 2 kθk 2 2 . Hint: you may want to drop constant terms and recall that for any vector θ, we have θ T θ = kθk 2 2 . (c) Notice that the form of the posterior is the same as the form of the ridge regression loss. Compute the gradient of the loss above with respect to θ.
3. Suppose we have a single variable linear regression model using the following hypothesis hθ(x) = θ1x + θ0, and the cost function: 1 J(θ) = 1 N X N i=1 (y (i) − hθ(x (i) ))2 (a) Find the partial derivatives of J with respect to θ1 and θ0. (b) Show that: θ1 = P N i=1 x (i) y (i) − x¯yN¯ P N i=1 (x (i) ) 2 − Nx¯ 2 θ0 = ¯y − θ1x¯ where x¯ = P N i=1 x (i)/N, y¯ = P N i=1 y (i)/N.
4. Suppose we have a regression problem with input vectors xn ∈ R D and three outputs such that yn = [yn1, yn2, yn3]. The output of each linear model is given by: yni = θi0 + θi1xn1 + θi2 + . . . θiDxnD = θ T i xn, where i = 1, 2, 3. Find θ1, θ2 and θ3 that minimize the following cost function: L(θ1, θ2, θ3) = X N n=1 X 3 i=1 1 2 [(yni − θ T i xn) 2 + λi X D d=0 θ 2 id]
Get Answers For Free
Most questions answered within 1 hours.