Question

data set from sklearn.datasets import load_diabetes diabetes = load_diabetes() from sklearn.model_selection import train_test_split diabetes_X_train, diabetes_X_test, diabetes_y_train,...

data set

from sklearn.datasets import load_diabetes
diabetes = load_diabetes()
from sklearn.model_selection import train_test_split
diabetes_X_train, diabetes_X_test, diabetes_y_train, diabetes_y_test = train_test_split(diabetes['data'], diabetes['target'], random_state=0)

What is the training and test R2 for the Lasso model using the default parameters? How many features does this model use? What are the names of those features?

Homework Answers

Answer #1

Here is the code to fit model:

from sklearn.linear_model import Lasso

lr = Lasso()

lr.fit(diabetes_X_train, diabetes_y_train)

Code to compute R-squared on train and test:

from sklearn.metrics import r2_score

tr_p = lr.predict(diabetes_X_train)

print(r2_score(diabetes_y_train, tr_p))

Score on Train: 0.4141

te_p = lr.predict(diabetes_X_test)

print(r2_score(diabetes_y_test, te_p))

Score on Test: 0.2781

We build model on all the features but since lasso penalizes the coefficients and reduces to 0, there are only 2 featires left:

1.) bmi

2.) s5

Know the answer?
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for?
Ask your own homework help question
Similar Questions
Using iris data set. test dataset includes indices: 0, 10, 20, 30, 40, 50, 60, 70,...
Using iris data set. test dataset includes indices: 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, ...(the multiples of 10) and the rest of the data points will be your training dataset. 1)USE MLP (NN) classifier from sklearn package (with random state = 123, if applicable). For this classifier only tune the following hyper-parameters (using Kfold CV):
1) When we fit a model to data, which is typically larger? a) Test Error b)...
1) When we fit a model to data, which is typically larger? a) Test Error b) Training Error 2) What are reasons why test error could be LESS than training error? (Pick all that applies) a) By chance, the test set has easier cases than the training set. b) The model is highly complex, so training error systematically overestimates test error c) The model is not very complex, so training error systematically overestimates test error 3) Suppose we want to...
You are given the following data set: {(0,0), (0.5,0.6), (1,0.9), (1.1, 1), (1.5, 1.7)}, where the...
You are given the following data set: {(0,0), (0.5,0.6), (1,0.9), (1.1, 1), (1.5, 1.7)}, where the first coordinate is the independent (explanatory) variable, and the second coordinate is the dependent variable. (a) Find a best fit model if the model is restricted to just be a constant (i.e. the best fit line has slope 0). (b) What is the mean squared error of (a)? (c) What is the mean squared error of the model that has y-intercept 0 and slope...
13.11 Interpreting model fitting results. Five different models are fit using the same training data set,...
13.11 Interpreting model fitting results. Five different models are fit using the same training data set, and tested on the same (separate) test set (which has the same size as the training set). The RMS prediction errors for each model, on the training and test sets, are reported below. Comment briefly on the results for each model. You might mention whether the model’s predictions are good or bad, whether it is likely to generalize to unseen data, or whether it...
According to the CDC, the prevalence of diabetes among US adults varies with education: 12.6% of...
According to the CDC, the prevalence of diabetes among US adults varies with education: 12.6% of those with less than high school education have diabetes, 9.5% of those with high school education have diabetes, and 7.2% of those with more than high school education have diabetes. A7. In Ohio, 10% of adults have less than high school education, 33% have high school education, and 57% have more than high school education. If the diabetes prevalence in each group matches the...
Using the U.S. census data in Table 3.1 for 1900, 1920, and 1940 to determine parameters...
Using the U.S. census data in Table 3.1 for 1900, 1920, and 1940 to determine parameters in the logistic equation model, what populations does the model predict for 2000 and 2010? Compare your answers with the census data for those years.
Data Set A- 7,7,7,9,9,9,10 Data Set B- 4,6,6,6,8,9,9,9,10,10,10 Step 1 Create a bar graph that examines...
Data Set A- 7,7,7,9,9,9,10 Data Set B- 4,6,6,6,8,9,9,9,10,10,10 Step 1 Create a bar graph that examines a variable or variables in your data set. Step 2 Find the sample mean, median (if it exists) mode for each set of data Next find the sample standard deviations for each set of data Create a box and whisker plot for each variable for each set data Eliminate any outliers from the samples. Redo part 1 for each variable (if necessary, if not...
From the data set of a class, the score of a student taking a test is...
From the data set of a class, the score of a student taking a test is a random variable with mean equal to 75 and variance equals to 25. How many students would have to take the test to ensure - with probability at least 0.9 - that the class average would be within 5 of 75?
Define and test a function myRange. This function should behave like Python’s standard range function, with...
Define and test a function myRange. This function should behave like Python’s standard range function, with the required and optional arguments, but it should return a list. Do not use the range function in your implementation! Study Python’s help on range to determine the names, positions, and what to do with your function’s parameters. Use a default value of None for the two optional parameters. If these parameters both equal None, then the only provided argument should be considered the...
The numbers of eyes that different people have. Does the data come from a discrete or...
The numbers of eyes that different people have. Does the data come from a discrete or continuous data set? Group of answer choices A- A continuous data set because there are infinitely many possible values and those values can be measured. B- A discrete data set because the possible values can be counted. C- A continuous data set because there are infinitely many possible values and those values can be counted. D- The data set is neither continuous nor discrete....