Question

When training a machine learning model with some dataset, what are some assumptions we are making...

When training a machine learning model with some dataset, what are some assumptions we are making about the data? What are some things that it is important for us not to assume? Please give a few examples for each.

Homework Answers

Answer #1

Checking model assumptions is essential prior to building a model that will be used for prediction. If assumptions are not met, the model may inaccurately reflect the data and will likely result in inaccurate predictions. Each model has different assumptions that must be met, so checking assumptions is important both in choosing a model and in verifying that it is the appropriate model to use.

Diagnostics

Diagnostics are used to evaluate the model assumptions and figure out whether or not there are observations with a large, undue influence on the analysis. They can be used to optimize the model by making sure the model you use is actually appropriate for the data you are analyzing. There are many ways to assess the validity of a model using diagnostics. Diagnostics is an overarching name that covers the other topics under model assumptions. It may include exploring the model’s basic statistical assumptions, examining the structure of a model by considering more, fewer, or different explanatory variables, or looking for data that is poorly represented by a model such as outliers or that have a large imbalanced effect on the regression model’s prediction.

Diagnostics can take many forms. There are numerical diagnostics you can examine. The statsmodels package provides a summary of many diagnostics through the summary function:

With this summary, we can see important values such as R2, the F-statistic, and many others. You can also analyze a model using a graphical diagnostic such as plotting the residuals against the fitted/predicted values.

Above is the fitted versus residual plot for our weight-height dataset, using height as the predictor. For the most part, this plot is random. However, as fitted values increase, so does the range of residuals. This means that as BMI increases, there is higher variance between our model and the actual data. It also tends to be a more negative residual at higher BMIs. This does not mean that a linear model is incorrect, but it is something to investigate and maybe something to help change or improve the model.

Another residual plot you can do is a scale-location plot. This plot shows whether our residuals are equally distributed along the range of our predictor. If all random variables have the same finite variance, they are considered to be homoscedastic. A plot with randomly spread points indicates the model is appropriate. You plot square-rooted normalized residuals against the fitted values.

In this plot, we want a random distribution that is horizontally banded. This would indicate that the data is homoscedastic and randomization in the relationship between the independent variables and the dependent variable is relatively equal across the independent variables. Our line is mostly horizontally banded at the beginning but seems to slope upwards near the end, meaning that there may not be equal variance everywhere. This may be a result of not fixing the issue we discovered above in the residual-fitted graph and another indicator something may need to be changed in our model.

When doing a regression model, you want to make sure that your residuals are relatively random. If they are not, that may mean that the regression you chose was not correct. For example, if you chose to use a linear regression and the residual plot is clearly not random, that would indicate that the data is not linear.

Note: Plzzz don' t give dislike.....Plzzz comment if u have any problem i will try to resolve it.......

Know the answer?
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for?
Ask your own homework help question
Similar Questions
Hi ! I am doing Implementation a Machine Learning Model and Test the Training Algorithm on...
Hi ! I am doing Implementation a Machine Learning Model and Test the Training Algorithm on Data. I chose Decision tree as my algorithm. I need to make sure if my coding is right or not/ I am also finding difficult to test on my data. Any help would be appreciated. Thank you !
What factors are leading to the new explosion in machine learning now used by organizations and...
What factors are leading to the new explosion in machine learning now used by organizations and making its way into consumer products? Research firms that provide low-cost access to machine learning tools and give examples of products and services developed using these technologies.
What are the assumptions underlying the use of a dividend growth model for the estimation of...
What are the assumptions underlying the use of a dividend growth model for the estimation of a company s cost of equity? Explain in details and give some examples
What is a leakage? When do we use a leakage? Please explain and give some examples,...
What is a leakage? When do we use a leakage? Please explain and give some examples, thank you.
The primary driver of making money is to grow revenues. What are some of the things...
The primary driver of making money is to grow revenues. What are some of the things that an organization must pay close attention to in order to ensure its financial health? Why is it important and When can you apply this information? Please advise.
What are some new things concerning training implementation that we can do now that we could...
What are some new things concerning training implementation that we can do now that we could not do before?” What do you think are the most promising new developments in HRD delivery or implementation? Why?
What are some new things concerning training implementation that we can do now that we could...
What are some new things concerning training implementation that we can do now that we could not do before?” What do you think are the most promising new developments in HRD delivery or implementation? Why?
(1)What are the ASSUMPTIONS of the Classical Model in econometrics? What happens when each of the...
(1)What are the ASSUMPTIONS of the Classical Model in econometrics? What happens when each of the assumptions are violated? Answers are provided in chapter 4 of the required text book for the class. The title of the chapter is “ The Classical Model”
#8 Research has shown that people have a hard time making correct decisions (being rational) on...
#8 Research has shown that people have a hard time making correct decisions (being rational) on even simple things. a) whats going on in the brain? b) What are the heuristics as discussed in this course? give some examples discussing how they work and how they lead us astray. c) why is the systematic incorrectness of heuristics interesting to economists? d) if people cannot easily be rational, how can we perhaps nudge them into the right behavior, without big, intrusive...
1) When we fit a model to data, which is typically larger? a) Test Error b)...
1) When we fit a model to data, which is typically larger? a) Test Error b) Training Error 2) What are reasons why test error could be LESS than training error? (Pick all that applies) a) By chance, the test set has easier cases than the training set. b) The model is highly complex, so training error systematically overestimates test error c) The model is not very complex, so training error systematically overestimates test error 3) Suppose we want to...