When training a machine learning model with some dataset, what are some assumptions we are making...

Question

Question

When training a machine learning model with some dataset, what are some assumptions we are making...

When training a machine learning model with some dataset, what are some assumptions we are making about the data? What are some things that it is important for us not to assume? Please give a few examples for each.

Engineering Computer-Science

0 0

Add a comment Transcribed image text

Answer 1

Answer #1

Checking model assumptions is essential prior to building a model that will be used for prediction. If assumptions are not met, the model may inaccurately reflect the data and will likely result in inaccurate predictions. Each model has different assumptions that must be met, so checking assumptions is important both in choosing a model and in verifying that it is the appropriate model to use.

Diagnostics

Diagnostics are used to evaluate the model assumptions and figure out whether or not there are observations with a large, undue influence on the analysis. They can be used to optimize the model by making sure the model you use is actually appropriate for the data you are analyzing. There are many ways to assess the validity of a model using diagnostics. Diagnostics is an overarching name that covers the other topics under model assumptions. It may include exploring the model’s basic statistical assumptions, examining the structure of a model by considering more, fewer, or different explanatory variables, or looking for data that is poorly represented by a model such as outliers or that have a large imbalanced effect on the regression model’s prediction.

Diagnostics can take many forms. There are numerical diagnostics you can examine. The statsmodels package provides a summary of many diagnostics through the summary function:

With this summary, we can see important values such as R2, the F-statistic, and many others. You can also analyze a model using a graphical diagnostic such as plotting the residuals against the fitted/predicted values.

Above is the fitted versus residual plot for our weight-height dataset, using height as the predictor. For the most part, this plot is random. However, as fitted values increase, so does the range of residuals. This means that as BMI increases, there is higher variance between our model and the actual data. It also tends to be a more negative residual at higher BMIs. This does not mean that a linear model is incorrect, but it is something to investigate and maybe something to help change or improve the model.

Another residual plot you can do is a scale-location plot. This plot shows whether our residuals are equally distributed along the range of our predictor. If all random variables have the same finite variance, they are considered to be homoscedastic. A plot with randomly spread points indicates the model is appropriate. You plot square-rooted normalized residuals against the fitted values.

In this plot, we want a random distribution that is horizontally banded. This would indicate that the data is homoscedastic and randomization in the relationship between the independent variables and the dependent variable is relatively equal across the independent variables. Our line is mostly horizontally banded at the beginning but seems to slope upwards near the end, meaning that there may not be equal variance everywhere. This may be a result of not fixing the issue we discovered above in the residual-fitted graph and another indicator something may need to be changed in our model.

When doing a regression model, you want to make sure that your residuals are relatively random. If they are not, that may mean that the regression you chose was not correct. For example, if you chose to use a linear regression and the residual plot is clearly not random, that would indicate that the data is not linear.

Note: Plzzz don' t give dislike.....Plzzz comment if u have any problem i will try to resolve it.......

0 0

Add a comment

When training a machine learning model with some dataset, what are some assumptions we are making...

Homework Answers

Post as a guest

Earn Coins

Not the answer you're looking for?

Similar Questions

What kind of machine learning model do you feel is the most transparent in making decisions?...

Hi ! I am doing Implementation a Machine Learning Model and Test the Training Algorithm on...

What factors are leading to the new explosion in machine learning now used by organizations and...

What are the assumptions underlying the use of a dividend growth model for the estimation of...

What is a leakage? When do we use a leakage? Please explain and give some examples,...

The primary driver of making money is to grow revenues. What are some of the things...

What are some new things concerning training implementation that we can do now that we could...

What are some new things concerning training implementation that we can do now that we could...

(1)What are the ASSUMPTIONS of the Classical Model in econometrics? What happens when each of the...

#8 Research has shown that people have a hard time making correct decisions (being rational) on...

Need Online Homework Help?

Active Questions