For the following code, describe why there is a difference in the two estimates of the MSE. Which of the two is more believable as a long-run estimate of the MSE?
> library(randomForest) > set.seed(461431) > carrun <- randomForest(dist~speed, data=cars) > carrun
Call: randomForest(formula = dist ~ speed, data = cars)
Type of random forest: regression Number of trees: 500
No. of variables tried at each split: 1
Mean of squared residuals: 255.5619 % Var explained: 60.73
> #FYI this matches the printout MSE > carrun$mse[500]
[1] 255.5619
> #FYI this also matches printout > sum((cars$dist-predict(carrun))^2)/length(cars$dist)
[1] 255.5619
> #but this is different! > sum((cars$dist-predict(carrun, newdata=cars))^2)/length(cars$dist)
[1] 139.8054
From the last question (cars data), was “bagging” (trees) or “random forests” performed and how do you know this? When only one predictor is available what element of bagging (trees) or random forests cannot be taken advantage of?
From the last question (cars data), was “bagging” (trees) or “random forests” performed and how do you know this?
If you see the last command then it states that,
> sum((cars$dist-predict(carrun, newdata=cars))^2)/length(cars$dist) here we using the 'carrun' object that is random forest object.
From this command and carrun we can know this.
When only one predictor is available what element of bagging (trees) or random forests cannot be taken advantage of?
While building random forest the approcah of voting is dependent on the feature space that different variables.
If we only have one variable then we may miss this voting feature.
Get Answers For Free
Most questions answered within 1 hours.