Question

We have used the 1987 baseball salary data to illustrate linear regression. In this project, we...

We have used the 1987 baseball salary data to illustrate linear regression. In this project, we consider
the 1992 baseball salary data set, which is available from

http://www.amstat.org/publications/jse/datasets/baseball.dat.txt

This data set (of dimension 337 × 18 ) contains salary information (and performance measures) of
337 Major League Baseball players in 1992. More detailed information can be found at
http://www.amstat.org/publications/jse/datasets/baseball.txt

The data set contains the following variables.

Table 1: Variable Description for the 1992 Baseball Salary Data

Var Columns Description
salary 1 – 4 Salary (in thousands of dollars)
X1 6 – 10 Batting average
X2 12 – 16 On-base percentage (OBP)
X3 18 – 20 Number of runs
X4 22 – 24 Number of hits
X5 26 – 27 Number of doubles
X6 29 – 30 Number of triples
X7 32 – 33 Number of home runs
X8 35 – 37 Number of runs batted in (RBI)
X9 39 – 41 Number of walks
X10 43 – 45 Number of strike-outs
X11 47 – 48 Number of stolen bases
X12 50 – 51 Number of errors
X13 53 Indicator of “free agency eligibility”
X14 55 Indicator of “free agent in 1991/2”
X15 57 Indicator of “arbitration eligibility”
X16 59 Indicator of “arbitration in 1991/2”
ID 61 – 79 Player’s name (in quotation marks)

The data set can be input into R by reading directly from the website, with the following R commands:


baseball <- read.table(file=
"http://www.amstat.org/publications/jse/datasets/baseball.dat.txt",
header = F,
col.names=c("salary", "x1", "x2", "x3", "x4", "x5",
"x6", "x7","x8", "x9", "x10", "x11", "x12", "x13",
"x14", "x15", "x16", "ID"))
baseball$logsalary <- log(baseball$salary);
baseball <- baseball[, -c(1, 18)] # REMOVE salary AND ID
dim(baseball); head(baseball)

Complete the project by following the specific instructions given below.
1. Starting with the whole model that includes all predictors (i.e., X1, X2, . . . , X16), apply one
model selection procedure of your choice to select your best model. (use either "Best Subset Selection" or "Regularization" methods)
(a) Provide the fitting results from your ‘best’ model, i.e., the table of Parameter Estimates
and the ANOVA table.
(b) Obtain the resultant R2 and interpret.

Homework Answers

Know the answer?
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for?
Ask your own homework help question
Similar Questions