2. Consider the data set has four variables which are Y, X1, X2
and X3. Construct...
2. Consider the data set has four variables which are Y, X1, X2
and X3. Construct a multiple regression
model using Y as response variable and other X variables as
explanatory variables.
(a) Write mathematics formulas (including the assumptions) and give
R commands to obtain linear
regression models for Y Xi, i =1, 2 and 3.
(b) Write several lines of R commands to obtain correlations
between Xi and Xj , i 6= j and i, j =
1, 2,...
1. Consider an actual proof problem
2. The problem should estimate one multiple linear regression,
or...
1. Consider an actual proof problem
2. The problem should estimate one multiple linear regression,
or a logistic regression. Explain the estimation.
3. Consider a different model from step 2. You can try adding
other random variables to explain your model, or consider different
ways to explain (For example: probit model, lasso, trees, random
forest, gradient boosting, or neural net). Then, compare the
estimation results of the original model and the new model.
ps. If you answer the question with...
Using R and the data in the table below, perform the regression
of D on C...
Using R and the data in the table below, perform the regression
of D on C (i.e., report the regression equation).
Hint: The code to enter the vectors C and D into R is: C <-
c(3, 6, 8, 9, 1, 3) D <- c(2, 7, 5, 4, 0, 4)
C
D
3
2
6
7
8
5
9
4
1
0
3
4
You must figure out how to obtain the regression equation from
R. Enter the code below...
Data Set A- 7,7,7,9,9,9,10
Data Set B- 4,6,6,6,8,9,9,9,10,10,10
Step 1
Create a bar graph that examines...
Data Set A- 7,7,7,9,9,9,10
Data Set B- 4,6,6,6,8,9,9,9,10,10,10
Step 1
Create a bar graph that examines a variable or variables in
your data set.
Step 2
Find the sample mean, median (if it exists) mode for each set
of data
Next find the sample standard deviations for each set of
data
Create a box and whisker plot for each variable for each set
data
Eliminate any outliers from the samples. Redo part 1 for each
variable (if necessary, if not...
1.
General features of economic time series: trends, cycles,
seasonality.
2.
Simple linear regression model and...
1.
General features of economic time series: trends, cycles,
seasonality.
2.
Simple linear regression model and multiple regression model:
dependent variable, regressor, error term; fitted value, residuals;
interpretation.
3.
Population VS sample: a sample is a subset of a population.
4.
Estimator VS estimate.
5.
For what kind of models can we use OLS?
6.
R-squared VS Adjusted R-squared.
7.
Model selection criteria: R-squared/Adjusted R-squared; residual
variance; AIC, BIC.
8.
Hypothesis testing: p-value, confidence interval (CI), (null
hypothesis , significance...
1) When we fit a model to data, which is typically larger?
a) Test Error b)...
1) When we fit a model to data, which is typically larger?
a) Test Error b) Training Error
2) What are reasons why test error could be LESS than training
error? (Pick all that applies)
a) By chance, the test set has easier cases than the training
set.
b) The model is highly complex, so training error systematically
overestimates test error
c) The model is not very complex, so training error
systematically overestimates test error
3) Suppose we want to...
We have used the 1987 baseball salary data to illustrate linear
regression. In this project, we...
We have used the 1987 baseball salary data to illustrate linear
regression. In this project, we consider
the 1992 baseball salary data set, which is available from
http://www.amstat.org/publications/jse/datasets/baseball.dat.txt
This data set (of dimension 337 × 18 ) contains salary
information (and performance measures) of
337 Major League Baseball players in 1992. More detailed
information can be found at
http://www.amstat.org/publications/jse/datasets/baseball.txt
The data set contains the following variables.
Table 1: Variable Description for the 1992 Baseball Salary
Data
Var Columns Description
salary...