Question

1) When we fit a model to data, which is typically larger?

a) Test Error b) Training Error

2) What are reasons why test error could be LESS than training
error? **(Pick all that applies)**

a) By chance, the test set has easier cases than the training set.

b) The model is highly complex, so training error systematically overestimates test error

c) The model is not very complex, so training error systematically overestimates test error

3) Suppose we want to use cross-validation to estimate the error of the following procedure:

Step 1: Find the k variables most correlated with y

Step 2: Fit a linear regression using those variables as predictors

We will estimate the error for each k from 1 to p, and then choose the best k.

True or false: a correct cross-validation procedure will possibly choose a different set of k variables for every fold.

4) Suppose that we perform forward stepwise regression and use cross-validation to choose the best model size.

Using the full data set to choose the sequence of models is the WRONG way to do cross-validation (we need to redo the model selection step within each training fold). If we do cross-validation the WRONG way, which of the following is true?

a) The selected model will probably be too complex

b) The selected model will probably be too simple

5) One way of carrying out the bootstrap is to average equally over all possible bootstrap samples from the original data set (where two bootstrap data sets are different if they have the same data points but in different order). Unlike the usual implementation of the bootstrap, this method has the advantage of not introducing extra noise due to resampling randomly. (You can use "^" to denote power, as in "n^2")

To carry out this implementation on a data set with n data points, how many bootstrap data sets would we need to average over?

6) If we have n data points, what is the probability that a given data point does not appear in a bootstrap sample?

7) If we use ten-fold cross-validation as a means of model selection, the cross-validation estimate of test error is:

a) biased upward

b) biased downward

c) unbiased

d) potentially any of the above

8) Why can't we use the standard bootstrap for some time series
data? **(Pick all that applies)**

a) The data points in most time series aren't i.i.d.

b) Some points will be used twice in the same sample

c) The standard bootstrap doesn't accurately mimic the real-world data-generating mechanism

Answer #1

Answer 7. is (d)

If we use ten-fold cross-validation as a means of model selection, the cross-validation estimate of test error is potentially biased upward, downward or unbiased.

There are competing biases: on one hand, the cross-validated estimate is based on models trained on smaller training sets than the full model, which means we will tend to overestimate test error for the full model.

On the other hand, cross-validation gives a noisy estimate of test error for each candidate model, and we select the model with the best estimate. This means we are more likely to choose a model whose estimate is smaller than its true test error rate, hence, we may underestimate test error. In any given case, either source of bias may dominate the other.

(1) A Chi-squared test is typically used to test for any of the
following except which of the following?
(A) If a mathematical model accurately predicts our observed
frequencies of data values.
(B) If a mathematical model accurately predicts the total number
of observed data values.
(C) If a mathematical model accurately predicts the pattern of
our observed data values.
(D) Whether two factors present in a population are independent
of one another.
(E) Whether a series of populations experience...

Which type of hypothesis test would we use to analyze data from
the following scenarios? Why? (Choose from paired t-test, 2-sample
t-test, ANOVA, or Chi-square test)
A researcher wants to know if there is an association of gender
(male, female) and eye color (brown, blue, grey, green). Which test
would he use to determine if gender and eye color are independent
in his data set?
A business owner wants to determine if her seminar is effective
in training new employees....

1.
Management of a fast-food chain proposed the following
regression model to predict sales at outlets:
y = β0 + β1x1 +
β2x2 + β3x3 + ε,
where
y = sales ($1000s)
x1= number of competitors
within one mile
x2= population (in 1000s)
within one mile
x3is 1 if a drive-up window
is present, 0 otherwise
The following estimated regression equation was developed after
20 outlets were surveyed:
= 12.6 − 3.6x1+
7.0x2+ 14.1x3
Use this equation to predict sales...

Regression Analysis 1. At the end of the Regression Analysis
with Categorical Data lecture, there was a prompt about a multiple
regression analysis conducted to examine the factors influencing
police arrests. There are two competing theories of when the police
make arrests: Situational Threats: police only make arrests when
protestors use violent or illegal tactics. When demonstrators step
out of line, the police respond accordingly. Non-Behavioral
Threats: while the tactics protestors use are certainly important,
the police are more aggressive...

PUBH 6033—Week 7 Assignment 1
Comparing two means: When drink drove a student to
statistics
(Rubric included)
Instructions
For this assignment, you review this week’s Learning Resources
and then perform a two-sample independent t test and an ANOVA
related to the dataset that was utilized in the week 2 SPSS
application assignment. Import the data into SPSS; or, if you
correctly saved the data file in Week 2, you may open and use that
saved file to complete this...

1.
General features of economic time series: trends, cycles,
seasonality.
2.
Simple linear regression model and multiple regression model:
dependent variable, regressor, error term; fitted value, residuals;
interpretation.
3.
Population VS sample: a sample is a subset of a population.
4.
Estimator VS estimate.
5.
For what kind of models can we use OLS?
6.
R-squared VS Adjusted R-squared.
7.
Model selection criteria: R-squared/Adjusted R-squared; residual
variance; AIC, BIC.
8.
Hypothesis testing: p-value, confidence interval (CI), (null
hypothesis , significance...

1. In a multiple
regression model, the following coefficients were obtained:
b0 = -10 b1
= 4.5 b2 = -6.0
a. Write the
equation of the estimated multiple regression model. (3 pts)
b Suppose a
sample of 25 observations produces this result, SSE = 480. What is
the estimated standard error of the estimate? (5 pts)
2. Consider the
following estimated sample regression equation:
Y = 12 + 6X1 -- 3 X2
Determine which of the following
statements are true,...

1. After performing an ANOVA test, with (3,4) degrees of
freedom, for data collected during an experiment trying to
determine if there is at least one difference between groups. You
get a calculated F value of 7.52. Using the table below, find the
appropriate critical F value. What should be your conclusion(s),
based on those 2 F values and α?
Select ALL that apply
Critical values of F (α= 0.05)
Group of answer choices:
A. My calculated F value is...

The data set (Canvas: body.csv) contains records of CHEST_DIAM,
, CHEST_DEPTH, ANKLE_DIAM,WAIST_GIRTH, WRIST_GIRTH, WRIST_DIAM (all
in cm.), AGE (years), WEIGHT (kg.), HEIGHT (cm.), andGENDER
(1=male) for 108 individuals. We will be looking for the best set
of variables to (parsimoniously?) modelWEIGHT. Even though 6
explanatory variables only gives 29=512 possibilities
for “all possible” regressions, we’lltry to be more methodical
about it.
##question2
library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from
'package:olsrr':
##
## cement
body =...

QUESTION 1 1. Brianna is trying to increase her chances of being
promoted to vice president by working to build good work
relationships with other managers outside her own department.
Brianna's behavior should be viewed as dysfunctional politics.
functional politics. coercive power. functional influence. 2 points
QUESTION 2 1. The Gingerbread Factory has a separate unit that
makes their chocolate crunch cookies and another unit that is
completely responsible for all operations in producing their ginger
snap cookies. The Gingerbread...

ADVERTISEMENT

Get Answers For Free

Most questions answered within 1 hours.

ADVERTISEMENT

asked 23 seconds ago

asked 9 minutes ago

asked 12 minutes ago

asked 13 minutes ago

asked 28 minutes ago

asked 34 minutes ago

asked 45 minutes ago

asked 57 minutes ago

asked 1 hour ago

asked 1 hour ago

asked 1 hour ago

asked 1 hour ago