Question

# VIII. Regarding the data offered in problem VII, we are interested in identifying which is the...

VIII. Regarding the data offered in problem VII, we are interested in identifying which is the best statistical relationship between the variables considered.

Using all the information previously obtained by you, I analyzed:

- the correlation coefficients, identifying which is the best and the weakest among all the possible regressions and equations.
- the regression errors obtained, identifying which is the best and the weakest among all the possible regressions and equations.
- the required hypothesis tests

The before data was:

VII. Fran’s Convenience Marts is located throughout the Erie, Pennsylvania metro area. Fran, the owner, wants to expand her businesses to other communities in northwest Pennsylvania and southeast New York, such as Jamestown, Corry, Meadville, and Warren. To prepare your presentation to the local bank, you would like to better understand the factors that make a particular discount store productive. Fran must do all the work on her own, so she won't be able to study all the discount stores. Therefore, he selects a random sample of 15 stores and records the average daily sales, the floor space (area), the number of parking spaces and the average income of the families in the region for each of the stores. The sample information is reported below.

 Stores Daily Sales Store Area Parking Space Income (thousand of dollars) 1 \$1840 532 6 44 2 1746 478 4 51 3 1812 530 7 45 4 1806 508 7 46 5 1792 514 5 44 6 1825 556 6 46 7 1811 541 4 49 8 1803 513 6 52 9 1830 532 5 46 10 1827 537 5 46 11 1764 499 3 48 12 1825 510 8 47 13 1763 490 4 48 14 1846 516 8 45 15 1815 482 7 43

- Present and identify which is the best equation to predict the monthly average purchase volume, explain why it is the best equation.

- With the best estimated equation present the confidence interval to predict the monthly average purchase volume, when the Area of
the store is 585, the family income is 50,000 and the parking number is 10.

As per data set given in the question:

1. Data Cleansing :- The first value of Daily sales required cleansing of dollar sign
2. Data Correlation:- See the correlation between Daily sales(y ) and all other X's( Store area , parking space & income) using scatter plot or pearson correlation matrix
3. Model Fitting :- Use multiple linear regression to fit the model y = BX+c
4. Insignificant variables:- Any value whose probabilty value exceeds 0.05 is insignificant
5. Model equation: Y = 1480.74461+0.73150*StoreArea+9.99149*Parkingspace-2.30826*incomeof household
6. Extrapolation with values given:-

STORE AREA= 585

FAMILY INCOME = 50

PARKING NUMBER =10

With 82% C.I we can say that Daily_Sales Will be 1893 for abv given point.