R coding:
The premis is the question do Democratic voters tend to be younger than Republicans? I am using a election.txt data set that contains responses from voters in the 1996 US Presidential Election and the columns are id, party, inc, and age. I need the sample mean and variance of age for the Democrats and Republicans, separately. I wrote the vector of ages for both republicans and dems as :
age_dem <- election$age[election$party=="Democrat"]
age_repub <- election$age[election$party=="Republican"]
But after establishing that I do not know how to find the sample means, variances, test statistic, and p value using R?
To calculate mean you can use the following statements after your statements
mean_age_dem<-mean(age_dem)
mean_age_repub<-mean(age_repub)
To calculate variance you can use the following code:
var_age_dem<-var(age_dem)
var_age_repub<-var(age_repub)
To calculate the sample variance we first need to calculate the length of the two vectors say n1 and n2
n1<-length(age_dem)
n2<-lenght(age_repub)
We can use the length to calculate the sample variance from
populaton variance which is
(n/n-1)*popvar
svar_age_dem<-var_age_dem*n1/(n1-1)
svar_age_repub<-var_age_repub*n2/(n2-1)
To calculate the test statistic we first calculate the pooled variance:
S2=(1/(n1+n2-2))*(n1*svar_age_dem+n2*svar_age_repub))
The test statistics is calculates as follows:
tstat=(mean_age_dem-mean_age_repub))/sqrt(S2*(1/n1+1/n2));
The p-value can be calculated as foloows:
pvalue=pt(abs(tstat), df=n1+n2-2, lower.tail = FALSE);# this gives
a one sided confidence interval.
To avoid all the above calculation we can simply use the t.test function of R which is as follows:
t.test(age_dem, age_, alternative="two.sided", paired=FALSE,conf.level = 0.95)# this gives a two sided test.
t.test(age_dem, age_, alternative="less", paired=FALSE,conf.level = 0.95)# this gives a one sided test.
In case of any more doubt leave a comment.
Get Answers For Free
Most questions answered within 1 hours.