Question

(Use the R studio to find the answer.) 5. Consider the Batting data frame from the...

(Use the R studio to find the answer.)

5. Consider the Batting data frame from the Lahman library.

a) How many observations and how many variables does Batting contain?

(b) Use filter from the dplyr library to make a new data frame with just the information from yearID 2015. How many observations does this contain?

(c) Who had the most Bases on Balls (BB) in 2015? (You can just give the player ID)

(d) How many players had more than 100 BB in 2015?

Homework Answers

Answer #1

a)

Ran the R code dim(Batting) to get the below output.

[1] 105861 22

Number of observations = 105861

Number of variables = 22

b)

Load the library dplyr as below.

library(dplyr)

Using filter from the dplyr library to make a new data frame with just the information from yearID 2015

Batting.2015 = filter(Batting, yearID == '2015')

dim(Batting.2015)
[1] 1486 22

Number of observations this contains = 1486

c)

Ran the below R script to get the player Id of most Bases on Balls (BB) in 2015

filter(Batting.2015, BB == max(Batting.2015$BB))$playerID

[1] "vottojo01"

The player Id with the most Bases on Balls (BB) in 2015 is vottojo01.

d)

Number of players had more than 100 BB in 2015 is 5.

> dim(filter(Batting.2015, BB > 100))
[1] 5 22

Know the answer?
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for?
Ask your own homework help question
Similar Questions
How do I do this in R Studio   1. You are testing a hypothesis Ho: μ...
How do I do this in R Studio   1. You are testing a hypothesis Ho: μ = 10 against Ha: μ < 10 based on a SRS of 20 observations from a Normal population. The data give sample mean of 8 and a sample standard deviation of 4. Find the value of t. (use correct formula, plug numerical values into formula, crunch numbers using R, report value of t) 2. What is the area to the left of the t...
Use R to do each of the following. Use R code instructions that are as general...
Use R to do each of the following. Use R code instructions that are as general as possible, and also as efficient as possible. Use the Quick-R website for help on finding commands. 1. Enter the following values into a data vector named Dat: 45.4 44.2 36.8 35.1 39.0 60.0 47.4 41.1 45.8 35.6 2. Calculate the difference between the 2nd and 7th entries of this vector using only reference indices. 3. Calculate the median of Dat. 4. Sort the...
This problem requires the use of R-Studio. Consider the mtcars data. In R, you can use...
This problem requires the use of R-Studio. Consider the mtcars data. In R, you can use the following code to get the data: dta <- mtcars Use ?mtcars to read the information about this data set. In what follows, we will fit the regression model: mpgi=β0 + β0vsi+ εi, i=1,2,...,n. Note that, vs is categorical variable, whose value is 1 if the observed car has the V-shaped engine or 0 otherwise. (A) Which of the following is the most accurate...
Question (2) [5 marks] (Use R) Suppose you have a company producing cupcakes. Each cupcake is...
Question (2) [5 marks] (Use R) Suppose you have a company producing cupcakes. Each cupcake is supposed to contain 10 grams of sugar. The cupcakes are produced by a machine that adds the sugar in a bowl before mixing everything. You believe the machine does not add 10 grams of sugar for each cupcake. If your assumption is true, the machine needs to be fixed. You stored the level of sugar of thirty cupcakes. Note: You can create a randomized...
We have used the 1987 baseball salary data to illustrate linear regression. In this project, we...
We have used the 1987 baseball salary data to illustrate linear regression. In this project, we consider the 1992 baseball salary data set, which is available from http://www.amstat.org/publications/jse/datasets/baseball.dat.txt This data set (of dimension 337 × 18 ) contains salary information (and performance measures) of 337 Major League Baseball players in 1992. More detailed information can be found at http://www.amstat.org/publications/jse/datasets/baseball.txt The data set contains the following variables. Table 1: Variable Description for the 1992 Baseball Salary Data Var Columns Description salary...
*Answer all questions using R-Script* Question 1 Using the built in CO2 data frame, which contains...
*Answer all questions using R-Script* Question 1 Using the built in CO2 data frame, which contains data from an experiment on the cold tolerance of Echinochloa crus-galli; find the following. a) Assign the uptake column in the dataframe to an object called "x" b) Calculate the range of x c) Calculate the 28th percentile of x d) Calculate the sample median of x e) Calculate the sample mean of x and assign it to an object called "xbar" f) Calculate...
Write an R function that will simulate 2 data sets from gamma distributions (``rgamma'' function, this...
Write an R function that will simulate 2 data sets from gamma distributions (``rgamma'' function, this gives us skewed samples) then does a standard t-test comparing the 2 means (``t.test'' function) and returns the p-value. The function should have 2 sample sizes and 2 sets of parameters as input. Now use the function to simulate a case with small sample sizes and the null hypothesis being true (equal means) and see how the type I error rate is affected by...
Data For Tasks 1-8, consider the following data: 7.2, 1.2, 1.8, 2.8, 18, -1.9, -0.1, -1.5,...
Data For Tasks 1-8, consider the following data: 7.2, 1.2, 1.8, 2.8, 18, -1.9, -0.1, -1.5, 13.0, 3.2, -1.1, 7.0, 0.5, 3.9, 2.1, 4.1, 6.5 In Tasks 1-8 you are asked to conduct some computations regarding this data. The computation should be carried out manually. All the steps that go into the computation should be presented and explained. (You may use R in order to verify your computation, but not as a substitute for conducting the manual computations.) A Random...
Multiple Choice Select the best answer from the available choices for each question. Which of the...
Multiple Choice Select the best answer from the available choices for each question. Which of the following is NOT part of the definition of a sample space S? S can be discrete or continuous Each outcome must be in S at most once Each element in S is equally likely Each outcome must be in S at least once S is a set of possible outcomes in an experiment Three A’s, three B’s, and two C’s are arranged at random...
Scenario Pigs R Us is a second generation, family-owned Richmond-based company with about 400 employees. It...
Scenario Pigs R Us is a second generation, family-owned Richmond-based company with about 400 employees. It slaughters, manufactures, and sells pork food products.  Pigs R Us (PRU) is a low-tech, hands-on, “bricks and mortar” type of company with solid brand recognition, an impeccable reputation for high quality and ethical standards. The processes used in manufacturing are with the highest ISO20002 standards, and the plant is maintained immaculately. The personnel are comprised of an older work force (average employee age is late...