Question

Suppose that a data mining situation has 40 features that attempt to predict a numerical target...

Suppose that a data mining situation has 40 features that attempt to predict a numerical target value, and individual regressions are run for all 40 of them, with a p-value of .05 as the threshold for significance. How many features would you expect to show significance at this level, even if none are in reality related at all to the target?

- I would really appreciate it if you could explain why as well. Thank you.

Homework Answers

Answer #1

The threshold on p-value is 0.05. Hence, we are using a size 0.05 test for testing the significance of the coefficients. Now a type I error occurs when we conclude that a feature is significant, even when it is not. The probability of this type I error is precisely the size of the test  .

Note that a type II error cannot occur in the test as in reality the alternative of significance of features in not true for any of them.

Now there are 40 features, each can be wrongly concluded to be significant with probability .

Thus we can expect about many features to show false significance.

Thus the expected number is: 40 x p = 40 x 0.05 = 2 (Ans.)

Know the answer?
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for?
Ask your own homework help question
Similar Questions
During the first 20 games of a basketball season, one player made 34 of her 40...
During the first 20 games of a basketball season, one player made 34 of her 40 free throws. A teammate made 35 of her 44 free throws. Assume that all conditions for running two proportion inference procedures have been met - even though this is questionable. a) determine the two different standard errors you would use for the following procedures. -Confidence interval -test of significance b) The 95% confidence interval for the difference in the two proportions (player-teammate) is about...
Suppose a data set with 15 observations has an outlier present (i.e. the extreme observation very...
Suppose a data set with 15 observations has an outlier present (i.e. the extreme observation very far away from the rest of the data). Would the mean of the data set be affected by this outlier? Explain: Would the median of the data set be affected by this outlier? Explain: 0 A student took a Biology exam and scored 77. If the class exam scores were mound-shaped with a mean score of 70 and standard deviation of 16, use z-score...
Please do not attempt to solve if you can not answer all!!! THE ENERGY BAR INDUSTRY...
Please do not attempt to solve if you can not answer all!!! THE ENERGY BAR INDUSTRY In 1986, PowerBar, a firm in Berkeley, California, single-handedly created the energy bar category. Positioned as an athletic energy food, it was distributed at bike shops and events that usually involved running or biking. The target segment was the athlete who needed an efficient, effective energy source. Six years later, seeking to provide an alternative to the sticky, dry nature of the PowerBar, a...
Outline and answer all discussion questions following case description in details. (Do not attempt to solve...
Outline and answer all discussion questions following case description in details. (Do not attempt to solve if you can not fulfill all the requirements!!!!) THE ENERGY BAR INDUSTRY In 1986, PowerBar, a firm in Berkeley, California, single-handedly created the energy bar category. Positioned as an athletic energy food, it was distributed at bike shops and events that usually involved running or biking. The target segment was the athlete who needed an efficient, effective energy source. Six years later, seeking to...
1.    In a multiple regression model, the following coefficients were obtained: b0 = -10      b1 =...
1.    In a multiple regression model, the following coefficients were obtained: b0 = -10      b1 = 4.5     b2 = -6.0 a.    Write the equation of the estimated multiple regression model. (3 pts) b     Suppose a sample of 25 observations produces this result, SSE = 480. What is the estimated standard error of the estimate? (5 pts) 2.    Consider the following estimated sample regression equation: Y = 12 + 6X1 -- 3 X2 Determine which of the following statements are true,...
Question 1 Jamal is a 40 year-old African American male. He is brought to a local,...
Question 1 Jamal is a 40 year-old African American male. He is brought to a local, out-patient mental health facility by his wife. She expresses concern about him, because he has been withdrawn for the last 2 or 3 weeks. She describes him as being irritable and restless. She also says he eats very little and has lost almost 10 pounds in the past 2 weeks. He agrees that he has been irritable; however, he attributes his irritability to the...
Use the dependent variable (labeled Y) and the independent variables (labeled X1, X2, and X3) in...
Use the dependent variable (labeled Y) and the independent variables (labeled X1, X2, and X3) in the data file. Use Excel to perform the regression and correlation analysis to answer the following. Generate a scatterplot for the specified dependent variable (Y) and the X1 independent variable, including the graph of the "best fit" line. Interpret. Determine the equation of the "best fit" line, which describes the relationship between the dependent variable and the selected independent variable. Determine the coefficient of...
Please read the article and answear about questions. Determining the Value of the Business After you...
Please read the article and answear about questions. Determining the Value of the Business After you have completed a thorough and exacting investigation, you need to analyze all the infor- mation you have gathered. This is the time to consult with your business, financial, and legal advis- ers to arrive at an estimate of the value of the business. Outside advisers are impartial and are more likely to see the bad things about the business than are you. You should...
Write a program in python that reads in the file quiztext (txt) and creates a list...
Write a program in python that reads in the file quiztext (txt) and creates a list of each unique word used in the file, then prints the list. Generate a report that lists each individual word followed by the number of times it appears in the text, for example: Frequency_list = [('was', 3), ('bird',5), ('it', 27)….] and so on. Note: Notice the structure: a list of tuples. If you're really feeling daring, Google how to sort this new list based...
These tests are intended for undergraduate students in college or those under 18 years of age....
These tests are intended for undergraduate students in college or those under 18 years of age. Read these directions carefully! The below test includes 10 questions, randomly selected from a large inventory. Most questions will be different each time you take the test, You must answer at least 9 out of 10 questions correctly to receive your Certificate. You have 40 minutes to complete each test, and you must answer all 10 questions in order to to see your results....
ADVERTISEMENT
Need Online Homework Help?

Get Answers For Free
Most questions answered within 1 hours.

Ask a Question
ADVERTISEMENT