Question

Suppose that a data mining situation has 40 features that attempt to predict a numerical target...

Suppose that a data mining situation has 40 features that attempt to predict a numerical target value, and individual regressions are run for all 40 of them, with a p-value of .05 as the threshold for significance. How many features would you expect to show significance at this level, even if none are in reality related at all to the target?

- I would really appreciate it if you could explain why as well. Thank you.

Homework Answers

Answer #1

The threshold on p-value is 0.05. Hence, we are using a size 0.05 test for testing the significance of the coefficients. Now a type I error occurs when we conclude that a feature is significant, even when it is not. The probability of this type I error is precisely the size of the test  .

Note that a type II error cannot occur in the test as in reality the alternative of significance of features in not true for any of them.

Now there are 40 features, each can be wrongly concluded to be significant with probability .

Thus we can expect about many features to show false significance.

Thus the expected number is: 40 x p = 40 x 0.05 = 2 (Ans.)

Know the answer?
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for?
Ask your own homework help question
Similar Questions
During the first 20 games of a basketball season, one player made 34 of her 40...
During the first 20 games of a basketball season, one player made 34 of her 40 free throws. A teammate made 35 of her 44 free throws. Assume that all conditions for running two proportion inference procedures have been met - even though this is questionable. a) determine the two different standard errors you would use for the following procedures. -Confidence interval -test of significance b) The 95% confidence interval for the difference in the two proportions (player-teammate) is about...
Please do not attempt to solve if you can not answer all!!! THE ENERGY BAR INDUSTRY...
Please do not attempt to solve if you can not answer all!!! THE ENERGY BAR INDUSTRY In 1986, PowerBar, a firm in Berkeley, California, single-handedly created the energy bar category. Positioned as an athletic energy food, it was distributed at bike shops and events that usually involved running or biking. The target segment was the athlete who needed an efficient, effective energy source. Six years later, seeking to provide an alternative to the sticky, dry nature of the PowerBar, a...
Outline and answer all discussion questions following case description in details. (Do not attempt to solve...
Outline and answer all discussion questions following case description in details. (Do not attempt to solve if you can not fulfill all the requirements!!!!) THE ENERGY BAR INDUSTRY In 1986, PowerBar, a firm in Berkeley, California, single-handedly created the energy bar category. Positioned as an athletic energy food, it was distributed at bike shops and events that usually involved running or biking. The target segment was the athlete who needed an efficient, effective energy source. Six years later, seeking to...
1.    In a multiple regression model, the following coefficients were obtained: b0 = -10      b1 =...
1.    In a multiple regression model, the following coefficients were obtained: b0 = -10      b1 = 4.5     b2 = -6.0 a.    Write the equation of the estimated multiple regression model. (3 pts) b     Suppose a sample of 25 observations produces this result, SSE = 480. What is the estimated standard error of the estimate? (5 pts) 2.    Consider the following estimated sample regression equation: Y = 12 + 6X1 -- 3 X2 Determine which of the following statements are true,...
Use the dependent variable (labeled Y) and the independent variables (labeled X1, X2, and X3) in...
Use the dependent variable (labeled Y) and the independent variables (labeled X1, X2, and X3) in the data file. Use Excel to perform the regression and correlation analysis to answer the following. Generate a scatterplot for the specified dependent variable (Y) and the X1 independent variable, including the graph of the "best fit" line. Interpret. Determine the equation of the "best fit" line, which describes the relationship between the dependent variable and the selected independent variable. Determine the coefficient of...
Please read the article and answear about questions. Determining the Value of the Business After you...
Please read the article and answear about questions. Determining the Value of the Business After you have completed a thorough and exacting investigation, you need to analyze all the infor- mation you have gathered. This is the time to consult with your business, financial, and legal advis- ers to arrive at an estimate of the value of the business. Outside advisers are impartial and are more likely to see the bad things about the business than are you. You should...
These tests are intended for undergraduate students in college or those under 18 years of age....
These tests are intended for undergraduate students in college or those under 18 years of age. Read these directions carefully! The below test includes 10 questions, randomly selected from a large inventory. Most questions will be different each time you take the test, You must answer at least 9 out of 10 questions correctly to receive your Certificate. You have 40 minutes to complete each test, and you must answer all 10 questions in order to to see your results....
2. SECURING THE WORKFORCE Diversity management in X-tech, a Japanese organisation This case is intended to...
2. SECURING THE WORKFORCE Diversity management in X-tech, a Japanese organisation This case is intended to be used as a basis for class discussion rather than as an illustration of the effective or ineffective handling of an administrative situation. The name of the company is disguised. INTRODUCTION In light of demographic concerns, in 2012, the Japanese government initiated an effort to change the work environment in order to secure the workforce of the future. Japan is world renowned for its...
3 SECURING THE WORKFORCE Diversity management in X-tech, a Japanese organisation This case is intended to...
3 SECURING THE WORKFORCE Diversity management in X-tech, a Japanese organisation This case is intended to be used as a basis for class discussion rather than as an illustration of the effective or ineffective handling of an administrative situation. The name of the company is disguised. INTRODUCTION In light of demographic concerns, in 2012, the Japanese government initiated an effort to change the work environment in order to secure the workforce of the future. Japan is world renowned for its...
Please summarize the below article in approximately 100 words: Monumental function in British Neolithic burial practices...
Please summarize the below article in approximately 100 words: Monumental function in British Neolithic burial practices Ian Kinnes The high-risk rate of survival for the non-megalithic series of Neolithic funerary monuments, recently re-emphasized by Piggott (1973: 34), introduces a further variable into the deductive study of burial practices. In Britain and Europe the overall distribution of monumental forms present both lacunae and a marked preponderance of cairns over earthen mounds which are in ill accord with the known or predicted...
ADVERTISEMENT
Need Online Homework Help?

Get Answers For Free
Most questions answered within 1 hours.

Ask a Question
ADVERTISEMENT