We very often set the cutoff for a significant p value at 0.05, and minimal acceptable power at 0.8. Sometimes p of 0.01 is demanded. Sometimes power of 0.90 is demanded. But what are these values and how are they chosen? The p value is *not* the probability that your results are due to chance. That is a common misunderstanding. The p-value is an answer to a very specific question:

If, in the population from which this sample was randomly drawn, the null hypothesis was strictly true, how likely is it that I would get a test statistic at least as extreme as the one I got in a sample the size of the one I have?

So, if the null is *false* the p value is irrelevant. You can’t make a type I error if rejecting the null is correct.

Power, on the other hand, is the ability to *correctly* reject the null. If the null is true, we can’t make a type II error.

By setting p values and power, we are implicitly saying things about how bad a type I is, relative to a type II error. In particular, if we put p = 0.05 and power = 0.80, we are saying that a type I error is four times worse than a type II error. Is this reasonable?

Sometimes it is, sometimes it isn’t. Sometimes a type II error is much worse than a type I error; other times a type I error is more than 4 times worse than a type II error. Let’s consider:

Situation I: You develop a drug for a disease which is currently quickly terminal. You test it to see if it works. Type I error: You say the drug works when it does nothing. Type II error: You say the drug does nothing (or, rather, that there is insufficient evidence that it does something) when it works. Which is worse? If you make a type I error then you give a needless drug to dying people. If you make a type II error, you withhold an effective drug from dying people.

Situation II: You develop a drug for a relatively innocuous disease which is already somewhat treatable. Your drug is more expensive than the existing drug and it has some side effect that the old drug does not have, but you think it may be more effective. Type I error: You say your drug is better when it is not (let’s ignore the possibility that your drug is *worse*). Type II error: You fail to say your drug is better, when it is. If you make a type I error, then people are put to additional expense and suffer side effects for no reason. If you make a type II error, then people get somewhat worse results than they would have, for a non-fatal disease.

Of course, always using the same p and same power does have one advantage: It allows us not to think or have to justify our decision. Everyone does it. It also pleases journal editors and committee chairs. But it’s wrong.