Peter Flom’s Statistics 101: Which kind of regression model should I choose?

By , November 19, 2012 10:33 am

Regression is a set of statistical techniques for relating a dependent variable to one or more independent variables. Briefly, a dependent variable (sometimes called an outcome variable) is one that you think is related to the independent variables. Although regression can’t prove causation, you usually think that the relationship goes from the independent variable(s) and to the dependent variable.  (I will discuss the distinction in another post).
Continue reading 'Peter Flom’s Statistics 101: Which kind of regression model should I choose?'»

Peter Flom’s Statistics 101: Sensitivity and specificity

By , November 5, 2012 2:22 pm

What are sensitivity and specificity?

Sensitivity and specificity are measures of the effectiveness of a diagnostic test. Most often they are used as part of medical research when doctors (or others) try to determine if a patient has a disease or not. This leads to four possible results:

  • True positive – The patient has the disease and you say he does
  • True negative – The patient does not have the disease and you say he does not
  • False positive – The patient does not have the disease and you say he does
  • False negative – The patient has the disease and you say he does not

Sensitivity is defined (see Dictionary of Statistics) as the conditional probability of having a positive test result, given that the patient has the disease. That is:

True positive/(true positive + false negative)

Specificity is defined as the “conditional probability of a negative test result, given that the patient does not have the disease”.  That is
True negative/(true negative + false positive)

What are good values of sensitivity and specificity?
This varies by the state of knowledge in the field. Higher values of both measures are always better. But some diseases already have excellent diagnostic tools and some do not. Sometimes, sensitivity and specificity are used to compare two tests, one of which may be a “gold standard” and the other may be new and possibly better. Or the new test may be less expensive or easier to administer or have fewer side effects. Then researchers must decide if the sacrifice of sensitivity and specificity are worth the cost.

How can sensitivity and specificity be adjusted?
Many, if not most, diagnostic measures give a quantitative result. For example, a person’s blood pressure is not high or low, it is defined in millimeters of mercury. Researchers can increase sensitivity (and decrease specificity) by using different numbers as cut-offs for diagnosing disease.

Taking it to ridiculous extremes, if you simply say that every patient has the disease, then sensitivity will be 1, which is perfect. But then specificity will be 0, which is as bad as it can be.

Which is more important, sensitivity or specificity?
This depends on the disease, its prognosis, the effectiveness of treatment and the side effects of treatment. Is a false negative worse than a false positive? Sometimes it is, sometimes it is not.

If the disease is easily treated but very harmful if not treated then sensitivity is more important. In this situation, you do not want to miss anyone who has the disease. But if the disease is difficult to treat and not harmful, then specificity is more important. In this case you do not want to tell people who do not have the disease that they do.

Can sensitivity and specificity be used outside medicine?

Certainly. They can be used whenever a diagnosis is being made. This could be a promotion decision at work, a decision to issue a credit card or any decision that has two choices and can be right or wrong.

Sources: B.S. Everitt, Dictionary of Statistics, Cambridge University Press.

Peter Flom’s Statistics 101: How to read a statistics book

By , October 22, 2012 12:47 pm

So, you’ve got to read a statistics book. Maybe you’re taking a statistics course, or maybe you are working on some research and need to learn something about statistics. And the text isn’t something with a bright colored cover and a title like “Statistics for Dummies who think they don’t like statistics: The cartoon version”. No, it’s a text. And it hasn’t got a lot of fancy sidebars and things. And it’s got formulas. And you don’t like formulas.

What to do?

Continue reading 'Peter Flom’s Statistics 101: How to read a statistics book'»

Peter Flom’s Statistics 101: ANOVA, ANCOVA, regression: What’s what?

By , October 15, 2012 10:23 am

When learning statistics, you may learn about ANOVA (analysis of variance), ANCOVA (analysis of covariance) and ordinary least squares regression. The way these are taught in many fields leaves many people confused.   Indeed, many people do not realize these are all the same model. Continue reading 'Peter Flom’s Statistics 101: ANOVA, ANCOVA, regression: What’s what?'»

Peter Flom’s statistics 101: Dependent and independent data

By , October 8, 2012 1:18 pm

Often, when reading a statistics book, you will see some variation on the phrase “independent data“. Many models assume that the data are independent. Sometimes this is abbreviated as part of the acronym iid which means independent and identically distributed.

You may get confused between this and the case of independent and dependent variables.  But the two ideas are quite different.

Continue reading 'Peter Flom’s statistics 101: Dependent and independent data'»

Panorama Theme by Themocracy