Regression is a set of statistical techniques for relating a dependent variable to one or more independent variables. Briefly, a dependent variable (sometimes called an outcome variable) is one that you think is related to the independent variables. Although regression can’t prove causation, you usually think that the relationship goes from the independent variable(s) and to the dependent variable. (I will discuss the distinction in another post).
The most common kind of regression; the first one (and sometimes the only one) learned in statistics classes is known as ordinary least squares, or OLS regression. OLS regression is appropriate when the dependent variable is continuous, either interval or ratio level (if you do not know what those terms mean, see this article. OLS regression also makes other assumptions, but if the dependent variable is not continuous, OLS regression cannot be appropriate.
Of course, many variables are continuous, or nearly so: Income, weight, IQ, SAT score and many others. But many are not. They come in various forms.
Some variables are counts: For example, the number of children you have, the number of cars you own, the number of times you have been married. What distinguishes count variables is that they are integers, that is, whole numbers, and that they never negative. You cannot have a negative number of children or cars. The two most common kinds of regression for count variables are Poisson regression and negative binomial regression. So, for example, if you wanted to look at the number of children people had, and relate it to (say) age, income, racial/ethnic group, and the number of brothers and sisters the parents have, you would use one of these models. The main difference is that Poisson regression makes a very restrictive assumption about the relationship between the conditional mean and the conditional variance, while negative binomial regression relaxes this assumption. In my experience, the assumption of Poisson regression is almost never met.
Many count variables have a lot of zero counts. For example, the number of heart attacks a person has had: Most people have had none. To deal with such variables, there are variations called zero-inflated Poisson regression and zero-inflated negative binomial regression.
Some variables are dichotomies: They can only take two values. These include living vs. dying, getting a particular disease vs. not getting the disease, voting for Obama vs. voting for McCain and so on). For these variables, the appropriate regression method is binary logistic regression .
Some variables are nominal or ordinal, again, see this article for a detailed explanation, but, briefly, ordinal variables have an order but no defined interval and nominal variables have not even order. For nominal variables, the right kind of regression is multinomial logistic regression. For ordinal variables the right method is ordinal logistic regression.
Finally, some variables are times to events. For example, time until death, time until getting a degree and so on. One key trait here is that such variables are often censored. Although there are different kinds of censoring, the most common is right censoring. This means that some people do not have the event happen while you are studying them. But they may have one after the study is over. There is a whole field of statistics called survival analysis which deals with these variables, but the most common kind of regression for times is Cox proportional hazards regression.
This is just a brief survey; typically, courses are offered in each of the topics listed. But I hope it gives you some ideas about the varieties of regression.
Specialties: Regression, logistic regression, cluster analysis, statistical graphics, quantile regression.