In regression and ANOVA, an interaction occurs when the effect of one independent variable on the dependent variable is different at different levels of another independent variable. When one or both of the independent variables is categorical, then two common strategies for dealing with interactions are stratifying and adding an interaction term. A somewhat less common method is classification and regression trees. Each has its advantages and disadvantages.
For example, suppose you are trying to explain voting behavior in the presidential election of 2012 and you have (among other things) data on
1) Who a person voted for (Obama, McCain, other, no one)
2) Their income (in dollars)
3) Their level of education (less than high school, high school, some college, college degree, some graduate school).
You suspect that education and income may interact: That is, you think that the effect of education on voting behavior is different among the wealthy, the middle class and the poor. Put another way, you suspect that the effect of income on voting behavior is different at different levels of education. How could you investigate this?
Since the dependent variable is categorical, with more than two categories that have no order, you probably want multinomial logistic regression (but see below). But how to deal with the interaction: You could stratify or you could add an interaction term. In stratification, you separate the data and run multiple models; here, you could run a regression for each level of education. Adding an interaction term is nearly always done by multiplying the two variables. In this case you would multiply the continuous variable income by the dummy codes for education.
Stratifying has the advantage of being easier to understand. The output for the model for people with, e.g., a high school education indicates the relationship between income and voting for high school graduates. It has the disadvantage of 1) Producing a lot of output (one regression for each stratum) and 2) Not providing estimates, p values or confidence intervals for the interaction. Adding an interaction term has the advantage of producing a single output with all terms evaluated, but the disadvantage of having many parameters to estimate (4 additional terms, in our example) and being harder to understand (understanding the parameter estimates for interaction terms is often confusing).
Another choice is a tree. In tree models, all the subjects (here, people) start out together. Then the computer finds the best way to split them into different types of voters (e.g. perhaps it splits “people with over $50,000 income” from those with less). These become new nodes and each is split again. Trees have the advantage of being able to find types of interactions that are hard to find with regression (particularly when there are many variables) and of producing a graphical output that is easy to understand. They have the disadvantage of being unfamiliar and of not producing the familiar parameter estimates. In addition, they may be unstable if the sample size is not large.
Which should you choose? That depends on your situation, but you might choose more than one!