There is a lot of interest in measuring change in all sorts of areas:

Am I losing weight?

Is Bush losing support?

Is our children learning? (hehe).

At first, this seems quite simple. Weigh yourself Monday. Weigh yourself next Monday. Did your weight go down or up?

But it’s not so simple, and for a couple of reasons.

The first reason is that, in all the questions above (and most of the ones we are interested in) there is error in our measurement. Scales aren’t perfect. Polls aren’t perfect. And tests of ability are a long way from perfect (ANY test of ability, standardized or not). Further, even if the scale or test were perfect, it wouldn’t measure exactly what you want. Even a perfect test can only measure a student’s ability on a particular day. If the kid has a cold, or didn’t sleep well,or whatever, then, even if the test is an accurate view of the child’s ability on that day, it isn’t a good measure of his or her true ability.

This problem can be dealt with by having many measurements. For example, if you only test a kid twice, and his scores are 92 and 75, then you have to figure that his score went down, and that he is not doing well. There’s no way to factor in the cold he had on the second day. But suppose you test him many times, and his scores are

92 91 93 94 72 94 92 91

now you can see that the 72 is some kind of mistake. You may not know what the problem is (he was sick, the student next to him was hitting him, he fell asleep, whatever) but you know that the 72 is weird in some way, and his ability is more or less constant.

The second problem is that many of the common methods for measuring change make assumptions about the data that are completely unrealistic. This one is harder to deal with, but it is possible, and it’s the main topic of this diary.

There are a number of good books on this sort of analysis, but to my mind, one of the best and clearest is longitudinal data analysis by Donald Hedeker and Robert Gibbons. Don gave me permission to quote a bunch from the beginning of the book, and I do that, with my own comments interspersed, as Hedeker and Gibbons assume a background that might not be realistic here.

If you DO have some statistics background, but want to learn more about longitudinal studies, I highly recommend the Hedeker and Gibbons book

Broadly speaking, there are six methods that have been proposed for dealing with the measurement of change over time. None of them is perfect, but some are better than others.

Before getting into the six methods, a little background is in order. When you have one variable that you think is related to a bunch of other variables, perhaps even causally, then you are in the field of regression. By far the most common kind of regression analysis is called ordinary least squares regression. This is very useful in many situations, but it makes certain assumptions. There is a good web page on regression here. Longitudinal data violates the assumption of independence. If you weigh yourself today and tomorrow, clearly there is no independence.

The first method eliminates the longitudinal problem by reducing the repeated measures into a summary score. This could be an average, or a gain score, or a trend, or a more complex score. Suppose, for example, you have measurements as follows over several days:

182 181 183 181 179 180

you could take the average (181.2) or the change (180-182 = -2) or find a line that fits the points, or a number of other things. There are three problems with this method:

- “our uncertainty in the derived measure is [inversely] proportional to the number of measurements” – that is, you are better off if you’ve measured something six times than if you’ve measured it twice. This causes big problems if different subjects have different numbers of measurements.
- “By reducing multiple measurements to a single variable, there is typically a substantial loss of statistical power” – briefly, statistical power relates to your ability to find what you are looking for.
- “the use of time-varying covariates is not possible”, for example, if you weigh yourself every day, but go on a diet after 5 days, and then off the diet 2 days later, there is no way to account for that.

“Second, perhaps the simplest but most restrictive model is the ANOVA for repeated measures”

The problem with this method is that it assumes compound symmetry, which means that the variances and covariances are constant over time. What this means, in plain language, is that the relationship between your weight today and your weight yesterday is just as strong as the relationship between your weight today and your weight 2 weeks ago. Clearly, this is nonsensical. The part about the variannces means that the amount of spread in people’s is constant over time, and this is also typically untrue. Oh, and if you have only two measurements, this is the same as a t-test.

“Third, MANOVA models have also been proposed for analysis of longitudinal data”. MANOVA stands for multivariate analysis of variance. The main problem with this method is that it does not allow for missing values. Yet, in most of the research that we will be interested in, there are missing values.

The above three methods cannot be recommended. Now, three that can be recommended.

“Fourth, generalized mixed-effects regression models…..[more on these below]…..are quite robust to missing data, and can handle time-invariant as well as time-variant covariates……the disadvantage is that they are computationally complex…..”

Fifth, covariance pattern models model the the variance-covariance matrix directly. The advantage is that they are computationally simpler than the mixed effect models (and therefore allow estimation of the full likelihood). “The disadvantage is that they do not attempt to distinguish within-subject variance”. That is, if you attempt to model the effect of a new education method by assigning some students to it, this method does not allow separate estimation of the effect of a person being a particular person vs. the effect of a person being in a particular group.

Finally, GEE or generalized estimatign equations are often useful, but they “assume that missing data are only ignorable if the missing data are explained by covariates in the model”. Briefly, this means that the reasons why Johnny missed school on Tuesday are captured by something in the model. This is unrealistic in much of the sort of research that we are interested in.

For the above reasons, I (like Hedeker and Gibbons) think that mixed models are often the way to go. So, what are they?

Well…..there’s one more preliminary. It’s almost impossible to write about these models without matrices. So….VERY briefly:

A matrix is a rectangular array of numbers:

` `

is an array. Adding two matrices is easy, but they have to be the same size. Just add each element to its corresponding element

` `

+

` `

`=`

` `

Multiplying two matrices is harder, and I am not going to try to show it in this format. See this site

But, if none of that makes sense to you, just ignore it and think of it as ordinary multiplication.

OK. In regular regression, we model a dependent variable (Y) as a linear combination of several independent variable (X)

In matrix terms

Y = XB + e

where B is a bunch of parameters to estimate, and e is the error.

Suppose we wanted to predict a person’s weight based on age, sex, and height. Then we have 3 independent variables, and the above turns to

Weight = b _{ 0 } + b _{ 1 } age + b _{ 2 } female + b _{ 3 } height + 3

Regression makes assumptions about e; specifically, it says that the errors are independent and identically distributed with mean 0 and constant variance.

The indepence assumption is violated with longitudinal data, so, we use a different model

Y = XB + ZG + e

Z is a bunch of what are called random effects.

The essential idea is that we let each individual have a random intercept and slope. So a person’s equation is partially based on general characteristics about that person (e.g. race, sex, age, or whatever is relevant) and partially on them being them.

The price you pay for this flexibility is a lot more complexity both in terms of conceiving the model properly and interpreting the results. There are questions about the covariance structure that need to be answered, for instance.

But this post is already very long and complex, but if people have questions, I will try to answer.

It’s almost impossible to write about these models with matrices.

should read

It’s almost impossible to write about these models *without* matrices.

I think.

Useful blog post… I think.

Thanks! I fixed the article

I am trying to analyse chnages in a species range over a period of 20 years. I have a measure of the range for each year. But need to know if the range is declining or increasing and whether that is statistically significant. Is that something that the mixed models you are talking about could be used for? I have the SPSS package but am very confused by too many options. If you could point me in the right direction I would be grateful.

Thanks

Hi Michael

Depending on how many time points you have (is it one per year or many per year?) you might want time series analysis. It also depends on what other variables you have. If you just have range and time, then mixed models are not what you want.

Peter

Hi Peter,

I have a dataset where students rate their instructors’ teaching effectiveness 7 times over the course of a semester. I am interested in determining whether the change we are seeing in ratings over time is significant. I am also interested in determining whether change between two specific points (e.g., between time 3 and 4) is significant. I was planning on using repeated measures, but it looks like that might not be the right move? What might you recommend in this situation? Thanks for your blog–it is clearly written and has helped to motivate me to start my analyses!

Kari

Thanks

You probably want some kind of mixed model (aka hierarchical linear model, multi-level model)

Thanks, Peter!

Hi Dr. Flom,

Thank you for this post. I’m finding it really useful as a refresher.

I am working with a dataset of 1s and 0s, where 1s represent the relatively uncommon event of spore release in a particular algae during a time interval. I’m hoping to determine whether release becomes more or less frequent over time. I was thinking of using a regression, but I am wondering whether there is another test more appropriate to testing changes in frequency instead of changes in value.

I can’t really tell without more information – what variables have you got? How many time points?

It’s like this: I recorded “spores released” every ten minutes for 3-5 days (so around 500 times). “Spores released” is usually 0, occasionally 1. My only independent variable is time – I want to know if the release frequency increases or decreases over the 4 days.

I guess my main question is this: since the sort of data I’m looking at is very different from yours (“0 0 0 1 0 0 0 1 0 0 0″… vs “92 91 93 94 72 94 92 91” are the same approaches still valid? Maybe some kind of transformation is in order?

It still could be analyzed the same way; it depends on whether it meets the assumption of the model

Hello Dr. Flom,

I’m hoping you might kindly be able to help me with a statistical problem I am having. I am analysing a landcover type over time. I have 4 1km2 grids, and aerial imagery from 1968, 1976, 1989 and 2005. I have data for landcover as both a % and in hectares and m2. Which tests would best test significant temporal change? Many thanks, Jules

Hi Jules

Sounds like you probably want some sort of nonlinear multilevel model – a complicated analysis.

My background is in quality improvement in health care settings. What is your opinion of run charts as a simple way of assessing whether a change over time is statistically significant?

A run chart by itself can’t assess statistical significance (although it may give an indication of practical importance). To find statistical significance you need a hypothesis and a test of that hypothesis.

Message…What’s your opinion of using multiple OLS regression then removing serial autocorrelation of residuals. Example: Variation in independent variable highly assocaited with variation in water temperature and rainfall. “Time” is days from start of sampling period with sampling at approximately equal intervals accross time.. Model y = Time – Wtemp + Rain then remove autocorrelation.

I am not an expert in time series but why not use established methods such as ARIMA and ARIMAX?

Two reasons. One is that I’m not familar with ARIMA and ARIMAX. Just had to Google to see what they are. The second is that, though I do statistical analysis as part of my job, I’m not classified as a statistician. My agency issues us computers with standard software. I submitted a “nonstandard software” request for a statistics package to do multivariable analyses and was told I can use Linest in Excel to do it. So that’s what I have to use except that I use on line calculators when I can. The technique I use is supposed to be valid for looking at variables such as rainfall and water temperature in time series. I’m just wondering if it’s valid if you’re including time itself as a variable since the idea is that it elminates the autocorrelation in the residuals.

Message…Oh..forgot to mention that when I Googled ARIMA and ARIMAX a quick look gave me the impression that they are for forecasting. I’m not trying to forcast. I’m need a quick tool to determine whether or not there is a substantial trend over time within data collected in the past. Also tryiing to get an idea as to, if there is a trend over time, how much of an impact it had on mean levels of the independent variable once you take out the “noise” introduced by things like rain events, seasonal changes, etc.

One last thing since you were kind enough to respond: This has to do with programs for sampling bivalve molluscan shellfish harvest areas to classify them to allow harvest for human consumption under given conditions. One doesn’t want to use obsolete data if there has been a true underlying trend over time. At the same time, once doesn’t want to unecessarily discard data that might yield a better understanding of variation in response to day to day occurences such as rainfall and water temperature. So it’d be good to be able to have some understanding of whether or not an underlying trend exists and, if so, how important it is so one can make the decision about how “far back” to go in using data.

Hi John

You can certainly model the effects of time on a single variable using fairly standard methods. But such a trend wouldn’t give a cutoff data for when data were obsolete and if you have to do more than one variable at a time, linear estimation isn’t going to be enough, you will need time series analysis and Excel hasn’t got that

If you are going to be doing statistical analysis with complex data as part of your job, you really need more than Excel; R is freely available and can do almost anything, although it has a bit of a learning curve

Peter

Thanks. I agree with what you say if I’m able to get access to and effectively use software to apply more appropriate approaches. However, I asked the question because of your Statement in the discussion at the top of this page, “Regression makes assumptions about e; specifically, it says that the errors are independent and identically distributed with mean 0 and constant variance.” I am wondering if removing first order serial autocorrelation from the residuals effectively addresses that problem. Like for instance I did a dataset the other day (n = 284). I included “Time” along with rainfall and water temperatue and independent variables. I tested for serial autocorrelation and got a Durbin Watson (D) statistic of 1.67. So I transformed all the variables using a technique I found designed to remove autocorrelation. I re-ran the regression then tested autocorrelation of residuals again. This time D = 1.96; very close to the “0” value of 2. I’m interpreting that to mean that the assumption of idependent errors is no longer violated. Am I wrong about that?

Oh…another P.S. I understand that even if the assumption of independent errors is addressed by what I did I can’t make any predictions because I can’t predict outside of the range of the independent variables and since “Time” is one of them that’s it. Any prediction about anything that happens after the last date represented in the dataset is outside of the range of that independent variable.

I am not completely certain,but that sounds like it should be right

Hi Dr. Flom,

This is a great article! I am working with someone who has performed a randomized controlled trial where a variable was measured over three times. There are some missing data. She has created a composite (like with the weight example you give) of the difference in the variable (a1c levels). Basically she subtracts the amount at time one from time 3. To assess treatment effects she runs a regression and controls for time 1 level of a1c. The simple model is a1cdiff = a1c1 treatgroup. Because of the missing data and the fact that she measured many variables over three time points, I wanted to use proc mixed instead. Her data were in the “wide” format. We played around and created a “long” format data set and I ran a basic proc mixed model using a1c as the DV. While in the wide format, the effects of treatgroup on that “diff” variable were significant. While in the long format, the effects were not significant. She thinks it is the proc mixed, but based on gut and some things I’ve learned, and your article, I believe it is wrong to use that composite variable. She actually has higher statistical power with it, though. Can you explain why there would be a diff in wide to long models and the effects? It’s mathematical, I know. I apologize for the length of this message! Thanks for your help in advance! Also – there was a “survey” that popped up on your page. It asks for my contact information. I wasn’t sure if it was you or coming from some other untrustworthy source.

The survey comes from me.

For the rest, things get complex. One good source is Hedeker & Gibbons,

Longitudinal Data Analysisbut any good book on longitudinal data will explain things.Thanks. I have Singer & Willett, which is a good source, but finding simple answers is not easy! I’ll check out Hedeker & Gibbons too.

Liz

Hi Dr. Flom,

This is really an informative article. I would like to start an analysis with the below dataset. Could you please let me which model could be better to start of with and why? Dataset shows the time taken for data delivery to the client for each month.

Client JAN FEB MAR APR MAY JUN JUL

1 12 14 13 11 9 7 4

2 21 NULL 17 NULL 15 10 8

3 7 8 NULL 7 5 2 1

Thank you for all your help in advance! Looking forward for your reply.

Any links to refer to the model you prescribe is also highly appreciated.

With only 3 people there’s not much you can do. If you have a lot more people then it depends on what your goals are. First, figure out what you want to know (that should be done

beforeyou start collecting data) then you can figure out a) What data you need b) What analysis to do.Peter

Nice overview! Any suggestions for the following situation: I have two tools to measure one aspect of behavior, and I would like to compare the two tools and find out which one is more sensitive to change over time. So I have baseline and 24 months as my 2 time points, and tool A and B. The behavior in question gets gradually worse over time, so both tools will reflect that progression. What would be the appropriate statistical approach to compare these two tools reg. their respective sensitivity to change? Thanks a lot.

The one that changes more is more sensitive to change, but whether it is over-sensitive or the other one is under-sensitive seems impossible to tell with what you’ve given me.

Dear Peter,

Thanks for taking the time to help us non-experts. Your input would be very helpful to the following experiments:

We have a set of 5 independent longitudinal experiments. One experiment is measuring the activity of at least 50 neurons at 10-12 different time points. We acutely manipulate the activity of all measured neurons usually between time point 2 and 3. Thus, the first two time points are ‘baseline’, then we manipulate, and then we hope to see changes compared to baseline. Unfortunately, we have different amounts of time points for each of the experiments but always between 10-12 time points. We align the experiments according to the acute manipulation. After the manipulation, the time intervalls between time points are identical between experiments but the intervall length varies: after manipulation, the time points are 3h, 6h, 9h, 12h, (then my student goes to bed), 25h, 33h, 55h, and 72h. Within one experiment, it is always the same 50 or more neurons we are looking at but each experiment is carried out in a different sample. We now want to see if there is a reproducible trend across the 5 independent experiments in respect to changes in activity. I hope I was clear in my description. Which statistical test would you recommend to assess significance and why?

Thank you very, very much in advance!

Best,

Sidney

Hi Sidney

I think you need multilevel models.

Regards

Peter

Hi Peter

Thank you for this and taking the time to help! I’m looking at 4 metrics over time, trying to determine relationships between changes in consumption habits with perceptions to various change barriers (eg taste), plans for future consumption and motivations for that change. So, trying to see how people’s consumption changes over time, depending on their perceptions of barriers and/or certain motivations, etc., also how their perceptions of these barriers changes too! Any advice on what tools to use greatly appreciated. 🙂

Cheers,

Trent

Hi Trent

Probably some kind of multilevel model.

Peter

Great Article..I am trying to monitor, evaluate or even quantify what I call “drift” in a population of measured parameters.

For example I log each day the temperature for an incubator that I have calibrated for 37*C.

Over the course of many months how do I know if the temperature is “drifting” in one direction or another?

37,37,37.1, 37.2, 37.1, 37,37.2 etc etc

It would be nice to have an alarm/warning if the measured parameters start to drift and thus the incubator may require (re) calibration,

Thank you for your help

You would have to precisely define ‘drift’ then you could develop an algorithm to check for it.