I received a question from a colleague about computing variances and covariances for standardized mean difference effect sizes from a design involving a single group, measured repeatedly over time. Deriving these quantities is a little exercise in normal distribution theory, which I find kind of relaxing sometimes (hey, we all have our coping mechanisms!).
The set-up
Consider a study in which a single group of participants was measured at each of time-points, indexed as . At the first time-point, there has not yet been any exposure to an intervention. At the second and subsequent time-points, there is some degree of exposure, and so we are interested in describing change between time point and time-point 0. For each time-point, we have a sample mean and a sample standard deviation . For now, assume that there is complete response. Let denote the population mean and denote the population standard deviation, both at time . Let denote the correlation between outcomes measured at time and time , for , where . We might also have sample correlations for each time point, denoted . We calculate a standardized mean difference for each time-point by taking where is the sample standard deviation pooled across all time-points: The question is then, what is and what is , for ?
The results
Define the unstandardized mean difference between time-point and time-point 0 as . Then, from the algebra of variances and covariances, we have and From a previous post about the distribution of sample variances, we have that Consequently, Let denote the average population variance across all time-points, and let denote the standardized mean difference parameter at time . Then, following the multivariate delta method, where .
Without imposing further assumptions, and assuming that we have access to the sample correlations between time-points, a feasible estimator of the sampling variance of is where Similarly, a feasible estimator for the covariance between and is
In some cases, it might be reasonable to use further assumptions about distributional structure in order to simplify these approximations. In particular, suppose we assume that the population variances are constant across time-points, . In this case, the variances and covariances no longer depend on the scale of the outcome, and we have (here, is the average of the squared correlations between pairs of distinct time-points). Since will always be less than 1, will always be larger than . If sample correlations aren’t reported or available, it would seem fairly reasonable to use , or to make a rough assumption about the average squared correlation . With the approximate degrees of freedom , the variances and covariances are then given by
Extension
In some contexts, one might encounter a design that uses over-lapping but not identical samples at each time-point. For instance, in a rotating panel survey, each participant is measured repeatedly for some small number of time-points (say or ), and new participants are added to the sample with each new time-point. The simple repeated measures set-up that I described in this post is an imperfect approximation for such designs. In dealing with such a design, suppose that one knew the total number of observations at each time-point, denoted for , as well as the number of observations that were common across any pair of time-points, denoted as for . Further suppose that the drop-outs and additions are ignorable (missing completely at random), so that any subset of participants defined by a pattern of response or non-response is still representative of the full population. I leave it as an exercise for the reader (a relaxing and fun one!) to derive and under such a model.
Back to top