James E. Pustejovsky – Predictive Checks of Statistical Models for SCD Data

Session overview

Collective brain dump (15 minutes)
Predictive checks for statistical models of single-case data (30-45 minutes)
Small- and large-group discussion (30-45 minutes)

Brain-dump questions

Editor perspective: What common problems are you seeing in manuscript review with respect to statistical analysis and reporting? Have you noticed any improvements in application of statistical analysis?
Researcher perspective: What challenges or limitations are you running into in applying statistical analysis that you think could potentially enhance your work?
Student perspective: Have you seen any applications of statistical analysis that you found especially compelling? Have you learned about any statistical methodology work that you find exciting or compelling?

Predictive Checks of Statistical Models for SCD Data

An Old and Under-Appreciated Method for Assessing Whether a Model is Any Good

Why engage in statistical analysis of single-case data?

Recent developments in models for SCD data are getting more complex and technical

We need tools for evaluating the

plausibility and credibility

of statistical analyses—even when

based on complex statistical models.

What is a parametric statistical model?

A succinct, highly stylized description of the process of collecting data.

A mathematical story about where you got your data.

A model describes not just the data you obtained, but also other possible outcomes of the study.
- This is what lets us make statements about uncertainty in parameter estimates and inferences.

A credible parametric model should tell believable, realistic stories.

Predictive Checks

Predictive Checks are a general technique for evaluating the fit of a parametric statistical model by examining other possible data generated from the model.
- Well-known part of Bayesian model development process (Berkhof, Mechelen, and Hoijtink 2000; Gelman et al. 2020; Sinharay and Stern 2003)
- Grekov, Pustejovsky, and Klingbeil (2024) explore use predictive checks for meta-analytic models of oral reading fluency outcomes in SCDs.

I will demonstrate predictive checking of a Bayesian multilevel model estimated using Markov Chain Monte Carlo.

Predictive Checking Workflow

Fit a statistical model.
Use the fitted model to simulate artificial data.
Examine the simulated data…
- By graphing it just like you would with real data.
- By calculating summary statistics for important features.

Barton-Arwood, Wehby, and Falk (2005)

Multiple baseline across participant pairs
Third graders with emotional and/or behavioral disabilities.
Horizon Fast-Track reading program and Peer-Assisted Learning Strategies
One-minute oral reading fluency (words read correct)

Three points of comparison

Are the simulated data points plausible considering what you know about the participants, behavior, and study context?
Are the simulated data points similar to the real data?
Are the simulated data points more realistic than data simulated from alternative models?

A better model

Use a negative binomial distribution instead of normal distribution (Li et al. 2023).
Allow for time trends in baseline and treatment.
Allow time trends to vary by case.

Summary Statistics

Examining summary statistics allows us to focus on and isolate specific important visual features of each data series.

But which summary statistics?
- Initial performance: First baseline observation
- Level: Mean/median of each phase
- Trend: Slope from linear regression (Manolov, Lebrault, and Krasny-Pacini 2024)
- Variability: SD of each phase
- Extinction: Proportion of zeros (Scotti et al. 1991)
- Overlap: Non-overlap of all pairs (Parker and Vannest 2009)
- Immediacy: Fine-grained effects (Ferron, Kirby, and Lipien 2024)
- Auto-correlation: First-order auto-correlation estimate (Busk and Marascuilo 1988; Matyas and Greenwood 1996)

Initial observation

Poor model

Better model

Level

Poor model

Better model

Trend

Poor model

Better model

Variability

Poor model

Better model

Percentage of Zeros

Poor model

Better model

(Non-)Overlap

Poor model

Better model

Auto-correlation

Poor model

Better model

Predicting with new cases

Examples so far are all about predicting possible data for the original participants.
With hierarchical models, we can also predict possible data for new participants.
- In meta-analytic models, we can predict possible data for new studies with new participants.

Baseline summary statistics for new participants

Counterfactual/hypothetical predictions

Predictive checks can be generated for hypothetical scenarios such as different study designs with different numbers of participants.

Between-group randomized experiment
N = 50 per group
Pre-test assessment
Post-test after 14 intervention sessions

Limitations

Predictive checking methods are limited to parametric statistical models.
I have demonstrated posterior predictive checking using Bayesian methods.
- Predictive checks for frequentist/likelihood-based estimation methods are possible but not quite as streamlined.

Open questions

Which summary statistics are generally useful?
How to do raw data graphical checks for meta-analytic models?
How to make the computations easier and more feasible?
How to share predictive checks as part of a study report?

Summary

Predictive checks are a useful and accessible way to evaluate the credibility of a parametric statistical model.
- Judgments can be informed by context and subject-matter expertise.
- Interpretation focuses on observable data, not unobservable parameters.
- A potential bridge between statistical and visual analysis.

Next time you need to evaluate a statistical model of single-case data, ask

Can we look at some predictive checks?

References

Barton-Arwood, Sally M., Joseph H. Wehby, and Katherine B. Falk. 2005. “Reading Instruction for Elementary-Age Students with Emotional and Behavioral Disorders: Academic and Behavioral Outcomes.” Exceptional Children 72 (1): 7–27.

Berkhof, Johannes, Iven Van Mechelen, and Herbert Hoijtink. 2000. “Posterior Predictive Checks: Principles and Discussion.” Computational Statistics 15: 337–54.

Busk, Patricia L, and Leonard A Marascuilo. 1988. “Autocorrelation in Single-Subject Research: A Counterargument to the Myth of No Autocorrelation.” Behavioral Assessment 10 (3): 229–42.

Ferron, John M., Megan S. Kirby, and Lodi Lipien. 2024. “Fine-Grained Effect Sizes.” School Psychology 39 (6): 613–24. https://doi.org/10.1037/spq0000634.

Gelman, Andrew, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. 2020. Bayesian Data Analysis. 3rd ed.

Grekov, Paulina, James E. Pustejovsky, and David A. Klingbeil. 2024. “Flexible Distributional Models for Meta-Analysis of Reading Fluency Outcomes from Single-Case Designs: An Examination Using Bayesian Methods.” Journal of School Psychology.

Li, Haoran, Wen Luo, Eunkyeng Baek, Christopher G. Thompson, and Kwok Hap Lam. 2023. “Multilevel Modeling in Single-Case Studies with Count and Proportion Data: A Demonstration and Evaluation.” Psychological Methods.

Manolov, Rumen, Hélène Lebrault, and Agata Krasny-Pacini. 2024. “How to Assess and Take into Account Trend in Single-Case Experimental Design Data.” Neuropsychological Rehabilitation 34 (3): 388–429.

Matyas, Thomas A, and Kenneth M Greenwood. 1996. “Serial Dependency in Single-Case Time Series.” In Design and Analysis of Single-Case Research, edited by Ronald D Franklin, David B Allison, and Bernard S Gorman, 215–43. Mahwah, NJ: Lawrence Erlbaum.

Parker, Richard I., and Kimberly Vannest. 2009. “An Improved Effect Size for Single-Case Research: Nonoverlap of All Pairs.” Behavior Therapy 40 (4): 357–67. https://doi.org/10.1016/j.beth.2008.10.006.

Scotti, J. R., I. M. Evans, L. H. Meyer, and P. Walker. 1991. “A Meta-Analysis of Intervention Research with Problem Behavior: Treatment Validity and Standards of Practice.” American Journal of Mental Retardation: AJMR 96 (3): 233–56.

Sinharay, Sandip, and Hal S. Stern. 2003. “Posterior Predictive Model Checking in Hierarchical Models.” Journal of Statistical Planning and Inference 111 (1-2): 209–21.