Some elaborations of the step-function selection model
2025-03-27
Martyna Citkowicz
Megha Joshi
Ryan Williams
Joshua Polanin
Melissa Rodgers
David Miller
The research reported here was supported, in whole or in part, by the Institute of Education Sciences, U.S. Department of Education, through grant R305D220026 to the American Institutes for Research. The opinions expressed are those of the authors and do not represent the views of the Institute or the U.S. Department of Education.
Selective reporting occurs if affirmative findings are more likely to be reported and available for inclusion in meta-analysis
Selective reporting can distort the evidence base available for systematic review/meta-analysis
Strong concerns about selective reporting across social, behavioral, and health sciences.
Graphical diagnostics
Tests/adjustments for funnel plot asymmetry
Selection models
p-value diagnostics
Dependent effect sizes are ubiquitous in education and social science meta-analyses.
We have well-developed methods for modeling dependent effect sizes assuming no selection.
But only very recent developments for investigating selective reporting in databases with dependent effect sizes (Chen and Pustejovsky 2024).
Location-scale-selection regressions
Alternative estimation methods
Handling dependence
Simulation findings
The data for
The step-function model implies that the distribution of observed effect size estimates is piece-wise normal.
math = require("mathjs")
norm = import('https://unpkg.com/norm-dist@3.1.0/index.js?module')
eta = math.sqrt(tau**2 + sigma**2)
H = 2
alpha = [alpha1, alpha2]
lambda = [1, lambda1, lambda2_ratio * lambda1]
lambda_max = math.max(lambda)
function findlambda(p, alp, lam) {
var m = 0;
while (p >= alp[m]) {
m += 1;
}
return lam[m];
}
function findMoments(mu, tau, sigma, alp, lam) {
let H = alp.length;
let eta = math.sqrt(tau**2 + sigma**2);
let gamma_h = Array(H+2).fill(null).map((x,i) => {
if (i==0) {
return Infinity;
} else if (i==H+1) {
return -Infinity;
} else {
return sigma * norm.icdf(1 - alp[i-1]);
}
});
let c_h = Array(H+2).fill(null).map((x,i) => {
return (gamma_h[i] - mu) / eta;
});
let B_h = Array(H+1).fill(null).map((x,i) => {
return norm.cdf(c_h[i]) - norm.cdf(c_h[i+1])
});
let Ai = 0;
for (let i = 0; i <= H; i++) {
Ai += lam[i] * B_h[i];
}
let psi_h = Array(H+1).fill(null).map((x,i) => {
return (norm.pdf(c_h[i+1]) - norm.pdf(c_h[i])) / B_h[i]
});
let psi_top = 0;
for (let i = 0; i <= H; i++) {
psi_top += lam[i] * B_h[i] * psi_h[i];
}
let psi_bar = psi_top / Ai;
let ET = mu + eta * psi_bar;
let dc_h = c_h.map((c_val) => {
if (math.abs(c_val) == Infinity) {
return 0;
} else {
return c_val * norm.pdf(c_val);
}
});
let kappa_h = Array(H+1).fill(null).map((x,i) => {
return (dc_h[i] - dc_h[i+1]) / B_h[i];
});
let kappa_top = 0;
for (let i = 0; i <= H; i++) {
kappa_top += lam[i] * B_h[i] * kappa_h[i];
}
let kappa_bar = kappa_top / Ai;
let SDT = eta * math.sqrt(1 - kappa_bar - psi_bar**2);
return ({Ai: Ai, ET: ET, SDT: SDT});
}
moments = findMoments(mu, tau, sigma, alpha, lambda)
Ai_toprint = moments.Ai.toFixed(3)
ET_toprint = moments.ET.toFixed(3)
eta_toprint = eta.toFixed(3)
SDT_toprint = moments.SDT.toFixed(3)
pts = 201
density_dat = Array(pts).fill().map((element, index) => {
let t = mu - 3 * eta + index * eta * 6 / (pts - 1);
let p = 1 - norm.cdf(t / sigma);
let dt = norm.pdf((t - mu) / eta) / eta;
let lambda_val = findlambda(p, alpha, lambda);
return ({
t: t,
d_unselected: dt,
d_selected: lambda_val * dt / lambda_max
})
})
viewof mu = Inputs.range(
[-2, 2],
{value: 0.1, step: 0.01, label: tex`\mu`}
)
viewof tau = Inputs.range(
[0, 2],
{value: 0.15, step: 0.01, label: tex`\tau`}
)
viewof sigma = Inputs.range(
[0, 1],
{value: 0.25, step: 0.01, label: tex`\sigma_i`}
)
viewof alpha1 = Inputs.range(
[0, 1],
{value: 0.025, step: 0.005, label: tex`\alpha_1`}
)
viewof alpha2 = Inputs.range(
[0, 1],
{value: 0.50, step: 0.005, label: tex`\alpha_2`}
)
viewof lambda1 = Inputs.range(
[0, 2],
{value: 1, step: 0.01, label: tex`\lambda_1`}
)
viewof lambda2_ratio = Inputs.range(
[0, 2],
{value: 1, step: 0.01, label: tex`\lambda_2 / \lambda_1`}
)
Plot.plot({
height: 500,
width: 1000,
y: {
grid: false,
label: "Density"
},
x: {
label: "Effect size estimate (Ti)"
},
marks: [
Plot.ruleY([0]),
Plot.ruleX([0]),
Plot.areaY(density_dat, {x: "t", y: "d_unselected", fillOpacity: 0.3}),
Plot.areaY(density_dat, {x: "t", y: "d_selected", fill: "blue", fillOpacity: 0.5}),
Plot.lineY(density_dat, {x: "t", y: "d_selected", stroke: "blue"})
]
})
tex`
\begin{aligned}
\mu &= ${mu} & \qquad \sqrt{\tau^2 + \sigma_{ij}} &= ${eta_toprint} \\
\mathbb{E}\left(T_{ij}\right) &= ${ET_toprint}
& \qquad \sqrt{\mathbb{V}\left(T_{ij}\right)} &= ${SDT_toprint} \\
\Pr(T_{ij} \text{ is observed}) &= ${Ai_toprint}
\end{aligned}
`
Viechtbauer and López‐López (2022) proposed location-scale meta-regression:
Coburn and Vevea (2015) investigated variation in strength of selection as a function of study characteristics.
Meta-scientific questions about how selective reporting changes over time, as a result of intervention, or by outcome type.
Change in selection process could act as a confounder of real secular changes.
Account for studies that follow reporting practices that are not susceptible to selective reporting (Van Aert 2025).
Pre-registered reports are assumed to be fully reported.
A selection regression with no intercept:
Lehmann, Elliot, and Calin-Jageman (2018) reported a systematic review of studies on color-priming, examining whether exposure to the color red influenced attractiveness judgements.
Many published studies where selective reporting was suspected.
Review included 11 pre-registered studies.
Mean ES | Heterogeneity Variance | |||
---|---|---|---|---|
Coef. | Est. | SE | Est. | SE |
(A) Summary meta-analysis | ||||
Overall | 0.2073 | 0.0571 | 0.1032 | 0.0251 |
(B) Moderation by study type | ||||
Pre-Registered | -0.0456 | 0.0407 | 0.0959 | 0.024 |
Not Pre-Registered | 0.2504 | 0.0633 | 0.0959 | 0.024 |
Mean ES | Heterogeneity Variance | |||
---|---|---|---|---|
Coef. | Est. | SE | Est. | SE |
(A) Summary meta-analysis | ||||
Overall | 0.2073 | 0.0571 | 0.1032 | 0.0251 |
(B) Moderation by study type | ||||
Pre-Registered | -0.0456 | 0.0407 | 0.0959 | 0.024 |
Not Pre-Registered | 0.2504 | 0.0633 | 0.0959 | 0.024 |
Mean ES | Heterogeneity Variance | Selection Parameter | ||||
---|---|---|---|---|---|---|
Coef. | Est. | SE | Est. | SE | Est. | SE |
(A) Summary selection model | ||||||
Overall | 0.1328 | 0.1373 | 0.0811 | 0.0845 | 0.548 | 0.616 |
(B) Mean moderation by study type | ||||||
Pre-Registered | -0.0854 | 0.0587 | 0.0738 | 0.0776 | 0.547 | 0.606 |
Not Pre-Registered | 0.2591 | 0.1173 | 0.0738 | 0.0776 | 0.547 | 0.606 |
(C) Selective Reporting of Non-Pre-Registered Studies | ||||||
Pre-Registered | 0.1031 | 0.0966 | 0.0723 | 0.0756 | 1 | - |
Not Pre-Registered | 0.1031 | 0.0966 | 0.0723 | 0.0756 | 0.366 | 0.369 |
Mean ES | Heterogeneity Variance | Selection Parameter | ||||
---|---|---|---|---|---|---|
Coef. | Est. | SE | Est. | SE | Est. | SE |
(A) Summary selection model | ||||||
Overall | 0.1328 | 0.1373 | 0.0811 | 0.0845 | 0.548 | 0.616 |
(B) Mean moderation by study type | ||||||
Pre-Registered | -0.0854 | 0.0587 | 0.0738 | 0.0776 | 0.547 | 0.606 |
Not Pre-Registered | 0.2591 | 0.1173 | 0.0738 | 0.0776 | 0.547 | 0.606 |
(C) Selective Reporting of Non-Pre-Registered Studies | ||||||
Pre-Registered | 0.1031 | 0.0966 | 0.0723 | 0.0756 | 1 | - |
Not Pre-Registered | 0.1031 | 0.0966 | 0.0723 | 0.0756 | 0.366 | 0.369 |
Mean ES | Heterogeneity Variance | Selection Parameter | ||||
---|---|---|---|---|---|---|
Coef. | Est. | SE | Est. | SE | Est. | SE |
(A) Summary selection model | ||||||
Overall | 0.1328 | 0.1373 | 0.0811 | 0.0845 | 0.548 | 0.616 |
(B) Mean moderation by study type | ||||||
Pre-Registered | -0.0854 | 0.0587 | 0.0738 | 0.0776 | 0.547 | 0.606 |
Not Pre-Registered | 0.2591 | 0.1173 | 0.0738 | 0.0776 | 0.547 | 0.606 |
(C) Selective Reporting of Non-Pre-Registered Studies | ||||||
Pre-Registered | 0.1031 | 0.0966 | 0.0723 | 0.0756 | 1 | - |
Not Pre-Registered | 0.1031 | 0.0966 | 0.0723 | 0.0756 | 0.366 | 0.369 |
Augmented, reweighted Gaussian likelihood:
Clustered bootstrap re-sampling
Simulated summary statistics for two-group comparison designs with multiple, correlated continuous outcomes. Correlated-and-hierarchical effects generating process:
Varying primary study sample sizes
Censored one-sided
Comparison estimators
Correlated-and-hierarchical effects (CHE) model
PET-PEESE regression adjustment with cluster-robust SEs
viewof mu_bias = Inputs.input(0.0)
viewof tau_bias = Inputs.input(0.15)
bias_dat = transpose(dat).filter(function(el) {
return el.mean_smd == mu_bias && el.tau == tau_bias;
})
Plot.plot({
x: {
domain: ["CML","ARGL","PET/PEESE","CHE"],
label: null,
axis: null
},
fill: {
domain: ["CML","ARGL","PET/PEESE","CHE"],
},
y: {
grid: true,
domain: [-0.1,0.4]
},
fx: {
padding: 0.10,
label: "Selection probability",
labelAnchor: "center"
},
width: 800,
height: 500,
color: {
domain: ["CML","ARGL","PET/PEESE","CHE"],
legend: true
},
marks: [
Plot.ruleY([0.0], {stroke: "black"}),
Plot.boxY(bias_dat, {fx: "weights", x: "estimator", y: "bias", stroke: "estimator", fill: "estimator"})
]
})
Plot.plot({
x: {
domain: ["CML","ARGL","PET/PEESE","CHE"],
label: null,
axis: null
},
fill: {
domain: ["CML","ARGL","PET/PEESE","CHE"],
},
y: {
grid: true,
},
fx: {
padding: 0.10,
label: "Selection probability",
labelAnchor: "center"
},
width: 800,
height: 500,
color: {
domain: ["CML","ARGL","PET/PEESE","CHE"],
legend: true
},
marks: [
Plot.ruleY([0.0], {stroke: "black"}),
Plot.boxY(bias_dat, {fx: "weights", x: "estimator", y: "scrmse", stroke: "estimator", fill: "estimator"})
]
})
coverage_dat = transpose(CI_dat).filter(function(el) {
return el.mean_smd == mu_CI && el.tau == tau_CI && estimator.includes(el.estimator);
})
viewof mu_CI = Inputs.select(
[0.0,0.4,0.8],
{
value: 0.0,
label: tex`\mu`,
width: "100px"
}
)
viewof tau_CI = Inputs.select(
[0.05,0.15,0.30,0.45],
{
value: 0.05,
label: tex`\tau`,
width: "100px"
}
)
viewof estimator = Inputs.select(
["CML","ARGL"],
{
value: "CML",
label: "estimator",
width: "100px"
}
)
Plot.plot({
x: {
axis: null
},
y: {
grid: true,
domain: [0.70,1.00]
},
fx: {
padding: 0.10,
label: "Number of studies (J)",
labelAnchor: "center"
},
color: {
legend: true
},
width: 800,
height: 500,
marks: [
Plot.ruleY([0.95], {stroke: "black", strokeDasharray: "5,3"}),
Plot.boxY(coverage_dat, {fx: "J", x: "CI_boot_method", y: "coverage", stroke: "CI_boot_method", fill: "CI_boot_method"})
]
})
Marginal step-function selection models are worth adding to the toolbox.
Low bias compared to other selective reporting adjustments (including PET-PEESE)
Bias-variance trade-off relative to regular meta-analytic models
Clustered bootstrap percentile confidence intervals work tolerably well
Marginal modeling costs precision
Selective reporting of each outcome
Our data-generating process involved conditional independence of
Need further models (and diagnostics) for multivariate selection processes
metaselection
Currently available on Github at https://github.com/jepusto/metaselection
Install using
remotes::install_github("jepusto/metaselection", build_vignettes = TRUE)
Under active development, suggestions welcome!
Parameter | Full Simulation | Bootstrap Simulation |
---|---|---|
Overall average effect | 0.0, 0.2, 0.4, 0.8 | 0.0, 0.2, 0.4, 0.8 |
Between-study heterogeneity | 0.05, 0.15, 0.30, 0.45 | 0.05, 0.45 |
Within-study heterogeneity ratio | 0.0, 0.5 | 0.0, 0.5 |
Correlation between outcomes | 0.4, 0.8 | 0.8 |
Selection probability | 0.02, 0.05, 0.10, 0.20, 0.50, 1.00 | 0.05, 0.20, 1.00 |
Number of primary studies | 15, 30, 60, 90, 120 | 15, 30, 60 |
Primary study sample sizes | Typical, Small | Typical, Small |
2000 replications per condition
399 bootstraps per replication