Synthetic RWE HPV Vaccine-Effectiveness Study

Cohort attrition · Survival · Vaccine effectiveness · Registry QC catch

Published

May 28, 2026

Synthetic data — methods demonstration only

Every figure on this page was computed on fully synthetic claims generated by this repository, with a “true” effect deliberately built into the data generator. Nothing here is a real HPV vaccine-effectiveness estimate. The purpose is to demonstrate the analysis and the data-quality control end to end, and to show that the pipeline recovers the planted ground truth.

Cohort attrition

How the source population narrows to the analytic cohort through each pre-specified inclusion/exclusion criterion (CONSORT-style).

Step Criterion Remaining Excluded % of source
1 Source population (all members) 50000 NA 100.0
2 Female (cervical-outcome population) 26001 23999 52.0
3 + Continuous enrollment >= 365d, index in study window 22721 3280 45.4
4 + Adult (>= 18 years) at index 9909 12812 19.8
5 + No prior CIN2+/cervical cancer (washout) 9864 45 19.7
6 + Alive at index (final cohort) 9850 14 19.7

Final analytic cohort: 9,850 members.

Outcome ascertainment

Members, incident CIN2+/cervical-cancer events, and person-years by exposure group.

Exposure Members Events Person-years
unvaccinated 6629 119 18347.6
vaccinated 3218 28 8938.1

Survival analysis

Descriptive cumulative incidence by ever-vaccinated status. The primary estimate comes from the time-varying Cox model below (the KM figure is descriptive and subject to immortal-time caveats).

Cumulative incidence of CIN2+/cervical cancer (synthetic data).

Incidence rates (per 1,000 person-years)

Group Events Person-years Rate / 1,000 PY
unexposed PT 127 20224 6.28
exposed (vaccinated) PT 20 7061 2.83

Vaccine effectiveness VE = (1 − HR) × 100%

Model Events HR HR 95% CI VE (%) VE 95% CI
crude 147 0.44 0.28 to 0.71 55.7 29 to 72%
adjusted (age, region, screening) 147 0.44 0.28 to 0.71 55.7 29 to 72%

Adjusted VE = 56% (95% CI 29-72%) – recovering the generator’s true VE of 55%.

Data-quality finding (the catch)

Cross-source reconciliation independently re-discovered a systematic vaccine-classification anomaly planted in one registry feed: Gardasil 9 (HPV9) doses recorded before that product existed, contradicted by the members’ claims-derived product.

1,669 registry records across 1,288 patients flagged, entirely within the CAIR2 feed.

Registry doses over time by product. HPV9 doses before 2014 are impossible - the planted anomaly.

Impact & handling. The error misclassifies product, not dose presence, so the all-product VE above is robust while product-specific estimates are not. The full finding and the vendor-escalation plan are in docs/results_summary.md.