| Step | Criterion | Remaining | Excluded | % of source |
|---|---|---|---|---|
| 1 | Source population (all members) | 50000 | NA | 100.0 |
| 2 | Female (cervical-outcome population) | 26001 | 23999 | 52.0 |
| 3 | + Continuous enrollment >= 365d, index in study window | 22721 | 3280 | 45.4 |
| 4 | + Adult (>= 18 years) at index | 9909 | 12812 | 19.8 |
| 5 | + No prior CIN2+/cervical cancer (washout) | 9864 | 45 | 19.7 |
| 6 | + Alive at index (final cohort) | 9850 | 14 | 19.7 |
Synthetic RWE HPV Vaccine-Effectiveness Study
Cohort attrition · Survival · Vaccine effectiveness · Registry QC catch
Every figure on this page was computed on fully synthetic claims generated by this repository, with a “true” effect deliberately built into the data generator. Nothing here is a real HPV vaccine-effectiveness estimate. The purpose is to demonstrate the analysis and the data-quality control end to end, and to show that the pipeline recovers the planted ground truth.
Cohort attrition
How the source population narrows to the analytic cohort through each pre-specified inclusion/exclusion criterion (CONSORT-style).
Final analytic cohort: 9,850 members.
Outcome ascertainment
Members, incident CIN2+/cervical-cancer events, and person-years by exposure group.
| Exposure | Members | Events | Person-years |
|---|---|---|---|
| unvaccinated | 6629 | 119 | 18347.6 |
| vaccinated | 3218 | 28 | 8938.1 |
Survival analysis
Descriptive cumulative incidence by ever-vaccinated status. The primary estimate comes from the time-varying Cox model below (the KM figure is descriptive and subject to immortal-time caveats).
Incidence rates (per 1,000 person-years)
| Group | Events | Person-years | Rate / 1,000 PY |
|---|---|---|---|
| unexposed PT | 127 | 20224 | 6.28 |
| exposed (vaccinated) PT | 20 | 7061 | 2.83 |
Vaccine effectiveness VE = (1 − HR) × 100%
| Model | Events | HR | HR 95% CI | VE (%) | VE 95% CI |
|---|---|---|---|---|---|
| crude | 147 | 0.44 | 0.28 to 0.71 | 55.7 | 29 to 72% |
| adjusted (age, region, screening) | 147 | 0.44 | 0.28 to 0.71 | 55.7 | 29 to 72% |
Adjusted VE = 56% (95% CI 29-72%) – recovering the generator’s true VE of 55%.
Data-quality finding (the catch)
Cross-source reconciliation independently re-discovered a systematic vaccine-classification anomaly planted in one registry feed: Gardasil 9 (HPV9) doses recorded before that product existed, contradicted by the members’ claims-derived product.
1,669 registry records across 1,288 patients flagged, entirely within the CAIR2 feed.
Impact & handling. The error misclassifies product, not dose presence, so the all-product VE above is robust while product-specific estimates are not. The full finding and the vendor-escalation plan are in docs/results_summary.md.