Data Sources and Methods

Data collection and harmonization

The tool includes data on primary invasive cancers of the oesophagus (C15), stomach (C16), colon (C18-19), rectum (C20), liver (C22), pancreas (C25), lung (C33), and ovary (C48.1-2, C56, C57.0) from population-based cancer registries covering 21 jurisdictions in seven countries: Australia (New South Wales, Victoria, and Western Australia), Canada (Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Nova Scotia, Ontario, Prince Edward Island, Quebec, and Saskatchewan), Denmark, Ireland, New Zealand, Norway, and the United Kingdom (England, Northern Ireland, Scotland, and Wales). The data include cases diagnosed in the period 1995–2014 and followed up until 31 December 2015, except for Ontario, for which follow-up is until 31 December 2014. Two registries provided data for somewhat less than the full 20-year period: Ireland (1995–2013) and Quebec, Canada (2000–2011). National cancer mortality estimates were obtained from the national vital statistics for the period 1995–2014, except for New Zealand, for which the period covered is 1995–2013.

All data were submitted through a secure portal at IARC, and the data were handled according to applicable data protection and privacy laws, rules, and regulations. Ethical approval was obtained from each participating registry, as well as from the IARC Ethics Committee.

Data quality control

Quality control measures were carried out on each dataset, and any identified issues were followed up with the registries and discussed in detail.

For survival analyses, cases were excluded according to the following criteria:

  • Diagnosed based on death certificate only (DCO) or at autopsy
  • Invalid or missing dates or age, or non-malignant tumour behaviour
  • Younger than 15 years or older than 99 years at diagnosis
  • Second or higher-order cancers at the same site

Details on the numbers of cases submitted and the quality indicators of the final datasets can be found here. Any interpretation of the data should take into account differences in data quality and registration practices.

More detailed information on the data processing can be found here.

Statistical analyses

Net survival (patients’ survival as if cancer was the only cause of death) was calculated to estimate population-based cancer survival. Net survival for patients older than 15 years was estimated for each primary site in each jurisdiction, and also for the Australian, Canadian, and United Kingdom jurisdictions combined.

Background mortality in each jurisdiction was obtained from life tables of all-cause death rates by sex, single year of age, and calendar year during 1995–2014. Complete life tables were available for all jurisdictions except one (Prince Edward Island, Canada), for which a Poisson model was used to interpolate 1-year intervals from an abridged (5-year) life table. Missing adjacent years (a maximum of 2) were imputed using the last observation carried forward (LOCF) method. Missing years and/or ages in the life tables for New Zealand were supplemented with data from the Human Mortality Database.

Net survival at 1, 3, and 5 years after diagnosis was computed by age, sex, period, and cancer site for each jurisdiction using the Pohar–Perme estimator (Perme et al., 2012; Dickman and Coviello, 2015). Age-standardization was carried out using international cancer survival standard weights (Corazziari et al., 2004). The cohort approach was used for 1995–1999, 2000–2004, and 2005–2009, and the period approach was used for 2010–2014 (Brenner and Gefeller, 1996).

Age-standardized cancer incidence and mortality rates (ASR) were computed for ages 25 years and older using the age-truncated World Standard Population and expressed as rates per 100 000 person-years (Segi, 1960).

Survival analyses by stage at diagnosis were carried out for the most recent period (2010-2014 for colorectal, lung and ovarian cancer and 2012-2014 for stomach, oesophageal and pancreatic cancer) and for jurisdictions able to provide stage at diagnosis information for at least 50% of the registered cases by site. For the purpose of stage comparisons across countries, stage information provided by cancer registries was mapped to one common system by translating individual T, N, M elements to SEER Summary staging (categorized as localised, regional and distant), using a pre-defined mapping algorithm. For cases with missing stage at diagnosis, stage information was imputed using multiple imputation; a total of 30 imputations were performed and results were combined using Rubin’s rules to estimate net survival and 95% confidence interval.

Details on the conversion algorithm for oesophageal, gastric and pancreatic cancer are described in Cabasag et al. and further information on site-specific in- and exclusion criteria can be found in the respective site-specific papers.


Brenner H, Gefeller O (1996). An alternative approach to monitoring cancer patient survival. Cancer. 78(9):2004–2010.

Brenner H, Hakulinen T (2002). Up-to-date long-term survival curves of patients with cancer by period analysis. J Clin Oncol. 20(3):826–832.

Corazziari I, Quinn M, Capocaccia R (2004). Standard cancer patient population for age standardising survival ratios. Eur J Cancer. 40(15):2307–2316.

Dickman PW, Coviello E (2015). Estimating and modeling relative survival. Stata J. 15(1):186–215.

Doll R, Payne P, Waterhouse JAH, editors (1966). Cancer incidence in five continents: a technical report. Geneva, Switzerland: International Union Against Cancer.

Hieke S, Kleber M, König C, Engelhardt M, Schumacher M (2015). Conditional survival: a useful concept to provide information on how prognosis evolves over time. Clin Cancer Res. 21(7):1530–1536.

Lambert PC, Royston P (2009). Further development of flexible parametric models for survival analysis. Stata J. 9(2):265–290.

Lambert PC, Dickman PW, Nelson CP, Royston P (2010). Estimating the Crude probability of death due to cancer and other causes using relative survival models. Stat Med. 29(7–8):885–895.

Nelson CP, Lambert PC, Squire IB, Jones DR (2007). Flexible parametric models for relative survival, with application in coronary heart disease. Stat Med. 26(30):5486–5498.

Perme MP, Stare J, Estève J (2012). On estimation in relative survival. Biometrics. 68(1):113–120.

Royston P, Lambert PC (2011). Flexible parametric survival analysis using Stata: beyond the Cox model. StataCorp LP. College Station (TX), USA: Stata Press.

Rutherford MJ, Crowther MJ, Lambert PC (2013). The use of restricted cubic splines to approximate complex hazard functions in the analysis of time-to-event data: a simulation study. J Stat Comput Sim. 85(4):777–793.

Segi M (1960). Cancer mortality for selected sites in 24 countries (1950–57). Sendai, Japan: Department of Public Health, Tohoku University of Medicine.

Cabasag CJ, Arnold M, Pineros M, Morgan E, Brierley J, Hofferkamp J, et al. Population-based cancer staging for oesophageal, gastric, and pancreatic cancer 2012-2014: International Cancer Benchmarking Partnership SurvMark-2. Int J Cancer 2021;149:1239-46.