Cancer Registry of Catalonia, Plan for Oncology of the Catalan Government, IDIBELL, Hospital Duran i Reynals. Av. Gran Via de l’Hospitalet, 199-203, 08908 – L’Hospitalet de Llobregat, Catalonia, Spain

Department of Clinical Sciences, University of Barcelona, Barcelona 08907, Spain

Epidemiology Unit and Cancer Registry of Girona, Institut Català d’Oncologia (ICO), Girona 17004, Catalonia, Spain

Tarragona Cancer Registry. Foundation Society for Cancer research and Prevention, Pere Virgili Health Research Institute, Reus 43201, Spain

Abstract

Background

The repertoire of statistical methods dealing with the descriptive analysis of the burden of a disease has been expanded and implemented in statistical software packages during the last years. The purpose of this paper is to present a web-based tool,

Results

We show a real-data application through the assessment of the burden of tobacco-related cancer incidence in two Spanish regions in the period 1995–2004. Making use of

Conclusions

Unlike other similar applications,

Background

An aim of public health assessment involves describing the health status of a defined population by looking at their changes over time or by comparing their health events to events occurring in other populations. Descriptive epidemiology of cancer, for example, may assess the size of the problem that cancer poses to health, measuring the risk in the same population at different periods of time

Worldwide, statistical methods for descriptive analysis has been expanded and implemented in statistical software packages during the last years. The most comprehensive coverage of statistical methods for analyzing cancer data is SEER*Stat

The purpose of this paper is to present a set of web-based tools,

In this paper

Implementation

Descriptive statistics for rates

Suppose that we want to assess the burden of a disease in a certain population of size N during a certain period of time. Consider that we have observed X cases of the disease under study, therefore the crude rate (CR) is defined as X/N. The CR is the simplest and most straightforward summary measure of the population’s diseases under study. But the events may be strongly related to age, so the age-specific events will differ greatly from one another, therefore it is of interest to calculate the age-specific rates. The use of a world standard population

Estimating the annual percent change in rates (EAPC)

In descriptive epidemiology, the evolution of incidence or mortality rates of certain disease during a determined time period can generate etiological hypotheses. The estimated annual percent change (EAPC) is one way to characterize trends in disease rates over time. This means that the rates are assumed to change at a constant percentage of the rate of the previous year _{
T
} be the ^{
th
} year,

where ^{
β
} − 1)

Predicting the Expected number of incident (or death) disease cases by age group using the time trends of rates

Prediction of future disease burden is essential for effective health service planning, as it may be utilized by public health authorities to formulate prevention, diagnosis and treatment strategies _{
iT
} is the number of cases for the ^{
th
} age-group and period _{
iT
} are the corresponding person-years at risk is linear in its log-scale, the following log-linear model can be fitted to these rates:

where _{0} is the reference time, _{
i
} is the log-rate at _{
0
} for the ^{
th
} age-group and _{
i
} is the age-specific slope. A parsimonious version of this model can be also used assuming a common slope for each age group,

where this model is known as the age-drift model. For these models we assume _{
iT
} to follow the Poisson distribution

Prediction of incidence at a future time _{
iT
} by _{
iF
} into the fitted model. Poisson and Negative Binomial distribution are both assumed for each model in (2) and (3). Therefore 4 models are assessed for the selection of the best fitting model to data. The assessment is made through the Akaike’s Information Criterion (AIC)

Comparing risk between two groups (time periods or geographical areas): standardized incidence or mortality ratio (SIMR)

where

Assessment of differences due to risk and demographic factors when comparing disease rates of two populations

To assess differences for incidence or deceased cases between two different time points or two areas in order to clarify the role of the changes in demographic factors and the risk of developing or dying from a disease, we used the method of Bashir and Estève

We note that in each age group we must take into account that rates into the period 2000–2004 must be considered as constant as well as rates into the period 1995–1999. If the population size is expected to increase by 10%, incident cases will also increase by 10%. The effect of population structure is estimated by comparing the rate observed in 1995–1999 and the rate expected in the 2000–2004, through applying the age specific rates observed in 1995–1999 to the population pyramid in 2000–2004. Lastly, the percent change not explained by percent change in the population will be considered to be due to the variation in risk of developing the disease

** freeware statistical tools for the analysis of disease population databases used in health and social studies”.**

Click here for file

Assessing survival of a cohort of patients diagnosed with a certain disease

The observed survival (OS) rate is the basic measure of the survival experience of a group of patients from the date of diagnosis to a certain time. However, information on the cancer patients’ causes of death might not be always suitable or it might be vague or unavailable

where _{
o
}(_{
E
}(_{
o
}(_{
E
}(_{
o
}(_{
E
}(_{
E
}(_{
E
}(5) between both periods increased but _{
o
}(5) remained stable, and therefore RS(5) decreased

The set of web-based applications included in REGSTATTOOLS

Figure

Schematic overview of Web-based applications included in REGSTATTOOLS and the required input files to perform each one of the statistical analyses

**Schematic overview of Web-based applications included in REGSTATTOOLS and the required input files to perform each one of the statistical analyses.**

The

The

Finally, the RS must be obtained through the

We will refer to AF throughout the paper where additional figures and tables can be found, and those that are related to the example section.

Results

To illustrate the use of

**Patient_ID**

**Sex**

**d_group**

**d_age**

**i_month**

**i_year**

**f_month**

**f_year**

**Status**

**Follow_up**

1

1

Lung

84

3

1995

3

1995

1

0.08

2

1

Lung

65

2

1995

12

1999

1

0.83

3

1

Lung

63

3

1995

5

1999

1

4.17

…

…

…

…

…

…

…

…

…

…

194

2

Lung

72

2

1995

8

1995

1

0.50

195

2

Lung

42

9

1995

11

1995

1

0.17

196

1

Lung

72

10

1995

1

1996

1

0.25

197

1

Lung

52

5

1995

12

2008

0

12.58

198

1

Lung

79

1

1996

9

1997

1

1.67

…

…

…

…

…

…

…

…

…

…

6087

1

Larynx

59

6

2000

8

2004

1

4.17

6088

2

Larynx

68

7

2001

8

2004

1

3.08

6089

2

Larynx

53

3

2000

12

2006

0

6.75

6090

1

Larynx

87

7

1995

12

2006

0

11.42

…

…

…

…

…

…

…

…

…

…

12989

1

Kidney

50

7

2002

5

2003

1

0.83

12990

2

Kidney

90

7

2003

7

2003

1

0.08

12991

1

Kidney

49

6

2004

6

2004

1

0.08

12992

1

Kidney

67

4

2004

6

2004

1

0.08

**Sex**

**Age.group**

**Year**

**Group**

**Cases**

**Population**

1

1

2000

Kidney

0

27251

1

2

2000

Kidney

0

27741

1

3

2000

Kidney

0

29381

…

…

…

…

…

…

2

16

2004

Larynx

0

26115

2

17

2004

Larynx

0

19665

2

18

2004

Larynx

0

15737

…

…

…

…

…

…

1

1

2000

Lung

0

27251

1

2

2000

Lung

0

29381

…

…

…

…

…

…

2

16

2004

Stomach

8

26115

2

17

2004

Stomach

19

19665

2

18

2004

Stomach

15

15737

**Men**

**1995-1999**

**2000-2004**

Risk

T

RS

LCI

UCI

OS

Risk

T

RS

LCI

UCI

OS

2252

0

0.999

0.998

1

0.999

2514

1

0.325

0.307

0.344

0.313

2250

1

0.292

0.273

0.312

0.282

775

2

0.191

0.175

0.207

0.179

616

2

0.156

0.141

0.173

0.146

436

3

0.15

0.135

0.165

0.136

319

3

0.113

0.1

0.128

0.103

299

4

0.129

0.116

0.144

0.114

224

4

0.096

0.084

0.11

0.085

219

5

0.115

0.101

0.13

0.099

186

5

0.083

0.071

0.096

0.071

140

6

0.106

0.093

0.121

0.089

155

6

0.074

0.063

0.087

0.062

89

7

0.101

0.087

0.117

0.082

135

7

0.066

0.055

0.079

0.054

40

8

0.089

0.073

0.108

0.07

117

8

0.056

0.046

0.068

0.044

17

9

0.065

0.045

0.093

0.049

89

9

0.051

0.042

0.063

0.039

61

10

0.047

0.037

0.059

0.035

41

11

0.04

0.031

0.052

0.029

21

12

0.037

0.028

0.05

0.026

12

13

0.036

0.025

0.05

0.024

4

14

0.027

0.014

0.053

0.018

**Women**

**1995-1999**

**2000-2004**

Risk

T

RS

LCI

UCI

OS

Risk

T

RS

LCI

UCI

OS

240

1

0.275

0.222

0.339

0.267

312

1

0.339

0.289

0.397

0.33

61

2

0.174

0.131

0.232

0.166

100

2

0.227

0.184

0.281

0.218

38

3

0.135

0.097

0.189

0.127

65

3

0.174

0.135

0.224

0.164

29

4

0.123

0.086

0.177

0.114

42

4

0.147

0.11

0.196

0.137

26

5

0.103

0.071

0.157

0.096

31

5

0.14

0.104

0.189

0.128

22

6

0.093

0.061

0.142

0.083

20

6

0.128

0.092

0.179

0.115

19

8

0.085

0.054

0.033

0.074

9

7

0.073

0.037

0.143

0.064

15

9

0.08

0.05

0.128

0.069

1

9

0.086

0.044

0.169

0.064

12

10

0.08

0.05

0.129

0.069

7

11

0.082

0.051

0.031

0.069

3

12

0.083

0.052

0.134

0.069

1

13

0.088

0.055

0.141

0.069

The Descriptive application has been used after preparing an age-groups file (Additional file

EAPC of tobacco-related cancer rates during the period 1995–2004 in both sexes is depicted in Figure

Estimated annual percent change incidence cancer in Girona and Tarragona. 1995–2004

**Estimated annual percent change incidence cancer in Girona and Tarragona.** 1995–2004.

Although some cancers did not show a significant time trend during the whole time period, there might be a change in the risk of developing the cancer at the individual level. Therefore, we could prepare two datasets, one which includes 1995–1999 aggregated data (reference period) and another with 2000–2004 aggregated data (target period) (Additional file

Standardized Incidence Ratio incidence cancer in Girona and Tarragona

**Standardized Incidence Ratio incidence cancer in Girona and Tarragona.** 2000–2004 vs. 1995–1999.

To assess the changes in the number of incident lung cancer cases among time periods making use of

Partition of the net change between risk, structure and size in lung cancer incidence from Tarragona and Girona

**Partition of the net change between risk, structure and size in lung cancer incidence from Tarragona and Girona.** 2000–2004 vs. 1995–1999: (I) Difference in the crude rate (number of lung cancer cases per 100,000 person-years) between 1995–1999 respect to 2000–2004 among men (**A.1**) and among women (**A.2**); (II) Difference in the absolute number of lung cancer cases between 1995–1999 respect to 2000–2004 among men (**B.1**) and among women (**B.2**). (Note that differences in the number of cases are partitioned into those due to risk and those due to population structure and population size.

We also assessed the evolution of the 5-year RS rates of lung cancer between the time period 2000–2004 and the time period 1995–1999 using

Finally, we predicted the burden of lung cancer for the year 2014 in Catalonia through the

Discussion

There are many “stand-alone” web pages which are designed to perform only a single statistical test or calculation.

Up to date of publication

Some limitations should be noted in these applications in their current versions.

Conclusions

Unlike other similar applications,

Availability and requirements

**Project name:** REGSTATTOOLS.

**Project home page:** Access to the set of applications

**Operating system:** Platform independent for accessing the public web server.

**Programming language:** R and PHP.

**Requirements:** R statistical software available at http://www.r-project.org/ website is required for the functions implemented.

**License:** None.

**Any restriction to use by non-academics:** None.

Abbreviations

ASR: Age standardized rate; CR: Crude rate; TR: Truncated rate; cumR: Cumulative rate; EAPC: Estimated annual percent of change; OS: Observed survival; RS: Relative survival; SIMR: Standardized incidence or mortality ratio; AF: Additional file; 95%CI: 95% confidence interval; RiskDif: A web tool for the analysis of the difference due to risk and demographic factors for incidence or mortality data; SART: Statistical analysis of rates and trends; WAERS: Web-assisted estimation of relative survival.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

LE, RC, JG, LP and JR initially conceived the tool and were involved in its design. LE, RC and LP did the statistical analysis and the implementation in R JG implemented the web interface. JG and AI contributed to the collection, processing and interpretation of the data. All authors have been involved in drafting the manuscript and revising it critically. All authors approved the final version.

Acknowledgements

We would like to thank the two reviewers as well as the editor for their careful reviews and constructive comments that have improved the manuscript significantly. We would like to thank Dr. Genevieve Buckland for the very constructive and helpful comments. This work was supported by a UICC International Cancer Technology Transfer Fellowship and with Federal funds from the National Cancer Institute, National Institutes of Health under Contract NO2-CO-41101.

Pre-publication history

The pre-publication history for this paper can be accessed here: