Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Oct 5:8:44.
doi: 10.1186/1472-6947-8-44.

A new scoring system in Cystic Fibrosis: statistical tools for database analysis - a preliminary report

Affiliations

A new scoring system in Cystic Fibrosis: statistical tools for database analysis - a preliminary report

G M Hafen et al. BMC Med Inform Decis Mak. .

Abstract

Background: Cystic fibrosis is the most common fatal genetic disorder in the Caucasian population. Scoring systems for assessment of Cystic fibrosis disease severity have been used for almost 50 years, without being adapted to the milder phenotype of the disease in the 21st century. The aim of this current project is to develop a new scoring system using a database and employing various statistical tools. This study protocol reports the development of the statistical tools in order to create such a scoring system.

Methods: The evaluation is based on the Cystic Fibrosis database from the cohort at the Royal Children's Hospital in Melbourne. Initially, unsupervised clustering of the all data records was performed using a range of clustering algorithms. In particular incremental clustering algorithms were used. The clusters obtained were characterised using rules from decision trees and the results examined by clinicians. In order to obtain a clearer definition of classes expert opinion of each individual's clinical severity was sought. After data preparation including expert-opinion of an individual's clinical severity on a 3 point-scale (mild, moderate and severe disease), two multivariate techniques were used throughout the analysis to establish a method that would have a better success in feature selection and model derivation: 'Canonical Analysis of Principal Coordinates' and 'Linear Discriminant Analysis'. A 3-step procedure was performed with (1) selection of features, (2) extracting 5 severity classes out of a 3 severity class as defined per expert-opinion and (3) establishment of calibration datasets.

Results: (1) Feature selection: CAP has a more effective "modelling" focus than DA.(2) Extraction of 5 severity classes: after variables were identified as important in discriminating contiguous CF severity groups on the 3-point scale as mild/moderate and moderate/severe, Discriminant Function (DF) was used to determine the new groups mild, intermediate moderate, moderate, intermediate severe and severe disease. (3) Generated confusion tables showed a misclassification rate of 19.1% for males and 16.5% for females, with a majority of misallocations into adjacent severity classes particularly for males.

Conclusion: Our preliminary data show that using CAP for detection of selection features and Linear DA to derive the actual model in a CF database might be helpful in developing a scoring system. However, there are several limitations, particularly more data entry points are needed to finalize a score and the statistical tools have further to be refined and validated, with re-running the statistical methods in the larger dataset.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Linear Discriminant Analysis in contiguous 3 severity groups. (O = Mild; Δ = Moderate; + = Severe). DA was used to derive the direction in space (a vector), associated with the difference between the groups. DF 1 for mild vs moderate and DF 2 for moderate vs severe. Note: Circles represent regions where each groups mean can be found. The size of the circle indicates each group's variability.
Figure 2
Figure 2
From 3 to 5 severity groups. The regions for the three new classes (derived from the original two classes, e.g. mild vs moderate) are chosen such that the number of patients in each group is approximately equal.
Figure 3
Figure 3
Profiles of predicted severity groups (Set 1 males). The proportion on the Y-axis indicates percent of subjects having that condition in each of the 5 predicted severity groups.
Figure 4
Figure 4
Profiles of predicted severity groups (Set 2 males). The proportion on the Y-axis indicates percent of subjects having that condition in each of the 5 predicted severity groups.
Figure 5
Figure 5
Profiles of predicted severity groups for quantitative variables (males). Mean value for quantitative variables associated with each of the CF severity group.
Figure 6
Figure 6
Profiles of predicted severity groups (Set 1 females). The proportion on the Y-axis indicates percent of subjects having that condition in each of the 5 predicted severity groups.
Figure 7
Figure 7
Profiles of predicted severity groups (Set 2 females). The proportion on the Y-axis indicates percent of subjects having that condition in each of the 5 predicted severity groups.
Figure 8
Figure 8
Profiles of predicted severity groups for quantitative variables (females). Mean value for quantitative variables associated with each of the CF severity group.

Similar articles

Cited by

References

    1. Davis PB. Cystic fibrosis since 1938. Am J Respir Crit Care Med. 2006;173:475–482. doi: 10.1164/rccm.200505-840OE. - DOI - PubMed
    1. Cystic Fibrosis Mutation Database http://www.genet.sickkids.on.ca/cftr
    1. Shwachman H, Kulczycki LL. Long-term study of one hundred five patients with cystic fibrosis; studies made over a five- to fourteen-year period. AMA J Dis Child. 1958;96:6–15. - PubMed
    1. Conway S, Littlewood J. Cystic Fibrosis clinical scoring systems. New York: John Wiley & Sons Ltd; 1996.
    1. Hafen GM, Ranganathan SC, Robertson CF, Robinson PJ. Clinical scoring systems in cystic fibrosis. Pediatr Pulmonol. 2006;41:602–617. doi: 10.1002/ppul.20376. - DOI - PubMed

Publication types