Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jul;69(Pt 7):1215-22.
doi: 10.1107/S0907444913001121. Epub 2013 Jun 15.

Better models by discarding data?

Affiliations

Better models by discarding data?

K Diederichs et al. Acta Crystallogr D Biol Crystallogr. 2013 Jul.

Abstract

In macromolecular X-ray crystallography, typical data sets have substantial multiplicity. This can be used to calculate the consistency of repeated measurements and thereby assess data quality. Recently, the properties of a correlation coefficient, CC1/2, that can be used for this purpose were characterized and it was shown that CC1/2 has superior properties compared with `merging' R values. A derived quantity, CC*, links data and model quality. Using experimental data sets, the behaviour of CC1/2 and the more conventional indicators were compared in two situations of practical importance: merging data sets from different crystals and selectively rejecting weak observations or (merged) unique reflections from a data set. In these situations controlled `paired-refinement' tests show that even though discarding the weaker data leads to improvements in the merging R values, the refined models based on these data are of lower quality. These results show the folly of such data-filtering practices aimed at improving the merging R values. Interestingly, in all of these tests CC1/2 is the one data-quality indicator for which the behaviour accurately reflects which of the alternative data-handling strategies results in the best-quality refined model. Its properties in the presence of systematic error are documented and discussed.

Keywords: R value; correlation coefficient; data quality; model quality; outlier rejection.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Scheme documenting the relationships of correlation coefficients calculated between squared observed and calculated amplitudes. This figure was adapted from Diederichs & Karplus (2013 ▶).
Figure 2
Figure 2
Data statistics for CDO3 (blue), CDO3b (green) and CDO3c (red).
Figure 3
Figure 3
Example demonstrating the possibility of negative CC1/2 when rejecting reflections with negative intensities from a data set. The plots show ∊1 versus2 for simulated data having Gaussian noise and no signal (τ = 0). (a) 1000 unique reflections, each represented by two observations; no rejections. The correlation of ∊1 and ∊2 is near zero. (b) From the 1000 unique reflections, those with negative intensity (∊1 + ∊2 < 0) were rejected. The resulting correlation between ∊1 and ∊2 is about −0.47. (c) From the 1000 unique reflections, those with negative ∊1 or negative ∊2 were rejected, also resulting in positive (merged) intensity. The resulting correlation between ∊1 and ∊2 is near zero.

Similar articles

Cited by

References

    1. Adams, P. D. et al. (2010). Acta Cryst. D66, 213–221. - PubMed
    1. Arndt, U. W., Crowther, R. A. & Mallett, J. F. W. (1968). J. Phys. E Sci. Instrum. 1, 510–516. - PubMed
    1. Diederichs, K. & Karplus, P. A. (1997). Nature Struct. Biol. 4, 269–275. - PubMed
    1. Diederichs, K. & Karplus, P. A. (2013). In Advancing Methods for Biomolecular Crystallography, edited by R. Read, A. G. Urzhumtsev & V. Y. Lunin. New York: Springer-Verlag.
    1. Evans, P. R. (2011). Acta Cryst. D67, 282–292. - PMC - PubMed

Substances