Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Dec:64:179-191.
doi: 10.1016/j.jbi.2016.10.005. Epub 2016 Oct 8.

Tumor reference resolution and characteristic extraction in radiology reports for liver cancer stage prediction

Affiliations

Tumor reference resolution and characteristic extraction in radiology reports for liver cancer stage prediction

Wen-Wai Yim et al. J Biomed Inform. 2016 Dec.

Abstract

Background: Anaphoric references occur ubiquitously in clinical narrative text. However, the problem, still very much an open challenge, is typically less aggressively focused on in clinical text domain applications. Furthermore, existing research on reference resolution is often conducted disjointly from real-world motivating tasks.

Objective: In this paper, we present our machine-learning system that automatically performs reference resolution and a rule-based system to extract tumor characteristics, with component-based and end-to-end evaluations. Specifically, our goal was to build an algorithm that takes in tumor templates and outputs tumor characteristic, e.g. tumor number and largest tumor sizes, necessary for identifying patient liver cancer stage phenotypes.

Results: Our reference resolution system reached a modest performance of 0.66 F1 for the averaged MUC, B-cubed, and CEAF scores for coreference resolution and 0.43 F1 for particularization relations. However, even this modest performance was helpful to increase the automatic tumor characteristics annotation substantially over no reference resolution.

Conclusion: Experiments revealed the benefit of reference resolution even for relatively simple tumor characteristics variables such as largest tumor size. However we found that different overall variables had different tolerances to reference resolution upstream errors, highlighting the need to characterize systems by end-to-end evaluations.

Keywords: Cancer stages; Information extraction; Liver cancer; Natural language processing; Radiology report; Reference resolution.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Radiology report excerpt
Figure 2
Figure 2
Example of one reference and its particularizations
Figure 3
Figure 3
Three canonical template annotation examples. The last one is a case in which the template head is measurement entity.
Figure 4
Figure 4
Brat annotation with augmentations
Figure 5
Figure 5
Examples of coreference relations that can be mistaken as particularizations
Figure 6
Figure 6
Tumor characteristics annotation
Figure 7
Figure 7
Logic for >50% of liver invaded
Figure 8
Figure 8
Conjunction ambiguities
Figure 9
Figure 9
Ambiguity in tumor invasion area
Figure 10
Figure 10
Reference resolution set up
Figure 11
Figure 11
Tumor characteristics annotator
Figure 12
Figure 12
Algorithm for >50% liver is invaded
Figure 13
Figure 13
Different parts of the report have anatomical context not necessarily immediately available in the same sentence or not explicitly clear. In the third sentence, “right base” can be inferred to be part of the lungs by the reference to “Lungs bases” in the previous sentence or the mention of “pleural” in the same sentence.
Figure 14
Figure 14
Starting organ concept identifiers
Figure 15
Figure 15
Conjunction normalization process. Step 1: Isolate relevant parts of the dependency tree and connect loose items as necessary. Step 2: Find the “base string” to connect other items to, by using the longest match intersected with the highest dependency node. Step 3: Cycle through the dependency tree and connect with “base string” ignoring conjunction tokens.

Similar articles

Cited by

References

    1. Grishman R, Sundheim B. Message understanding conference-6: A brief history. COLING. 1996;96:466–471.
    1. Doddington GR, Mitchell A, Przybocki MA, Ramshaw LA, Strassel S, Weischedel RM. The automatic content extraction (ace) program-tasks, data, and evaluation. LREC. 2004;2:1.
    1. OntoNotes Release 5.0 - Linguistic Data Consortium. URL https://catalog.ldc.upenn.edu/LDC2013T19.
    1. Araki J, Liu Z, Hovy E, Mitamura T. Detecting Subevent Structure for Event Coreference Resolution. URL http://citeseerx.ist.psu.edu/viewdoc/citations;jsessionid=AC6C5BDE654DDC....
    1. Kim J-D, Nguyen N, Wang Y, Tsujii J, Takagi T, Yonezawa A. The Genia Event and Protein Coreference tasks of the BioNLP Shared Task 2011. BMC Bioinformatics. 2012;13(11):1–12. URL http://dx.doi.org/10.1186/1471-2105-13-S11-S1. - DOI - PMC - PubMed