#HSLIDE
- Supervised Machine Learning needs annotated data
- Existing training data are
- often outdated
- in military context
- too universal or too specific
- Need to generate own training data
- Existing solutions are time consuming and exhausting
- ... and as such expensive
#HSLIDE
- Inconvenient available annotation interfaces
- Mostly with linguistic focus
- Only whole document views
- Distribution to annotators done manually
#VSLIDE
#VSLIDE
#VSLIDE
#HSLIDE
- To solve these problems, we restructured the way annotations are done fundamentally.
- We designed an iterative workflow to automate as much as we can and to save annotator's attention.
- And created the OpenSource project Dalphi.
#HSLIDE
- Web application, runs everywhere
- Helps building and maintaining annotated data
- Key features:
- Iterating active learning supported workflow framework
- Human readable presentation
- Server side to propose useful annotations
- Parallel distribution to annotators
- Problem-agnostic document handling
#HSLIDE
- Service
- Raw data
- Annotation document
- Statistic
- Interface
- Project
#VSLIDE
- any system capable of communicating over HTTP
- maintaining problem specific jobs
- three types:
- Iterate
- Merge
- Machine Learning
#VSLIDE
- data that needs to be annotated
#VSLIDE
JSON
{
"foo": "bar",
"foobar": 1.23
}
HTML
<h1>Impressum</h1>
<p>3antworten UG (haftungsbeschränkt)<br>Karl-Kunger Straße 64<br>12435 Berlin</p>
#VSLIDE
- a subset of raw data
- document which is renderable and annotable
#VSLIDE
- key-value pair
- mostly chronological numeric values
#VSLIDE
- problem specific user interface
- renders an annotation document as a subset of raw data
#VSLIDE
#HSLIDE
#VSLIDE
#VSLIDE
#VSLIDE
#VSLIDE
#VSLIDE
#VSLIDE
#VSLIDE
#VSLIDE
#VSLIDE
#HSLIDE
#VSLIDE
#HSLIDE
#VSLIDE
#VSLIDE
#HSLIDE
#VSLIDE