Skip to content

Commit

Permalink
Add more description for readme
Browse files Browse the repository at this point in the history
  • Loading branch information
hatianzhang committed Jul 9, 2018
1 parent 92b328e commit ffd8681
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 4 deletions.
11 changes: 8 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,14 @@
# Description of sampled test collection
The text collection in this repository is a sample of the Athome4 collection,
which was used in the TREC 2016 Total Recall Track [1]. The original dataset
contains 290,000 Jeb Bush emails and 34 topics.

We provided 9 topics (`athome4.topics.sample`), some sampled documents
(`athome4_sample.tgz`), and some relevance judgments (`athome4.qrel.sample`)
for this sampled test collection.
We provided 9 topics (`athome4.topics.sample`), 50000 sampled documents
(`athome4_sample.tgz`), and sampled relevance judgments (`athome4.qrel.sample`) for this sampled test collection.

# Extract paragraphs for full documents
```bash
python3 process.py athome4_sample.tgz
```

[1] Grossman, Maura R., Gordon V. Cormack, and Adam Roegiest. "TREC 2016 Total Recall Track Overview." TREC. 2016.
2 changes: 1 addition & 1 deletion process.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/env python3
""" Preprocess the sample dataset
Usage: python process.py ./athome4_test.tgz
Usage: python process.py athome4_sample.tgz
"""

import sys
Expand Down

0 comments on commit ffd8681

Please sign in to comment.