-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
92b328e
commit ffd8681
Showing
2 changed files
with
9 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,14 @@ | ||
# Description of sampled test collection | ||
The text collection in this repository is a sample of the Athome4 collection, | ||
which was used in the TREC 2016 Total Recall Track [1]. The original dataset | ||
contains 290,000 Jeb Bush emails and 34 topics. | ||
|
||
We provided 9 topics (`athome4.topics.sample`), some sampled documents | ||
(`athome4_sample.tgz`), and some relevance judgments (`athome4.qrel.sample`) | ||
for this sampled test collection. | ||
We provided 9 topics (`athome4.topics.sample`), 50000 sampled documents | ||
(`athome4_sample.tgz`), and sampled relevance judgments (`athome4.qrel.sample`) for this sampled test collection. | ||
|
||
# Extract paragraphs for full documents | ||
```bash | ||
python3 process.py athome4_sample.tgz | ||
``` | ||
|
||
[1] Grossman, Maura R., Gordon V. Cormack, and Adam Roegiest. "TREC 2016 Total Recall Track Overview." TREC. 2016. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters