upd readme

castorini · lintool · Aug 1, 2017 · Jul 31, 2017 · Jul 31, 2017 · Jul 31, 2017
commit b9f567d51386f5a7dfa0e9c7e35842ba7c8d1f82
diff --git a/README.md b/README.md
@@ -4,9 +4,17 @@ This repo contains the implementation of extracting high quality training exampl
 + Haotian Zhang, Jinfeng Rao, Jimmy Lin and Mark Smucker. Automatically Extracting High-Quality Negative Examples for Answer Selection in Question Answering. SIGIR 2017.
 
 
-## Negative Example DataSet 
-We provided top k={1,3,5,7} negative examples for each (question,answer) pair in the TrecQA train-all set. The different examples sets locate in `NegExSets/` folder and are named as `splitDocNegTopk.tgz`.
+# Negative Example DataSet 
+* We provided top k={1,3,5,7} negative examples for each (question,answer) pair in the TrecQA train-all set. The different examples sets locate in `NegExSets/` folder and are named as `splitDocNegTopk.tgz`.
 
+* After you uncompress each NegExSet by ```tar zxvf splitDocNegTopk.tgz```, you will see our negative examples for each (question,answer) pair. Each negative example is one sentence which is extracted from the same document containing the answer. Each example is named as:
+
+** `ID of answer` + `relevance` + `ID of doc` + `ID of sentence`
+
+* `ID of answer` is the ID of each answer. For train-all set of Trec-QA, there are 56082 (question,answer) pairs in total. The ID of the answers range from 1 to 56082.  
+* `relevance` is the relevance of each extracted example answer. If the relevance is 1, this example is the answer itself. Otherwise, it is one of the top k negative examples of the answer.
+* `ID of doc` is the ID of the document which contains the answer. All the negative examples sentences come from this document.
+* `ID of sentence` is the ID of the extracted sentence. The range of ID is decided by the number of sentences in the document. It starts from 0. And 0th sentence means it is the first sentence of the document.
 
 ## Prepare TrecQA DataSet 
 Please download the TrecQA Dataset and refer to: https://github.com/castorini/data/tree/master/TrecQA
@@ -68,4 +76,4 @@ $ python selectLowestShingleDist.py --input=shingledist.qaans.list
 4.Select the sentences in the document with the lowest shingle matching scores matching the question/answer. 
 ```
 $ python splitSentence.py shingledist.ans.doc.pair.top1.list splitDoc
-```
+```