Skip to content

Commit

Permalink
Showing 17 changed files with 12 additions and 20 deletions.
4 changes: 2 additions & 2 deletions docs/experiments-car17.md
Original file line number Diff line number Diff line change
@@ -7,8 +7,8 @@ Typical indexing command:
```
nohup sh target/appassembler/bin/IndexCollection -collection CarCollection \
-generator LuceneDocumentGenerator -threads 40 -input /path/to/car17 -index \
lucene-index.car17.pos+docvectors -storeRawDocs -storePositions -storeDocvectors \
-optimize >& log.car17.pos+docvectors+rawdocs &
lucene-index.car17.pos+docvectors -storePositions -storeDocvectors -storeRawDocs \
>& log.car17.pos+docvectors+rawdocs &
```

The directory `/path/to/Car17` should be the root directory of Car17 collection, i.e., `ls /path/to/Car17` should bring up a list of `.cbor` files.
2 changes: 1 addition & 1 deletion docs/experiments-core17.md
Original file line number Diff line number Diff line change
@@ -8,7 +8,7 @@ Typical indexing command:
nohup sh target/appassembler/bin/IndexCollection -collection \
NewYorkTimesCollection -generator JsoupGenerator -threads 16 -input \
/path/to/core17 -index lucene-index.core17.pos+docvectors -storePositions \
-storeDocvectors -storeRawDocs -optimize >& log.core17.pos+docvectors+rawdocs &
-storeDocvectors -storeRawDocs >& log.core17.pos+docvectors+rawdocs &
```

The directory `/path/to/nyt_corpus/` should be the root directory of TREC Core collection, i.e., `ls /path/to/nyt_corpus/`
2 changes: 1 addition & 1 deletion docs/experiments-disk12.md
Original file line number Diff line number Diff line change
@@ -8,7 +8,7 @@ Typical indexing command:
nohup sh target/appassembler/bin/IndexCollection -collection TrecCollection \
-generator JsoupGenerator -threads 16 -input /path/to/disk12 -index \
lucene-index.disk12.pos+docvectors -storePositions -storeDocvectors \
-storeRawDocs -optimize >& log.disk12.pos+docvectors+rawdocs &
-storeRawDocs >& log.disk12.pos+docvectors+rawdocs &
```

The directory `/path/to/disk12/` should be the root directory of the Disk12 collection, i.e., `ls /path/to/disk12/` should bring up subdirectories like `doe`, `wsj`.
2 changes: 1 addition & 1 deletion docs/experiments-gov2.md
Original file line number Diff line number Diff line change
@@ -8,7 +8,7 @@ Typical indexing command:
nohup sh target/appassembler/bin/IndexCollection -collection TrecwebCollection \
-generator JsoupGenerator -threads 44 -input /path/to/gov2 -index \
lucene-index.gov2.pos+docvectors -storePositions -storeDocvectors -storeRawDocs \
-optimize >& log.gov2.pos+docvectors+rawdocs &
>& log.gov2.pos+docvectors+rawdocs &
```

The directory `/path/to/gov2/` should be the root directory of Gov2 collection, i.e., `ls /path/to/gov2/` should bring up a bunch of subdirectories, `GX000` to `GX272`.
4 changes: 2 additions & 2 deletions docs/experiments-mb11.md
Original file line number Diff line number Diff line change
@@ -13,8 +13,8 @@ Indexing the Tweets2011 collection:
nohup sh target/appassembler/bin/IndexCollection -collection TweetCollection \
-generator TweetGenerator -threads 44 -input /path/to/mb11 -index \
lucene-index.mb11.pos+docvectors -storePositions -storeDocvectors -storeRawDocs \
-optimize -uniqueDocid -tweet.keepUrls -tweet.stemming >& \
log.mb11.pos+docvectors+rawdocs &
-uniqueDocid -tweet.keepUrls -tweet.stemming >& log.mb11.pos+docvectors+rawdocs \
&
```
__NB:__ The process is backgrounded

4 changes: 2 additions & 2 deletions docs/experiments-mb13.md
Original file line number Diff line number Diff line change
@@ -13,8 +13,8 @@ Indexing the Tweets2013 collection:
nohup sh target/appassembler/bin/IndexCollection -collection TweetCollection \
-generator TweetGenerator -threads 44 -input /path/to/mb13 -index \
lucene-index.mb13.pos+docvectors -storePositions -storeDocvectors -storeRawDocs \
-optimize -uniqueDocid -tweet.keepUrls -tweet.stemming >& \
log.mb13.pos+docvectors+rawdocs &
-uniqueDocid -tweet.keepUrls -tweet.stemming >& log.mb13.pos+docvectors+rawdocs \
&
```
__NB:__ The process is backgrounded

2 changes: 1 addition & 1 deletion docs/experiments-robust04.md
Original file line number Diff line number Diff line change
@@ -8,7 +8,7 @@ Typical indexing command:
nohup sh target/appassembler/bin/IndexCollection -collection TrecCollection \
-generator JsoupGenerator -threads 16 -input /path/to/robust04 -index \
lucene-index.robust04.pos+docvectors -storePositions -storeDocvectors \
-storeRawDocs -optimize >& log.robust04.pos+docvectors+rawdocs &
-storeRawDocs >& log.robust04.pos+docvectors+rawdocs &
```

The directory `/path/to/disk45/` should be the root directory of Disk4 and Disk5 collection; inside each there should be subdirectories like `ft`, `fr94`.
2 changes: 1 addition & 1 deletion docs/experiments-robust05.md
Original file line number Diff line number Diff line change
@@ -8,7 +8,7 @@ Typical indexing command:
nohup sh target/appassembler/bin/IndexCollection -collection TrecCollection \
-generator JsoupGenerator -threads 16 -input /path/to/robust05 -index \
lucene-index.robust05.pos+docvectors -storePositions -storeDocvectors \
-storeRawDocs -optimize >& log.robust05.pos+docvectors+rawdocs &
-storeRawDocs >& log.robust05.pos+docvectors+rawdocs &
```

The directory `/path/to/aquaint/` should be the root directory of AQUAINT collection; under subdirectory `disk1/` there should be `NYT/` and under subdirectory `disk2/` there should be `APW/` and `XIE/`.
2 changes: 1 addition & 1 deletion docs/experiments-wt10g.md
Original file line number Diff line number Diff line change
@@ -8,7 +8,7 @@ Typical indexing command:
nohup sh target/appassembler/bin/IndexCollection -collection TrecwebCollection \
-generator JsoupGenerator -threads 16 -input /path/to/wt10g -index \
lucene-index.wt10g.pos+docvectors -storePositions -storeDocvectors -storeRawDocs \
-optimize >& log.wt10g.pos+docvectors+rawdocs &
>& log.wt10g.pos+docvectors+rawdocs &
```

The directory `/path/to/wt10g/` should be the root directory of Wt10g collection, containing a bunch of subdirectories, `WTX001` to `WTX104`.
1 change: 0 additions & 1 deletion src/main/resources/regression/core17.yaml
Original file line number Diff line number Diff line change
@@ -13,7 +13,6 @@ index_options:
- -storePositions
- -storeDocvectors
- -storeRawDocs
- -optimize
topic_reader: Trec
input: /tuna1/collections/newswire/NYTcorpus/
index_path: "/tuna1/indexes/lucene-index.core17.pos+docvectors+rawdocs" # path to the existing index, used in regression test if `--index` option is absent
1 change: 0 additions & 1 deletion src/main/resources/regression/disk12.yaml
Original file line number Diff line number Diff line change
@@ -14,7 +14,6 @@ index_options:
- -storePositions
- -storeDocvectors
- -storeRawDocs
- -optimize
topic_reader: Trec
evals:
- command: eval/trec_eval.9.0/trec_eval
1 change: 0 additions & 1 deletion src/main/resources/regression/gov2.yaml
Original file line number Diff line number Diff line change
@@ -16,7 +16,6 @@ index_options:
- -storePositions
- -storeDocvectors
- -storeRawDocs
- -optimize
topic_reader: Trec
evals:
- command: eval/trec_eval.9.0/trec_eval
1 change: 0 additions & 1 deletion src/main/resources/regression/mb11.yaml
Original file line number Diff line number Diff line change
@@ -16,7 +16,6 @@ index_options:
- -storePositions
- -storeDocvectors
- -storeRawDocs
- -optimize
- -uniqueDocid
- -tweet.keepUrls
- -tweet.stemming
1 change: 0 additions & 1 deletion src/main/resources/regression/mb13.yaml
Original file line number Diff line number Diff line change
@@ -16,7 +16,6 @@ index_options:
- -storePositions
- -storeDocvectors
- -storeRawDocs
- -optimize
- -uniqueDocid
- -tweet.keepUrls
- -tweet.stemming
1 change: 0 additions & 1 deletion src/main/resources/regression/robust04.yaml
Original file line number Diff line number Diff line change
@@ -14,7 +14,6 @@ index_options:
- -storePositions
- -storeDocvectors
- -storeRawDocs
- -optimize
topic_reader: Trec
evals:
- command: eval/trec_eval.9.0/trec_eval
1 change: 0 additions & 1 deletion src/main/resources/regression/robust05.yaml
Original file line number Diff line number Diff line change
@@ -14,7 +14,6 @@ index_options:
- -storePositions
- -storeDocvectors
- -storeRawDocs
- -optimize
topic_reader: Trec
evals:
- command: eval/trec_eval.9.0/trec_eval
1 change: 0 additions & 1 deletion src/main/resources/regression/wt10g.yaml
Original file line number Diff line number Diff line change
@@ -14,7 +14,6 @@ index_options:
- -storePositions
- -storeDocvectors
- -storeRawDocs
- -optimize
topic_reader: Trec
evals:
- command: eval/trec_eval.9.0/trec_eval

0 comments on commit 6b19d37

Please sign in to comment.