TODO

#TODO DO SPECS BEFORE EACH IMPL!!

# INDEXER:
..........

THINK ON NEW DESIGN...

2. Test on trec file (edit it)
2. Memory with writing to disk...
3. Compression?
4. LATER: improve to tokenize in each iteration of file
5. Dynamic indexing..

# TOKENIZATION
..........
add positions to edge/nGramz
0. Define again behaviours , improve it
0. fix standard tokenization .2 and a- etc cases
1. impl nGRAM tok


# SEARCH / SCORING
........
#TODO improve boolean search and boolean modeling.. lousy!
TF_IDF indexing!
2. Run queries and check results
4. Query processor + optimization

# SPELLING CORRECTION
...........
1.

# CONCURRENCY :
-------------
1. impl mapReduce for indexing

# QUERY PROCESSOR:
.......
1. Impl from some library

# MICROSERVICES
.......
1. Find tutorial on this
2. Start...

# THEORY:
.......
1. Elasticsearch theory behind relevance scoring
https://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html


# WRITE ON:
........
1. Inverted index
2. Retrieval models
    - applying boolean search (via stanford book)
Documents data >> modified tokens >> indexer builds the inverted index
3. BooleanSearch
4. Phrase search


#TODO LAST:
- CI/CD internally for running tests/build after each commit
- Benchmark NLTK vs myIMPL
- Do some LAws of Text coding

#TESTS
- add tests documentation