[collection] consider SQLite for keeping analysis results #20
Open
Description
PROs:
- its easier/faster to check&skip already existing results w.r.t. walking the filesystem
- no additional deps (built-in in python)
- still a multi-language self-contained file-based storage
- we can implement several logic on SQL (object filtering and counting, conditional indexing, etc.)
CONs:
- SQL-like management, we'll have to cope with migrations
- probably will use more disk space
- embeddings as BLOBs. Save and load are pretty fast though, on SSD:
Bulk insertion of 100k 1024-d vectors took 5.6198 seconds.
Reading out 100k 1024-d vectors took 2.3810 seconds.