Skip to content

[collection] consider SQLite for keeping analysis results #20

Open
@fabiocarrara

Description

PROs:

  • its easier/faster to check&skip already existing results w.r.t. walking the filesystem
  • no additional deps (built-in in python)
  • still a multi-language self-contained file-based storage
  • we can implement several logic on SQL (object filtering and counting, conditional indexing, etc.)

CONs:

  • SQL-like management, we'll have to cope with migrations
  • probably will use more disk space
  • embeddings as BLOBs. Save and load are pretty fast though, on SSD:

    Bulk insertion of 100k 1024-d vectors took 5.6198 seconds.
    Reading out 100k 1024-d vectors took 2.3810 seconds.

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions