Open
Description
We are using lance to store text corpus, but before training, some lightweight normalizations should be applied to the text, for example, removing sensitive words. currently, we have to store both normalized text and unnormalized text. the storage is doubled.
maybe lance can implement generated column which is similar to https://www.sqlite.org/gencol.html.
the benifits are:
- save storage size
- better IO performance
- no need to update normalized text any more.
Metadata
Assignees
Labels
No labels