-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add retriever guide, address minor onboarding feedbacks & enhancement #1326
Conversation
AyushExel
commented
May 27, 2024
•
edited
Loading
edited
- Tried to address some onboarding feedbacks listed in OSS Examples/ benchmarks/ Feature docs epic #1224
- Improve visibility of pydantic integration and embedding API. (Based on onboarding feedback - Many ways of ingesting data, defining schema but not sure what to use in a specific use-case)
- Add a guide that takes users through testing and improving retriever performance using built-in utilities like hybrid-search and reranking
- Add some benchmarks for the above
- Add missing cohere docs
ACTION NEEDED Lance follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. For details on the error please inspect the "PR Title Check" action. |
lancedb jni test issue seems unrelated to this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool, I think this is a great description
docs/src/basic.md
Outdated
@@ -180,6 +180,9 @@ table. | |||
|
|||
!!! info "Under the hood, LanceDB reads in the Apache Arrow data and persists it to disk using the [Lance format](https://www.github.com/lancedb/lance)." | |||
|
|||
!!! info "Automatic vectorization with Embedding API" | |||
When working with embedding models, it is recommended to use LanceDB embedding API to automatically vectorize the data and queries in the background. See the [quickstart example](#using-embedding-api) or the embedding API [guide](./embeddings/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't know if vectorize
is the right word to use here.
I'm used to the definition "use an algorithm that takes advantage of vectorized CPU capabilities (e.g. AVX2)" but I think you mean "convert into a vector". But maybe this is a second definition of "vectorize" that is common in ML descriptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah its pretty common.. IG usage difference is "vectorizing an operation or algo" vs "vectorizing data" .. Example https://neptune.ai/blog/vectorization-techniques-in-nlp-guide .. But I can try to say something like "create vector embeddings of the data" to avoid confusing both the audiences
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay changed it to "create vector representation of the data"
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>