Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add retriever guide, address minor onboarding feedbacks & enhancement #1326

Merged
merged 15 commits into from
Jun 8, 2024

Conversation

AyushExel
Copy link
Contributor

@AyushExel AyushExel commented May 27, 2024

  • Tried to address some onboarding feedbacks listed in OSS Examples/ benchmarks/ Feature docs epic #1224
  • Improve visibility of pydantic integration and embedding API. (Based on onboarding feedback - Many ways of ingesting data, defining schema but not sure what to use in a specific use-case)
  • Add a guide that takes users through testing and improving retriever performance using built-in utilities like hybrid-search and reranking
  • Add some benchmarks for the above
  • Add missing cohere docs

@github-actions github-actions bot added the documentation Improvements or additions to documentation label May 27, 2024
Copy link

ACTION NEEDED

Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@AyushExel AyushExel changed the title docs: Add retriever guide, address minor onboarding feedbacks & enhancement (May audit) docs: Add retriever guide, address minor onboarding feedbacks & enhancement May 27, 2024
@AyushExel AyushExel changed the title docs: Add retriever guide, address minor onboarding feedbacks & enhancement docs: add retriever guide, address minor onboarding feedbacks & enhancement May 27, 2024
@AyushExel AyushExel changed the title docs: add retriever guide, address minor onboarding feedbacks & enhancement docs: add retriever guide, address minor onboarding feedbacks & enhancement May 29, 2024
@github-actions github-actions bot added the Python Python SDK label May 29, 2024
AyushExel added a commit that referenced this pull request May 30, 2024
@AyushExel AyushExel requested review from westonpace, changhiskhan and wjones127 and removed request for westonpace and changhiskhan June 5, 2024 12:34
@AyushExel
Copy link
Contributor Author

lancedb jni test issue seems unrelated to this PR

@AyushExel AyushExel assigned tanaymeh and unassigned tanaymeh Jun 5, 2024
@AyushExel AyushExel requested a review from tanaymeh June 5, 2024 15:00
Copy link
Contributor

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool, I think this is a great description

docs/src/basic.md Outdated Show resolved Hide resolved
@@ -180,6 +180,9 @@ table.

!!! info "Under the hood, LanceDB reads in the Apache Arrow data and persists it to disk using the [Lance format](https://www.github.com/lancedb/lance)."

!!! info "Automatic vectorization with Embedding API"
When working with embedding models, it is recommended to use LanceDB embedding API to automatically vectorize the data and queries in the background. See the [quickstart example](#using-embedding-api) or the embedding API [guide](./embeddings/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't know if vectorize is the right word to use here.

I'm used to the definition "use an algorithm that takes advantage of vectorized CPU capabilities (e.g. AVX2)" but I think you mean "convert into a vector". But maybe this is a second definition of "vectorize" that is common in ML descriptions?

Copy link
Contributor Author

@AyushExel AyushExel Jun 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah its pretty common.. IG usage difference is "vectorizing an operation or algo" vs "vectorizing data" .. Example https://neptune.ai/blog/vectorization-techniques-in-nlp-guide .. But I can try to say something like "create vector embeddings of the data" to avoid confusing both the audiences

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay changed it to "create vector representation of the data"

docs/src/basic.md Outdated Show resolved Hide resolved
docs/src/basic.md Outdated Show resolved Hide resolved
docs/src/basic.md Outdated Show resolved Hide resolved
docs/src/basic.md Outdated Show resolved Hide resolved
docs/src/guides/tuning_retrievers/1_query_types.md Outdated Show resolved Hide resolved
docs/src/guides/tuning_retrievers/1_query_types.md Outdated Show resolved Hide resolved
docs/src/guides/tuning_retrievers/1_query_types.md Outdated Show resolved Hide resolved
docs/src/guides/tuning_retrievers/2_reranking.md Outdated Show resolved Hide resolved
AyushExel and others added 2 commits June 8, 2024 05:59
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
AyushExel and others added 7 commits June 8, 2024 06:02
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
@AyushExel AyushExel merged commit 76fc16c into main Jun 8, 2024
14 of 15 checks passed
@AyushExel AyushExel deleted the docs_may branch June 8, 2024 00:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation Python Python SDK
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants