-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚀 feat: Add Atlas MongoDB as an option for Vector Store #21
Conversation
…nstead of pgvector
…ctor code; working pass the /query langchain portion
…t iterable by jsonable_encoder
Note on Altas Vector Search IndexI created "default" index manually with the following JSON {
"fields": [
{
"numDimensions": 1536,
"path": "embedding",
"similarity": "cosine",
"type": "vector"
},
{
"path": "file_id",
"type": "filter"
}
]
} |
does this share collection with the File collection? because maybe we can do that. this way the files and vectors can be deleted at once |
@danny-avila thanks for the comments.
|
@danny-avila I think this PR is ready for a review. I've got the most basic APIs you listed working. I haven't implemented the async methods as you have done for pgvector but I think that can wait for a separate PR once this is merged. I do have one question that I think it is best done with your decision. How do you want a user to switch between pgvector and atlas-mongo? Right now, my # # pgvector
# DB_HOST=xxx
# DB_PORT=5432
# POSTGRES_DB=librechat-pgvector
# POSTGRES_USER=postgres.
# POSTGRES_PASSWORD=
# atlas-mongo vector
ATLAS_MONGO_DB_URI=mongodb+srv:/
COLLECTION_NAME=testcollection Do you want me to introduce a new variable like Thanks a lot. |
Thanks, it's on my list to review. Does this has to be done manually for atlas to work here?
|
Yes, there is likely an API or CLI tool to set it up as well but the creation of vector index needs to be done. |
can you do a quick write up in the read me on how to do that exactly? I'm sure I could research this but it would be nice |
config.py
Outdated
# mode="async", | ||
# ) | ||
|
||
# new atlas-mongo vector: | ||
vector_store = get_vector_store( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i know this is draft but some way to initialize one or the other would be good
main.py
Outdated
@@ -57,8 +57,8 @@ | |||
@asynccontextmanager | |||
async def lifespan(app: FastAPI): | |||
# Startup logic goes here | |||
await PSQLDatabase.get_pool() # Initialize the pool | |||
await ensure_custom_id_index_on_embedding() | |||
# await PSQLDatabase.get_pool() # Initialize the pool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's draft but here would be good to also initialize differently, could probably import functions from a different module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can start working on a async implementation of the MongoDB code and that might be a good time to do these things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can start working on a async implementation of the MongoDB code and that might be a good time to do these things
awesome thanks for doing that!
related to danny-avila/LibreChat#2304
So far, only two APIs work:
/embed
/query
They are sufficient to make most important use case of asking a questions based on a file work (see screenshot above)