Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚀 feat: Add Atlas MongoDB as an option for Vector Store #21

Merged
merged 16 commits into from
May 11, 2024

Conversation

jinzishuai
Copy link
Contributor

@jinzishuai jinzishuai commented Apr 5, 2024

related to danny-avila/LibreChat#2304

image

So far, only two APIs work:

  • /embed
  • /query

They are sufficient to make most important use case of asking a questions based on a file work (see screenshot above)

@jinzishuai
Copy link
Contributor Author

Note on Altas Vector Search Index

I created "default" index manually with the following JSON

{
  "fields": [
    {
      "numDimensions": 1536,
      "path": "embedding",
      "similarity": "cosine",
      "type": "vector"
    },
    {
      "path": "file_id",
      "type": "filter"
    }
  ]
}

@danny-avila
Copy link
Owner

Note on Altas Vector Search Index

I created "default" index manually with the following JSON

{
  "fields": [
    {
      "numDimensions": 1536,
      "path": "embedding",
      "similarity": "cosine",
      "type": "vector"
    },
    {
      "path": "file_id",
      "type": "filter"
    }
  ]
}

does this share collection with the File collection? because maybe we can do that. this way the files and vectors can be deleted at once

@jinzishuai
Copy link
Contributor Author

jinzishuai commented Apr 6, 2024

@danny-avila thanks for the comments.

  • I think sharing the collection with the File collection would totally work if we are exlusively using MongoDB only. However, if the objective is to allow users flexibility to choose different vector databases while all using a MongoDB for the API layer, I think keeping them separate is probably better. In my test above, I am using a separate database with one collection.It is totally okay and easy to bring this collection to the API Mongo database.
  • About your comment in the discussion on how to switch. I agree with you that some kind of environment variable flags to switch among vector databases is sufficient and there is no need to complicate things with live data migration for a switch.

@jinzishuai jinzishuai marked this pull request as ready for review April 24, 2024 03:12
@jinzishuai
Copy link
Contributor Author

@danny-avila I think this PR is ready for a review. I've got the most basic APIs you listed working. I haven't implemented the async methods as you have done for pgvector but I think that can wait for a separate PR once this is merged.

I do have one question that I think it is best done with your decision. How do you want a user to switch between pgvector and atlas-mongo?

Right now, my .env looks like this

# # pgvector
# DB_HOST=xxx
# DB_PORT=5432
# POSTGRES_DB=librechat-pgvector
# POSTGRES_USER=postgres. 
# POSTGRES_PASSWORD= 

# atlas-mongo vector
ATLAS_MONGO_DB_URI=mongodb+srv:/
COLLECTION_NAME=testcollection

Do you want me to introduce a new variable like VECTOR_DB_TYPE=pgvector/atlas-mongo and we can default to pgvector. We'll simply ignore the other type of variables in this case?

Thanks a lot.

@jinzishuai jinzishuai changed the title POC: Use Atlas MongoDB as the Vector Store 🚀 feat: Add Atlas MongoDB as an option for Vector Store Apr 28, 2024
@danny-avila
Copy link
Owner

Thanks, it's on my list to review. Does this has to be done manually for atlas to work here?

Note on Altas Vector Search Index

I created "default" index manually with the following JSON

{
  "fields": [
    {
      "numDimensions": 1536,
      "path": "embedding",
      "similarity": "cosine",
      "type": "vector"
    },
    {
      "path": "file_id",
      "type": "filter"
    }
  ]
}

@jinzishuai
Copy link
Contributor Author

jinzishuai commented May 4, 2024

Thanks, it's on my list to review. Does this has to be done manually for atlas to work here?

Yes, there is likely an API or CLI tool to set it up as well but the creation of vector index needs to be done.

@danny-avila
Copy link
Owner

Thanks, it's on my list to review. Does this has to be done manually for atlas to work here?

Yes, there is likely an API or CLI tool to set it up as well but the creation of vector index needs to be done.

can you do a quick write up in the read me on how to do that exactly? I'm sure I could research this but it would be nice

README.md Show resolved Hide resolved
config.py Outdated
# mode="async",
# )

# new atlas-mongo vector:
vector_store = get_vector_store(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know this is draft but some way to initialize one or the other would be good

main.py Outdated
@@ -57,8 +57,8 @@
@asynccontextmanager
async def lifespan(app: FastAPI):
# Startup logic goes here
await PSQLDatabase.get_pool() # Initialize the pool
await ensure_custom_id_index_on_embedding()
# await PSQLDatabase.get_pool() # Initialize the pool
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's draft but here would be good to also initialize differently, could probably import functions from a different module

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can start working on a async implementation of the MongoDB code and that might be a good time to do these things

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can start working on a async implementation of the MongoDB code and that might be a good time to do these things

awesome thanks for doing that!

@danny-avila danny-avila merged commit ad107dc into danny-avila:main May 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants