feat: Add scalar storage support for DynamoDB #531

gauthamchandra · 2023-08-31T17:25:07Z

This change allows GPTCache to use DynamoDB as the underlying scalar
storage for the cache. Addresses #200

The underlying implementation uses 2 tables:

gptcache_questions - which holds all questions and session
information.
gptcache_reports - which holds the reporting information.

Normally, we would do a single table design and rollup
gptcache_reports into the same table as gptcache_questions. However,
this was not done for one key reason: billing.

In the event a lot of analytics data is being created, then table scans
for operations like count() and get_ids() would also involve reading
these reporting rows before filtering them out, resulting in higher
read costs for end users of GPTCache.

Preview of what the tables look like:

`gpt_cache` table

Secondary Indexes:

gsi_items_by_type:

gsi_questions_by_deletion_status:

sre-ci-robot · 2023-08-31T17:25:14Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gauthamchandra
To complete the pull request process, please assign cxie after the PR has been reviewed.
You can assign the PR to them by writing /assign @cxie in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sre-ci-robot · 2023-08-31T17:25:16Z

Welcome @gauthamchandra! It looks like this is your first PR to zilliztech/GPTCache 🎉

Signed-off-by: Gautham Chandra <gautham.chandra@live.com>

tests/requirements.txt

gauthamchandra · 2023-09-02T14:18:32Z

Sorry for the failing tests. Turns out that specifying the region_name for AWS overrides the endpoint_url field so it ended up trying to connect directly to us-east-2 as part of the tests instead of the local DynamoDB container.

My latest changes should hopefully fix the issue. 🙏🏼

This change allows GPTCache to use DynamoDB as the underlying scalar storage for the cache. The underlying implementation uses 2 tables: - `gptcache_questions` - which holds all questions and session information. - `gptcache_reports` - which holds the reporting information. Normally, we would do a single table design and rollup `gptcache_reports` into the same table as `gptcache_questions`. However, this was not done for one key reason: billing. In the event a lot of analytics data is being created, then table scans for operations like `count()` and `get_ids()` would also involve reading these reporting rows before filtering them out, resulting in higher read costs for end users of GPTCache. Signed-off-by: Gautham Chandra <gautham.chandra@live.com>

gauthamchandra · 2023-09-05T02:35:57Z

It seems a totally unrelated test is failing. Might be finicky 🤔

SimFG · 2023-09-05T02:59:50Z

@gauthamchandra thanks your patience, and i will fix it in the next pr

mikeblackk · 2023-10-24T00:19:45Z

Has this been released? can we use dynamodb now as a storage for caching? I can't see in the documentation anywhere?

SimFG · 2023-10-24T02:24:02Z

yes, you can. the usage is like other cache storage, and its params include:

elif name == "dynamo":
            from gptcache.manager.scalar_data.dynamo_storage import DynamoStorage

            return DynamoStorage(
                aws_access_key_id=kwargs.get("aws_access_key_id"),
                aws_secret_access_key=kwargs.get("aws_secret_access_key"),
                aws_region_name=kwargs.get("region_name"),
                aws_profile_name=kwargs.get("aws_profile_name"),
                aws_endpoint_url=kwargs.get("endpoint_url"),

gauthamchandra · 2023-10-31T15:31:49Z

@mikeblackk, just want to clarify that it uses boto3 under the hood so not all params (like aws_endpoint_url, etc.) are required.

As a result, it relies on the standard AWS resolution logic (so if you have ENV vars like AWS_ACCESS_KEY_ID, then it will be automatically picked up without explicitly specifying). The parameters like aws_endpoint_url act as overrides so that you have an easier time developing on your local machine where you might be using LocalStack or have multiple AWS profiles.

Let me know if you run into any issues. I am not an expert on Dynamo by any means so there might be issues which need to be fixed 😄

sre-ci-robot requested review from SimFG and xiaofan-luan August 31, 2023 17:25

sre-ci-robot added the size/XL label Aug 31, 2023

mergify bot added the needs-dco label Aug 31, 2023

chore: fix small typo in the directory

651a411

Signed-off-by: Gautham Chandra <gautham.chandra@live.com>

gauthamchandra force-pushed the dev branch from 18b9a94 to 86f46e7 Compare August 31, 2023 17:26

mergify bot added dco-passed and removed needs-dco labels Aug 31, 2023

gauthamchandra commented Sep 1, 2023

View reviewed changes

tests/requirements.txt Outdated Show resolved Hide resolved

gauthamchandra force-pushed the dev branch 3 times, most recently from 1587ffb to 0ccdb2e Compare September 2, 2023 14:16

gauthamchandra force-pushed the dev branch from 0ccdb2e to 35d8704 Compare September 2, 2023 15:17

SimFG merged commit bca8de9 into zilliztech:dev Sep 5, 2023

gauthamchandra deleted the dev branch October 31, 2023 15:31

gauthamchandra mentioned this pull request Apr 17, 2024

[Feature]: Support DynamoDB #200

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add scalar storage support for DynamoDB #531

feat: Add scalar storage support for DynamoDB #531

gauthamchandra commented Aug 31, 2023

sre-ci-robot commented Aug 31, 2023

sre-ci-robot commented Aug 31, 2023

gauthamchandra commented Sep 2, 2023 •

edited

Loading

gauthamchandra commented Sep 5, 2023

SimFG commented Sep 5, 2023

mikeblackk commented Oct 24, 2023 •

edited

Loading

SimFG commented Oct 24, 2023

gauthamchandra commented Oct 31, 2023

feat: Add scalar storage support for DynamoDB #531

feat: Add scalar storage support for DynamoDB #531

Conversation

gauthamchandra commented Aug 31, 2023

Preview of what the tables look like:

gpt_cache table

Secondary Indexes:

sre-ci-robot commented Aug 31, 2023

sre-ci-robot commented Aug 31, 2023

gauthamchandra commented Sep 2, 2023 • edited Loading

gauthamchandra commented Sep 5, 2023

SimFG commented Sep 5, 2023

mikeblackk commented Oct 24, 2023 • edited Loading

SimFG commented Oct 24, 2023

gauthamchandra commented Oct 31, 2023

`gpt_cache` table

gauthamchandra commented Sep 2, 2023 •

edited

Loading

mikeblackk commented Oct 24, 2023 •

edited

Loading