Skip to content

Cache: Query result cache #4521

Closed
Closed
@BohuTANG

Description

Summary

Background

For a read, the main flow is:

  1. Get the source plan: file name(partition file)
  2. Read the files by file name which on object storage(like AWS S3).

With the query result cache, we can do:

  • Step1. Parse the query, and calculate the fingerprint: query_id
  • Step2. Get the source plan(read_plan), and calculate the fingerprint: source_plan_id
  • Step3. Check the cache
    • 3.1 If the cache is exists: /query_id/source_plan_id/result, get and return the result.
    • 3.2 If the cache is not exists, put the result to the cache

Where the cache stored

Storage in the S3, path is /<bucket>/<tenant>/result/cache/, and the user can download it.

How to calculate the fingerprint

  • query_id need based on the AST? select * from t1 where a>1 fingerprint is same select * from t1 where a>1 and 1=1
  • source_plan_id based on the partition file name and the file offset

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions