Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save Moustafa-Elgammal/dbdd2d1c2fde553bf65cd83a8aded238 to your computer and use it in GitHub Desktop.
Save Moustafa-Elgammal/dbdd2d1c2fde553bf65cd83a8aded238 to your computer and use it in GitHub Desktop.
Find Logs By Date Range Optimized Query by Binary Search in SQL.

Description

How to perform a binary search in SQL to efficiently find records in a logs table where the created_at date has no type of any indices. This method narrows down the search range by iteratively adjusting the id range, and then performs the final query on the optimized range, while id is sequential auto increment column.

Example Case:

Find the logs which have been created after 2024-04-21.

-- Step 1: Determine the range of `id`s
SELECT MIN(id) INTO min_id FROM logs;
SELECT MAX(id) INTO max_id FROM logs;

-- Initialize variables
SET search_date = '2024-04-21';
SET found = FALSE;

-- Step 2: Perform the binary search
WHILE NOT found LOOP
    SET mid_id = (min_id + max_id) / 2;
    
    -- Check the `created_at` value at `mid_id`
    SELECT created_at INTO mid_date FROM logs WHERE id = mid_id;
    
    IF mid_date >= search_date THEN
        -- If the `created_at` at mid_id is greater than or equal to the search date,
        -- move the upper bound to the midpoint
        SET max_id = mid_id;
    ELSE
        -- Otherwise, move the lower bound to the midpoint
        SET min_id = mid_id;
    END IF;
    
    -- If the range is sufficiently narrow, exit the loop
    IF max_id - min_id <= 1 THEN
        SET found = TRUE;
    END IF;
END LOOP;

-- Step 3: Perform the final query using the found `id` range
SELECT *
FROM logs
WHERE id >= min_id
  AND created_at > '2024-04-21';
  

Explanation

Range Detection:

  • SELECT MIN(id) INTO min_id FROM logs; and SELECT MAX(id) INTO max_id FROM logs; determine the minimum and maximum id values in the logs table.

Binary Search Loop:

  • A loop is used to perform binary search to narrow down the id range.
  • mid_id is calculated as the midpoint of the current range.
  • The created_at value at mid_id is checked.
  • If the created_at at mid_id is greater than or equal to the target date (search_date), the upper bound (max_id) is moved to mid_id.
  • Otherwise, the lower bound (min_id) is moved to mid_id.
  • The loop continues until the range is sufficiently narrow (max_id - min_id <= 1).

Final Query Execution:

  • Once the binary search loop narrows down the id range, a final query is performed to select the desired records with created_at greater than '2024-04-21'.

This method is useful for optimizing queries on large datasets where a full table scan is impractical, especially when there is no index on the created_at column.

Generated @ poe.com powered by GPT-4o ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment