Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose dateutil.rrule functionality #1

Closed
jcmkk3 opened this issue Sep 29, 2020 · 15 comments
Closed

Expose dateutil.rrule functionality #1

jcmkk3 opened this issue Sep 29, 2020 · 15 comments
Labels
enhancement New feature or request

Comments

@jcmkk3
Copy link

jcmkk3 commented Sep 29, 2020

This is a really great plugin and definitely simplifies preparing dates for SQlite!

I'm not sure if this would fit into this plugin (I realize that it is pretty aligned with the python library), but something that could potentially be really helpful would be an easy way to generate a date table. This would make it really easy to aggregated by different date parts to see a zoomed out picture, analyze seasonality, account for days where zero events happened, or calculate days between timestamps while excluding weekends or holidays.

Power BI, for instance, provides a function to generate all days that have been observed in the tables of the data model.

Emile posted details about how her team generated a date table when she was at GitLab. You may not be familiar with dbt so the syntax might look a bit strange to you (though also familiar since it's jinja2). I found where the table is currently documented to see what the code actually looks like (check "Compiled" under "Code"). It takes a while for that page to load for some reason.

That technique for generating a sequence is used because it works on pretty much all databases, but many databases have a function to do it. It looks like there's one for SQlite

@simonw
Copy link
Owner

simonw commented Sep 29, 2020

That's really interesting. I've been thinking I'd like to expose the dateutil module`s rrule (recurrence rule) stuff as functions, so maybe that could serve a similar purpose here?

https://dateutil.readthedocs.io/en/stable/rrule.html

SQLite does have a mechanism for creating virtual tables but it's not exposed in the Python sqlite3 module. What IS available is SQLite's JSON module, which can turn a JSON column into a table that can be joined against other results.

So I wonder... if I had a function that returned a JSON list of every date in a given year, would that address these kind of use-cases?

@simonw
Copy link
Owner

simonw commented Sep 29, 2020

Let's try that out, with a JSON array representing just a single month:

In [2]: days = ['2020-10-{:02d}'.format(i) for i in range(1, 32)]

In [3]: import json

In [4]: print(json.dumps(days))
["2020-10-01", "2020-10-02", "2020-10-03", "2020-10-04", "2020-10-05", "2020-10-06", "2020-10-07", "2020-10-08", "2020-10-09", "2020-10-10", "2020-10-11", "2020-10-12", "2020-10-13", "2020-10-14", "2020-10-15", "2020-10-16", "2020-10-17", "2020-10-18", "2020-10-19", "2020-10-20", "2020-10-21", "2020-10-22", "2020-10-23", "2020-10-24", "2020-10-25", "2020-10-26", "2020-10-27", "2020-10-28", "2020-10-29", "2020-10-30", "2020-10-31"]
select * from json_each('["2020-10-01", "2020-10-02", "2020-10-03", "2020-10-04", "2020-10-05", "2020-10-06", "2020-10-07", "2020-10-08", "2020-10-09", "2020-10-10", "2020-10-11", "2020-10-12", "2020-10-13", "2020-10-14", "2020-10-15", "2020-10-16", "2020-10-17", "2020-10-18", "2020-10-19", "2020-10-20", "2020-10-21", "2020-10-22", "2020-10-23", "2020-10-24", "2020-10-25", "2020-10-26", "2020-10-27", "2020-10-28", "2020-10-29", "2020-10-30", "2020-10-31"]')

https://latest.datasette.io/fixtures?sql=select+*+from+json_each%28%27%5B%222020-10-01%22%2C+%222020-10-02%22%2C+%222020-10-03%22%2C+%222020-10-04%22%2C+%222020-10-05%22%2C+%222020-10-06%22%2C+%222020-10-07%22%2C+%222020-10-08%22%2C+%222020-10-09%22%2C+%222020-10-10%22%2C+%222020-10-11%22%2C+%222020-10-12%22%2C+%222020-10-13%22%2C+%222020-10-14%22%2C+%222020-10-15%22%2C+%222020-10-16%22%2C+%222020-10-17%22%2C+%222020-10-18%22%2C+%222020-10-19%22%2C+%222020-10-20%22%2C+%222020-10-21%22%2C+%222020-10-22%22%2C+%222020-10-23%22%2C+%222020-10-24%22%2C+%222020-10-25%22%2C+%222020-10-26%22%2C+%222020-10-27%22%2C+%222020-10-28%22%2C+%222020-10-29%22%2C+%222020-10-30%22%2C+%222020-10-31%22%5D%27%29

key value type atom id parent fullkey path
0 2020-10-01 text 2020-10-01 1   $[0] $
1 2020-10-02 text 2020-10-02 2   $[1] $
2 2020-10-03 text 2020-10-03 3   $[2] $
3 2020-10-04 text 2020-10-04 4   $[3] $
4 2020-10-05 text 2020-10-05 5   $[4] $
5 2020-10-06 text 2020-10-06 6   $[5] $
6 2020-10-07 text 2020-10-07 7   $[6] $
7 2020-10-08 text 2020-10-08 8   $[7] $
8 2020-10-09 text 2020-10-09 9   $[8] $
9 2020-10-10 text 2020-10-10 10   $[9] $
10 2020-10-11 text 2020-10-11 11   $[10] $
11 2020-10-12 text 2020-10-12 12   $[11] $
12 2020-10-13 text 2020-10-13 13   $[12] $
13 2020-10-14 text 2020-10-14 14   $[13] $
14 2020-10-15 text 2020-10-15 15   $[14] $
15 2020-10-16 text 2020-10-16 16   $[15] $
16 2020-10-17 text 2020-10-17 17   $[16] $
17 2020-10-18 text 2020-10-18 18   $[17] $
18 2020-10-19 text 2020-10-19 19   $[18] $
19 2020-10-20 text 2020-10-20 20   $[19] $
20 2020-10-21 text 2020-10-21 21   $[20] $
21 2020-10-22 text 2020-10-22 22   $[21] $
22 2020-10-23 text 2020-10-23 23   $[22] $
23 2020-10-24 text 2020-10-24 24   $[23] $
24 2020-10-25 text 2020-10-25 25   $[24] $
25 2020-10-26 text 2020-10-26 26   $[25] $
26 2020-10-27 text 2020-10-27 27   $[26] $
27 2020-10-28 text 2020-10-28 28   $[27] $
28 2020-10-29 text 2020-10-29 29   $[28] $
29 2020-10-30 text 2020-10-30 30   $[29] $
30 2020-10-31 text 2020-10-31 31   $[30] $

@simonw
Copy link
Owner

simonw commented Sep 29, 2020

So it looks to me like custom SQL functions that return JSON arrays of date strings could be used in joins!

@simonw simonw changed the title Possible Feature: Date Table Expose dateutil.rrule functionality Sep 29, 2020
@simonw
Copy link
Owner

simonw commented Sep 29, 2020

I think the easiest way to do this would be to expose the rrulestr() function https://dateutil.readthedocs.io/en/stable/rrule.html#rrulestr-examples

I thought about exposing custom SQL functions for each of the different types of recur rules -rrule_daily(...), rrule_monthly(...) etc - but SQLite custom functions need a fixed number of arguments and don't take keyword parameters. This makes it difficult to come up with a clean design for a function where the Python version looks like this: rrule(freq=MONTHLY, count=4, dtstart=start_date)

The nice thing about rrulestr() is that it already provides a standardized mini-language for describing recurrence rules. It would look something like this:

select dateutil_rrulestr("FREQ=DAILY;INTERVAL=10;COUNT=5");
-- returns ["2020-09-28", "2020-10-08", "2020-10-18", "2020-10-28", "2020-11-07"]

Would need to limit the number of records that can come back from this - maybe to 100 or 1000 - otherwise someone could enter a malicious string that exhausts all memory.

@simonw
Copy link
Owner

simonw commented Sep 29, 2020

I could add some additional SQL functions with nicer syntax for really common cases - like "give me every day between date X and date Y":

select dateutil_dates_between('2020-01-01', '2020-03-01');

This would be inclusive (so you'd get everything from 2020-01-01 to 2020-03-01). Maybe have an optional 3rd argument to specify that you don't want it to be inclusive?

select dateutil_dates_between('2020-01-01', '2020-03-01', 0);

This would return 2020-01-01 up to 2020-02-28.

@simonw
Copy link
Owner

simonw commented Sep 29, 2020

dateutil_rrulestr() could return full datetimes, dateutil_rrulestr_date() could return just dates.

@simonw
Copy link
Owner

simonw commented Sep 29, 2020

The documentation will have to show people how to use these with json_each(). Here's a neater example query (it's easier to imagine how you might join this against other tables):

with dates as (
  select
    value
  from
    json_each('["2020-10-01", "2020-10-02", "2020-10-03"]')
)
select
  *
from
  dates;

https://latest.datasette.io/fixtures?sql=with+dates+as+%28%0D%0A++select%0D%0A++++value%0D%0A++from%0D%0A++++json_each%28%27%5B%222020-10-01%22%2C+%222020-10-02%22%2C+%222020-10-03%22%5D%27%29%0D%0A%29%0D%0Aselect%0D%0A++*%0D%0Afrom%0D%0A++dates%3B

With the proposed dateutil_rrulestr() function that would look like this:

with dates as (
  select
    value
  from
    json_each(dateutil_rrulestr("FREQ=DAILY;INTERVAL=10;COUNT=5"))
)
select
  *
from
  dates;

@simonw
Copy link
Owner

simonw commented Sep 29, 2020

Rather than cutting off at 1000 items I think the functions should raise an exception, because otherwise people may write queries that were truncated at 1000 without realizing it.

@simonw simonw added the enhancement New feature or request label Sep 29, 2020
@simonw
Copy link
Owner

simonw commented Sep 29, 2020

A limit of 10,000 values returned in a single JSON list might work fine, and would allow people with even pretty extreme use-cases some head room.

@simonw
Copy link
Owner

simonw commented Sep 29, 2020

This still needs documentation. I also haven't implemented dateutil_dates_between() yet.

@simonw
Copy link
Owner

simonw commented Sep 29, 2020

I'm not crazy about this:

select dateutil_rrule('DTSTART:20200101
FREQ=DAILY;INTERVAL=10;COUNT=5')

The DTSTART needs to be on a separate line, and uses : instead of = which is hard to remember.

Instead, how about an optional second argument which you can pass a datetime to - or the result of dateutil_parse?

select dateutil_rrule('FREQ=DAILY;INTERVAL=10;COUNT=5', dateutil_parse('10 october 2020'))

@simonw
Copy link
Owner

simonw commented Sep 29, 2020

Actually I'll have that second argument always parsed by parser.parse(), so the above example can look like this:

select dateutil_rrule('FREQ=DAILY;INTERVAL=10;COUNT=5', '10 october 2020')

@simonw simonw closed this as completed in 1b364ec Sep 29, 2020
simonw added a commit that referenced this issue Sep 29, 2020
simonw added a commit to simonw/latest-datasette-with-all-plugins that referenced this issue Sep 29, 2020
@simonw
Copy link
Owner

simonw commented Sep 29, 2020

@simonw
Copy link
Owner

simonw commented Sep 30, 2020

Added documentation on how to use this with json_each() and a LEFT JOIN here: https://github.com/simonw/datasette-dateutil/blob/0.2.1/README.md#joining-data-using-json_each

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants