-
Notifications
You must be signed in to change notification settings - Fork 0
Roadmap and rationale
...Some "what-and-why" items, explaining project's moivation... Or jump to roadmap.
-
Transfer standard Datasets to a PostgreSQL database, transforming it into fast, reliable and compact data-representations (CSV lines as JSONb arrays).
-
Offer "standard SQL VIEWs" (standard table names and column names) for the dataset SQL-representation.
PS: so also offer a simple universal data-transformation language (SQL) to describe, in a standard and reproductible way, the data provenance of my datasets (and perhaps any standard dataset). -
Do class modeling of all datasets. (illustred below from sql-unifier/src/Appendix.).
-
Build (easy to) new datasets, as a mixed-datasets, from standard ones, knowing or modeling relationships (SQL JOINS, etc.). See illustration bellow.
-
See standard Datasets and Datasets-BR as an "ecossystem of reliable data" that have long-term digital preservation at git repositories, and an enhanced access by standard class modeling, standard API interfaces (as GraphQL with PostGraphQL), or consuming it as standard SQL tables.
-
...
-
To be portable: all datasets in one table.
-
To be easy for me: I like SQL, and there are costs of test and learning to use other (non-standard) tools like CSVkit.
-
To be easy to produce new mix-datasets, in a realiable and standaerd way.
-
To be easy to use the Datasets in PostgreSQL databases, as an independent, refreshable and flexible SQL-SCHEMA.
-
To be easy to offer the Datasets for my users, in standard APIs.
-
...
... next steps? ...
-
Consensus about a "SQL-kernel" framework to do all basic things.
-
Test with all
-
Test with other people and import/expot/analyse tools as CSVkit or Goodtables.
-
Test with all datasets of http://github.com/datasets
-
API: consuming online the datasets by GraphQL's interface of PostGraphQL project... Or others like http://postgrest.com
-
Expand and finesh.