Data provenance provides a historical record of the data and its origins.
Most of the data in this project (Datasets-BR), or similar projects like Datahub.io,
are simply the copy of some portions of data from an authority that worked to produce them, and offers reliability in their processes.
Some datasets or portions of datasets of this project also have origem in the "filtering" of that raw original datasets,
so can be described as preparation algorithm (a simple recipe), or expressed as a software that can run again to any one reproduce the process.
As a convention over configuration strategy, there are two general approaches to describe the original sources of a dataset:
-
Registering here as
datapackage.json
in the FrictionlessData standard; -
Registering at Wikidata in each item its source as reference, with has quality or with stated in.
To express algorithms of the transformations, please frefer the simplest, functional and "standard" way. The convention is to use the following priority of languages:
-
With SQL statements (with no or some PL/pgSQL functions).
-
With Python or Javascript.
-
With Perl, make, shell and others.