Skip to content

Latest commit

 

History

History
 
 

data

Data

Page dedicated to data exploratory analysis, preparation, cleaning, pre-processing / wrangling, generation, feature engineering and other related topics

The question to ask ourselves: Do we know our data...?


Table of contents


Ethics / altruistic motives

See Ethics / altruistic motives

Datasets and sources of raw data

Raw / unclean datasets

Clean / ready-to-use datasets

Data Exploratory Analysis

Data preparation

Data cleaning

Data preprocessing / Data Wrangling

Misc

Data Generation

See Data Generation

Feature engineering / selection

Statistics

Visualisation

See Visualisation

Common mistakes when training models (data related)

  • Having a lot more training examples of one type of object than the other types
  • Accidentally testing the neural network using images that were in the training set
  • Training the neural network on data that is easier to recognize or more consistent than the real-world data it will be used to classify later on

Cheatsheets

See under Cheatsheets

Course / books

Best practices / rules / an unordered list of high level or low level guidelines

Framework(s) / checklist(s)

Notebooks

Programs and Tools

See Programs and Tools

Databases

Time-series databases

  • Time-scale
  • kdb+ - is a column-based relational time series database (TSDB) with in-memory (IMDB) abilities, developed and marketed by Kx Systems

References

Credits

Big thanks to Jeremie Charlet for his contributions to many of the resources on this page. Not forgetting the others who have also helped support in the building of the above resources.

Contributing

Contributions are very welcome, please share back with the wider community (and get credited for it)!

Please have a look at the CONTRIBUTING guidelines, also have a read about our licensing policy.


Back to main page (table of contents)