02-devoxx-uk-2019

Do we know our data, as good as we know our tools?

Slides

Video

See YouTube video on the Devoxx UK channel (~51 mins)

Speakers

Abstract

See Devoxx UK 2019 - talk abstract | cached version of Talk Abstract

For many of us who are developer turning data scientist, we are always concerned about how to build a model, train it, etc... And yes, we want the best accuracy (close to 99%).

But as every seasoned data scientist will always advise us, we need first and foremost to understand our data, ensure it’s clean and prepared before doing any training on it.

During the conference, we will explore multiple problems occurring during data analysis or preparation and for each a technique to solve them (from a list of them). You will go away with a number of resources to explore at your own pace.

We will cover these categories of problems:

dirty data
disparate datasets - needing normalisation
too much information to process
and others…

We will cover some of these techniques:

analysis - detecting misleading data, outliers, specific time series issues
cleaning - deal with missing/ambiguous values, outliers, generating synthetic data, resampling
preparation - using statistical and physics functions, dimensionality reduction, feature selection, resampling

And using different kinds of plots relevant at different stages.

Code

Simple Data generation code

Notebooks

Notebooks used during the talk
Also see towards the bottom of Notebooks

Name		Name	Last commit message	Last commit date
parent directory ..
Do_we_know_our_data,_as_good_as_we_know_our_tools_.pdf		Do_we_know_our_data,_as_good_as_we_know_our_tools_.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

02-devoxx-uk-2019

02-devoxx-uk-2019

README.md

Do we know our data, as good as we know our tools?

Slides

Video

Speakers

Abstract

Code

Notebooks

Files

02-devoxx-uk-2019

Directory actions

More options

Directory actions

More options

Latest commit

History

02-devoxx-uk-2019

Folders and files

parent directory

README.md

Do we know our data, as good as we know our tools?

Slides

Video

Speakers

Abstract

Code

Notebooks