Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Options for visualising larger datasets #65

Open
njtierney opened this issue Feb 11, 2018 · 1 comment
Open

Options for visualising larger datasets #65

njtierney opened this issue Feb 11, 2018 · 1 comment
Milestone

Comments

@njtierney
Copy link
Collaborator

downsample the data

If you just use the first 1,000 rows / 100 columns, then you get a warning stating that this is not all of the data.

Animation / scrolling

"Roll" the data through visdat as if it were an animation, like scrolling down the data (suggested by @krlmlr)

Use transparency to indicate missingness

@krlmlr 's idea for making each column have an alpha level that corresponds to the missingness - so high transparency means lots of missing data.

@njtierney
Copy link
Collaborator Author

could also explore this option from

@JonnoB

https://github.com/JonnoB/BigHeat/blob/master/CompressDF.R

To visualise the missingness, I made a heat map ordered by hierarchical similarity. However, when I visualised it I had two problems. The first was that the size of the dataframe was so large that ggplot started choking up. The second problem was that when visualised the heatmap could be difficult to understand as individual points were too small to see. I solved both problems by aggregating areas of pixels of size xy to an average value between 0 and 1 and plotting the resulting resized matrix on a scale of red-white-blue. Because the data had already been hierarchically clustered there is not a huge amount of disturbance. This change allows very large datasets to be visualised and that they are still interpretable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant