Lesson on OpenRefine for ecology
- This data set is derived from [The Portal Project Long-term desert ecology] (http://portal.weecology.org/) project data. This data file was downloaded and then modified specifically for use with Open Refine.
- Taxon names were put back into the file.
- Globally Unique Identifiers (in the form of UUIDs) were added.
- These modifications were made in order to illustrate some features of Open Refine.
- using clustering algorithms on the taxon names column, shows the power of the algorithms to find discrepancies quickly and make it simple to fix all issues found, very quickly.
- using UUIDs highlights the importance of being able to merge many data sets together without the possibility to garble data.
- Known errors in the taxon names were added, again to show off what Open Refine can do.
- Also, we have done a version where we introduce a duplicate UUID - to highlight the importance of checking for uniqueness.
- For someone already familiar with Open Refine, it would be a very simple matter to substitute a different data set, as desired.