Copyright (c) 2016 by SAS Institute Inc., Cary, NC 27513 USA
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied See the License for the specific language governing permissions and limitations under the License.
Git client: https://git-scm.com/book/en/v2/Getting-Started-Installing-Git
Git lfs client https://git-lfs.github.com/
SAS University Edition: http://www.sas.com/en_us/software/university-edition.html
Fork the enlighten-apply repository from: https://github.com/sassoftware/enlighten-apply
Enter the following statements on the git bash command line:
$ mkdir enlighten-apply
$ cd enlighten-apply
$ git init
$ git remote add origin https://github.com/<your username>/enlighten-apply.git
$ git remote add upstream https://github.com/sassoftware/enlighten-apply.git
$ git pull origin master
$ git lfs install
$ git lfs track "*.jpg"
$ git lfs track "*.png"
This repository contains three SAS programs which allow you to tidy your messy data. Tidying data is a structured process meant to address common analytical data quality problems. These SAS macros do not use all of Hadley Wickham's original terminology, but instead try to represent the Tidy Data process in a format most native to SAS programming.
Detailed information about Tidy Data is presented by Hadley Wickham in the article entitled “Tidy Data” from The Journal of Statistical Software, Vol. 59, Issue 10 (August 2014), https://www.jstatsoft.org/article/view/v059i10.
Specifically, these SAS programs address Sections 3.1, 3.2, and 3.3 in that article.
This example will run in the free SAS® University Edition.
This is data stored in a presentation style format, where multiple measure columns are described by the dimension values in their names.
This is data where a column contains values for multiple dimension variables.
This is data stored in a presentation style format, where multiple measure columns are described by the dimension values in their names, and multiple measure variables are stored in rows.
This example was tested in the following environment:
- Windows 7 Enterprise
- Intel i7-5600U @ 2.60 GHz
- 16 GB RAM
- VMware Workstation 12 Player
- SAS University Edition