Updating documentation

gremau · Sep 22, 2016 · 7cd882a · 7cd882a
1 parent a668b67
commit 7cd882a
Show file tree

Hide file tree

Showing 5 changed files with 225 additions and 83 deletions.
diff --git a/README.md b/README.md
@@ -1,93 +1,22 @@
 # NMEG_FluxProc
 
 [This](https://github.com/gremau/NMEG_FluxProc) is the repository for
-FluxProc code used to process data from the New Mexico Elevation Gradient.
-It is primarily written in and called from MATLAB.
+FluxProc code used to process data from the New Mexico Elevation Gradient, 
+an ecological research project based at the University of New Mexico.
+For project information see [the project website](http://biology.unm.edu/litvak/res_NM_elev.html).
+This code is used to process and quality assure eddy covariance and other
+environmental sensor data.
 
+## Documentation
 
-## Local setup
+Documentation is found in the `doc` directory -- start [here](doc/README.md).
 
-### Data and configuration directories
+## Data releases
 
-The FluxProc code performs operations on data in a designated directory
-("FLUXROOT"). An example FLUXROOT directory with data from an imaginary
-site can be downloaded from the socorro ftp and used to test the FluxProc
-code.
+Many of versions of NMEG flux data have been produced by this code as it has
+changed over the years. Recently we have made the effort to version the data
+using a release system that has associated git tags. Release information is 
+on the [Releases page](doc/Releases.md) of the docs.
 
-Site specific configuration files must also be present in the FLUXROOT
-path, and FluxProc is currently set to look for them in
-"FLUXROOT/FluxProcConfig". Configuration files for NMEG sites, including
-the test site mentioned above can be found [here]
-(https://github.com/gremau/NMEG_FluxProcConfig).
 
-### Paths and environment variables.
 
-An environment variable must be set for FluxProc to find the FLUXROOT
-directory on the local file structure. In your `startup.m` file, add
-these lines:
-
-    setenv('FLUXROOT', '/.../')
-
-where "/.../" is the path to the FLUXROOT directory. This will add the
-needed environment variable each time you start MATLAB.
-
-Once this is done, start MATLAB, add the NMEG_FluxProc to your path:
-
-    addpath('path/to/FluxProc')
-
-Enter the directory:
-
-    cd 'path/to/FluxProc'
-
-The rest of the paths needed for FluxProc can be set using
-
-    fluxproc_setpaths
-
-Now FluxProc code should be initialized and ready to use the data and
-configuration files in the FLUXROOT directory.
-
-
-## Task scripts
-
-Common tasks have scripts that can be run with common configurations, and are easily modified. These scripts can be found in the [scripts](scripts/) directory. Each of these scripts can be set to run for a list of sites and years and to overwrite existing output files or not.
-
-### Create new "fluxall" files
-
-Fluxall files (`{site}_{year}_fluxall.txt`) should contain raw data from all sensors at a site for one year. The [script_make_fluxall.m](/scripts/script_make_fluxall.m) script will make these files, primarily by calling [card_data_processor.m](card_data_processor.m) in various configurations and reading the raw data in 'toa5' and 'ts_data' directories. Though these files should contain all sensor data, in practice there are some sites with dataloggers that have not been configured to be merged into the fluxall file (namely the Valles Caldera sites).
-
-### Create new "qc", "for_gapfilling", and "for_gapfilling_filled" files
-
-There are several files created from the NMEG quality control pipeline, all output to the "processed_flux" directory. These are:
-
-1. "qc" files (`{site}_{years}_fluxall_qc.txt`): Contain all variables that are quality-controlled and then output by the [UNM_RemoveBadData.m](UNM_RemoveBadData.m) script.
-
-2. "for_gapfilling" files (`{site}_flux_all_{year}_for_gap_filling.txt`): Also output by [UNM_RemoveBadData.m](UNM_RemoveBadData.m) script and contain a subset of quality-controlled variables in a format ready to be filled with ancillary met data.
-
-3. "for_gapfilling_filled" files (`{site}_flux_all_{year}_for_gap_filling_filled.txt`): Same as the file above, but gaps in the met variables have been filled with ancillary met data by the [UNM_fill_met_gaps_from_nearby_site.m](UNM_fill_met_gaps_from_nearby_site.m) script.
-
-To make these files, run the [script_make_qc_gf.m](/scripts/script_make_qc_gf.m). This script may also run the REddyProc gapfilling tool by calling on the [R code from the Max Planck institute](https://www.bgc-jena.mpg.de/bgi/index.php/Services/REddyProcWebRPackage), and the output (also in 'processed_flux') can be used to make AmeriFlux files, below, if desired.
-
-### Create new AmeriFlux files
-
-AmeriFlux files (`{af-site}_{year}_gapfilled.txt` and `{af-site}_{year}_with_gaps.txt`) contain quality controlled sensor data, gapfilled met data, gapfilled fluxes, and partitioned C fluxes. There are several steps currently needed to create them.
-
-1. Send the "for_gapfilling_filled" file for each site/year to the [MPI EddyProc web service](http://www.bgc-jena.mpg.de/~MDIwork/eddyproc/upload.php). This service provides gapfilled and partitioned flux data, and is the way we currently have to get Lasslop partitioned fluxes used for the lower elevation NMEG sites.
-
-2. Once you receive notification that the partitioner has finished (by email), copy the job number and run [download_gapfilled_partitioned_flux.m](/retrieve_card_data/download_gapfilled_partitioned_flux.m) with the job number as an argument. This will download the resulting files to the "processed_flux" directory.
-
-3. Run [script_make_ameriflux.m](/scripts/script_make_ameriflux.m), which will call the `UNM_Ameriflux_File_Maker.m` with the specified configuration options and output the new AmeriFlux files to "FLUXROOT/FluxOut/".
-
-### Create soil "qc" and "qc_rbd" files
-
-Running the [script_make_soilmet_qc.m](/scripts/script_make_soilmet_qc.m) script for a site/year runs [soil_met_correct.m](soil_met_correct.m), which assembles all soil temperature, water content, heat flux, and TCAV sensor outputs, applies temperature corrections to VWC, and then removes bad data. The two output files, which are placed in "processed_soil", are:
-
-1. "qc" files (`{site}_{year}_soilmet_qc.txt`): Contains all sensor outputs, with raw outputs converted to VWC and temperature corrected (if applicable), for a site/year.
-
-2. "qc_rbd" files (`{site}_{year}_soilmet_qc.txt`): Same as above, but with outliers, level shifts, and other bad data removed on a site/year basis (see [soil_met_correct.m](soil_met_correct.m) for details.
-
-
-## Additional documentation (in doc/)
-
-* [Ancillary Met Data](/doc/Anc_Met_Readme.md)
-* [The old README](/doc/old_README.md)
-* [The old README](/doc/old_README.md)
diff --git a/doc/AncillaryMetData.md b/doc/AncillaryMetData.md
@@ -0,0 +1,110 @@
+# AncillaryMetData
+
+This file contains instructions for downloading ancillary data used to fill
+missing NMEG site meteorology data. Unless otherwise specified, this information
+applies to the `FLUXROOT/Ancillary_met_data` directory, and new data files
+should be put inside that directory.
+
+
+## Valles Caldera met stations ( MCon and PPine )
+
+There is a network of stations in the Valles that have been managed by different entities over the years. NPS staff now maintain the stations, but data are available through the Desert Research Institute (DRI) Western Regional Climate Center (WRCC).
+
+All sites at VCNP can be seen here [here](http://www.wrcc.dri.edu/vallescaldera/). You can click on the site or...
+
+Data for Jemez, the closest to PPine, can be accessed from:
+
+<http://www.wrcc.dri.edu/cgi-bin/rawMAIN3.pl?nmxjem>
+
+Data for Redondo, the closest to MCon, can be accessed from:
+
+<http://www.wrcc.dri.edu/cgi-bin/rawMAIN3.pl?nmvrdd>
+
+Data for VC Headquarters, can be accessed from:
+
+<http://www.wrcc.dri.edu/cgi-bin/rawMAIN3.pl?nmvhvc>
+
+From each page select "Data Lister" and fill in the form for dates/format desired (delimited format,short headers, metric units, etc). The password needed to download data is "wrcc14" (or wrcc15 for Jemez?). The headers have changed periodically, so after downloading all data (2006-present) the files have been broken down into periods with the same header. Original files are in the "raw_DRI_VC_files" directory, and this directory contains the doctored files. The "parse_valles_met_data.m" script should do a pretty good job of stitching these back together.
+
+### NOTE
+
+Data from some, but not all, of these VCNP met stations are also available in different format from the Sev LTER network. There was once a parser (see git history) for these files and in some ways they are a little easier to use. They can be found here:
+
+<http://tierra.unm.edu:8080/research/climate/meteorology/VCNP/index>
+
+using username:sevuser and password:mes4paje . In the past these same files were
+at:
+
+<http://sev.lternet.edu/research/climate/meteorology/VCNP/index.html>
+
+
+## SNOTEL Stations ( MCon and PPine ) 
+
+Can be navigated to from the interactive map at the [NRCS site](http://www.wcc.nrcs.usda.gov/snow/). Navigate to the desired site page where there is a data download interface. Select hourly or daily csv data from standard SNOTEL sensors for the desired calendar year (all days), then press the View Historic button. This should download a file with a 'XXX_STAND_YEAR=YYYY.csv' filename, where XXX is the site code and YYYY is the year. Save this in the 'SNOTEL_daily' or 'SNOTEL_hourly' directory as appropriate.
+
+We currently use data from Quemazon (708), Senorita Divide #2 (744), and Vacas Locas (1017).
+
+
+## Sevilleta sites (GLand, SLand, New_Gland)
+
+We use the Sevilleta LTER met network to fill our sites there. These sites are managed by Doug Moore. There are several options for acquiring data.
+
+    1. Contact Doug directly (dmoore@sevilleta.unm.edu)
+    2. Raw data from all sites is posted periodically to the web at:
+
+       http://sev.lternet.edu/research/climate/meteorology/SEV/outputXX.dat
+       
+       where XX is the year, OR 
+
+       http://sev.lternet.edu/data (Meteorology data link)
+
+    3. Get the raw wireless data from the socorro ftp server at:
+
+       ftp://eddyflux@socorro.unm.edu/export/db/work/wireless/data/current/met
+
+
+## Global Historical Climate Network data from NCDC (JSav)
+
+GHCN data is available at <http://www.ncdc.noaa.gov/cdo-web/datasets>. Follow
+the link to "Daily Summaries". Then select the time period, and search
+for the sitename (ESTANCIA and PROGRESSO are near JSav and PJ sites). Once the
+site is found add it to the cart and:
+
+1. Select the desired format (Custom Daily CSV), adjust date range if needed
+2. Select station metadata outputs (all - station name, location, flags) and set
+   units to METRIC
+3. Select data outputs (all - precip, temp, + all subcategories)
+4. Enter email address and submit order
+
+A link to the data will be sent via email. The raw datafile should be parseable
+by MATLAB. Note that data outputs change over the years, so it may be wise to
+always get data files for the entire period (2006/01/01-present)
+
+
+## PRISM data
+
+Daily PRISM data is downloaded as zipped archives of 1-day .bil files (a georeferenced data format). Yearly archives for the continental US can be downloaded [here](http://prism.oregonstate.edu/recent/). Note that PRISM data is provisional for 6 months. If desired, provisional data can be downloaded [here](http://prism.oregonstate.edu/6month/). Save the archives in the 'PRISM_daily/raw_bil' directory.
+
+There are 2 python scripts and a configuration file needed to process the .bil
+file archives into usable precip data. These can be found in [this
+repository](http://github.com/gremau/NMEG_utils).
+
+* `site_coords.txt` is a list of site names and lat/lon coordinates
+* `bilParser.py` is defines a `BilFile` object with a method to extract a data value at a given coordinate location. It also defines some functions to extract data from particular types of bilFiles (monthly, provisional, etc)
+* `getPRISMdata.py` is the master script. It sets needed parameters and then
+* makes calls to the `bilParser` functions (and by extension, `BilFile` methods) to extract data for each day, year, and site, and then outputs a csv for each year.
+
+I use the anaconda python distribution and run getPRISMdata.py with ipython, but other python distributions that include numPy, pandas, and matplotlib should work. In addition to python, [OsGeo GDAL](http://www.gdal.org/) and its python bindings need to be installed. Decent instructions for this can be found [here](http://pythongisandstuff.wordpress.com/2011/07/07/installing-gdal-and-ogr-for-python-on-windows/). For the Sandia lab computer, this has all been done.
+
+
+## DayMet data
+
+DayMet has its own single pixel extractor program (daymet_multiple_extraction.jar) that can be downloaded [here](http://daymet.ornl.gov/singlepixel.html). This currently resides in the 'DayMet' directory. 
+
+* Sites to be extracted should be added to latlon.txt.
+* Open a terminal in this directory and run `./daymet_multiple_extraction.sh`
+
+Currently Daymet only seems to be available through 2013
+
+
+
diff --git a/doc/Install_and_Configure.md b/doc/Install_and_Configure.md
@@ -0,0 +1,53 @@
+# Installation and configuration
+
+The source code can be downloaded from [the GitHub repository](https://github.com/gremau/NMEG_FluxProc). MATLAB should be installed, and some tasks will requre cygwin (for the bash shell), Campbell Scientific's CardConvert utility, and R tobe installed and on the path.
+
+
+## Local setup
+
+### Data and configuration directories
+
+NMEG_Fluxproc scripts and functions reqire access to a local data 
+directory (`FLUXROOT`) and a configuration file directory (`FluxProcConfig`).
+On the local machine, MATLAB must be able to find these two directories.
+
+#### FLUXROOT
+
+The NMEG_FluxProc scripts and functions performs operations on data in a
+designated path (termed `FLUXROOT`). An example FLUXROOT directory with data
+from an imaginary site can be downloaded from the socorro ftp and used to
+test the NMEG_FluxProc code.
+
+#### FluxProcConfig
+
+Site specific configuration files must also be present in the `FLUXROOT`
+path, and NMEG_FluxProc is currently set to look for them in
+`FLUXROOT/FluxProcConfig`. Configuration files for NMEG sites, including
+the test site mentioned above can be found [here]
+(https://github.com/gremau/NMEG_FluxProcConfig).
+
+### Paths and environment variables.
+
+An environment variable must be set for FluxProc to find the FLUXROOT
+directory on the local file structure. In your `startup.m` file, add
+these lines:
+
+    setenv('FLUXROOT', '/.../')
+
+where "/.../" is the path to the FLUXROOT directory. This will add the
+needed environment variable each time you start MATLAB.
+
+Once this is done, start MATLAB, add the NMEG_FluxProc to your path:
+
+    addpath('path/to/FluxProc')
+
+Enter the directory:
+
+    cd 'path/to/FluxProc'
+
+The rest of the paths needed for FluxProc can be set using
+
+    fluxproc_setpaths
+
+Now FluxProc code should be initialized and ready to use the data and
+configuration files in the FLUXROOT directory.
diff --git a/doc/README.md b/doc/README.md
@@ -0,0 +1,11 @@
+# NMEG_FluxProc Documentation
+
+This is the documentation for the NMEG_FluxProc code repository, which is 
+primarily used to process and quality assure data from the New Mexico
+Elevation Gradient. It is written in and called from MATLAB (mostly).
+
+* [Installing and configuring on a local machine](Install_and_Configure.md)
+* [Common tasks and scripts](TaskScripts.md)
+* [Data releases (and corresponding git tags)](DataReleases.md)
+* [Managing and updating ancillary met data](AncillaryMetData.md)
+* [The old README](/doc/old_README.md) by Tim Hilton
diff --git a/doc/TaskScripts.md b/doc/TaskScripts.md
@@ -0,0 +1,39 @@
+# Scripts for common tasks
+
+## Task scripts
+
+Common tasks have scripts that can be run with common configurations, and are easily modified. These scripts can be found in the [scripts](scripts/) directory. Each of these scripts can be set to run for a list of sites and years and to overwrite existing output files or not.
+
+### Create new "fluxall" files
+
+Fluxall files (`{site}_{year}_fluxall.txt`) should contain raw data from all sensors at a site for one year. The [script_make_fluxall.m](/scripts/script_make_fluxall.m) script will make these files, primarily by calling [card_data_processor.m](card_data_processor.m) in various configurations and reading the raw data in 'toa5' and 'ts_data' directories. Though these files should contain all sensor data, in practice there are some sites with dataloggers that have not been configured to be merged into the fluxall file (namely the Valles Caldera sites).
+
+### Create new "qc", "for_gapfilling", and "for_gapfilling_filled" files
+
+There are several files created from the NMEG quality control pipeline, all output to the "processed_flux" directory. These are:
+
+1. "qc" files (`{site}_{years}_fluxall_qc.txt`): Contain all variables that are quality-controlled and then output by the [UNM_RemoveBadData.m](UNM_RemoveBadData.m) script.
+
+2. "for_gapfilling" files (`{site}_flux_all_{year}_for_gap_filling.txt`): Also output by [UNM_RemoveBadData.m](UNM_RemoveBadData.m) script and contain a subset of quality-controlled variables in a format ready to be filled with ancillary met data.
+
+3. "for_gapfilling_filled" files (`{site}_flux_all_{year}_for_gap_filling_filled.txt`): Same as the file above, but gaps in the met variables have been filled with ancillary met data by the [UNM_fill_met_gaps_from_nearby_site.m](UNM_fill_met_gaps_from_nearby_site.m) script.
+
+To make these files, run the [script_make_qc_gf.m](/scripts/script_make_qc_gf.m). This script may also run the REddyProc gapfilling tool by calling on the [R code from the Max Planck institute](https://www.bgc-jena.mpg.de/bgi/index.php/Services/REddyProcWebRPackage), and the output (also in 'processed_flux') can be used to make AmeriFlux files, below, if desired.
+
+### Create new AmeriFlux files
+
+AmeriFlux files (`{af-site}_{year}_gapfilled.txt` and `{af-site}_{year}_with_gaps.txt`) contain quality controlled sensor data, gapfilled met data, gapfilled fluxes, and partitioned C fluxes. There are several steps currently needed to create them.
+
+1. Send the "for_gapfilling_filled" file for each site/year to the [MPI EddyProc web service](http://www.bgc-jena.mpg.de/~MDIwork/eddyproc/upload.php). This service provides gapfilled and partitioned flux data, and is the way we currently have to get Lasslop partitioned fluxes used for the lower elevation NMEG sites.
+
+2. Once you receive notification that the partitioner has finished (by email), copy the job number and run [download_gapfilled_partitioned_flux.m](/retrieve_card_data/download_gapfilled_partitioned_flux.m) with the job number as an argument. This will download the resulting files to the "processed_flux" directory.
+
+3. Run [script_make_ameriflux.m](/scripts/script_make_ameriflux.m), which will call the `UNM_Ameriflux_File_Maker.m` with the specified configuration options and output the new AmeriFlux files to "FLUXROOT/FluxOut/".
+
+### Create soil "qc" and "qc_rbd" files
+
+Running the [script_make_soilmet_qc.m](/scripts/script_make_soilmet_qc.m) script for a site/year runs [soil_met_correct.m](soil_met_correct.m), which assembles all soil temperature, water content, heat flux, and TCAV sensor outputs, applies temperature corrections to VWC, and then removes bad data. The two output files, which are placed in "processed_soil", are:
+
+1. "qc" files (`{site}_{year}_soilmet_qc.txt`): Contains all sensor outputs, with raw outputs converted to VWC and temperature corrected (if applicable), for a site/year.
+
+2. "qc_rbd" files (`{site}_{year}_soilmet_qc.txt`): Same as above, but with outliers, level shifts, and other bad data removed on a site/year basis (see [soil_met_correct.m](soil_met_correct.m) for details.