Skip to content

Commit

Permalink
Updating documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Gregory E. Maurer committed Sep 22, 2016
1 parent a668b67 commit 7cd882a
Show file tree
Hide file tree
Showing 5 changed files with 225 additions and 83 deletions.
95 changes: 12 additions & 83 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,93 +1,22 @@
# NMEG_FluxProc

[This](https://github.com/gremau/NMEG_FluxProc) is the repository for
FluxProc code used to process data from the New Mexico Elevation Gradient.
It is primarily written in and called from MATLAB.
FluxProc code used to process data from the New Mexico Elevation Gradient,
an ecological research project based at the University of New Mexico.
For project information see [the project website](http://biology.unm.edu/litvak/res_NM_elev.html).
This code is used to process and quality assure eddy covariance and other
environmental sensor data.

## Documentation

## Local setup
Documentation is found in the `doc` directory -- start [here](doc/README.md).

### Data and configuration directories
## Data releases

The FluxProc code performs operations on data in a designated directory
("FLUXROOT"). An example FLUXROOT directory with data from an imaginary
site can be downloaded from the socorro ftp and used to test the FluxProc
code.
Many of versions of NMEG flux data have been produced by this code as it has
changed over the years. Recently we have made the effort to version the data
using a release system that has associated git tags. Release information is
on the [Releases page](doc/Releases.md) of the docs.

Site specific configuration files must also be present in the FLUXROOT
path, and FluxProc is currently set to look for them in
"FLUXROOT/FluxProcConfig". Configuration files for NMEG sites, including
the test site mentioned above can be found [here]
(https://github.com/gremau/NMEG_FluxProcConfig).

### Paths and environment variables.

An environment variable must be set for FluxProc to find the FLUXROOT
directory on the local file structure. In your `startup.m` file, add
these lines:

setenv('FLUXROOT', '/.../')

where "/.../" is the path to the FLUXROOT directory. This will add the
needed environment variable each time you start MATLAB.

Once this is done, start MATLAB, add the NMEG_FluxProc to your path:

addpath('path/to/FluxProc')

Enter the directory:

cd 'path/to/FluxProc'

The rest of the paths needed for FluxProc can be set using

fluxproc_setpaths

Now FluxProc code should be initialized and ready to use the data and
configuration files in the FLUXROOT directory.


## Task scripts

Common tasks have scripts that can be run with common configurations, and are easily modified. These scripts can be found in the [scripts](scripts/) directory. Each of these scripts can be set to run for a list of sites and years and to overwrite existing output files or not.

### Create new "fluxall" files

Fluxall files (`{site}_{year}_fluxall.txt`) should contain raw data from all sensors at a site for one year. The [script_make_fluxall.m](/scripts/script_make_fluxall.m) script will make these files, primarily by calling [card_data_processor.m](card_data_processor.m) in various configurations and reading the raw data in 'toa5' and 'ts_data' directories. Though these files should contain all sensor data, in practice there are some sites with dataloggers that have not been configured to be merged into the fluxall file (namely the Valles Caldera sites).

### Create new "qc", "for_gapfilling", and "for_gapfilling_filled" files

There are several files created from the NMEG quality control pipeline, all output to the "processed_flux" directory. These are:

1. "qc" files (`{site}_{years}_fluxall_qc.txt`): Contain all variables that are quality-controlled and then output by the [UNM_RemoveBadData.m](UNM_RemoveBadData.m) script.

2. "for_gapfilling" files (`{site}_flux_all_{year}_for_gap_filling.txt`): Also output by [UNM_RemoveBadData.m](UNM_RemoveBadData.m) script and contain a subset of quality-controlled variables in a format ready to be filled with ancillary met data.

3. "for_gapfilling_filled" files (`{site}_flux_all_{year}_for_gap_filling_filled.txt`): Same as the file above, but gaps in the met variables have been filled with ancillary met data by the [UNM_fill_met_gaps_from_nearby_site.m](UNM_fill_met_gaps_from_nearby_site.m) script.

To make these files, run the [script_make_qc_gf.m](/scripts/script_make_qc_gf.m). This script may also run the REddyProc gapfilling tool by calling on the [R code from the Max Planck institute](https://www.bgc-jena.mpg.de/bgi/index.php/Services/REddyProcWebRPackage), and the output (also in 'processed_flux') can be used to make AmeriFlux files, below, if desired.

### Create new AmeriFlux files

AmeriFlux files (`{af-site}_{year}_gapfilled.txt` and `{af-site}_{year}_with_gaps.txt`) contain quality controlled sensor data, gapfilled met data, gapfilled fluxes, and partitioned C fluxes. There are several steps currently needed to create them.

1. Send the "for_gapfilling_filled" file for each site/year to the [MPI EddyProc web service](http://www.bgc-jena.mpg.de/~MDIwork/eddyproc/upload.php). This service provides gapfilled and partitioned flux data, and is the way we currently have to get Lasslop partitioned fluxes used for the lower elevation NMEG sites.

2. Once you receive notification that the partitioner has finished (by email), copy the job number and run [download_gapfilled_partitioned_flux.m](/retrieve_card_data/download_gapfilled_partitioned_flux.m) with the job number as an argument. This will download the resulting files to the "processed_flux" directory.

3. Run [script_make_ameriflux.m](/scripts/script_make_ameriflux.m), which will call the `UNM_Ameriflux_File_Maker.m` with the specified configuration options and output the new AmeriFlux files to "FLUXROOT/FluxOut/".

### Create soil "qc" and "qc_rbd" files

Running the [script_make_soilmet_qc.m](/scripts/script_make_soilmet_qc.m) script for a site/year runs [soil_met_correct.m](soil_met_correct.m), which assembles all soil temperature, water content, heat flux, and TCAV sensor outputs, applies temperature corrections to VWC, and then removes bad data. The two output files, which are placed in "processed_soil", are:

1. "qc" files (`{site}_{year}_soilmet_qc.txt`): Contains all sensor outputs, with raw outputs converted to VWC and temperature corrected (if applicable), for a site/year.

2. "qc_rbd" files (`{site}_{year}_soilmet_qc.txt`): Same as above, but with outliers, level shifts, and other bad data removed on a site/year basis (see [soil_met_correct.m](soil_met_correct.m) for details.


## Additional documentation (in doc/)

* [Ancillary Met Data](/doc/Anc_Met_Readme.md)
* [The old README](/doc/old_README.md)
* [The old README](/doc/old_README.md)
110 changes: 110 additions & 0 deletions doc/AncillaryMetData.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# AncillaryMetData

This file contains instructions for downloading ancillary data used to fill
missing NMEG site meteorology data. Unless otherwise specified, this information
applies to the `FLUXROOT/Ancillary_met_data` directory, and new data files
should be put inside that directory.


## Valles Caldera met stations ( MCon and PPine )

There is a network of stations in the Valles that have been managed by different entities over the years. NPS staff now maintain the stations, but data are available through the Desert Research Institute (DRI) Western Regional Climate Center (WRCC).

All sites at VCNP can be seen here [here](http://www.wrcc.dri.edu/vallescaldera/). You can click on the site or...

Data for Jemez, the closest to PPine, can be accessed from:

<http://www.wrcc.dri.edu/cgi-bin/rawMAIN3.pl?nmxjem>

Data for Redondo, the closest to MCon, can be accessed from:

<http://www.wrcc.dri.edu/cgi-bin/rawMAIN3.pl?nmvrdd>

Data for VC Headquarters, can be accessed from:

<http://www.wrcc.dri.edu/cgi-bin/rawMAIN3.pl?nmvhvc>

From each page select "Data Lister" and fill in the form for dates/format desired (delimited format,short headers, metric units, etc). The password needed to download data is "wrcc14" (or wrcc15 for Jemez?). The headers have changed periodically, so after downloading all data (2006-present) the files have been broken down into periods with the same header. Original files are in the "raw_DRI_VC_files" directory, and this directory contains the doctored files. The "parse_valles_met_data.m" script should do a pretty good job of stitching these back together.

### NOTE

Data from some, but not all, of these VCNP met stations are also available in different format from the Sev LTER network. There was once a parser (see git history) for these files and in some ways they are a little easier to use. They can be found here:

<http://tierra.unm.edu:8080/research/climate/meteorology/VCNP/index>

using username:sevuser and password:mes4paje . In the past these same files were
at:

<http://sev.lternet.edu/research/climate/meteorology/VCNP/index.html>


## SNOTEL Stations ( MCon and PPine )

Can be navigated to from the interactive map at the [NRCS site](http://www.wcc.nrcs.usda.gov/snow/). Navigate to the desired site page where there is a data download interface. Select hourly or daily csv data from standard SNOTEL sensors for the desired calendar year (all days), then press the View Historic button. This should download a file with a 'XXX_STAND_YEAR=YYYY.csv' filename, where XXX is the site code and YYYY is the year. Save this in the 'SNOTEL_daily' or 'SNOTEL_hourly' directory as appropriate.

We currently use data from Quemazon (708), Senorita Divide #2 (744), and Vacas Locas (1017).


## Sevilleta sites (GLand, SLand, New_Gland)

We use the Sevilleta LTER met network to fill our sites there. These sites are managed by Doug Moore. There are several options for acquiring data.

1. Contact Doug directly (dmoore@sevilleta.unm.edu)
2. Raw data from all sites is posted periodically to the web at:

http://sev.lternet.edu/research/climate/meteorology/SEV/outputXX.dat
where XX is the year, OR

http://sev.lternet.edu/data (Meteorology data link)

3. Get the raw wireless data from the socorro ftp server at:

ftp://eddyflux@socorro.unm.edu/export/db/work/wireless/data/current/met


## Global Historical Climate Network data from NCDC (JSav)

GHCN data is available at <http://www.ncdc.noaa.gov/cdo-web/datasets>. Follow
the link to "Daily Summaries". Then select the time period, and search
for the sitename (ESTANCIA and PROGRESSO are near JSav and PJ sites). Once the
site is found add it to the cart and:

1. Select the desired format (Custom Daily CSV), adjust date range if needed
2. Select station metadata outputs (all - station name, location, flags) and set
units to METRIC
3. Select data outputs (all - precip, temp, + all subcategories)
4. Enter email address and submit order

A link to the data will be sent via email. The raw datafile should be parseable
by MATLAB. Note that data outputs change over the years, so it may be wise to
always get data files for the entire period (2006/01/01-present)


## PRISM data

Daily PRISM data is downloaded as zipped archives of 1-day .bil files (a georeferenced data format). Yearly archives for the continental US can be downloaded [here](http://prism.oregonstate.edu/recent/). Note that PRISM data is provisional for 6 months. If desired, provisional data can be downloaded [here](http://prism.oregonstate.edu/6month/). Save the archives in the 'PRISM_daily/raw_bil' directory.

There are 2 python scripts and a configuration file needed to process the .bil
file archives into usable precip data. These can be found in [this
repository](http://github.com/gremau/NMEG_utils).

* `site_coords.txt` is a list of site names and lat/lon coordinates
* `bilParser.py` is defines a `BilFile` object with a method to extract a data value at a given coordinate location. It also defines some functions to extract data from particular types of bilFiles (monthly, provisional, etc)
* `getPRISMdata.py` is the master script. It sets needed parameters and then
* makes calls to the `bilParser` functions (and by extension, `BilFile` methods) to extract data for each day, year, and site, and then outputs a csv for each year.

I use the anaconda python distribution and run getPRISMdata.py with ipython, but other python distributions that include numPy, pandas, and matplotlib should work. In addition to python, [OsGeo GDAL](http://www.gdal.org/) and its python bindings need to be installed. Decent instructions for this can be found [here](http://pythongisandstuff.wordpress.com/2011/07/07/installing-gdal-and-ogr-for-python-on-windows/). For the Sandia lab computer, this has all been done.


## DayMet data

DayMet has its own single pixel extractor program (daymet_multiple_extraction.jar) that can be downloaded [here](http://daymet.ornl.gov/singlepixel.html). This currently resides in the 'DayMet' directory.

* Sites to be extracted should be added to latlon.txt.
* Open a terminal in this directory and run `./daymet_multiple_extraction.sh`

Currently Daymet only seems to be available through 2013



53 changes: 53 additions & 0 deletions doc/Install_and_Configure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Installation and configuration

The source code can be downloaded from [the GitHub repository](https://github.com/gremau/NMEG_FluxProc). MATLAB should be installed, and some tasks will requre cygwin (for the bash shell), Campbell Scientific's CardConvert utility, and R tobe installed and on the path.


## Local setup

### Data and configuration directories

NMEG_Fluxproc scripts and functions reqire access to a local data
directory (`FLUXROOT`) and a configuration file directory (`FluxProcConfig`).
On the local machine, MATLAB must be able to find these two directories.

#### FLUXROOT

The NMEG_FluxProc scripts and functions performs operations on data in a
designated path (termed `FLUXROOT`). An example FLUXROOT directory with data
from an imaginary site can be downloaded from the socorro ftp and used to
test the NMEG_FluxProc code.

#### FluxProcConfig

Site specific configuration files must also be present in the `FLUXROOT`
path, and NMEG_FluxProc is currently set to look for them in
`FLUXROOT/FluxProcConfig`. Configuration files for NMEG sites, including
the test site mentioned above can be found [here]
(https://github.com/gremau/NMEG_FluxProcConfig).

### Paths and environment variables.

An environment variable must be set for FluxProc to find the FLUXROOT
directory on the local file structure. In your `startup.m` file, add
these lines:

setenv('FLUXROOT', '/.../')

where "/.../" is the path to the FLUXROOT directory. This will add the
needed environment variable each time you start MATLAB.

Once this is done, start MATLAB, add the NMEG_FluxProc to your path:

addpath('path/to/FluxProc')

Enter the directory:

cd 'path/to/FluxProc'

The rest of the paths needed for FluxProc can be set using

fluxproc_setpaths

Now FluxProc code should be initialized and ready to use the data and
configuration files in the FLUXROOT directory.
11 changes: 11 additions & 0 deletions doc/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# NMEG_FluxProc Documentation

This is the documentation for the NMEG_FluxProc code repository, which is
primarily used to process and quality assure data from the New Mexico
Elevation Gradient. It is written in and called from MATLAB (mostly).

* [Installing and configuring on a local machine](Install_and_Configure.md)
* [Common tasks and scripts](TaskScripts.md)
* [Data releases (and corresponding git tags)](DataReleases.md)
* [Managing and updating ancillary met data](AncillaryMetData.md)
* [The old README](/doc/old_README.md) by Tim Hilton
39 changes: 39 additions & 0 deletions doc/TaskScripts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Scripts for common tasks

## Task scripts

Common tasks have scripts that can be run with common configurations, and are easily modified. These scripts can be found in the [scripts](scripts/) directory. Each of these scripts can be set to run for a list of sites and years and to overwrite existing output files or not.

### Create new "fluxall" files

Fluxall files (`{site}_{year}_fluxall.txt`) should contain raw data from all sensors at a site for one year. The [script_make_fluxall.m](/scripts/script_make_fluxall.m) script will make these files, primarily by calling [card_data_processor.m](card_data_processor.m) in various configurations and reading the raw data in 'toa5' and 'ts_data' directories. Though these files should contain all sensor data, in practice there are some sites with dataloggers that have not been configured to be merged into the fluxall file (namely the Valles Caldera sites).

### Create new "qc", "for_gapfilling", and "for_gapfilling_filled" files

There are several files created from the NMEG quality control pipeline, all output to the "processed_flux" directory. These are:

1. "qc" files (`{site}_{years}_fluxall_qc.txt`): Contain all variables that are quality-controlled and then output by the [UNM_RemoveBadData.m](UNM_RemoveBadData.m) script.

2. "for_gapfilling" files (`{site}_flux_all_{year}_for_gap_filling.txt`): Also output by [UNM_RemoveBadData.m](UNM_RemoveBadData.m) script and contain a subset of quality-controlled variables in a format ready to be filled with ancillary met data.

3. "for_gapfilling_filled" files (`{site}_flux_all_{year}_for_gap_filling_filled.txt`): Same as the file above, but gaps in the met variables have been filled with ancillary met data by the [UNM_fill_met_gaps_from_nearby_site.m](UNM_fill_met_gaps_from_nearby_site.m) script.

To make these files, run the [script_make_qc_gf.m](/scripts/script_make_qc_gf.m). This script may also run the REddyProc gapfilling tool by calling on the [R code from the Max Planck institute](https://www.bgc-jena.mpg.de/bgi/index.php/Services/REddyProcWebRPackage), and the output (also in 'processed_flux') can be used to make AmeriFlux files, below, if desired.

### Create new AmeriFlux files

AmeriFlux files (`{af-site}_{year}_gapfilled.txt` and `{af-site}_{year}_with_gaps.txt`) contain quality controlled sensor data, gapfilled met data, gapfilled fluxes, and partitioned C fluxes. There are several steps currently needed to create them.

1. Send the "for_gapfilling_filled" file for each site/year to the [MPI EddyProc web service](http://www.bgc-jena.mpg.de/~MDIwork/eddyproc/upload.php). This service provides gapfilled and partitioned flux data, and is the way we currently have to get Lasslop partitioned fluxes used for the lower elevation NMEG sites.

2. Once you receive notification that the partitioner has finished (by email), copy the job number and run [download_gapfilled_partitioned_flux.m](/retrieve_card_data/download_gapfilled_partitioned_flux.m) with the job number as an argument. This will download the resulting files to the "processed_flux" directory.

3. Run [script_make_ameriflux.m](/scripts/script_make_ameriflux.m), which will call the `UNM_Ameriflux_File_Maker.m` with the specified configuration options and output the new AmeriFlux files to "FLUXROOT/FluxOut/".

### Create soil "qc" and "qc_rbd" files

Running the [script_make_soilmet_qc.m](/scripts/script_make_soilmet_qc.m) script for a site/year runs [soil_met_correct.m](soil_met_correct.m), which assembles all soil temperature, water content, heat flux, and TCAV sensor outputs, applies temperature corrections to VWC, and then removes bad data. The two output files, which are placed in "processed_soil", are:

1. "qc" files (`{site}_{year}_soilmet_qc.txt`): Contains all sensor outputs, with raw outputs converted to VWC and temperature corrected (if applicable), for a site/year.

2. "qc_rbd" files (`{site}_{year}_soilmet_qc.txt`): Same as above, but with outliers, level shifts, and other bad data removed on a site/year basis (see [soil_met_correct.m](soil_met_correct.m) for details.

0 comments on commit 7cd882a

Please sign in to comment.