Changes

Eva Schiffer · d987fa98
--- a/home.md
+++ b/home.md
+The STG application is a command line application that organizes data into an equal angle grid so that it can be compared across space and time. The application includes several types of statistical analysis. 
+
+The algorithms below are based upon those described in “A Uniform Space–Time Gridding Algorithm for Comparison of Satellite Data Products: Characterization and Sensitivity Study” (Smith et al., Jan. 2013).
+
+### Installing
+
+To install STG on servers like iris and thunder, first create a ~/.condarc file as described on http://sips.ssec.wisc.edu/docs/using_thunder.html . Then run the following commands in a directory where you wish to store the STG code:
+
+    module load anaconda27
+    conda create -n stg python=2.7 anaconda
+    source activate stg
+    conda install netCDF4=1.1.9
+    conda install hdf4 python-hdf4
+    conda install basemap
+    git clone https://gitlab.ssec.wisc.edu/evas/STG.git
+    cd uwstg
+    python setup.py develop
+    easy_install -vi http://larch.ssec.wisc.edu/eggs/repos keoni
+
+This will create a virtual environment called “stg” which includes all the software needed to run the STG command line calls. When the virtual environment is activated the “stg” and “stg_plot” command line calls should be available in your path.
+
+To re-activate this virtual environment when you reconnect to a server where you have already set up the STG virtual environment, you will need to run the following commands:
+
+    module load anaconda27
+    source activate stg
+
+### Command Line Usage
+
+STG can be invoked on the command line using the “stg” command. It includes several sub-commands to run parts of the gridding process and to create axillary files. If you invoke the “stg” command with no subsequent sub-command, it will print help information about the sub-commands and command line options.
+
+For all calls you will want to specify the “-o” or “--output” and “-i” or “--input” parameters to give the paths of your output and input directories. These default to ./out and ./ respectively. If the output directory does not exist STG will create it. 
+
+Currently output from the space and time gridding commands is in netCDF4 formatted file. STG produces one file per day of daily space gridding and one file per day of daily time gridding (multiple variables for the same day will be inside the same netCDF file). 
+
+### Space Gridding
+
+The first step when using the STG application is to space grid your data by day. Each call to “stg space_gridding_day” handles one day worth of data and should be given an input directory that contains all the granule files for that day. 
+
+STG knows how many granules it expects for a day (for example 288 granules for a day of MODIS data). If the application is unable to process at least ⅔ of the number of expected files for a day, the application will issue a warning and produce no data for this day. If you wish to run space gridding for a day containing less than ⅔ the expected number of files, you can use the “-p” or “--do_process_with_little_data” command line argument to bypass this check.
+
+The application transforms the data for each granule to fall onto an equal angle grid. The size of the grid cells is measured in latitude and longitude degrees, and can be controlled with the “-g” or “--grid_degrees” command line argument. The default is to produce grid cells that are 0.5 degrees wide and tall.
+
+The transformation from the latitude and longitude in the granule file to the equal angle grid is done using the following formula.
+
+    lon_index = numpy.round((lon_data + 180.0) / grid_degrees) % (360.0 / grid_degrees)
+    lat_index = numpy.round((lat_data 	+  90.0) / grid_degrees) % (180.0 / grid_degrees)
+
+lon_index and lat_index are the indices in the final equal angle grid, lat_data and lon_data are the input latitude and longitude values for each data point, and grid degrees is the size of the grid cell in latitude and longitude degrees. Because the longitude values are modulo 360.0 the algorithm can handle both longitudes in the range of 0 to 360 and those in the range of -180 to 180.
+
+As it processes the granules the space gridding algorithm will either: keep only the data for a given grid cell that comes from the overpass with the smallest maximum sensor zenith angle OR keep all the data that falls into a given grid cell (even if this includes multiple overpasses). By default the algorithm will limit grid cells to a single overpass, but you can include data from all overpasses by specifying the “-m” or “--multiple_overpasses_per_cell” command line flag. Limiting data to a single overpass is especially useful in the polar regions.
+
+All data is limited by scan angle, and data that has a scan angle greater than the angle specified by the “-a” or “--min_scan_angle” command line parameters (default 32 degrees) is discarded.
+
+The space gridding algorithm further separates variable data by time of day. The current code for MODIS processing separates the data into four time sets: evening, night, morning, and afternoon. 
+
+    evening data   = (solar zenith angle > 85 degrees) and (local time >= 12) 
+    night data     = (solar zenith angle > 85 degrees) and (local time < 12)
+    morning data   = (solar zenith angle <= 85 degrees) and (local time < 12)
+    afternoon data = (solar zenith angle <= 85 degrees) and (local time >= 12) 
+
+The local time is the number of hours on a 24 hour clock, and is calculated from the file’s timestamp as follows (the input scan line time is seconds since 1993-1-1 00:00:00.0 0 for MODIS).
+
+    local time = ((scan line time / seconds per hour) + (longitude degrees * hours per degree longitude)) % 24 hours per day
+
+local time is also used to ensure that data from different overpasses is not mixed in the same grid cell (unless you specify the “-m” or “--multiple_overpasses_per_cell” command line flag).
+
+This calculation does not yet take into account where granules may overlap the dateline. This is a planned future improvement. 
+
+The space gridding code creates two variables for each time category for each variable: the number of observations and the gridded data values. So processing the Cloud Top Pressure would create up to eight variables: two for evening, two for night, two for morning, and two for afternoon. If there is no data available for a variable during a time category, those variables will not be created.
+ 
+Number of observations is a count of every time a data point for a granule would have fallen into a given grid cell (limited by overpass and scan angle as described above). This includes times when the satellite did not record a data value for that point (and during time gridding it also includes times when the associated data value does not fall into the current filtering category: for example you are looking for high CTP, and this data point is low). The number of measurements is a count of how many finite data measurements fell into a grid cell for this variable. 
+
+This means that different grid cells will have different amounts of data in them after space gridding. The output file for the space gridded data is a 3D array, where the indexes are [measurement index, latitude cell index, longitude cell index]. All data values are listed starting at the beginning of the measurement dimension and then packed with numpy’s NaN values where there are not enough measurements to fill the full depth of the measurements dimension in that grid cell. The number of observations output requires only one integer for each grid cell so it is a 2D array with indexes in the form [latitude cell index, longitude cell index].
+
+### Time Gridding
+
+Time gridding primarily involves calculating aggregated statistics for the space gridded data. Daily time gridding produces the following statistics for each grid cell:
+
+    cloud fraction (number of measurements / number of observations)
+    mean 
+    number of measurements
+    number of observations 
+    standard deviation 
+    uncertainty (standard deviation / number of measurements)
+
+These statistics are saved in a separate variable for each combination of input variable, time set (morning/afternoon/evening/night), and any additional filtering category (described below). 
+
+Time gridding can be done for each day of space gridded data using the “stg time_gridding_day” command. The input directory should be the directory where your daily space gridded netCDF files are stored.
+
+Variable data read in for daily time gridding is additionally filtered into sets based on ranges specified for some variables. Variables which are currently being filtered into sets include the cloud top pressure (CTP) and cloud effective emissivity (CEFF). 
+
+    high CTP    = variable CTP < 440
+    mid CTP     = (variable CTP >= 440) and (variable CTP < 680)
+    low CTP     = variable CTP >= 680
+
+    thin CEFF	= variable CEFF < 0.5
+    thick CEFF	= (variable CEFF >= 0.5) and (variable CEFF < 0.95)
+    opaque CEFF	= variable CEFF >= 0.95
+
+Data can also be limited during time gridding based on the number of observations that fall into a grid cell for that day. If you want to retain only data with more than a static number of observations, you can specify this threshold using the “-f” or “--fixed_nobs_cutoff” command line arguments. 
+
+If you prefer to use a dynamic cutoff, you can use the “-d” or “--dynamic_nobs_cutoff” to specify the fraction of the standard deviation you wish to allow out from the mean, so 1.0 would be +/- one standard deviation from the mean. If you are using the dynamic cutoff you will also need to use the “-l” or “--nobs_lut” argument to specify a representative number of observations look up table. Number of observations look up tables can be generated from sets of space gridded files (at least one month worth of space gridded files is the prefered amount). 
+
+Unless you indicate a fixed or dynamic cutoff in your command line call, no grid cells will be excluded by their associated number of observations.
+
+Multi-day aggregated time gridding will be implemented in the future to produce longer term statistics. This will be built to input the daily time gridded data.
+
+### Instrument Specific Information
+
+Individual instruments like MODIS or VIIRS are managed inside the code using their own guidebook and io management files. The majority of the examples for filtering and time categories above apply to the MODIS specific code. Similar filtering and time categories will be implemented for other instruments in the future.
+
+#### MODIS
+
+TODO, separate out MODIS specific information once other instruments have been implemented.
+
+### Other Testing Details and Future Plans
+
+Testing of this code initial focused on MODIS collection 6 level two cloud products. The intent was to prioritize creating gridded output that would allow for comparisons in the continuity of cloud products produced from different types of satellite data. Future plans include expanding the code to handle the anticipated Patmos-x VIIRS and AVHRR/HIRS products.
+
+The golden day chosen for our initial testing is March 29th, 2013. Most tests have been run on this day before attempting to process other daily sets. 
+
+---------
+
+Below are some of the older notes on this project for reference:
+ 
+Output: This process will be performed for a day on thunder, and after we take a look, we would request that the SIPS provide similar files over the entire MODIS data record. We envision this as a standalone process.
+  
+Specific Outputs:
+
+Daily frequency (percent) of high thin, high thick, and opaque clouds from 5-km and 1-km data
+Daily frequency (percent) of mid thin, mid thick, and opaque clouds from 5-km and 1-km data
+Daily frequency (percent) of low thin, low thick, and opaque clouds from 5-km and 1-km data
+Daily frequency of ice, water, and unknown cloud phase from the 1-km IR cloud phase product (do we want 5-km in here?)
+Daily mean cloud fraction (percent) from 1-km and 5-km data
+Daily mean cloud top pressure and height for high, mid, and low cloud categories from 5-km and 1-km data
+Daily mean cloud effective emissivity for high, mid, and low cloud categories from 5-km and 1-km)
+Weekly and Monthly means from above daily means
+ 
+Filtering and Aggregation Rules:
+
+   use the SDS 'cloud_top_pressure_1km'
+   use the SDS 'cloud_top_height_1km'
+   use the SDS 'cloud_emissivity_1km'
+ 
+   use the SDS ‘Cloud_Phase_Infrared_1km’
+   use 'Sensor_Zenith' (5-km data); center 1-km pixel in a 5x5 corresponds  
+       	to values in the 5-km geolocation SDS)  
+
+For the VIIRS files they would like to start with the following variables:
+cld_height_acha
+cld_press_acha
+cld_emiss_acha
+