yori issueshttps://gitlab.ssec.wisc.edu/pveglio/yori/-/issues2020-07-22T14:41:46Zhttps://gitlab.ssec.wisc.edu/pveglio/yori/-/issues/27Minimum Valid Pixel/Days in D3/M3 for L3 Aerosol Products2020-07-22T14:41:46ZPaolo VeglioMinimum Valid Pixel/Days in D3/M3 for L3 Aerosol Products# Request
A two-layer filtering is requested by the Aerosol Team to produce M3 products with Yori:
1. A minimum number of `Pixel_Counts` in a given 1x1 grid cell is required to populate any Aerosol-related group in the D3 product
2. A mi...# Request
A two-layer filtering is requested by the Aerosol Team to produce M3 products with Yori:
1. A minimum number of `Pixel_Counts` in a given 1x1 grid cell is required to populate any Aerosol-related group in the D3 product
2. A minimum number of "valid days" in a given 1x1 grid cell is required to populate any Aerosol-related group in the M3 product
**Note:**
the minimum value for both the `Pixel_Counts` and the "valid days" is subject to change and the user should be allowed to set it
# Implementation
The implementation of this change is split in two parts, in line with the two bullets listed above. All the changes will require the user to specify they want to use these features; this will allow to add functionalities to Yori while maintaining the same workflow for current users.
## Minimum `Pixel_Counts`
The minimum `Pixel_Counts` will be implemented in the gridding phase. As such, an additional, optional line will be added to the configuration file that will specify the minimum threshold.
The following example shows how this option can be added to the configuration file (names are not final):
```yaml
variable_settings:
- name_in: Aerosol_Variable
name_out: Aerosol_Variable_Out
attributes:
...
min_pixel_counts: 4
```
On the code side this should be a relatively easy feature to add. Yori at the moment does this filtering implicitly by populating grid cells with fill values whenever `Pixel_Counts = 0`. We can implement this functionality by passing the `min_pixel_counts` to the `ComputeVariables` class in the `gridtools` module. Again, this parameter will default to zero so `yori-grid` won't change unless explicitly requested by the user.
## Minimum "Valid Days"
*Valid Days* is defined as the number of D3 files with more than zero `Pixel_Counts` values over a month period for a given grid cell.
The hard requirement for making the M3 products filtered by minimum valid days is having a month worth of D3 products ready to be passed to `yori-aggr`.
During the aggregation from D3 to M3 a temporary 360x180 `valid_days` array is initialized with zeros for each variable that requires the "minimum valid days" filtering. Iterating over the D3 files, a one is added to every grid cell that is not empty. At the end of the 30-day aggregation each grid cell in the `valid_days` array will have a value between 0 and 30. This array is used to determine whether a given grid cell meets the criteria and deleted from the final M3 file. This idea is basically what Paul also suggested, but I'm still thinking of a better alternative.https://gitlab.ssec.wisc.edu/pveglio/yori/-/issues/26QA-weighted Statistics2020-06-03T15:20:01ZPaolo VeglioQA-weighted Statistics# Goal
Compute QA-weighted means and standard deviations and propagate them during the aggregation process. Ideally the end result should look something like the sample ncdump below:
```
group: Aerosol_Variable {
variables:
double...# Goal
Compute QA-weighted means and standard deviations and propagate them during the aggregation process. Ideally the end result should look something like the sample ncdump below:
```
group: Aerosol_Variable {
variables:
double Mean(longitude, latitude) ;
double Standard_Deviation(longitude, latitude) ;
double Sum(longitude, latitude) ;
int Pixel_Counts(longitude, latitude) ;
double Sum_Squares(longitude, latitude) ;
double QA_Mean(longitude, latitude) ; # new variable
double QA_Standard_Deviation(longitude, latitude) ; # new variable
double Weighted_Sum(longitude, latitude) ; # new variable, required for aggregation
int Sum_of_Weights(longitude, latitude) ; # new variable, required for aggregation
double Weighted_Sum_Squares(longitude, latitude) ; # new variable, required for aggregation
...
} // group Aerosol_Variable
```
**Changes to Configuration File**
- The configuration file will have an additional option to turn on the computation of the QA-weighted quantities (default off)
```yaml
variable_settings:
- name_in: aerosol_variable
name_out: aerosol_var_out_with_qa
compute_QA: QA_weights
attributes:
...
```
If `compute_QA` is not defined the code runs as usual, if it is defined the name of the variable that contains the weights need to be specified (e.g. `QA_weights`). Names are not final.
**Changes to the Code:**
In `yori-grid` the QA-weighted quantities are computed like the normal statistics. Weighted sum, weighted sum of squares and sum of weights have to be saved for the aggregation. An additional function to compute these quantities will be added in `gridtools.py` and called when requested by the configuration file.
In `yori-aggr` the QA-weighted quantities are aggregated if the weighted sum, weighted sum of squares and sum of weights are found. A function to propagate the QA-weighted quantities from the gridded files will be added to `yori_aggregate.py`. A conditional instruction will be added as well, to run the new function when the "QA quantities" are found in the gridded files. Depending on how the code will end up looking at the end of this process I might restructure the `yori_aggregate` module to be more readable (e.g. move utility functions in a separate module and leave only the aggregation).
# On the Computation of the QA-weighted quantities
## Recap of the Math Behind Yori
Let's define the **sum** of a given variable, that is, the sum of its values $`v_j`$ within a grid cell $`c`$, as:
```math
S_c = \sum_j v_j
```
and similarly the **sum of squares**:
```math
\textit{SS}_c = \sum_j v_j ^2
```
where $`j`$ is the $`j`$-th pixel and $`v_j`$ is the quantity under consideration for that pixel.
With these quantities we can compute the **mean** $`M`$ as:
```math
M_c = \frac{\sum_i S_{c,i}}{\sum_i N_{c,i}}
```
with $`c`$ and $`i`$ the $`c`$-th cell and the $`i`$-th file respectively and $`N_{c,i}`$ is the number of points. With the same indexing logic, the **standard deviation** is derived from:
```math
\textit{STD}_c = \left[\frac{\sum_i \textit{SS}_{c,i}}{\sum_i N_{c,i}} - \left(\frac{\sum_i S_{c,i}}{\sum_i N_{c,i}}\right)^2\right]^{1/2}
```
## Computation of the QA-weighted Quantities
For the QA-weighted quantities we can use the same approach done before, with some minor changes to account for the weights.
First let's define the general formulas for the weighted mean and the standard deviation:
##### Weighted Mean:
```math
M_w = \frac{\sum_j w_j v_j}{\sum_j w_j}
```
##### Weighted Standard Deviation:
```math
\textit{STD}_w = \left( \frac{\sum_j w_j (v_j - \bar{v})^2}{\sum_i w_j} \right)^{1/2}
```
where $`w_j`$ is the weight of the $`j`$-th pixel and $`\bar{v}`$ is mean value of all $`v_j`$.
Now, in order to propagate the weighted mean and standard deviation we need three additional quantities in the same way we did for the unweighted mean and standard deviation. Let's define these three additional quantities as follows:
* **sum of weights**:
```math
W = \sum_j w_j
```
* **weighted sum**:
```math
S_w = \sum_j w_j v_j
```
* **weighted sum of squares**:
```math
\textit{SS}_w = \sum_j w_j (v_j - \bar{v})^2
```
Now we can use $`W`$, $`S_w`$ and $`\textit{SS}_w`$ to compute the weighted mean $`M_w`$ and standard deviation $`\textit{STD}_w`$:
```math
M_w = \frac{S_w}{W}
```
```math
\textit{STD}_w = \left[\frac{\sum_i \textit{SS}_{w,i}}{\sum_i W_i} - \left(\frac{\sum_i S_{w,i}}{\sum_i W_i}\right)^2\right]^{1/2}
```
Note that in the previous two equations the subscript $`c`$ has been dropped for readability but these are still per-cell quantities.
## Test
Here a code snippet is provided to verify the above formulas
```python
import numpy as np
def main():
# create the fake data
scale = np.random.randint(100)
data1 = np.random.random(500)*scale
data2 = np.random.random(300)*scale
# create the weight arrays
w1 = np.random.randint(4, size=len(data1))
w2 = np.random.randint(4, size=len(data2))
# merge the two arrays
alldata = np.append(data1, data2)
allW = np.append(w1, w2)
# compute sum of weights, weighted sum and weighted sum of squares
W1 = np.sum(w1)
W2 = np.sum(w2)
ws1 = np.sum(w1*data1)
ws2 = np.sum(w2*data2)
wss1 = np.sum(w1*data1**2)
wss2 = np.sum(w2*data2**2)
# reference values
reference_mean = np.sum(allW*alldata)/np.sum(allW)
reference_std = np.sqrt(np.sum(allW*(alldata - reference_mean)**2)/np.sum(allW))
# Weighted mean and standard deviation with the formulas defined above
qa_mean = (ws1 + ws2)/(W1 + W2)
qa_std = np.sqrt((wss1+wss2)/(W1+W2) - ((ws1+ws2)/(W1+w2))**2)
# compare results
print('Means are the same: {}'.format(np.isclose(reference_mean, qa_mean)))
print('Stds are the same: {}'.format(np.isclose(reference_std, qa_std)))
if __name__ == '__main__':
main()
```
https://gitlab.ssec.wisc.edu/pveglio/yori/-/issues/22Greater than 2-D data throws error on histogram2019-09-30T15:25:52ZEthan Nelsonethan.nelson@ssec.wisc.eduGreater than 2-D data throws error on histogramWhen trying to histogram data with more than two dimensions, yori raises an error:
```
Traceback (most recent call last):
File "monitor.py", line 71, in yori_aggregate
callYori('%s-settings.yaml' % product, new_f, new_grid_f)
Fi...When trying to histogram data with more than two dimensions, yori raises an error:
```
Traceback (most recent call last):
File "monitor.py", line 71, in yori_aggregate
callYori('%s-settings.yaml' % product, new_f, new_grid_f)
File "/home/enelson/venv/lib/python3.6/site-packages/yori-1.3.11.dev0+g7e044fb.d20190927-py3.6.egg/yori/run_yori.py", line 187, in callYori
Yori.runYori(debug, compression)
File "/home/enelson/venv/lib/python3.6/site-packages/yori-1.3.11.dev0+g7e044fb.d20190927-py3.6.egg/yori/run_yori.py", line 149, in runYori
tmp_gridvar = myGrid.compute_stats(masked_data_in, var_name_out)
File "/home/enelson/venv/lib/python3.6/site-packages/yori-1.3.11.dev0+g7e044fb.d20190927-py3.6.egg/yori/gridtools.py", line 100, in compute_stats
var = reformat_vector(var)
File "/home/enelson/venv/lib/python3.6/site-packages/yori-1.3.11.dev0+g7e044fb.d20190927-py3.6.egg/yori/gridtools.py", line 251, in reformat_vector
vec = np.reshape(vec, np.shape(vec)[0]*np.shape(vec)[1])
File "<__array_function__ internals>", line 6, in reshape
File "/home/enelson/venv/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 301, in reshape
return _wrapfunc(a, 'reshape', newshape, order=order)
File "/home/enelson/venv/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 61, in _wrapfunc
return bound(*args, **kwds)
ValueError: cannot reshape array of size 1131200 into shape (2828,)
```
This error stems from a hard-coded assumption of data shape as 2-D in the `reformat_vector` function. (It would have actually been raised sooner in execution, https://gitlab.ssec.wisc.edu/pveglio/yori/blob/master/yori/gridtools.py#L23, except I formatted my input lat/lon arrays incorrectly to be 2-D instead of 3-D.)
One path forward may be to change [this line](https://gitlab.ssec.wisc.edu/pveglio/yori/blob/master/yori/gridtools.py#L251) in the `reformat_vector` function to [`numpy.flatten`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.flatten.html) to handle arrays of higher dimension. The program will then make sure that lat/lon and data variables are of the same length after flattening (https://gitlab.ssec.wisc.edu/pveglio/yori/blob/master/yori/gridtools.py#L103).
Another may be to enforce a dimension limit in the program by checking if all inputs are 2-D early in the execution. I don't know how that aligns with the program philosophy--I can see a case of someone wanting to grid data with shape `(time, lat, lon)`.https://gitlab.ssec.wisc.edu/pveglio/yori/-/issues/20daily aggregation vs C6?2019-03-27T18:32:45ZSteve Dutcherdaily aggregation vs C6?I believe this is the current logic we use for daily aggregation in Yori:
```
180W to 90W: use 3z on target day to 3z on next day
90W to 0W: use 21z on previous day to 21z on target day
0E to 90E: use 3z on target day to 3z on...I believe this is the current logic we use for daily aggregation in Yori:
```
180W to 90W: use 3z on target day to 3z on next day
90W to 0W: use 21z on previous day to 21z on target day
0E to 90E: use 3z on target day to 3z on next day
90E to 180E: use 21z on previous day to 21z on target day
```
on the PLAT-33 Jira ticket, Paul Hubanks was saying this is the logic they use for C6.
```
For Aqua
Longitude Zone [-180 to -90]
Aqua Late Day Only: 24 hours starting at 03:00 GMT
Longitude Zone [-90 to -0]
Aqua Standard Day: 24 hours starting at 00:00 GMT
Longitude Zone [0 to 90]
Aqua Late Day Only: 24 hours starting at 03:00 GMT
Longitude Zone [90 to 180]
Aqua Standard Day: 24 hours starting at 00:00 GMT
For Terra
Longitude Zone [-180 to -90]
Terra Standard Day: 24 hours starting at 00:00 GMT
Longitude Zone [-90 to -0]
Terra Early Day Only: 24 hours ending at 21:00 GMT
Longitude Zone [0 to 90]
Terra Standard Day: 24 hours starting at 00:00 GMT
Longitude Zone [90 to 180]
Terra Early Day Only: 24 hours ending at 21:00 GMT
```
So Aqua they use the same zones as we do for 1 & 3 and for Terra they use the same zones we do for 2 & 4. The question we have is does our aggregation produce a similar result? In other words does using all 4 zones for both satellites cause differences in the product. I know there will be differences on granules near the poles but for granules near the equator I suspect not.
What I would suggest that you do is make a test branch of yori with 2 new command lines arguments, something like aqua_daily and terra_daily where it uses the logic above. Then create a L3 product for both Aqua and Terra using both our daily aggregation and then the corresponding new aqua_daily and terra_daily. You should be able to validate the result by gridding up observation time and then comparing the two datasets.
Paolo,
Do you have the free cycles to look into this?https://gitlab.ssec.wisc.edu/pveglio/yori/-/issues/14Multiple overpasses2017-11-07T19:39:59ZPaolo VeglioMultiple overpassesBryan asked if there's a way of dealing with multiple overpasses.
Specifically, how to solve the problem of having data from multiple orbits in the same grid cell.
The goal would be to have a grid cell filled with only one overpass per...Bryan asked if there's a way of dealing with multiple overpasses.
Specifically, how to solve the problem of having data from multiple orbits in the same grid cell.
The goal would be to have a grid cell filled with only one overpass per day, so that it isn't filled with data hours apart where the conditions might have changed.
edit:
just noticed that this is the same issue #5 that Steve opened a while ago. I'll leave it here for now, so that we know that there's more than one person asking for this featurehttps://gitlab.ssec.wisc.edu/pveglio/yori/-/issues/13Thresholding on aggregation based on number of valid pixels per grid cell2017-11-02T14:45:11ZPaolo VeglioThresholding on aggregation based on number of valid pixels per grid cellAnother request from Andy Sayer:
> is there any ability to implement thresholding? If not, is this something which is easy to add?
> For example, in the MODIS level 3 aerosol products, at least 3 (I think) retrievals are needed for a...Another request from Andy Sayer:
> is there any ability to implement thresholding? If not, is this something which is easy to add?
> For example, in the MODIS level 3 aerosol products, at least 3 (I think) retrievals are needed for a daily 1 degree grid cell to be considered valid, and at least 3 days are needed for a monthly grid cell to be considered valid.https://gitlab.ssec.wisc.edu/pveglio/yori/-/issues/8Groups I/O2017-10-26T20:25:50ZPaolo VeglioGroups I/OAt the moment Yori doesn't support any kind of groups on either the input or the output side.
We could change the code so that the users can decide to save their products in groups as well, or we can leave it as it is.At the moment Yori doesn't support any kind of groups on either the input or the output side.
We could change the code so that the users can decide to save their products in groups as well, or we can leave it as it is.https://gitlab.ssec.wisc.edu/pveglio/yori/-/issues/7Inverse Masks2017-12-08T15:17:56ZPaolo VeglioInverse MasksDo we want to support the use of inverse masks?
Pros:
- save some space for the input files
- Yori can handle them with some minor tweaks
Cons:
- limited use
- unnecessary "complication" of the codeDo we want to support the use of inverse masks?
Pros:
- save some space for the input files
- Yori can handle them with some minor tweaks
Cons:
- limited use
- unnecessary "complication" of the codehttps://gitlab.ssec.wisc.edu/pveglio/yori/-/issues/4Support MOD08 features2017-11-01T18:34:41ZGreg Quinngreg.quinn@ssec.wisc.eduSupport MOD08 featuresI've been familiarizing myself with the various types of gridding operations supported in the MOD08 MODIS product. This ticket is meant as a place to list and discuss what features Yori is missing that we'd need in order to be able to re...I've been familiarizing myself with the various types of gridding operations supported in the MOD08 MODIS product. This ticket is meant as a place to list and discuss what features Yori is missing that we'd need in order to be able to reproduce MOD08. There are still some MOD08 constructs I need to spend a bit more time on to fully understand, but here's what I've uncovered so far:
* **Minimum / maximum** So far we've just talked about storing `count`, `sum`, `sum_of_squares`, `mean`, and `stdev`.
* **QA weighting** MOD08 allows variables to be gridded such that higher-quality observations/retrievals are given higher weighting in the statistics. One way I could see us supporting that sort of thing by extending the mask concept from simply boolean 0/1 to a floating point array of weights.
* **Log stats** For some variables (e.g., optical depths) MOD08 takes stats of the logarithm of the value instead of the value itself. Maybe we should support this as well, though I could also see placing the responsibility on the user to store logarithms in the input NetCDF file before invoking Yori.
* **Joint / multivariate histograms** We've talked a bit about this one already, but it's still not clear how we're going to support it.