Reduce aggr mem usage
Denis has been running some VIIRS CLDPROP aggregations in the SIPS that are getting up over 100GB in memory usage and take over 12 hours to run.
This MR includes two primary changes to aggregation:
-
Instead of handling all output groups at once it handles them in batches (of 2 groups at a time currently). This means we no longer have to allocate all output arrays simultaneously which was the primary reason for the large memory footprint.
-
Last time I investigated aggregation performance I put in an improvement that utilized
Pixel_Counts
to limit the read size on gridded granule files. This didn't help, however, whenPixel_Counts
is not available which is the case when only histograms are requested. This MR adds a global attribute to gridded granule files (calledgranule_edges
) that indicates the subset of the global grid that actually contains data.
I tested this change on a daily aggregation and memory usage was down under 10GB and run time was about 1.5 hours.