Skip to content

Reduce aggr mem usage

Greg Quinn requested to merge reduce-aggr-mem-usage into master

Denis has been running some VIIRS CLDPROP aggregations in the SIPS that are getting up over 100GB in memory usage and take over 12 hours to run.

This MR includes two primary changes to aggregation:

  1. Instead of handling all output groups at once it handles them in batches (of 2 groups at a time currently). This means we no longer have to allocate all output arrays simultaneously which was the primary reason for the large memory footprint.

  2. Last time I investigated aggregation performance I put in an improvement that utilized Pixel_Counts to limit the read size on gridded granule files. This didn't help, however, when Pixel_Counts is not available which is the case when only histograms are requested. This MR adds a global attribute to gridded granule files (called granule_edges) that indicates the subset of the global grid that actually contains data.

I tested this change on a daily aggregation and memory usage was down under 10GB and run time was about 1.5 hours.

Merge request reports