Parallel Segments (pseg) Demo
This demo specifies a clavrx_options
and file_list
You can change these however you like, it doesn't even need to be GOES.
I recommend first testing your configuration by running ./clavrxorb
This demo runs as fast as possible, meaning that there is no limit to the number of workers created.
Therefore you should only run it on a powerful machine with lots of memory (>50GB)
Also try tuning the segment size (scan_lines
) in clavrx_options
.
Smaller segments means more workers to run in parallel, but also more memory used.
Dependencies
This demo is very light on dependencies, but requires a non-ancient Python version (>=3.7) and numpy.
The other major dependency is clavrx, which must support tracing and reopening the netcdf every segment. This was implemented in commit c1809117028f on 2022-11-14, so any newer version of clavrx is supported.
This demo also requires the nm
tool to read the symbol table of the executable, though the values could be hardcoded if necessary.
Invocation
Parallel Segments
python run_pseg.py
is a wrapper that executes ./clavrxorb
On SSEC machines the basic miniconda environment satisfies all requirements
module load miniconda/3.7-base
/usr/bin/time -v python run_pseg.py
Pseg will redirect clavrx output to files in the cwd and periodically display the process tree to help monitor progress.
Normal clavrx
The clavrx executable can be used normally too
./clavrxorb
Performance
Note that memory usage is difficult to measure in the parallel case. You need cgroups to accurately measure the physical set size (pss), which is what slurm checks. For the most part you can add up the resident set size (rss) for all of the concurrent clavrxorbs.
Parallel Segments
Command being timed: "python run_pseg.py"
User time (seconds): 2097.66
System time (seconds): 326.85
Percent of CPU this job got: 660%
Elapsed (wall clock) time (h:mm:ss or m:ss): 6:06.94
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 7512888
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 4
Minor (reclaiming a frame) page faults: 91394621
Voluntary context switches: 35157
Involuntary context switches: 4777
Swaps: 0
File system inputs: 15332856
File system outputs: 649152
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Normal
Command being timed: "./clavrxorb"
User time (seconds): 1850.83
System time (seconds): 35.67
Percent of CPU this job got: 96%
Elapsed (wall clock) time (h:mm:ss or m:ss): 32:28.49
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 12181464
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 10085835
Voluntary context switches: 10582
Involuntary context switches: 1286
Swaps: 0
File system inputs: 4638016
File system outputs: 568936
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
How it works
If clavrx is executed with environmental variable CLAVRX_ENABLE_TRACER=1
it runs in "tracing mode".
This means that at certain points during the processing it will stop, and another process must signal it with SIGCONT.
Another environmental variable CLAVRX_TRACER_CLONES=N
will create N clones each processing segment.
This happens after the L1b data is read in, but before processing starts.
So, with clavrx in tracing mode and set to make a single clone every segment, we can dedicate a single clone to reading the L1b and fork a worker clone for each segment in parallel.
Slides
Parallel Segments with Clavrx Tracer
How the CLAVR-x Tracer System Works
More Notes
- the responsibility of writing the netcdf has returned to clavrx