README.md

Parallel Segments (pseg) Demo
=================

This demo specifies a `clavrx_options` and `file_list`  
You can change these however you like, it doesn't even need to be GOES.  
I recommend first testing your configuration by running `./clavrxorb` 

This demo runs as fast as possible, meaning that there is no limit to the number of workers created.
Therefore you should only run it on a powerful machine with lots of memory (>50GB)  
Also try tuning the segment size (`scan_lines`) in `clavrx_options`.
Smaller segments means more workers to run in parallel, but also more memory used. 


Dependencies
------------

This demo is very light on dependencies, but requires a non-ancient Python version (>=3.7) and numpy.  

The other major dependency is clavrx, which must support tracing and reopening the netcdf every segment. 
This was implemented in commit c1809117028f on 2022-11-14, so any newer version of clavrx is supported.

This demo also requires the `nm` tool to read the symbol table of the executable, though the values could be hardcoded if necessary.

Invocation
----------


### Parallel Segments

`python run_pseg.py` is a wrapper that executes `./clavrxorb`

On SSEC machines the basic miniconda environment satisfies all requirements

`module load miniconda/3.7-base`

`/usr/bin/time -v python run_pseg.py`

Pseg will redirect clavrx output to files in the cwd and periodically display the process tree to help monitor progress.

### Normal clavrx

The clavrx executable can be used normally too

`./clavrxorb`


Performance
-------------

Note that memory usage is difficult to measure in the parallel case.
You need cgroups to accurately measure the physical set size (pss), which is what slurm checks.
For the most part you can add up the resident set size (rss) for all of the concurrent clavrxorbs.

### Parallel Segments

```
        Command being timed: "python run_pseg.py"
        User time (seconds): 2097.66
        System time (seconds): 326.85
        Percent of CPU this job got: 660%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 6:06.94
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 7512888
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 4
        Minor (reclaiming a frame) page faults: 91394621
        Voluntary context switches: 35157
        Involuntary context switches: 4777
        Swaps: 0
        File system inputs: 15332856
        File system outputs: 649152
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
```


### Normal

```
        Command being timed: "./clavrxorb"
        User time (seconds): 1850.83
        System time (seconds): 35.67
        Percent of CPU this job got: 96%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 32:28.49
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 12181464
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 10085835
        Voluntary context switches: 10582
        Involuntary context switches: 1286
        Swaps: 0
        File system inputs: 4638016
        File system outputs: 568936
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
```

How it works
------------

If clavrx is executed with environmental variable `CLAVRX_ENABLE_TRACER=1` it runs in "tracing mode".
This means that at certain points during the processing it will stop, and another process must signal it with SIGCONT.
Another environmental variable `CLAVRX_TRACER_CLONES=N` will create N clones each processing segment.
This happens after the L1b data is read in, but before processing starts.
So, with clavrx in tracing mode and set to make a single clone every segment, we can dedicate a single clone to reading the L1b and fork a worker clone for each segment in parallel. 

### Slides

[Parallel Segments with Clavrx Tracer](https://docs.google.com/presentation/d/1pV3MD5wOQUVJsM9lisW1UyKNGT1ygb3PtH6lDOQSMn0/edit?usp=sharing)  
[How the CLAVR-x Tracer System Works](https://docs.google.com/presentation/d/1tpFyA5_hQfRg3RB7-AelzXZnUhMfa5gMCqXDWc8-dRU/edit?usp=sharing)


More Notes
---------

* the responsibility of writing the netcdf has returned to clavrx