Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
Parallel Segments (pseg) Demo
=================
This demo specifies a `clavrx_options` and `file_list`
You can change these however you like, it doesn't even need to be GOES.
I recommend first testing your configuration by running `./clavrxorb`
This demo runs as fast as possible, meaning that there is no limit to the number of workers created.
Therefore you should only run it on a powerful machine with lots of memory (>50GB)
Also try tuning the segment size (`scan_lines`) in `clavrx_options`.
Smaller segments means more workers to run in parallel, but also more memory used.
Dependencies
------------
This demo is very light on dependencies, but requires a non-ancient Python version (>=3.7) and numpy.
The other major dependency is clavrx, which must support tracing and reopening the netcdf every segment.
This was implemented in commit c1809117028f on 2022-11-14, so any newer version of clavrx is supported.
This demo also requires the `nm` tool to read the symbol table of the executable, though the values could be hardcoded if necessary.
Invocation
----------
### Parallel Segments
`python run_pseg.py` is a wrapper that executes `./clavrxorb`
On SSEC machines the basic miniconda environment satisfies all requirements
`module load miniconda/3.7-base`
`/usr/bin/time -v python run_pseg.py`
Pseg will redirect clavrx output to files in the cwd and periodically display the process tree to help monitor progress.
### Normal clavrx
The clavrx executable can be used normally too
`./clavrxorb`
Performance
-------------
Note that memory usage is difficult to measure in the parallel case.
You need cgroups to accurately measure the physical set size (pss), which is what slurm checks.
For the most part you can add up the resident set size (rss) for all of the concurrent clavrxorbs.
### Parallel Segments
```
Command being timed: "python run_pseg.py"
User time (seconds): 2097.66
System time (seconds): 326.85
Percent of CPU this job got: 660%
Elapsed (wall clock) time (h:mm:ss or m:ss): 6:06.94
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 7512888
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 4
Minor (reclaiming a frame) page faults: 91394621
Voluntary context switches: 35157
Involuntary context switches: 4777
Swaps: 0
File system inputs: 15332856
File system outputs: 649152
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
```
### Normal
```
Command being timed: "./clavrxorb"
User time (seconds): 1850.83
System time (seconds): 35.67
Percent of CPU this job got: 96%
Elapsed (wall clock) time (h:mm:ss or m:ss): 32:28.49
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 12181464
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 10085835
Voluntary context switches: 10582
Involuntary context switches: 1286
Swaps: 0
File system inputs: 4638016
File system outputs: 568936
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
```
How it works
------------
If clavrx is executed with environmental variable `CLAVRX_ENABLE_TRACER=1` it runs in "tracing mode".
This means that at certain points during the processing it will stop, and another process must signal it with SIGCONT.
Another environmental variable `CLAVRX_TRACER_CLONES=N` will create N clones each processing segment.
This happens after the L1b data is read in, but before processing starts.
So, with clavrx in tracing mode and set to make a single clone every segment, we can dedicate a single clone to reading the L1b and fork a worker clone for each segment in parallel.
### Slides
[Parallel Segments with Clavrx Tracer](https://docs.google.com/presentation/d/1pV3MD5wOQUVJsM9lisW1UyKNGT1ygb3PtH6lDOQSMn0/edit?usp=sharing)
[How the CLAVR-x Tracer System Works](https://docs.google.com/presentation/d/1tpFyA5_hQfRg3RB7-AelzXZnUhMfa5gMCqXDWc8-dRU/edit?usp=sharing)
More Notes
---------
* the responsibility of writing the netcdf has returned to clavrx