... | ... | @@ -34,7 +34,7 @@ git push |
|
|
We can now submit `fusion_matlab` to the cluster from condor, on the development (`flo`) account:
|
|
|
```bash
|
|
|
sudo su - flo
|
|
|
cd /home/geoffc/fusion_matlab/work/
|
|
|
cd /data/geoffc/vaerdt/cluster_processing/logs
|
|
|
|
|
|
$ python /home/geoffc/code/PeateScience/packages/fusion_matlab/submit_fusion_matlab.py
|
|
|
(INFO):submit_fusion_matlab.py:<module>:30: Submitting intervals...
|
... | ... | @@ -54,6 +54,10 @@ condor_q -autoformat FloClusterComputations Owner ClusterID ProcID |
|
|
condor_q -format '%d' ClusterId -format '.%d\n' ProcId
|
|
|
condor_q -constraint 'FloClusterComputations=="flo.sw.fusion_matlab:FUSION_MATLAB"' -format '%d' ClusterId -format '.%d\n' ProcId
|
|
|
```
|
|
|
We can change the job mods of running jobs by doing...
|
|
|
```bash
|
|
|
condor_qedit -const 'FloclusterComputations == "flo.sw.fusion_matlab:FUSION_MATLAB"' RequestMemory 6000
|
|
|
```
|
|
|
To look at the log files of a particular job(s)
|
|
|
```python
|
|
|
run -e /home/geoffc/git/sips_utils/snippets.py
|
... | ... | @@ -215,23 +219,31 @@ update forward_streams set find_contexts_arguments = find_contexts_arguments || |
|
|
```
|
|
|
|
|
|
## Examining log files of failed jobs
|
|
|
|
|
|
The details o failed jobs can be found from
|
|
|
|
|
|
```sql
|
|
|
psql $flo_user -c "select * from failed_jobs where head_computation = 'flo.sw.fusion_matlab:FUSION_MATLAB' and context->'version'='''1.0dev10''' and timestamp > '2018-01-30';"
|
|
|
```
|
|
|
psql $flo_user -c "select * from failed_jobs where head_computation = 'flo.sw.fusion_matlab:FUSION_MATLAB' and context->'version'='''1.0dev1''' and timestamp > '2018-01-30';"
|
|
|
```
|
|
|
|
|
|
Generate a list of jobnumbers for failed jobs:
|
|
|
|
|
|
```sql
|
|
|
psql $flo_user -c "SELECT job, context FROM failed_jobs WHERE head_computation='flo.sw.fusion_matlab:FUSION_MATLAB' and context->'version'='''1.0dev10''' and timestamp > '2018-01-30' order by context;" | grep granule | gawk '{print $1}' > fusion_matlab_v1.0dev10_failed_granules.txt
|
|
|
```
|
|
|
psql $flo_user -c "SELECT job, context FROM failed_jobs WHERE head_computation='flo.sw.fusion_matlab:FUSION_MATLAB' and context->'version'='''1.0dev1''' and timestamp > '2018-01-30' order by context;" | grep granule | gawk '{print $1}' > fusion_matlab_v1.0dev1_failed_granules.txt
|
|
|
```
|
|
|
|
|
|
Read a file containing the job numbers of failed jobs, and do something with them...
|
|
|
|
|
|
```python
|
|
|
file_obj = open('fusion_matlab_v1.0dev1_failed_granules.txt','r')
|
|
|
file_obj = open('fusion_matlab_v1.0dev10_failed_granules.txt','r')
|
|
|
jobnums = file_obj.readlines()
|
|
|
file_obj.close()
|
|
|
jobnums = [int(x) for x in jobnums]
|
|
|
|
|
|
run -e /mnt/sdata/geoffc/git/sips_utils/snippets.py
|
|
|
|
|
|
_ = [os.system('cat {} >> test.log; echo ">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>" >> test.log'.format(job_number_to_dir('/scratch/flo/jobs', job_num,suffix='-stdout'))) for job_num in jobnums]
|
|
|
|
|
|
job_file_branches = [job_number_to_dir('/scratch/flo/jobs',job) for job in jobnums]
|
|
|
job_stdout_files = list(np.squeeze([glob(dir+'-stdout') for dir in job_file_branches]))
|
|
|
job_stderr_files = list(np.squeeze([glob(dir+'-stderr') for dir in job_file_branches]))
|
... | ... | |