The Glue Code
After package delivery we can access package information from ipython
using the delivered_software
method of the
from flo.sw.lib.glutil import delivered_software
delivered_software.lookup('hirs2nc', delivery_id='20180410-1')
which gives the output
Delivery(id='20180410-1', name='hirs2nc', version='1.0.0', path='/mnt/deliveredcode/deliveries/hirs2nc/20180410-1')
We also need to install the following packages:
pip install -i https://sips.ssec.wisc.edu/eggs -U flo rados
pip install -i https://sips.ssec.wisc.edu/eggs -U sipsprod glutil
pip install -i https://sips.ssec.wisc.edu/eggs -U timeutil
pip install -i https://sips.ssec.wisc.edu/eggs -U simple
Local Deployment of python glue code
The glue code for hirs2nc
can be started by creating the file source/flo/__init__.py
in ~/code/PeateScience/packages/hirs2nc
. The glue code can then be linked into the local execution directory:
cd ~/code/PeateScience/local/dist/hirs2nc
ln -s ../../../packages/hirs2nc/source/flo ./
Local Processing
str_import_hirs2nc="import os; from timeutil import TimeInterval, datetime, timedelta; os.chdir('$repo/hirs2nc'); import example_local_prepare; os.chdir('$work_dir')"
python -c "$str_import_hirs2nc; granule = datetime(2015, 2, 1, 0, 15); interval = TimeInterval(granule, granule+timedelta(seconds=0)); example_local_prepare.local_execute_example(interval, '$satellite', '$hirs2nc_id', skip_prepare=False, skip_execute=False, verbosity=2)"
python -c "$str_import_hirs2nc; granule = datetime(2015, 2, 1, 0, 15); interval = TimeInterval(granule, granule+timedelta(minutes=0)); example_local_prepare.print_contexts(interval, '$satellite', '$hirs2nc_id', verbosity=2)"
Cluster Processing
Deploy the glue code to the user account cluster
We import the flo3
interface python code for hirs2nc
into the software tree
by running rsync
sudo su - flo
cd /mnt/software/geoffc
mv hirs2nc hirs2nc_old
rsync -urLv /home/geoffc/code/PeateScience/local/dist/hirs2nc . --progress --exclude=.*.sw*
Deploy the glue code to the development (flo) account cluster
We import the flo3
interface python code for hirs2nc
into the software tree
by changing to the flo
account and running rsync
sudo su - flo
cd /mnt/software/flo/
mv hirs2nc hirs2nc_old
rsync -urLv /home/geoffc/code/PeateScience/local/dist/hirs2nc . --progress --exclude=.*.sw*
Commit glue code to PeateScience repo
The actual glue code was copied to /mnt/software
in the last step, but pushing the hirs2nc
python code to the PeateScience
repo will provide the submission scripts
and submit_hirs2nc.py
for use on condor.
cd ~/code/PeateScience
git pull
git add ~/code/PeateScience/packages/hirs2nc
git commit hirs2nc -m "Initial commit of the hirs2nc package."
git push
Running the hirs2nc code on the cluster
We can now submit hirs2nc
to the cluster from condor, on the development (flo
) account:
sudo su - flo
cd /home/geoffc/hirs2nc/work/
$ python /home/geoffc/code/PeateScience/packages/hirs2nc/submit_hirs2nc.py
(INFO):submit_hirs2nc.py:<module>:30: Submitting intervals...
(INFO):submit_hirs2nc.py:<module>:32: Submitting interval 2015-04-17 14:36:00 -> 2015-04-17 14:36:59
(INFO):submit_hirs2nc.py:<module>:36: There are 1 contexts in this interval
{'satellite': 'snpp', 'version': '1.0dev0', 'granule': datetime.datetime(2015, 4, 17, 14, 36)}
(INFO):submit_hirs2nc.py:<module>:42: First context: {'satellite': 'snpp', 'version': '1.0dev0', 'granule': datetime.datetime(2015, 4, 17, 14, 36)}
(INFO):submit_hirs2nc.py:<module>:43: Last context: {'satellite': 'snpp', 'version': '1.0dev0', 'granule': datetime.datetime(2015, 4, 17, 14, 36)}
(INFO):submit_hirs2nc.py:<module>:44: xrange(86694864, 86694865)
We can keep track of running jobs by doing the various incantations:
sudo su - flo
condor_q -autoformat FloClusterComputations | sort | uniq -c
condor_q -constraint 'FloClusterComputations=="flo.sw.hirs2nc:HIRS2NC"' -constraint 'Owner=="flo"'
condor_q -autoformat FloClusterComputations Owner ClusterID ProcID
condor_q -format '%d' ClusterId -format '.%d\n' ProcId
condor_q -constraint 'FloClusterComputations=="flo.sw.hirs2nc:HIRS2NC"' -format '%d' ClusterId -format '.%d\n' ProcId
To look at the log files of a particular job(s)
run -e /home/geoffc/git/sips_utils/snippets.py
job_range = (86694864, 86694865)
job_file_branches = [job_number_to_dir('/scratch/flo/jobs',job) for job in range(*job_range)]
if len(job_file_branches)>1:
job_stdout_files = list(np.squeeze([glob(dir+'-stdout') for dir in job_file_branches]))
job_stderr_files = list(np.squeeze([glob(dir+'-stderr') for dir in job_file_branches]))
job_stdout_files = list([glob(dir+'-stdout') for dir in job_file_branches][0])
job_stderr_files = list([glob(dir+'-stderr') for dir in job_file_branches][0])
In order to check the database for the hirs2nc output
flo_user="-d postgresql://flo3@ratchet.sips/flo3"
psql $flo_user -c "SELECT job,size,context,file_name from stored_products where computation='flo.sw.hirs2nc:HIRS2NC' and context->'satellite'='''$satellite''' and context->'hirs2nc_delivery_id'='''20180410-1''' order by file_name;"
job | size | context | file_name
105198323 | 1834536 | "granule"=>"datetime.datetime(2013, 5, 20, 0, 29)", "satellite"=>"'metop-b'", "hirs2nc_delivery_id"=>"'20180410-1'" | NSS.HIRX.M1.D13140.S0029.E0127.B0347172.SV.nc
To group granules by day/month etc...
psql $flo_user -c "SELECT date_trunc('$timeunit',pydt(context->'granule')) as m,count(*) from stored_products where computation='flo.sw.hirs2nc:HIRS2NC' and context->'satellite'='''$satellite''' and context->'hirs2nc_delivery_id'='''20180410-1''' group by m order by m"
To select granules which match or are between certain dates:
psql $flo_user -c "SELECT job,size,context,file_name from stored_products where computation='flo.sw.hirs2nc:HIRS2NC' and context->'satellite'='''$satellite''' and context->'hirs2nc_delivery_id'='''20180410-1''' and date_trunc('days',pydt(context->'granule'))='2015-01-01' order by file_name;"
psql $flo_user -c "SELECT job,size,context,file_name from stored_products where computation='flo.sw.hirs2nc:HIRS2NC' and context->'satellite'='''$satellite''' and context->'hirs2nc_delivery_id'='''20180410-1''' and date_trunc('days',pydt(context->'granule'))>'2015-01-01' and date_trunc('days',pydt(context->'granule'))<'2015-01-03' order by file_name;"
To remove old files:
psql $flo_user -c "SELECT job, size, context, file_name FROM stored_products WHERE computation='flo.sw.hirs2nc:HIRS2NC' and context->'satellite'='''$satellite''' and context->'hirs2nc_delivery_id'='''20180410-1''' order by file_name" | less
psql $flo_user -c "DELETE FROM stored_products WHERE computation='flo.sw.hirs2nc:HIRS2NC' and context->'satellite'='''$satellite''' and context->'hirs2nc_delivery_id'='''20180410-1'''"
Other Database Querys
psql $flo_user -c "SELECT pydt(context->'granule') as d,count(*) FROM stored_products WHERE computation='flo.sw.hirs2nc:HIRS2NC' and context->'satellite'='''$satellite''' and context->'hirs2nc_delivery_id'='''20180410-1''' group by d order by d order by file_name;" | less
psql $flo_user -c "SELECT pydt(context->'granule') as d,count(*) FROM stored_products WHERE computation='flo.sw.hirs2nc:HIRS2NC' and context->'satellite'='''$satellite''' and context->'hirs2nc_delivery_id'='''20180410-1''' and date_trunc('days',pydt(context->'granule'))='2014-01-01' group by d order by d;" | less
psql $flo_user -c "SELECT job,size,context,file_name from stored_products where computation='flo.sw.hirs2nc:HIRS2NC' and context->'satellite'='''$satellite''' and context->'hirs2nc_delivery_id'='''20180410-1''' order by file_name;" | less
psql $flo_user -c "SELECT job,size,context,file_name from stored_products where computation='flo.sw.hirs2nc:HIRS2NC' and context->'satellite'='''$satellite''' and context->'hirs2nc_delivery_id'='''20180410-1''' and date_trunc('days',pydt(context->'granule'))='2014-01-01' order by file_name;" | less
# List files keys
psql $flo_user -tA -c "SELECT format ('flo3/%s/%s',job,file_name) FROM stored_products WHERE computation='flo.sw.hirs2nc:HIRS2NC' and context->'satellite'='''$satellite''' and context->'hirs2nc_delivery_id'='''20180410-1''' order by file_name limit 5;"
# List file keys and status
psql $flo_user -tA -c "SELECT format ('flo3/%s/%s',job,file_name) FROM stored_products WHERE computation='flo.sw.hirs2nc:HIRS2NC' and context->'satellite'='''$satellite''' and context->'hirs2nc_delivery_id'='''20180410-1''' order by file_name limit 5;" | xargs -n1 -IXX rados -p dev --id flo stat XX
# List file key basenames
psql $flo_user -tA -c "SELECT format ('flo3/%s/%s',job,file_name) FROM stored_products WHERE computation='flo.sw.hirs2nc:HIRS2NC' and context->'satellite'='''$satellite''' and context->'hirs2nc_delivery_id'='''20180410-1''' order by file_name limit 5;" | xargs -n1 -IXX basename XX
# List the rados commands to download files using the database file keys.
psql $flo_user -tA -c "SELECT format ('flo3/%s/%s',job,file_name) FROM stored_products WHERE computation='flo.sw.hirs2nc:HIRS2NC' and context->'satellite'='''$satellite''' and context->'hirs2nc_delivery_id'='''20180410-1''' order by file_name limit 5;" | xargs -n1 -IXX echo rados -p dev --id flo get XX "~/hirs2nc/work/links/"$(basename XX)
# rados commands
rados -p dev --id flo get flo3/91069111/VNP02FSN.A2015091.0000.001.2018025170339.nc VNP02FSN.A2015091.0000.001.2018025170339.nc
Comparing Cluster Results with Test database
Running in Forward Stream
Examining log files of failed jobs
Generate a list of jobnumbers for failed jobs:
psql $flo_user -c "SELECT job, context FROM failed_jobs WHERE head_computation='flo.sw.hirs2nc:HIRS2NC' and context->'version'='''1.0dev1''' and timestamp > '2018-01-30' order by context;" | grep granule | gawk '{print $1}' > hirs2nc_v1.0dev1_failed_granules.txt
file_obj = open('hirs2nc_v1.0dev1_failed_granules.txt','r')
jobnums = file_obj.readlines()
jobnums = [int(x) for x in jobnums]
run -e /mnt/sdata/geoffc/git/sips_utils/snippets.py
job_file_branches = [job_number_to_dir('/scratch/flo/jobs',job) for job in jobnums]
job_stdout_files = list(np.squeeze([glob(dir+'-stdout') for dir in job_file_branches]))
job_stderr_files = list(np.squeeze([glob(dir+'-stderr') for dir in job_file_branches]))
for files in job_stdout_files:
result = search_logfile_for_string(files, 'input sounder_0')
if result != []:
result = search_logfile_for_string(files, 'Dateline granule')
if result != []:
for stdout_file, stderr_file in zip(job_stdout_files,job_stderr_files):
if os.path.isfile(stderr_file) and (os.stat(stderr_file).st_size > 0):
print('\n>>> stderr_file = {}'.format(stderr_file))
file_obj = open(stdout_file,'r')
for line in file_obj.readlines():
searchObj = re.search( r'Dateline granule', line, re.M)
if searchObj:
line = line.replace('\n','')
print('Checking {}: {}'.format(stdout_file, line))
print('Checking {}:'.format(stdout_file))
line = file_obj.readline()
line = os.pathbasename(line.replace('\n','').split(' ')[-1])
#print('stderr_file {} does not exist or has zero size.'.format(stderr_file))
except Exception:
print('There was a problem with stderr_file {}'.format(stderr_file))
print('stdout_file = {}'.format(stdout_file))
for stderr_file in [glob(dir+'-stdout') for dir in [job_number_to_dir('/scratch/flo/jobs',job) for job in range(77666696, 77667532)]]: check_call('tail -n 1 {}'.format(stderr_file[0]).split(' '))
logfile_obj = open(logpath,'w')
# Write the geocat output to a log file, and parse it to determine the output
# HDF4 files.
hdf_files = []
for line in exe_out.splitlines():
searchObj = re.search( r'geocat[LR].*\.hdf', line, re.M)
if searchObj:
hdf_files.append(string.split(line," ")[-1])
Downloading Results from the Cluster
export satellite="noaa-19"
satellite='noaa-17'; psql -d $flo_dbase -tA -c "select job from stored_products where context->'satellite'='''$satellite''' and computation='flo.sw.hirs_csrb_monthly:HIRS_CSRB_MONTHLY' and output='zonal_means' order by context" | xargs -n 1 flo_fetch -j