College Park, MD and Boulder, CO MRMS feeds
College Park is the primary producer of MRMS 2D severe weather grids (grib2s) for ProbSevere. Boulder is a backup. The SSEC DC pulls data from both sources, so that we will always have data when NCEP/MRMS does a failover...which happens maybe once every couple months...sometimes more frequently.
The DC machine, usgeo1, receives the data and sends them via LDM to viper, gusto, etc. The LDM user on these machines has a script (scripts/mrms_process.bash) that checks /data/common/PROBSEV_DATA/radar/raw/grib2 for the incoming grib2s, since this is where the probsevere user moves the files once written by ldm to /data/common/PROBSEV_DATA/radar/raw/incoming. If the hhmmfilename grib2 doesn't exist, then LDM user will go ahead and write what is in 'buffer'. If it does exist, (this means the other stream already put a hhmm file to /incoming), the file/buffer is simply deleted.
A problem arises when files arrive from the two different streams at about the same time. Then to instances of mrms_process.bash are executed, both can't find the file, so both files are written (with different hhmmss). Then, when the probsevere user farms off two jobs, one usually stomps on the other, and we get garbage data, which causes an outage. This generally happens 0.5 - 1 time a day. You can look in /data/common/PROBSEV_DATA/radar/raw to see fml files that failed to be moved (these are instances where this happened).
We need to figure out a way to 1) know which stream a file is coming from (College Park or Boulder), 2) know which is the "operational stream" and 3) put a sleep, or "hold" on a backup stream.