From time to time the OneFS Job Engine can get…stuck. The official recommend from DellEMC Support can be to start a new jobdb (after making a backup, of course). But what happens if you need to reference some of the old job reports after you create the new jobdb?
Problem
When the OneFS Job Engine and/or its Job Report database goes bad, often times the solution is simple: move the old db out of the way and have the system create a new one.
In my experience, this has proved a workable solution. BUT, it obviously comes with a major caveat: you can’t (easily) pull reports from old jobs.
Solution
Save out the reports before moving the db aside
Personally, I like to store all cluster data, troubleshooting output, temporary working files, etc. in specific directories within “/ifs/data/Isilon_Support”.
When building a new cluster, one of those directories—and the one I’ll target here—is “job_engine_reports” (I also make directories for pcaps, ACL examples, and then each SR gets its own directory…).
To grab the full report list of the current, active Job Engine, I run this simple command:
# for i in $(isi job reports list --no-header --no-footer | awk '{print $2}');do isi job reports view $i ;done >> /ifs/data/Isilon_Support/job_engine_reports/$(date +"%Y%m%d_%H%M")-jobreports.txt
Once I have the Job Report output saved out (AND I’ve confirmed as such), we can stop the Job Engine, move the old db aside (never delete! Just move.), and start the Job Engine back up.