New 'Scavenger' Queue Does Not Count Against Standard HPCC Limits
On May 11th, 2022, ICER deployed a new 'scavenger' queue that allows users to run preemptable jobs on idle cores. Jobs running in this queue are not limited by the regular queues’ limits on running jobs and do not count against yearly usage totals, but they may be canceled at any time to allow general or buy-in jobs to run. The general and buy-in queues still function as-is, and no change is required for users who do not want to use this new queue.
With few exceptions, each researcher using the HPCC is limited to running up to 520 jobs or 1040 cores at one time. Annually, non-buyin users are limited to a total of 500,000 CPU hours and 10,000 GPU hours. These limits do not apply to jobs submitted to the scavenger queue. Jobs in this queue can start on resources that would otherwise be left idle, improving research throughput. Similar to jobs submitted to the general-long queue, these jobs can request up to a 7-day wall time; however, jobs in the scavenger queue may be interrupted if resources are required for other non-scavenger jobs. The default behavior for interrupted jobs is to be re-queued, but users can opt for cancellation if it is more conducive to their workflow. We recommend that only users who can checkpoint and restart or have a workflow implemented that can manage jobs being canceled or requeued use this new queue.
To submit jobs to this queue, add ‘#SBATCH --qos=scavenger’ to your job script. If you would like help understanding if your workflow is a good match for this system or if you have any questions, please contact us at https://contact.icer.msu.edu.