Difference: PBSJobScheduler (7 vs. 8)

Revision 82015-08-20 - dpane

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Deleted:
<
<

 

PBS Batch Job Scheduler

What is PBS? The Portable Batch System, is software that performs job scheduling. Its main task is to allocate computational resources to the available computing resources within a cluster and offers ways to monitor and control the workload of available resources in a fair way. With PBS, jobs can be scheduled for execution on according to scheduling policies that attempt to fully utilize system resources without over committing those resources, while being fair to all users. For more information about PBS, see the online manual page, which can be viewed by executing the command:

Line: 25 to 23
 
  • dsi - This queue maps to the nodes that were the dsi nodes on the hawk cluster. This queue is reserved for Timothy Verstynen's Lab. Verstynen lab members should consider using this queue instead of the default queue to keep the default queue open to other users.
  • default - The is the default queue which maps to all nodes that were the CNBC nodes on the hawk cluster and the psycho nodes before the two clusters were merged. It is available to all cluster users. As can be assumed by it's name, when no queue is specified, your job will be submitted to this queue by default.
  • loprio - This queue is the low priority queue. The queue encompasses all nodes on the cluster. This queue is available to all users. It is low priority in that jobs in the various higher priority queues (plaut, dsi and default) will take priority over the jobs in this queue. (jobs in this queue will have to wait for those other jobs to be scheduled first). There is also a limit to the wall time of this queue. This is to ensure that users submitting jobs to the higher priority queues don't have to wait too long to have their jobs scheduled.
Changed:
<
<
>
>
Format for submitting a job (1 core on 1 node):
qsub <script name>
 
Format for requesting a particular queue:
Changed:
<
<
qsub -I -X -l nodes=1:ppn=1 -q <queuename>  
Example command for a request for an interactive session on the low priority queue:
qsub -I -X -l nodes=1:ppn=1 -q loprio 
>
>
qsub  -q <queuename>  <script name>
Example command for submitting a job on the low priority queue:
qsub -q loprio  <script name>

Example command for a request for an interactive session, xsession, on the low priority queue:
qsub -I -X -q loprio 
 

Reasonable Usage Policies for Running Jobs

Changed:
<
<
There are no configuration setting preventing users from submitting jobs to any of the queues on the system. It is expected that users will honor the restrictions and get permission to submit to queues that ordinarily they wouldn't be privileged to submit jobs to.
>
>
There are configuration setting preventing users from submitting jobs to any of the queues on the system. It is expected that users will get permission from the PI to submit to queues that ordinarily they wouldn't be privileged to submit jobs to. Once they user receives permission, please email me and copy the PI so you can be added to the queue.
  PBS Job Scheduler can do a very good job at fairly distributing the work load of jobs of multiple users. But, it has some constraints. If one user submits a large number of long (more than 4 hrs) jobs that will fill most or all of the nodes available, other users will have to wait at least that long before their job is scheduled. We need your cooperation so that you are not responsible for "taking over the cluster" and preventing others from fairly having access to nodes to do their work.
Here are a couple recommendations:
Line: 122 to 125
 
  • qdel -p <job_id>:Purges job from queue, but does not do any cleanup).
  • qsig -s 0 <job_id>: Sends job process a signal to exit.
How to Cleanup the processes
Changed:
<
<
  • tentakel -g compute 'killall -9 -u <userid>: Command to kill all <userid> processes on the compute nodes.
  • tentakel -g compute 'killall -9 -u <userid>: Verifies whether any orphaned processes are running.
>
>
  • rocks run host compute 'killall -9 -u <userid>: Command to kill all <userid> processes on the compute nodes.
  • rocks run host compute 'killall -9 -u <userid>: Verifies whether any orphaned processes are running.
 
Job troubleshooting
  • diagnose -j <jobid>: This will provide information about why a job won't run.
  • tracejob: Provides historical data about a job.
Line: 148 to 151
  Standard error and standard output are printed to the terminal. This can be redirected to a file or piped to another command using Unix shell syntax.
Some use cases for interactive sessions
Changed:
<
<
  1. Interactive Matlab sessions (for debugging or analysis).
  2. Anything requiring a Graphical User Interface (GUI).
>
>
  1. Interactive Matlab sessions (for debugging or analysis) Use the -X option to request an X session.
  2. Anything requiring a Graphical User Interface (GUI). Use the -X option to request an X session.
 
  1. Compiling code of large projects that are resource intensive.
Changed:
<
<
  1. Any other resource intensive interactive task (to avoid using CPU cycles on the headnode)
>
>
  1. Any other resource intensive interactive task (to avoid using CPU cycles on the headnode)
 

Wall Time

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback