Parallelism¶
QCG-PilotJob Manager can handle jobs that require more than a single core. The number of required cores and nodes
is specified with numCores
and numNodes
parameter of Jobs.add
method. The number of required resources
can be specified either as specific values or as a range of resources (with minimum and maximum values), where
QCG-PilotJob Manager will try to assign as much resources from those available in the moment.
The environment of parallel job is prepared for MPI or OpenMP jobs.
MPI¶
Running MPI programs on HPC systems can be a complex process, as it depends on chosen MPI implementation (OpenMPI,
IntelMPI) and system configuration. Some sites supports launching MPI programs directly with scheduling system client
srun
, but on other ones such applications should be launched with standard mpirun
command. To get a proper
process binding to the specific cores is even harder, especially where programs are launched with mpirun
command.
To support running MPI applications, QCG-PilotJob Manager implements different execution models. The detailed description
about those models can be found in Execution models section. In following example we are using the default model,
where only single process is started by QCG-PilotJob Manager which is typically script that calls mpirun
or
mpiexec
command. All the environment for the parallel job, such as hosts file, and environment variables are
prepared by QCG-PilotJob Manager. For example to run Quantum Espresso application, the example program may look like this:
from qcg.pilotjob.api.manager import LocalManager
from qcg.pilotjob.api.job import Jobs
manager = LocalManager()
jobs = Jobs().add(
name='qe-example',
exec='mpirun',
args=['pw.x'],
stdin='pw.benzene.scf.in',
stdout='pw.benzene.scf.out',
modules=['espresso/5.3.0', 'mkl', 'impi', 'mpich'],
numCores=8)
job_ids = manager.submit(jobs)
manager.wait4(job_ids)
manager.finish()
As we can see in the example, we run a single program mpirun
which is responsible for setup a proper, parallel
environment for the destination program and spawn the Quantum Espresso executables (pw.x
).
In the example program we used some additional options of Jobs.add
method:
stdin
- points to the file that content should be sent to job’s standard inputmodules
- environment modules that should be loaded before job startnumCores
- how much cores should be allocated for the job
The JSON job description file for the same example is presented below:
[
{
"request": "submit",
"jobs": [
{
"name": "qe-example",
"execution": {
"exec": "mpirun",
"args": ["pw.x"],
"stdin": "pw.benzene.scf.in",
"stdout": "pw.benzene.scf.out",
"modules": ["espresso/5.3.0", "mkl", "impi", "mpich"]
},
"resources": {
"numCores": { "exact": 8 }
}
}
]
},
{
"request": "control",
"command": "finishAfterAllTasksDone"
}
]
OpenMP¶
For OpenMP programs (shared memory parallel model), where there is one process that spawns many threads on the same
node, we need to use special option model
with threads
value.
To test execution of OpenMP program we need to compile a sample application:
$ wget https://computing.llnl.gov/tutorials/openMP/samples/C/omp_hello.c
$ gcc -Wall -fopenmp -o omp_hello omp_hello.c
Now we can launch this application with QCG-PilotJob Manager:
from qcg.pilotjob.api.manager import LocalManager
from qcg.pilotjob.api.job import Jobs
manager = LocalManager()
jobs = Jobs().add(
name='openmp-example',
exec='omp_hello',
stdout='omp.out',
model='threads',
numCores=8,
numNodes=1)
job_ids = manager.submit(jobs)
manager.wait4(job_ids)
manager.finish()
The omp.out
file should contain eight lines with Hello world from thread =. It is worth to remember, that OpenMP
applications can operate only on single node, so adding numNodes=1
might be necessary in case where there are more
than single node in available resources.
The equivalent JSON job description file for given example is presented below:
[
{
"request": "submit",
"jobs": [
{
"name": "openmp-example",
"execution": {
"exec": "omp_hello",
"stdout": "omp.ou",
"model": "threads"
},
"resources": {
"numCores": { "exact": 8 },
"numNodes": { "exact": 1 }
}
}
]
},
{
"request": "control",
"command": "finishAfterAllTasksDone"
}
]