Execution models
The QCG-PJM service manages individual cores, so it assigns specific cores to the tasks. From the performance perspective, binding tasks to the cores is more efficient as it separates tasks from each other.
Note
To support CPU binding, the service must have information about physical available cores in the system. This information is provided by SLURM in a created allocation, but it is not available in case of logical resources, i.e. where user defines virtual cores and nodes. So currently the binding is supported only when the QCG-PJM service is run inside a SLURM allocation.
The binding of single core tasks is achieved with:
Custom Launching Agent, that uses
tasksetcommand,SLURM built-in mechanism based on
--cpu-bindoption of thesruncommand.
Currently, the parallel tasks that require more than one core are launched
only by the srun or mpirun commands. The mask_cpu flag of the
srun’s `--cpu-bind parameter will contain the CPU masks for all
allocated nodes separated with the comma. When mpirun command is used to
launch parallel task, either the --rankfile parameter is used for OpenMPI
model or I_MPI_PIN_PROCESSOR_LIST environment variable for Intel MPI
model.
Additionally, for all tasks launched by the Slurm with binding supported, the QCG_PM_CPU_SET environment variable will be available and set with core identifiers separated with comma.
To support process affinity for different parallel applications, QCG-PJM supports different execution models. Currently the following models are available:
default- in this model only a single process is launched within the allocation, which is prepared based on task’s resource requirements, the allocation description can be found in environment variables, such as: -SLURM_NODELIST- list of allocated nodes separated by comma character -SLURM_NTASKS- total number of cores -SLURM_TASKS_PER_NODE- number of allocated cores on following nodes, where each element can be in a formNODE_NAME(a single core on a node) orNODE_NAME(xNUM_OF_CORES)(many cores on a single node) -QCG_PM_CPU_SET- list of core identifiers on following nodes separated by a comma characterthreads- is designed for running OpenMP tasks on a single node, the process is started with thesruncommand with the--cpus-per-taskparameter set according to a number of cores defined in resource requirementsopenmpi- the processes are started with thempiruncommand with rankfile created based on task’s resource requirementsintelmpi- the processes are started with thempiruncommand (from the IntelMPI distribution) with defined multiple components, where each component describing execution node, contains-hostelement andI_MPI_PIN_PROCESSOR_LISTarguments set according to the allocated resourcessrunmpi- the processes are started with thesruncommand with--cpu-bindparameter set according to the allocated resources; this model should be used only on sites that have penMPI/IntelMPI/Slurm environments configured properly.
The openmpi and srunmpi provide additional configuration options that that can be defined in the element model_opts. Currently the following options are supported:
mpirun(str) - path to the mpirun command that should be used to launch applications, if not defined the default command (should be in thePATHenvironment variable) is usedmpirun_args(list) - additional arguments that should be passed to the mpirun command
We recommend usage of srunmpi model for MPI applications on HPC sites wherever srun is properly configured to
execute MPI codes.
Note
It is important to define for the intelmpi, srunmpi and openmpi models
appropriate IntelMPI/OpenMPI modules in executable/modules element of the job description
or to load them before staring QCG-PJM.
Examples
To use different MPI implementation applications in a single workflow we can define appropriate options
[
{
"request": "submit",
"jobs": [
{
"name": "intelmpi_task",
"execution": {
"exec": "/home/user/my_intelmpi_application",
"model": "intelmpi",
"model_opts": {
"mpirun": "/opt/exp_soft/local/skylake/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpirun"
},
"modules": [ "impi" ]
},
"resources": {
"numCores": {
"exact": 8
}
}
},
{
"name": "openmpi_task",
"execution": {
"exec": "/home/user/my_openmpi_application",
"model": "openmpi",
"model_opts": {
"mpirun": "/opt/exp_soft/local/skylake/openmpi/4.1.0_gcc620/bin/mpirun",
"mpirun_args": [ "--mca", "rmaps_rank_file_physical", "1" ]
},
"modules": [ "openmpi/4.1.0_gcc620" ]
},
"resources": {
"numCores": {
"exact": 8
}
}
}
]
}
]
With this input, QCG-PilotJob service will launch task’s intelmpi_task
application /home/user/my_intelmpi_application with the mpirun command path
/opt/exp_soft/local/skylake/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpirun
and additionally it will load impi module. The second task’s openmpi_task
application /home/user/my_openmpi_application will be launched with the command
/opt/exp_soft/local/skylake/openmpi/4.1.0_gcc620/bin/mpirun with additional
arguments --mca rmaps_rank_file_physical 1 and the module
openmpi/4.1.0_gcc620 loaded before the application’s start.
The description for the API looks similar:
jobs = Jobs()
jobs.add(name = 'intelmpi_task', exec = '/home/user/my_intelmpi_application', numCores = { 'exact': 4 }, model = 'intelmpi', model_opts = { 'mpirun': '/opt/exp_soft/local/skylake/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpirun' }, modules = [ 'impi' ])
jobs.add(name = 'openmpi_task', exec = '/home/user/my_openmpi_application', numCores = { 'exact': 4 }, model = 'openmpi', model_opts = { 'mpirun': '/opt/exp_soft/local/skylake/openmpi/4.1.0_gcc620/bin/mpirun', 'mpirun_args': ['--mca', 'rmaps_rank_file_physical', '1']}, modules = [ 'openmpi/4.1.0_gcc620' ])
Instead of compiled application, it is possible to use Bash script from which the application is called later. It gives us more possibilities to configure the environment for the application. For example using the following input description:
[
{
"request": "submit",
"jobs": [
{
"name": "openmpi_task",
"execution": {
"exec": "bash",
"args": [ "-l", "./app_script.sh" ],
"model": "openmpi",
},
"resources": {
"numCores": {
"exact": 8
}
}
}
]
}
]
The script app_script.sh could look like the following:
#!/bin/bash
module load openmpi/4.1.0_gcc620
/home/user/my_openmpi_application
Warning
It is important to remember, that for the parallel task with a model different that default, there will be as many instances created of the script as the required number of cores. Thus the actions that should be executed only once per all application’s processes should be enclosed in the following block:
if [ "x$OMPI_COMM_WORLD_RANK" == "x0" ] || [ "x$PMI_RANK" == "x0" ]; then
# actions in this block will be executed only for rank 0 of OpenMPI/IntelMPI applications
endif