Execution models ====================== The QCG-PJM service manages individual cores, so it assigns specific cores to the tasks. From the performance perspective, binding tasks to the cores is more efficient as it separates tasks from each other. .. note:: To support CPU binding, the service must have information about physical available cores in the system. This information is provided by *SLURM* in a created allocation, but it is not available in case of logical resources, i.e. where user defines *virtual* cores and nodes. So currently the binding is supported only when the QCG-PJM service is run inside a *SLURM* allocation. The binding of single core tasks is achieved with: - Custom Launching Agent, that uses ``taskset`` command, - SLURM built-in mechanism based on ``--cpu-bind`` option of the ``srun`` command. Currently, the parallel tasks that require more than one core are launched only by the ``srun`` or ``mpirun`` commands. The `mask_cpu` flag of the ``srun``'s ```--cpu-bind`` parameter will contain the CPU masks for all allocated nodes separated with the comma. When ``mpirun`` command is used to launch parallel task, either the ``--rankfile`` parameter is used for OpenMPI model or ``I_MPI_PIN_PROCESSOR_LIST`` environment variable for Intel MPI model. Additionally, for all tasks launched by the Slurm with binding supported, the *QCG_PM_CPU_SET* environment variable will be available and set with core identifiers separated with comma. To support process affinity for different parallel applications, QCG-PJM supports different execution models. Currently the following models are available: - ``default`` - in this model only a single process is launched within the allocation, which is prepared based on task's resource requirements, the allocation description can be found in environment variables, such as: - ``SLURM_NODELIST`` - list of allocated nodes separated by comma character - ``SLURM_NTASKS`` - total number of cores - ``SLURM_TASKS_PER_NODE`` - number of allocated cores on following nodes, where each element can be in a form ``NODE_NAME`` (a single core on a node) or ``NODE_NAME(xNUM_OF_CORES)`` (many cores on a single node) - ``QCG_PM_CPU_SET`` - list of core identifiers on following nodes separated by a comma character - ``threads`` - is designed for running OpenMP tasks on a single node, the process is started with the ``srun`` command with the ``--cpus-per-task`` parameter set according to a number of cores defined in resource requirements - ``openmpi`` - the processes are started with the ``mpirun`` command with rankfile created based on task's resource requirements - ``intelmpi`` - the processes are started with the ``mpirun`` command (from the IntelMPI distribution) with defined multiple components, where each component describing execution node, contains ``-host`` element and ``I_MPI_PIN_PROCESSOR_LIST`` arguments set according to the allocated resources - ``srunmpi`` - the processes are started with the ``srun`` command with ``--cpu-bind`` parameter set according to the allocated resources; this model should be used only on sites that have penMPI/IntelMPI/Slurm environments configured properly. The ``openmpi`` and ``srunmpi`` provide additional configuration options that that can be defined in the element ``model_opts``. Currently the following options are supported: - ``mpirun`` (str) - path to the `mpirun` command that should be used to launch applications, if not defined the default command (should be in the ``PATH`` environment variable) is used - ``mpirun_args`` (list) - additional arguments that should be passed to the `mpirun` command We recommend usage of ``srunmpi`` model for MPI applications on HPC sites wherever srun is properly configured to execute MPI codes. .. note:: It is important to define for the ``intelmpi``, ``srunmpi`` and ``openmpi`` models appropriate IntelMPI/OpenMPI modules in `executable/modules` element of the job description or to load them before staring QCG-PJM. Examples -------- 1) To use different MPI implementation applications in a single workflow we can define appropriate options .. code:: json [ { "request": "submit", "jobs": [ { "name": "intelmpi_task", "execution": { "exec": "/home/user/my_intelmpi_application", "model": "intelmpi", "model_opts": { "mpirun": "/opt/exp_soft/local/skylake/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpirun" }, "modules": [ "impi" ] }, "resources": { "numCores": { "exact": 8 } } }, { "name": "openmpi_task", "execution": { "exec": "/home/user/my_openmpi_application", "model": "openmpi", "model_opts": { "mpirun": "/opt/exp_soft/local/skylake/openmpi/4.1.0_gcc620/bin/mpirun", "mpirun_args": [ "--mca", "rmaps_rank_file_physical", "1" ] }, "modules": [ "openmpi/4.1.0_gcc620" ] }, "resources": { "numCores": { "exact": 8 } } } ] } ] With this input, QCG-PilotJob service will launch task's `intelmpi_task` application ``/home/user/my_intelmpi_application`` with the mpirun command path ``/opt/exp_soft/local/skylake/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpirun`` and additionally it will load `impi` module. The second task's `openmpi_task` application ``/home/user/my_openmpi_application`` will be launched with the command ``/opt/exp_soft/local/skylake/openmpi/4.1.0_gcc620/bin/mpirun`` with additional arguments ``--mca rmaps_rank_file_physical 1`` and the module ``openmpi/4.1.0_gcc620`` loaded before the application's start. The description for the API looks similar: .. code:: python jobs = Jobs() jobs.add(name = 'intelmpi_task', exec = '/home/user/my_intelmpi_application', numCores = { 'exact': 4 }, model = 'intelmpi', model_opts = { 'mpirun': '/opt/exp_soft/local/skylake/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpirun' }, modules = [ 'impi' ]) jobs.add(name = 'openmpi_task', exec = '/home/user/my_openmpi_application', numCores = { 'exact': 4 }, model = 'openmpi', model_opts = { 'mpirun': '/opt/exp_soft/local/skylake/openmpi/4.1.0_gcc620/bin/mpirun', 'mpirun_args': ['--mca', 'rmaps_rank_file_physical', '1']}, modules = [ 'openmpi/4.1.0_gcc620' ]) 2) Instead of compiled application, it is possible to use Bash script from which the application is called later. It gives us more possibilities to configure the environment for the application. For example using the following input description: .. code:: json [ { "request": "submit", "jobs": [ { "name": "openmpi_task", "execution": { "exec": "bash", "args": [ "-l", "./app_script.sh" ], "model": "openmpi", }, "resources": { "numCores": { "exact": 8 } } } ] } ] The script ``app_script.sh`` could look like the following: .. code:: bash #!/bin/bash module load openmpi/4.1.0_gcc620 /home/user/my_openmpi_application .. warning:: It is important to remember, that for the parallel task with a model different that default, there will be as many instances created of the script as the required number of cores. Thus the actions that should be executed only once per all application's processes should be enclosed in the following block: .. code:: bash if [ "x$OMPI_COMM_WORLD_RANK" == "x0" ] || [ "x$PMI_RANK" == "x0" ]; then # actions in this block will be executed only for rank 0 of OpenMPI/IntelMPI applications endif