qcg.pilotjob.slurmres module

class qcg.pilotjob.slurmres.SlurmArg(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

CPU_BIND()
qcg.pilotjob.slurmres.parse_local_cpus()

Return information about available CPU’s and cores in local system. The information is gathered from lscpu command which besides the available CPUs informations, also returns information about physical cores in the system, which is usefull for hyper threading systems.

Returns

two maps with mapping:

core_id -> list of cpu’s assigned to core cpu_id -> list of cores (in most situations this will be a single element list)

Return type

dict(str, list), dict(str, list)

qcg.pilotjob.slurmres.parse_nodelist(nodespec)

Return full node names based on the Slurm node specification.

This method calls scontrol show hostnames to get real node host names.

Parameters

nodespec (str) – Slurm node specification

Returns

node hostnames

Return type

list(str)

qcg.pilotjob.slurmres.get_allocation_data()

Get information about slurm allocation and pack it into dictionary. The information is obtained by ‘scontrol show job’ command

Returns

list of all allocation attributes and values and also a dictionary, a dictionary

might be used to check if any element exist in attributes, but for some attributes like Node, CPU_IDs they are not uniq so in the map there will be just the last occurence of these attributes; remember that the dictionary doesn’t contain information about attributes order.

Return type

dict(str,str)

qcg.pilotjob.slurmres.parse_slurm_cpu_binding(cpu_bind_list)

Return CPU identifier list based on Slurm’s SLURM_CPU_BIND_LIST variable’s value.

Parameters

cpu_bind_list (str) – the value of SLURM_CPU_BIND_LIST

Returns

list of CPU identifiers

Return type

list (int)

qcg.pilotjob.slurmres.parse_slurm_job_cpus(cpus)

Return number of cores allocated on each node in the allocation.

This method parses value of Slurm’s SLURM_JOB_CPUS_PER_NODE variable’s value.

Parameters

cpus (str) – the value of SLURM_JOB_CPUS_PER_NODE

Returns

the number of cores on each node

Return type

list (int)

qcg.pilotjob.slurmres.get_num_slurm_nodes()

Return number of nodes in Slurm allocation.

Returns

number of nodes

Return type

int

qcg.pilotjob.slurmres.parse_slurm_resources(config)

Return resources availabe in Slurm allocation.

Parameters

config (dict) – QCG-PilotJob configuration

Returns

resources available in Slurm allocation

Return type

Resources

qcg.pilotjob.slurmres.parse_slurm_allocation_cpu_ids(allocation_data_list, node_names, cores_num)

Based on allocation data obtained via ‘scontrol show job –detail’ return information about core bindings per node. The information in the allocation data is optimized, so getting those binding might be tricky. The data can be described in form:

Nodes=c[1-2] CPU_IDs=0 Mem=0 GRES=

but also as:

Nodes=c1 CPU_IDs=1 Mem=0 GRES= Nodes=c2 CPU_IDs=0 Mem=0 GRES=

Parameters
  • allocation_data_list (dict) – allocation data (as map (‘map’ key) and list (‘list’))

  • node_names (list(str)) – node names for which the binding should be parsed

  • cores_num (list(int)) – the number of allocated cores for given nodes (the binding information must match # of cores)

Returns

map with node names as keys and binded core list as value

Return type

dict

qcg.pilotjob.slurmres.parse_slurm_env_binding(slurm_cpu_bind_list, node_names, cores_num)

Based on environment varialbe SLURM_CPU_BIND_LIST set by slurm return information about core bindings per node. WARNING: those information might not be as precise as those obtained from allocation data, as environment variable contain the same information for all nodes, so if not all nodes has the same architecture and number of allocated cores the information might not be correct.

Parameters
  • slurm_cpu_bind_list (str) – value of SLURM_CPU_BIND_LIST variable

  • node_names (list(str)) – name of the nodes

  • cores_num (list(int)) – number of cores on each node

Returns

map with node names and core identifier list

Return type

dict(str,int)

qcg.pilotjob.slurmres.in_slurm_allocation()

Check if program has been run inside slurm allocation. We detect some environment variables (like SLURM_NODELIST) that are always set by slurm.

Returns

true if we are inside slurm allocation, otherwise false

qcg.pilotjob.slurmres.get_slurm_version()

Return slurm version in current environment. The Slurm version is obtained only once via srun –version command. In case of error during version obtaining, the version is initialized with values: MAJOR(0), MINOR(0), PATCH(‘’)

Returns

tuple(int,int,str) - the major, minor and patch version of slurm in current environment

qcg.pilotjob.slurmres.find_slurm_version()

Return version of Slurm.

This method calls srun --version and parses result in form MAJOR,MINOR,PATCH.

Returns

MAJOR,MINOR,PATCH

Return type

tuple (int,int,str)

Raises

SlurmEnvError - when srun command is unavailable or finished with non zero exit or if output does not match – expected pattern

qcg.pilotjob.slurmres.test_environment(env=None)

Try to parse slurm resources based on environment passed as dictionary, or string where each environment variable is placed in the separate line. WARNING: some information must be gathered through slurm client programs like ‘scontrol’ so gathering resource information only on environment variables is limited.

Parameters

env (dict|string,optional) – environment to test, if None - the current environment is checked dict - the environment in form of dictionary to be checked string - the environment in form of string where each variable is placed in separate line

Returns

instance with gathered slurm information.

Return type

Resources