qcg.pilotjob.slurmres module

class qcg.pilotjob.slurmres.SlurmArg

Bases: enum.Enum

An enumeration.

CPU_BIND()
qcg.pilotjob.slurmres.parse_local_cpus()

Return information about available CPU’s and cores in local system. The information is gathered from lscpu command which besides the available CPUs informations, also returns information about physical cores in the system, which is usefull for hyper threading systems.

Returns:
two maps with mapping:
core_id -> list of cpu’s assigned to core cpu_id -> list of cores (in most situations this will be a single element list)
Return type:dict(str, list), dict(str, list)
qcg.pilotjob.slurmres.parse_nodelist(nodespec)

Return full node names based on the Slurm node specification.

This method calls scontrol show hostnames to get real node host names.

Parameters:nodespec (str) – Slurm node specification
Returns:node hostnames
Return type:list(str)
qcg.pilotjob.slurmres.get_allocation_data()

Get information about slurm allocation and pack it into dictionary. The information is obtained by ‘scontrol show job’ command

Returns:
list of all allocation attributes and values and also a dictionary, a dictionary
might be used to check if any element exist in attributes, but for some attributes like Node, CPU_IDs they are not uniq so in the map there will be just the last occurence of these attributes; remember that the dictionary doesn’t contain information about attributes order.
Return type:dict(str,str)
qcg.pilotjob.slurmres.parse_slurm_cpu_binding(cpu_bind_list)

Return CPU identifier list based on Slurm’s SLURM_CPU_BIND_LIST variable’s value.

Parameters:cpu_bind_list (str) – the value of SLURM_CPU_BIND_LIST
Returns:list of CPU identifiers
Return type:list (int)
qcg.pilotjob.slurmres.parse_slurm_job_cpus(cpus)

Return number of cores allocated on each node in the allocation.

This method parses value of Slurm’s SLURM_JOB_CPUS_PER_NODE variable’s value.

Parameters:cpus (str) – the value of SLURM_JOB_CPUS_PER_NODE
Returns:the number of cores on each node
Return type:list (int)
qcg.pilotjob.slurmres.get_num_slurm_nodes()

Return number of nodes in Slurm allocation.

Returns:number of nodes
Return type:int
qcg.pilotjob.slurmres.parse_slurm_resources(config)

Return resources availabe in Slurm allocation.

Parameters:config (dict) – QCG-PilotJob configuration
Returns:resources available in Slurm allocation
Return type:Resources
qcg.pilotjob.slurmres.parse_slurm_allocation_cpu_ids(allocation_data_list, node_names, cores_num)

Based on allocation data obtained via ‘scontrol show job –detail’ return information about core bindings per node. The information in the allocation data is optimized, so getting those binding might be tricky. The data can be described in form:

Nodes=c[1-2] CPU_IDs=0 Mem=0 GRES=

but also as:

Nodes=c1 CPU_IDs=1 Mem=0 GRES= Nodes=c2 CPU_IDs=0 Mem=0 GRES=
Parameters:
  • allocation_data_list (dict) – allocation data (as map (‘map’ key) and list (‘list’))
  • node_names (list(str)) – node names for which the binding should be parsed
  • cores_num (list(int)) – the number of allocated cores for given nodes (the binding information must match # of cores)
Returns:

map with node names as keys and binded core list as value

Return type:

dict

qcg.pilotjob.slurmres.parse_slurm_env_binding(slurm_cpu_bind_list, node_names, cores_num)

Based on environment varialbe SLURM_CPU_BIND_LIST set by slurm return information about core bindings per node. WARNING: those information might not be as precise as those obtained from allocation data, as environment variable contain the same information for all nodes, so if not all nodes has the same architecture and number of allocated cores the information might not be correct.

Parameters:
  • slurm_cpu_bind_list (str) – value of SLURM_CPU_BIND_LIST variable
  • node_names (list(str)) – name of the nodes
  • cores_num (list(int)) – number of cores on each node
Returns:

map with node names and core identifier list

Return type:

dict(str,int)

qcg.pilotjob.slurmres.in_slurm_allocation()

Check if program has been run inside slurm allocation. We detect some environment variables (like SLURM_NODELIST) that are always set by slurm.

Returns:true if we are inside slurm allocation, otherwise false
qcg.pilotjob.slurmres.get_slurm_version()

Return slurm version in current environment. The Slurm version is obtained only once via srun –version command. In case of error during version obtaining, the version is initialized with values: MAJOR(0), MINOR(0), PATCH(‘’)

Returns:tuple(int,int,str) - the major, minor and patch version of slurm in current environment
qcg.pilotjob.slurmres.find_slurm_version()

Return version of Slurm.

This method calls srun --version and parses result in form MAJOR,MINOR,PATCH.

Returns:MAJOR,MINOR,PATCH
Return type:tuple (int,int,str)
Raises:SlurmEnvError - when `srun` command is unavailable or finished with non zero exit or if output does not match – expected pattern
qcg.pilotjob.slurmres.test_environment(env=None)

Try to parse slurm resources based on environment passed as dictionary, or string where each environment variable is placed in the separate line. WARNING: some information must be gathered through slurm client programs like ‘scontrol’ so gathering resource information only on environment variables is limited.

Parameters:env (dict|string,optional) – environment to test, if None - the current environment is checked dict - the environment in form of dictionary to be checked string - the environment in form of string where each variable is placed in separate line
Returns:instance with gathered slurm information.
Return type:Resources