qcg.pilotjob.slurmres module
- class qcg.pilotjob.slurmres.SlurmArg(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)
Bases:
Enum- CPU_BIND()
- qcg.pilotjob.slurmres.parse_local_cpus()
Return information about available CPU’s and cores in local system. The information is gathered from
lscpucommand which besides the available CPUs informations, also returns information about physical cores in the system, which is usefull for hyper threading systems.- Returns
- two maps with mapping:
core_id -> list of cpu’s assigned to core cpu_id -> list of cores (in most situations this will be a single element list)
- Return type
dict(str, list), dict(str, list)
- qcg.pilotjob.slurmres.parse_nodelist(nodespec)
Return full node names based on the Slurm node specification.
This method calls
scontrol show hostnamesto get real node host names.- Parameters
nodespec (str) – Slurm node specification
- Returns
node hostnames
- Return type
list(str)
- qcg.pilotjob.slurmres.get_allocation_data()
Get information about slurm allocation and pack it into dictionary. The information is obtained by ‘scontrol show job’ command
- Returns
- list of all allocation attributes and values and also a dictionary, a dictionary
might be used to check if any element exist in attributes, but for some attributes like Node, CPU_IDs they are not uniq so in the map there will be just the last occurence of these attributes; remember that the dictionary doesn’t contain information about attributes order.
- Return type
dict(str,str)
- qcg.pilotjob.slurmres.parse_slurm_cpu_binding(cpu_bind_list)
Return CPU identifier list based on Slurm’s SLURM_CPU_BIND_LIST variable’s value.
- Parameters
cpu_bind_list (str) – the value of SLURM_CPU_BIND_LIST
- Returns
list of CPU identifiers
- Return type
list (int)
- qcg.pilotjob.slurmres.parse_slurm_job_cpus(cpus)
Return number of cores allocated on each node in the allocation.
This method parses value of Slurm’s SLURM_JOB_CPUS_PER_NODE variable’s value.
- Parameters
cpus (str) – the value of SLURM_JOB_CPUS_PER_NODE
- Returns
the number of cores on each node
- Return type
list (int)
- qcg.pilotjob.slurmres.get_num_slurm_nodes()
Return number of nodes in Slurm allocation.
- Returns
number of nodes
- Return type
int
- qcg.pilotjob.slurmres.parse_slurm_resources(config)
Return resources availabe in Slurm allocation.
- Parameters
config (dict) – QCG-PilotJob configuration
- Returns
resources available in Slurm allocation
- Return type
- qcg.pilotjob.slurmres.parse_slurm_allocation_cpu_ids(allocation_data_list, node_names, cores_num)
Based on allocation data obtained via ‘scontrol show job –detail’ return information about core bindings per node. The information in the allocation data is optimized, so getting those binding might be tricky. The data can be described in form:
Nodes=c[1-2] CPU_IDs=0 Mem=0 GRES=
but also as:
Nodes=c1 CPU_IDs=1 Mem=0 GRES= Nodes=c2 CPU_IDs=0 Mem=0 GRES=
- Parameters
allocation_data_list (dict) – allocation data (as map (‘map’ key) and list (‘list’))
node_names (list(str)) – node names for which the binding should be parsed
cores_num (list(int)) – the number of allocated cores for given nodes (the binding information must match # of cores)
- Returns
map with node names as keys and binded core list as value
- Return type
dict
- qcg.pilotjob.slurmres.parse_slurm_env_binding(slurm_cpu_bind_list, node_names, cores_num)
Based on environment varialbe SLURM_CPU_BIND_LIST set by slurm return information about core bindings per node. WARNING: those information might not be as precise as those obtained from allocation data, as environment variable contain the same information for all nodes, so if not all nodes has the same architecture and number of allocated cores the information might not be correct.
- Parameters
slurm_cpu_bind_list (str) – value of SLURM_CPU_BIND_LIST variable
node_names (list(str)) – name of the nodes
cores_num (list(int)) – number of cores on each node
- Returns
map with node names and core identifier list
- Return type
dict(str,int)
- qcg.pilotjob.slurmres.in_slurm_allocation()
Check if program has been run inside slurm allocation. We detect some environment variables (like SLURM_NODELIST) that are always set by slurm.
- Returns
true if we are inside slurm allocation, otherwise false
- qcg.pilotjob.slurmres.get_slurm_version()
Return slurm version in current environment. The Slurm version is obtained only once via srun –version command. In case of error during version obtaining, the version is initialized with values: MAJOR(0), MINOR(0), PATCH(‘’)
- Returns
tuple(int,int,str) - the major, minor and patch version of slurm in current environment
- qcg.pilotjob.slurmres.find_slurm_version()
Return version of Slurm.
This method calls
srun --versionand parses result in form MAJOR,MINOR,PATCH.- Returns
MAJOR,MINOR,PATCH
- Return type
tuple (int,int,str)
- Raises
SlurmEnvError - when srun command is unavailable or finished with non zero exit or if output does not match – expected pattern
- qcg.pilotjob.slurmres.test_environment(env=None)
Try to parse slurm resources based on environment passed as dictionary, or string where each environment variable is placed in the separate line. WARNING: some information must be gathered through slurm client programs like ‘scontrol’ so gathering resource information only on environment variables is limited.
- Parameters
env (dict|string,optional) – environment to test, if None - the current environment is checked dict - the environment in form of dictionary to be checked string - the environment in form of string where each variable is placed in separate line
- Returns
instance with gathered slurm information.
- Return type