Python-controlled job execution across multiple platforms
Latest official release
$ pip install pyjob
Source code
$ git clone https://github.com/fsimkovic/pyjob.git
$ cd pyjob
$ python setup.py install
A Script
is easily created by simply providing some optional information. Content can be stored just like any other Python list
.
>>> from pyjob import Script
>>> script = Script(directory='.', prefix='example', stem='', suffix='.sh')
>>> script.append('sleep 5')
>>> print(script)
#!/bin/bash
sleep 5
The path to the Script
can be retrieved by accessing the associated path
attribute.
>>> print(script.path)
'./example.sh'
We could also write()
the Script
to disk, but do not worry, the Task
would do this for you in case you forget before execution.
>>> script.write()
If we are provided with a script written to disk, i.e. reverse the previous few steps, we could simply use the read_script
function, and obtain a Script
instance. This would also allow us to conveniently edit a Script
if necessary.
>>> from pyjob import read_script
>>> script = read_script('./example.sh')
>>> print(script)
#!/bin/bash
sleep 5
To create multiple scripts in parallel we can use the LocalScriptCreator
, given a function to generate a single Script
, an iterable containing the options for each script, and the number of processors to use. You can access the collector
, which will return the ScriptCollector
that can then be input directly into TaskFactory
for execution (detailed below).
>>> from pyjob.script import LocalScriptCreator
>>> script_creator = LocalScriptCreator(func=example_function, iterable=example_iterable, processes=2)
>>> collector = script_creator.collector()
The Script
created in the previous step can be easily executed across all supported platforms, i.e. operating systems and HPC queueing systems. To do so, we simply select a platform (local in the example below), provide one or more Script
instances or paths to scripts, and then execute with the run()
method. To simplify the selection of the correct platform, a TaskFactory
is provided.
>>> from pyjob import TaskFactory
>>> with TaskFactory('local', script) as task:
... task.run()
In the example, the Task
is handled with a Python context, which is the recommended way to handle all Task
instances.
>>> def dup_script(s, i=0):
... s1 = s[:]
... s1.stem = str(i)
... return s1
>>> script1 = dup_script(script, i=0)
>>> script2 = dup_script(script, i=1)
This process is identical to the previous example, except that this time we provide the Script
instances as list
.
>>> with TaskFactory('local', [script1, script2]) as task:
... task.run()
If we would like to use multiple processes, simply provide the processes keyword argument with the relevant count.
>>> with TaskFactory('local', [script1, script2], processes=2) as task:
... task.run()
If a list of Script
instances is inconvenient to maintain, or you would like to use the latest implementation, you could also use the ScriptCollector
and provide it instead.
>>> from pyjob.script import ScriptCollector
>>> collector = ScriptCollector(script)
>>> for i in range(5):
... script = dup_script(script, i=i)
... collector.add(script)
>>> with TaskFactory('local', collector, processes=2) as task:
... task.run()
>>> with TaskFactory('sge', [script1, script2]) as task:
... task.run()
The first argument to TaskFactory
, sge
in this example, defines the
platform on which the Task
will be executed. Other options exist and you
can try this by installing PyJob on such a machine and substituting any of below options in.
Platform | Argument | Task class |
---|---|---|
Local Machine | local |
LocalTask |
Sun Grid Engine | sge |
SunGridEngineTask |
Slurm | slurm |
SlurmTask |
Load Sharing Facility | lsf |
LoadSharingFacilityTask |
Portable Batch System | pbs |
PortableBatchSystemTas |
TORQUE Resource Manager | torque |
TorqueTask |
This little nugget is simply an extension to multiprocessing.Pool
to simplify and tidy imports in your own code. It also provides a backwards-compatible context for the multiprocessing.Pool
, which is standard in Python3.
>>> import time
>>> def sleep(t):
... time.sleep(t)
>>> from pyjob import Pool
>>> with Pool(processes=4) as pool:
... pool.map(sleep, [10] * 8)
If you use PyJob frequently, you may find the manual definition of the same parameters for the system irritating. You are able to pre-define default configurations for your system by creating a YAML configuration file. To simplify the procedure of default-option setting, use:
$ pyjob conf platform:local processes:4
This would set the default platform to local
and the number of processors to use to 4. You therefore do not need to define those in your constructors, unless you want to override them for a particular task.
If you decide that you would like to change a parameter, call the same command with a different parameter. Alternatively, to delete an option, simply set set the value, for example local
or 4
in the example above, to None
.