Running jobs on a cluster submitted via qsub from Python. Does it make sense? -


i have situation doing computation in python, , based on outcomes have list of target files candidates passed 2nd program.

for example, have 50,000 files contain ~2000 items each. want filter items , call command line program calculation on of those.

this program #2 can used via shell command line, requires lengthy set of arguments. because of performance reasons have run program #2 on cluster.

right now, running program #2 via 'subprocess.call("...", shell=true) i'd run via qsub in future. have not experience of how done in reasonably efficient manner.

would make sense write temporary 'qsub' files , run them via subprocess() directly python script? there better, maybe more pythonic solution?

any ideas , suggestions welcome!

it makes perfect sense, although go solution.

as far understand, have programme #1 determines of 50,000 files needs computed programme #2. both programme #1 , #2 written in python. excellent choice.

incidentally, have python module might come in handy: https://gist.github.com/stefanedwards/8841307

if running same qsub-system have (no idea ours called), cannot use command arguments on submitted scripts. instead, options submitted via -v option, puts them environment variables, e.g.:

[me@local ~] $ python isprime.py 1 1: true  [me@local ~] $ head -n 5 isprime.py #!/usr/bin/python ### python script ... import os os.chdir(os.environ.get('pbs_o_workdir','.'))  [me@local ~] $ qsub -v isprime='1 2 3' isprime.py 123456.cluster.control.com [me@local ~] 

here, isprime.py handle command line arguments using argparse. need check whether script running submitted job, , retrieve said arguments environment variables (os.environ).

when programme #2 modified run on cluster, programme #1 can submit jobs using subprocess.call(['qsub','-v options=...','programme2.py'], shell=false)

another approach queue files in database (say, sqlite database). have programme #1 check non-processed entries in database, determine outcome (run, not run, run special options). have opportunity run programme #2 in parallel on cluster, checks database files analyse.

edit: when programme #2 executable

instead of python script, use bash script takes environment variables , puts them on command line programme:

#!/bin/bash cd . # put options context/flags etc. if [ -n $option1 ]; _opt1="--opt1 $option1"; fi # can define our own defaults _opt2='--no-verbose' if [ -n $opt2 ]; _opt2="-o $opt2"; fi /path/to/exe $_opt1 $opt2 

if going database solution, have python script checks database unprocessed files, mark file being processed (do these in single transaction), options, call executable subprocess, when done, mark file done, check new file, etc.


Comments

Popular posts from this blog

python - Subclassed QStyledItemDelegate ignores Stylesheet -

java - HttpClient 3.1 Connection pooling vs HttpClient 4.3.2 -

SQL: Divide the sum of values in one table with the count of rows in another -