The subprocess module provides some very useful functionality for working with external programs from Python applications, but is often complained about as being harder to use than it needs to be. See, for example, Kenneth Reitz’s Envoy project, which aims to provide an ease-of-use wrapper over subprocess. There’s also Andrew Moffat’s pbs project, which aims to let you do things like

from pbs import ifconfig
print ifconfig("eth0")

Which it does by replacing sys.modules['pbs'] with a subclass of the module type which overrides __getattr__ to look for programs in the path. Which is nice, and I can see that it would be useful in some contexts, but I don’t find that wc(ls("/etc", "-1"), "-l") is more readable than call(“ls /etc –1 | wc –l”) in the general case.

I’ve been experimenting with my own wrapper for subprocess, called sarge. The main things I need are:

  • I want to use command pipelines, but using subprocess out of the box often leads to deadlocks because pipe buffers get filled up.
  • I want to use bash-style pipe syntax on Windows as well as Posix, but Windows shells don’t support some of the syntax I want to use, like &&, ||, |& and so on.
  • I want to process output from commands in a flexible way, and communicate() is not always flexible enough for my needs - for example, if I need to process output a line at a time.
  • I want to avoid shell injection problems by having the ability to quote command arguments safely, and I want to minimise the use of shell=True, which I generally have to use when using pipelined commands.
  • I don’t want to set arbitrary limits on passing data between processes, such as Envoy’s 10MB limit.
  • subprocess allows you to let stderr be the same as stdout, but not the other way around - and I sometimes need to do that.

I’ve been working on supporting these use cases, so sarge offers the following features:

  • A simple run function which allows a rich subset of Bash-style shell command syntax, but parsed and run by sarge so that you can run cross-platform on Posix and Windows without cygwin:

    >>> p = run('false && echo foo')
    >>> p.commands
    [Command('false')]
    >>> p.returncodes
    [1]
    >>> p.returncode
    1
    >>> p = run('false || echo foo')
    foo
    >>> p.commands
    [Command('false'), Command('echo foo')]
    >>> p.returncodes
    [1, 0]
    >>> p.returncode
    0
  • The ability to format shell commands with placeholders, such that variables are quoted to prevent shell injection attacks:

    >>> from sarge import shell_format
    >>> shell_format('ls {0}', '*.py')
    "ls '*.py'"
    >>> shell_format('cat {0}', 'a file name with spaces')
    "cat 'a file name with spaces'"
  • The ability to capture output streams without requiring you to program your own threads. You just use a Capture object and then you can read from it as and when you want:

    >>> from sarge import Capture, run
    >>> with Capture() as out:
    ... run('echo foobarbaz', stdout=out)
    ...
    <sarge.Pipeline object at 0x175ed10>
    >>> out.read(3)
    'foo'
    >>> out.read(3)
    'bar'
    >>> out.read(3)
    'baz'
    >>> out.read(3)
    '\n'
    >>> out.read(3)
    ''

    A Capture object can capture the output from multiple commands:

    >>> from sarge import run, Capture
    >>> p = run('echo foo; echo bar; echo baz', stdout=Capture())
    >>> p.stdout.readline()
    'foo\n'
    >>> p.stdout.readline()
    'bar\n'
    >>> p.stdout.readline()
    'baz\n'
    >>> p.stdout.readline()
    ''

    Delays in commands are honoured in asynchronous calls:

    >>> from sarge import run, Capture
    >>> cmd = 'echo foo & (sleep 2; echo bar) & (sleep 1; echo baz)'
    >>> p = run(cmd, stdout=Capture(), async=True) # returns immediately
    >>> p.close() # wait for completion
    >>> p.stdout.readline()
    'foo\n'
    >>> p.stdout.readline()
    'baz\n'
    >>> p.stdout.readline()
    'bar\n'
    >>>

    Here, the sleep commands ensure that the asynchronous echo calls occur in the order foo (no delay), baz (after a delay of one second) and bar (after a delay of two seconds); the capturing works as expected.

Sarge hasn’t been released yet, but it’s not far off being ready. It’s meant for Python >= 2.6.5 and is tested on 2.6, 2.7, 3.1, 3.2 and 3.3 on Linux, Mac OS X, Windows XP and Windows 7 (not all versions are tested on all platforms, but the overall test coverage is comfortably over 90%).

I have released the sarge documentation on Read The Docs; I’m hoping people will read this and give some feedback about the API and feature set being proposed, so that I can fill in any gaps where possible and perhaps make it more useful to other people. Please add your comments here, or via the issue tracker on the BitBucket project for the docs.

2

View comments

About
About
Occasional posts about Python programming.
Blog Archive
Loading
All content on this blog is Copyright © 2012-2016 Vinay Sajip. Dynamic Views theme. Powered by Blogger. Report Abuse.