Last year, Mark Hammond proposed PEP 397 (Python launcher for Windows), to bring some much needed functionality for Python on Windows. Historically, Python on Windows does not add itself to the system path; this needs to be done manually by users as a separate step. This may change in the future, but it remains the case for Python versions that are already released. With the increasing use of Python 2 and Python 3 on the same system, you can’t be sure if a particular script will run with a version of Python which may have been added to the system path.

Some Python distributions add the .py and .pyw extensions to the PATHEXT environment variable, and if not, they can be added manually. This tells the Windows shell to try and execute foo.py when you invoke just plain foo on the command line. The shell then uses file associations to try to locate the actual executable to be invoked with foo.py as a parameter, along with any parameters you passed when you typed foo on the command line.

File associations typically involve a two-step process: extensions are mapped to file types (also called vendor keys), and file types are mapped to executables which are used to perform operations on those files. These mappings are maintained in the registry; only one Python executable can be associated with .py files, which is clearly insufficient in environments where multiple Python versions are installed on the same machine.

On Unix, Linux, OS X and other similar systems, individual scripts can specify which executable should be used to execute them, using the first line of the script – commonly called the shebang line. This is conventionally a comment line in the scripting language, and is ignored by the scripting language interpreter, but used by the command shell. The typical line for a Python script would be #!/usr/bin/env python, which directs the shell to invoke the script using the default Python interpreter for the system (usually a 2.x version, except on Arch Linux, where it’s a 3.x version).

All of these systems allow symlinks for python3 and or python2 pointing to a suitable Python 3.x or 2.x version, so individual scripts can easily specify which version of Python is appropriate for them.

This functionality has not been available in the Windows environment, until recently, when an implementation of a PEP 397-compliant launcher was released. This aims to provide shebang line functionality for Windows, and early adopters have indicated that the launcher is genuinely useful for Python users on Windows.

The launcher installer comes in four variants, offering different architectures and installation locations:

  Windows folder "%ProgramFiles%\Python Launcher"
32-bit launchwin.msi launcher.msi
64-bit launchwin.amd64.msi launcher.amd64.msi

The main advantage of choosing the launchwin variants is that they are on the system path, so you can invoke the launcher directly (i.e. not indirectly through a script) using the commands py, py –2 or py –3. These commands will result in launching respectively the latest default Python version (2.x or 3.x depending on configuration, described below), the latest installed Python 2.x or the latest installed Python 3.x.

The launcher installation program associates the Python file types with the launcher executable, so that (with .py and .pyw in PATHEXT), typing e.g. foo -abc def in a command shell (where foo.py exists) will lead to Windows starting the launcher using the command line <path>\py.exe foo.py –abc def, where <path> is wherever the launcher was installed to. The launcher then opens foo.py, reads the first line, looks for a shebang line, and from it tries to determine which executable is needed to actually run the script. If it fails to do this, it exits with an error message; otherwise, it invokes that executable, passing it the same arguments it was passed.

In order for the shebang line processing to correctly identify an executable to run the script, the shebang line (the first line in the script) must begin with #! and then be followed by some optional whitespace and then an executable specifier. This can take a number of forms:

Executable specifier What will happen
/usr/bin/env python Look for an optional version specifier and arguments, and launch the appropriate version of Python with any arguments from the shebang line, the script name and the parameters passed to the script.
/usr/bin/python As above.
/usr/local/bin/python As above.
python As above.
/usr/bin/env progname Look in the launcher’s configuration (see below) to identify the executable for progname, and if found, launch it with any arguments from the shebang line, the script name and the parameters passed to the script.
/usr/bin/progname As above.
/usr/local/bin/progname As above.
progname As above.

The progname in the above table is an identifier other than 'python'. It is regarded as a 'customised command' (see below).

The version specifier can take one of the following forms:

Version specifier Which Python will be run
2 The default 2.x version of Python, as configured (see below)
3 The default 3.x version of Python, as configured (see below)
2.x where x is a digit The specified version 2.x of Python with the same architecture as the OS
3.x where x is a digit The specified version 3.x of Python with the same architecture as the OS
2.x-32 where x is a digit The same as 2.x, except that on a 64-bit system, the 32-bit Python is run
3.x-32 where x is a digit The same as 3.x, except that on a 64-bit system, the 32-bit Python is run

The optional –32 suffix, if present, will have no effect when processed by a 32-bit launcher executable. If a specified Python version is not installed on the system, the launcher will terminate with an error message.

Customised commands are treated as follows. A py.ini file is looked for in two locations: adjacent to the launcher, and in the user’s local AppData folder (non-roaming). This has a standard .ini format; the [commands] section is expected to contain a list of lines with the format progname=path. The file adjacent to the launcher is read first, then the one in the user’s profile location; these commands are read into an internal command table. If the same progname is in both configuration files, the value read from the user’s profile will overwrite the value read from the launcher-adjacent configuration. This table is searched when a progname variant of the shebang line is encountered; if a match is found, the corresponding path is assumed to be the path to the custom executable.

Customised commands allow the launcher to operate as a very flexible shebang line processor, not limited to launching Python scripts. For example, you can launch Perl, Ruby or Haskell scripts using the launcher (as long as the file type associations are set up for the corresponding .pl, .rb and .hs extensions to point to the launcher, and that the launcher’s configuration allows it to map perl, ruby and runhaskell progname values to the paths of the corresponding executables). Of course, those programs already set the registry up to point to their launchers: but using the customisability of the Python launcher, you could (for example) cause ‘perl’ in a shebang line to invoke the Perl interpreter with useful default arguments such as –w (which enables many useful warnings and is recommended).

PEP 397 has been accepted, and so the launcher should be incorporated into Python 3.3 (now in beta). If you are a Windows user, please download the launcher (or the Python 3.3 beta), try it out and give feedback (if you find any bugs or have any enhancement requests, you can raise an issue here for the standalone launcher, or here for the Python 3.3 beta). The standalone launcher will remain available for use with older Python versions.

0

Add a comment

  1. When I set up xrdp on Raspbian Jessie a while ago, the keyboard layout appeared to be wrong - commonly used keys seemed to be returning US keycodes rather than UK ones. I found this post very helpful in resolving the problem, but it didn't quite fit the bill when I tried to do the same with a Raspbian Stretch instance recently. Here's what I did on Raspbian Stretch to set up xrdp to provide the correct keycodes.

    First, I checked the keboard layout was as expected:

    $ cat /etc/default/keyboard | grep LAYOUT
    XKBLAYOUT="gb"
    

    Then, I generated a keyboard mapping file using xrdp-genkeymap:

    $ xrdp-genkeymap km-00000809.ini
    

    This filename follows the current filename convention (under Jessie, it was km-0809.ini). I then copied this file into the xrdp configuration files directory:

    $ sudo cp km-00000809.ini /etc/xrdp
    

    The next step was to edit the file /etc/xrdp/xrdp_keyboard.ini, which appears to be new in Raspbian Stretch. Here are the lines I added, in an excerpt from the file:

    [default_rdp_layouts]
    rdp_layout_us=0x00000409
    rdp_layout_de=0x00000407
    < ... lines omitted ... >
    rdp_layout_br=0x00000416
    rdp_layout_pl=0x00000415
    rdp_layout_gb=0x00000809
    
    ; <rdp layout name> = <X11 keyboard layout value>
    [default_layouts_map]
    rdp_layout_us=us
    rdp_layout_de=de
    < ... lines omitted ... >
    rdp_layout_br=br(abnt2)
    rdp_layout_pl=pl
    rdp_layout_gb=gb
    

    In each case, I added the rdp_layout_gb= lines. The final step was to restart the xrdp service:

    $ sudo service xrdp restart
    

    On reconnecting, I found that the keycodes were as they should have been, and I had access to the £@'~#\| keys again in their expected places on the keyboard.

    1

    View comments

  2. The implementation of PEP 391 (Dictionary-Based Configuration for Logging) provides, under the hood, the basis for a flexible, general-purpose configuration mechanism. The class which performs the logging configuration work is DictConfigurator, and it's based on another class, BaseConfigurator. DictConfigurator knows about logging components such as Logger, Handler, Formatter, and so on, and provides some “syntax sugar” for logging configuration, but the BaseConfigurator class is generic and can be used, with a little extra work, to configure any Python object from a dictionary. The nice thing about using dictionaries is that you side-step the question of whether to use JSON, YAML, Python or some other means of storing your configuration - people can use whatever makes the most sense for them.

    Ideally, a configuration approach should allow one to do the following:

    • Construct arbitrary objects, including sub-components which are themselves arbitrary objects.
    • Let parts of the configuration to refer to external objects which are accessible through normal import mechanisms.
    • Let parts of the configuration to refer to other parts of the configuration, so as to avoid duplication.
    • Offer the ability to refer to sub-configurations which are held outside a given configuration (e.g. in external files). This is commonly called “include” functionality.

    I present an approach for doing these things, which is based on the existing code in the logging.config package.

    Of course, constructing completely arbitrary objects from untrusted sources can lead to problems. For example, YAML allows the construction of completely arbitrary objects, and the use of YAML by Ruby on Rails led to some security exploits not that long ago. Likewise, pickle allows the creation of arbitrary objects and thus is also vulnerable to the same sorts of security exploits.

    In the case of the configuration mechanism being discussed here, creation of objects is under the control of the a configurator object, whose implementation can, in principle, use mechanisms (such as, but not limited to, whitelists and blacklists) to control what can and can't be created.

    As mentioned in the logging documentation, access to external objects which are accessible through normal import mechanisms is achieved through describing the objects using string literals in the configuration. For example, the string "ext://sys.stderr" would resolve to the object bound to sys.stderr. The resolution process calls the configurator's ext_convert method with the literal string "sys.stderr", and this method in turn calls the configurator's resolve method with the same string to find the value through the import machinery. Either of these methods can be overridden to put security mechanisms in place to prevent access to certain objects, if desired.

    As also mentioned in the logging documentation, access to objects internal to the configuration (so that you can reference one object in the configuration from another) is also done through literal strings, of the form "cfg://path" where the path portion indicates how to get to the object from the top-level configuration dictionary. In the path specification, you can use attribute access and item access notation to pinpoint the object you want, e.g. you could use "cfg://handlers.email.toaddrs[0]" to get the first recipient’s email address in an SMTPHandler named email in a logging configuration.

    A general purpose configurator which can build an arbitrary object from a dictionary will need:

    • A callable which will create and initialise the object. This will usually be a class, but it could be any callable.
    • Positional arguments for that callable.
    • Keyword arguments for that callable.
    • In cases where some attributes need to be set in the instance after initialisation, a dict mapping the attribute names to the values to set them to.

    Of course, the arguments and attribute values can themselves be configuration dictionaries which specify objects to be created and initialised.

    To handle sub-configurations held in external files, the configurator will also support literal strings of the type "inc://path/to/external/file". (The base implementation assumes these are JSON files, but this can be easily generalised to support other file types.)

    The following conventions are used in configuration dictionaries which are used to configure objects:

    • The "()" key, if present, identifies the dictionary as a configuration dictionary for an object. The corresponding value, if not a callable, must be a string which resolves to a callable through normal import mechanisms (e.g. the literal "logging.StreamHandler"). If this key is absent, the dictionary is treated as an ordinary dictionary.
    • The "[]" key, if present, identifies the positional arguments for the call to the callable. This should be a list of objects or dictionaries used to configure objects. If not present, the empty tuple is used for the positional arguments.
    • The "." key, if present, must have a corresponding dict as a value. Each key in this dict is an attribute name (i.e. it should be a valid Python identifier), and the corresponding value is either an object or a dictionary used to configure an object.
    • All other keys are assumed to be keyword arguments for the callable – they should all be valid Python identifiers, and the corresponding values should be either objects or dictionaries used to configure objects.

    Note that the use of special keys for callable, positional arguments and attributes means that they will never clash with keyword arguments for the callable.

    To show how the scheme works, let's define a dummy class which just holds the objects passed to its initialiser:

    class TestContainer(object):
        def __init__(self, *args, **kwargs):
            self.args = args
            self.kwargs = kwargs 
    

    We can then define a configuration dictionary:

    config_dict = {
        'o': {
            '()': '__main__.TestContainer',
            '[]': [
                1, 2.0, '3', {
                    '()': '__main__.TestContainer',
                    '[]': [4, 5.0],
                    'k11': 'ext://sys.stderr',
                    'k12': 'cfg://o[k1]',
                },
            ],
            'k1': 'v1',
            'k2': {
                '()': '__main__.TestContainer',
                'k21': 'v21'
            },
            '.': {
                'p1': 'a',
                'p2': {
                    '()': '__main__.TestContainer',
                 }
            }
        }
    }
    

    We then initialise a Configurator:

    >>> cfg = Configurator(config_dict)
    

    If you're going to handle sub-configurations in external files, you'd use the following form of initialiser:

    >>> cfg = Configurator(config_dict, '/path/to/external/configs')
    

    The path to external configuration files, if not specified, defaults to the current directory. It's used, if a relative path is specified in an inc://path, to determine the absolute path to the external configuration file.

    Once the configurator has been created, we can access the configuration:

    >>> cfg['o']
    <__main__.TestContainer object at 0x7f6aa90e7d10>
    >>> o = cfg['o']
    >>> o.args
    (1, 2.0, '3', <__main__.TestContainer object at 0x7f8b8eaadd50>)
    >>> o.kwargs
    {'k2': <__main__.TestContainer object at 0x7f8b8eaadcd0>, 'k1': 'v1'}
    >>> o2 = o.args[-1]
    >>> o2.args
    (4, 5.0)
    >>> o2.kwargs
    {'k12': 'v1', 'k11': <open file '<stderr>', mode 'w' at 0x7fb245c65270>} 
    

    Notice that the cfg:// reference to the configuration and the ext:// reference to the external object have been set correctly.

    >>> o3 = o.kwargs['k2']
    >>> o3.args
    ()
    >>> o3.kwargs
    {'k21': 'v21'}
    >>> o.p1
    'a'
    >>> o.p2
    <__main__.TestContainer object at 0x7f8b8c43af50>
    >>> o4 = o.p2
    >>> o4.args
    ()
    >>> o4.kwargs
    {} 
    

    The above inspections show that the objects have been constructed as expected. The configuration dictionary can be created from either JSON or YAML files.

    To look at how “includes” work, we can create a sub-configuration to be included, in a file tests/included.json:

    {
      "foo": "bar",
      "bar": "baz"
    } 
    

    and we can refer to this in a configuration:

    >>> config_dict = {'included_value': 'inc://included.json'}
    

    Then, we instantiate a configurator:

    >>> cfg = Configurator(config_dict, 'tests')
    

    and examine the included value:

    >>> cfg['included_value']
    {'foo': 'bar', 'bar': 'baz'} 
    

    which is as expected.

    This configuration functionality will be included in the next release of distlib. A simple test script exercising the functionality is available here; you’ll need to clone the BitBucket repository for distlib to actually run it, but you should be able to see how the functionality works just by looking at the script.

    Your comments about this configuration approach are welcome, particularly regarding any missing functionality or problems you can foresee. Thanks for reading.

    0

    Add a comment

  3. Last year, Mark Hammond proposed PEP 397 (Python launcher for Windows), to bring some much needed functionality for Python on Windows. Historically, Python on Windows does not add itself to the system path; this needs to be done manually by users as a separate step. This may change in the future, but it remains the case for Python versions that are already released. With the increasing use of Python 2 and Python 3 on the same system, you can’t be sure if a particular script will run with a version of Python which may have been added to the system path.

    Some Python distributions add the .py and .pyw extensions to the PATHEXT environment variable, and if not, they can be added manually. This tells the Windows shell to try and execute foo.py when you invoke just plain foo on the command line. The shell then uses file associations to try to locate the actual executable to be invoked with foo.py as a parameter, along with any parameters you passed when you typed foo on the command line.

    File associations typically involve a two-step process: extensions are mapped to file types (also called vendor keys), and file types are mapped to executables which are used to perform operations on those files. These mappings are maintained in the registry; only one Python executable can be associated with .py files, which is clearly insufficient in environments where multiple Python versions are installed on the same machine.

    On Unix, Linux, OS X and other similar systems, individual scripts can specify which executable should be used to execute them, using the first line of the script – commonly called the shebang line. This is conventionally a comment line in the scripting language, and is ignored by the scripting language interpreter, but used by the command shell. The typical line for a Python script would be #!/usr/bin/env python, which directs the shell to invoke the script using the default Python interpreter for the system (usually a 2.x version, except on Arch Linux, where it’s a 3.x version).

    All of these systems allow symlinks for python3 and or python2 pointing to a suitable Python 3.x or 2.x version, so individual scripts can easily specify which version of Python is appropriate for them.

    This functionality has not been available in the Windows environment, until recently, when an implementation of a PEP 397-compliant launcher was released. This aims to provide shebang line functionality for Windows, and early adopters have indicated that the launcher is genuinely useful for Python users on Windows.

    The launcher installer comes in four variants, offering different architectures and installation locations:

      Windows folder "%ProgramFiles%\Python Launcher"
    32-bit launchwin.msi launcher.msi
    64-bit launchwin.amd64.msi launcher.amd64.msi

    The main advantage of choosing the launchwin variants is that they are on the system path, so you can invoke the launcher directly (i.e. not indirectly through a script) using the commands py, py –2 or py –3. These commands will result in launching respectively the latest default Python version (2.x or 3.x depending on configuration, described below), the latest installed Python 2.x or the latest installed Python 3.x.

    The launcher installation program associates the Python file types with the launcher executable, so that (with .py and .pyw in PATHEXT), typing e.g. foo -abc def in a command shell (where foo.py exists) will lead to Windows starting the launcher using the command line <path>\py.exe foo.py –abc def, where <path> is wherever the launcher was installed to. The launcher then opens foo.py, reads the first line, looks for a shebang line, and from it tries to determine which executable is needed to actually run the script. If it fails to do this, it exits with an error message; otherwise, it invokes that executable, passing it the same arguments it was passed.

    In order for the shebang line processing to correctly identify an executable to run the script, the shebang line (the first line in the script) must begin with #! and then be followed by some optional whitespace and then an executable specifier. This can take a number of forms:

    Executable specifier What will happen
    /usr/bin/env python Look for an optional version specifier and arguments, and launch the appropriate version of Python with any arguments from the shebang line, the script name and the parameters passed to the script.
    /usr/bin/python As above.
    /usr/local/bin/python As above.
    python As above.
    /usr/bin/env progname Look in the launcher’s configuration (see below) to identify the executable for progname, and if found, launch it with any arguments from the shebang line, the script name and the parameters passed to the script.
    /usr/bin/progname As above.
    /usr/local/bin/progname As above.
    progname As above.

    The progname in the above table is an identifier other than 'python'. It is regarded as a 'customised command' (see below).

    The version specifier can take one of the following forms:

    Version specifier Which Python will be run
    2 The default 2.x version of Python, as configured (see below)
    3 The default 3.x version of Python, as configured (see below)
    2.x where x is a digit The specified version 2.x of Python with the same architecture as the OS
    3.x where x is a digit The specified version 3.x of Python with the same architecture as the OS
    2.x-32 where x is a digit The same as 2.x, except that on a 64-bit system, the 32-bit Python is run
    3.x-32 where x is a digit The same as 3.x, except that on a 64-bit system, the 32-bit Python is run

    The optional –32 suffix, if present, will have no effect when processed by a 32-bit launcher executable. If a specified Python version is not installed on the system, the launcher will terminate with an error message.

    Customised commands are treated as follows. A py.ini file is looked for in two locations: adjacent to the launcher, and in the user’s local AppData folder (non-roaming). This has a standard .ini format; the [commands] section is expected to contain a list of lines with the format progname=path. The file adjacent to the launcher is read first, then the one in the user’s profile location; these commands are read into an internal command table. If the same progname is in both configuration files, the value read from the user’s profile will overwrite the value read from the launcher-adjacent configuration. This table is searched when a progname variant of the shebang line is encountered; if a match is found, the corresponding path is assumed to be the path to the custom executable.

    Customised commands allow the launcher to operate as a very flexible shebang line processor, not limited to launching Python scripts. For example, you can launch Perl, Ruby or Haskell scripts using the launcher (as long as the file type associations are set up for the corresponding .pl, .rb and .hs extensions to point to the launcher, and that the launcher’s configuration allows it to map perl, ruby and runhaskell progname values to the paths of the corresponding executables). Of course, those programs already set the registry up to point to their launchers: but using the customisability of the Python launcher, you could (for example) cause ‘perl’ in a shebang line to invoke the Perl interpreter with useful default arguments such as –w (which enables many useful warnings and is recommended).

    PEP 397 has been accepted, and so the launcher should be incorporated into Python 3.3 (now in beta). If you are a Windows user, please download the launcher (or the Python 3.3 beta), try it out and give feedback (if you find any bugs or have any enhancement requests, you can raise an issue here for the standalone launcher, or here for the Python 3.3 beta). The standalone launcher will remain available for use with older Python versions.

    0

    Add a comment

  4. With the acceptance of PEP 414 (Explicit Unicode Literal for Python 3.3), string literals with u prefixes will be permitted syntax in Python 3.3, though they cause a SyntaxError in Python 3.2 (and earlier 3.x versions). The motivation behind the PEP is to make it easier to port any 2.x project which has a lot of Unicode literals to 3.3 using a single codebase strategy. That’s a strategy which avoids the need to repeatedly run 2to3 on the code, either during development or at installation time. The single codebase strategy has gained currency because the (repeated) running of 2to3 over parts of the codebase causes impedance in the development workflow. The main impact of the PEP from a porting perspective is that the diffs between ported code and unported code will not have the noise created by removing the u prefixes from all the string literals, and should therefore give project owners an easier time of it when reviewing changes. Of course it also saves the work of actually removing the literals, but that would be a one-time operation automated by 2to3, and so not really a significant part of a porting effort.

    While this PEP is fine for people who want to port their 2.x project straight to 3.3, it leaves Python 3.2 users a little bit out in the cold. The PEP does consider the question of 3.2 support, but does somewhat treat 3.2 as a second-class citizen. And for those proposing that people just move to 3.3 as the latest and greatest Python, remember there will be people who are constrained to use 3.2 because of project dependency constraints, whether they are technical or organisational in nature. For example, Ubuntu 12.04 LTS (Long Term Support) will receive 5 years of support; there are already people who have invested time and effort in projects with 3.2 as a dependency, which may not be possible to migrate to 3.3 (which, let’s remember, won’t be released for a while – the scheduled release date is 18 August 2012).

    The PEP offers to support 3.2 users by means of an installation hook which works similarly to 2to3 used at installation time. However, an installation time hook does not provide the benefits of a single codebase in terms of streamlined iterative workflow, involving making code changes interspersed with testing with multiple Python versions.

    An import hook (which was suggested during the PEP 414 discussions on the python-dev mailing list) is a much more attractive proposition. The benefits are that you can have code containing u'xxx' literals, which 3.3 will allow by virtue of PEP 414’s proposed change to Python, and also work with that code transparently in Python 3.2. How it would work is:

    • An import hook is installed.
    • When importing a module, if the compiled .pyc file exists and is up to date, it will be used. The hook will not do anything in this case.
    • If when importing, the .py file is newer than the .pyc file, the hook will load the source code, convert all string literals with u prefixes to unadorned string literals (as expected by 3.1/3.2), and then compile the converted source. The compiled code will be stored in the .pyc file, so conversion will not be performed again until the .py file’s timestamp is more recent than that of its .pyc file.
    • There is no need to integrate with editing environments – any updated source files that are imported will automatically be converted lazily, as needed.

    I set out to try and implement an import hook to do the prefix removal. The initial result is uprefix, a package containing the hook and functions to register and unregister it. It’s available on PyPI, so you can try it out in a virtualenv using pip install uprefix. Or you can just download a source tarball and install it (e.g. into a virtual environment) using python setup.py install, or run the tests using python setup.py test before installing.

    Usage is easy: once you have it on your path, on Python 3.2, you can do

    >>> import uprefix; uprefix.register_hook()
    >>>

    That’s it. You should now be able to import any module containing string literals using u prefixes, as if they weren’t there.

    You can call uprefix.unregister_hook() to remove the hook from the import pipeline.

    This is a proof of concept, and uses lib2to3 to strip the prefixes.  It should allow you to import 2.x code into Python 3.x without worrying about literal syntax (though other gotchas such as relative import, exception syntax etc. may prevent a successful import).

    The performance seems to be good enough. I couldn’t use the modified tokenize.py which is used by the PEP 414 installation hook, even though it would be faster, because it has a couple of bugs which cause it to break on real-world codebases. (These bugs were reported a month ago, but so far don’t appear to have received any love.)

    Your feedback is welcome. I’m just dipping my toes in the Python import machinery, so I might well have missed some things.

    0

    Add a comment

  5. The subprocess module provides some very useful functionality for working with external programs from Python applications, but is often complained about as being harder to use than it needs to be. See, for example, Kenneth Reitz’s Envoy project, which aims to provide an ease-of-use wrapper over subprocess. There’s also Andrew Moffat’s pbs project, which aims to let you do things like

    from pbs import ifconfig
    print ifconfig("eth0")

    Which it does by replacing sys.modules['pbs'] with a subclass of the module type which overrides __getattr__ to look for programs in the path. Which is nice, and I can see that it would be useful in some contexts, but I don’t find that wc(ls("/etc", "-1"), "-l") is more readable than call(“ls /etc –1 | wc –l”) in the general case.

    I’ve been experimenting with my own wrapper for subprocess, called sarge. The main things I need are:

    • I want to use command pipelines, but using subprocess out of the box often leads to deadlocks because pipe buffers get filled up.
    • I want to use bash-style pipe syntax on Windows as well as Posix, but Windows shells don’t support some of the syntax I want to use, like &&, ||, |& and so on.
    • I want to process output from commands in a flexible way, and communicate() is not always flexible enough for my needs - for example, if I need to process output a line at a time.
    • I want to avoid shell injection problems by having the ability to quote command arguments safely, and I want to minimise the use of shell=True, which I generally have to use when using pipelined commands.
    • I don’t want to set arbitrary limits on passing data between processes, such as Envoy’s 10MB limit.
    • subprocess allows you to let stderr be the same as stdout, but not the other way around - and I sometimes need to do that.

    I’ve been working on supporting these use cases, so sarge offers the following features:

    • A simple run function which allows a rich subset of Bash-style shell command syntax, but parsed and run by sarge so that you can run cross-platform on Posix and Windows without cygwin:

      >>> p = run('false && echo foo')
      >>> p.commands
      [Command('false')]
      >>> p.returncodes
      [1]
      >>> p.returncode
      1
      >>> p = run('false || echo foo')
      foo
      >>> p.commands
      [Command('false'), Command('echo foo')]
      >>> p.returncodes
      [1, 0]
      >>> p.returncode
      0
    • The ability to format shell commands with placeholders, such that variables are quoted to prevent shell injection attacks:

      >>> from sarge import shell_format
      >>> shell_format('ls {0}', '*.py')
      "ls '*.py'"
      >>> shell_format('cat {0}', 'a file name with spaces')
      "cat 'a file name with spaces'"
    • The ability to capture output streams without requiring you to program your own threads. You just use a Capture object and then you can read from it as and when you want:

      >>> from sarge import Capture, run
      >>> with Capture() as out:
      ... run('echo foobarbaz', stdout=out)
      ...
      <sarge.Pipeline object at 0x175ed10>
      >>> out.read(3)
      'foo'
      >>> out.read(3)
      'bar'
      >>> out.read(3)
      'baz'
      >>> out.read(3)
      '\n'
      >>> out.read(3)
      ''

      A Capture object can capture the output from multiple commands:

      >>> from sarge import run, Capture
      >>> p = run('echo foo; echo bar; echo baz', stdout=Capture())
      >>> p.stdout.readline()
      'foo\n'
      >>> p.stdout.readline()
      'bar\n'
      >>> p.stdout.readline()
      'baz\n'
      >>> p.stdout.readline()
      ''

      Delays in commands are honoured in asynchronous calls:

      >>> from sarge import run, Capture
      >>> cmd = 'echo foo & (sleep 2; echo bar) & (sleep 1; echo baz)'
      >>> p = run(cmd, stdout=Capture(), async=True) # returns immediately
      >>> p.close() # wait for completion
      >>> p.stdout.readline()
      'foo\n'
      >>> p.stdout.readline()
      'baz\n'
      >>> p.stdout.readline()
      'bar\n'
      >>>

      Here, the sleep commands ensure that the asynchronous echo calls occur in the order foo (no delay), baz (after a delay of one second) and bar (after a delay of two seconds); the capturing works as expected.

    Sarge hasn’t been released yet, but it’s not far off being ready. It’s meant for Python >= 2.6.5 and is tested on 2.6, 2.7, 3.1, 3.2 and 3.3 on Linux, Mac OS X, Windows XP and Windows 7 (not all versions are tested on all platforms, but the overall test coverage is comfortably over 90%).

    I have released the sarge documentation on Read The Docs; I’m hoping people will read this and give some feedback about the API and feature set being proposed, so that I can fill in any gaps where possible and perhaps make it more useful to other people. Please add your comments here, or via the issue tracker on the BitBucket project for the docs.

    2

    View comments

  6. Eric Holscher, one of the creators of Read The Docs, recently posted about the importance of a documentation culture in Open Source development, and about things that could be done to encourage this. He makes some good points, and Read The Docs is a very nice looking showcase for documentation. Writing good documentation is difficult enough at the best of times, and one practical problem that I face when working on Sphinx documentation is that I often feel I have to break away from composing it to building it, to see how it looks - because the look of it on the page will determine how I want to refine it.

    What I’ve tended to do is work iteratively by making some changes to the ReST sources, invoking make html and refreshing the browser to show how the built documentation looks. This is OK, but does break the flow more than a little (for me, anyway, but I can’t believe I’m the only one).

    I had the idea that it would be nice to streamline the process somewhat, so that all I would need to do is to save the changed ReST source – the building and browser refresh would be automatically done, and if I had the editor and browser windows open next to each other in tiled fashion, I could achieve a sort of WYSIWYG effect with the changes appearing in the browser a second or two after I saved any changes.

    I decided to experiment with this idea, and needed a browser which I could easily control (to get it to refresh on-demand). I decided to use Roberto Alsina’s 128-line browser, which is based on QtWebKit and PyQt. Roberto posted his browser code almost a year ago, and I knew I’d find a use for it one day :-)

    I also needed to track changes to .rst files in the documentation tree, and since I do a fair amount of my Open Source development on Linux, I decided to use inotify functionality. Although there is a Python binding for this, I decided to use the command-line interface and the subprocess module in the standard library, because I wasn’t very familiar with inotify and the command-line interface is easier to experiment with.

    The basic mechanism of the solution is that the browser watches for changes in source files in the documentation tree, invokes Sphinx to build the documentation, and then refreshes its contents. This is done in a separate thread:

    class Watcher(QtCore.QThread):
    def run(self):
    self._stop = False
    watch_command = 'inotifywait -rq -e close_write --exclude \'"*.html"\' .'.split()
    make_command = 'make html'.split()
    while not self._stop:
    # Perhaps should put notifier access in a mutex - not bothering yet
    self.notifier = subprocess.Popen(watch_command)
    self.notifier.wait()
    if self._stop:
    break
    subprocess.call(make_command)
    # Refresh the UI ...
    self.parent().changed.emit()

    def stop(self):
    self._stop = True
    # Perhaps should put notifier access in a mutex - not bothering for now
    if self.notifier.poll() is None: # not yet terminated ...
    self.notifier.terminate()

    The thread invokes inotifywait, and waits for it to exit. This happens when a file is written in the documentation tree which has an extension other than .html, and this typically happens when a source file is edited and saved. The inotifywait command is usually available through a Linux package – on Ubuntu, for example, you can install it using sudo apt-get install inotify-tools. In the specific invocation used, the –r flag tells the program to recursively watch a particular directory, –q indicates that output should not be too verbose (it’s used for trouble-shooting only), –e close_write indicates that we’re only interested in files being closed after being opened for writing, and --exclude '"*.html"' indicates that we don’t care about writes to .html files.

    The Watcher instance’s parent is the main (browser) window. In this we create a new custom event, changed, to be emitted when we want the window to know that the HTML has changed. This is done through the following snippets of code:

    changed = QtCore.pyqtSignal()

    which is declared in the main window class, and

    self.watcher = Watcher(self)
    self.changed.connect(self.wb.reload)
    self.watcher.start()

    which are added to the window’s constructor. Here, self.wb is the QWebView component of the browser which actually holds the browser content.

    One last refinement is to save the browser window coordinates on exit and restore them on starting, so that if you have moved  the window to a particular location, it will reappear there every time until you move it. First, we create a module-level QSettings instance:

    settings = QtCore.QSettings("Vinay Sajip", "DocWatch")

    and provide a couple of main window methods to load and save the settings:

        def load_settings(self):
    settings.beginGroup('mainwindow')
    pos = settings.value('pos')
    size = settings.value('size')
    if isinstance(pos, QtCore.QPoint):
    self.move(pos)
    if isinstance(size, QtCore.QSize):
    self.resize(size)
    settings.endGroup()

    def save_settings(self):
    settings.beginGroup('mainwindow')
    settings.setValue('pos', self.pos())
    settings.setValue('size', self.size())
    settings.endGroup()

    When the main window is closed, we need to stop the watcher and save the settings. (We also need to call load_settings in the main window constructor.)

        def closeEvent(self, event):
    self.save_settings()
    self.watcher.stop()

    The last thing is to construct the code which is invoked when the module is invoked as a script. Note that this very simplistic use is consistent with Sphinx’s quick-start script defaults.

    if __name__ == "__main__":
    if not os.path.isdir('_build'):
    # very simplistic sanity check. Works for me, as I generally use
    # sphinx-quickstart defaults
    print('You must run this application from a Sphinx directory containing _build')
    rc = 1
    else:
    app=QtGui.QApplication(sys.argv)
    path = os.path.join('_build', 'html', 'index.html')
    url = 'file:///' + pathname2url(os.path.abspath(path))
    url = QtCore.QUrl(url)
    wb=MainWindow(url)
    wb.show()
    rc = app.exec_()
    sys.exit(rc)

    The code (MIT licensed) is available from here. As it’s a single file standalone script, I haven’t considered putting it on PyPI – it’s probably easier to download it to a $HOME/bin or similar location, then you can invoke it in the docs directory of your project, run your editor, position the browser and editor windows suitably, and you’re ready to go! Here’s a screen-shot using doc-watch and gedit:

    doc-watch

    Please feel free to try it. Comments and suggestions are welcome.

    Update: Another advantage of using the subprocess / command line approach to notification is that it’s easy to slot in a solution for a platform which doesn’t support inotify. Alternatives are available for both Windows and Mac OS X. For example, on Windows, if you have IronPython installed, the following script could be used to provide the equivalent functionality to inotifywait (for this specific application):

    import clr
    import os

    from System.IO import FileSystemWatcher, NotifyFilters

    stop = False

    def on_change(source, e):
    global stop
    if not e.Name.endswith('.html'):
    stop = True
    print('%s: %s, stop = %s' % (e.FullPath, e.ChangeType, stop))

    watcher = FileSystemWatcher(os.getcwd())
    watcher.NotifyFilter = NotifyFilters.LastWrite | NotifyFilters.FileName
    watcher.EnableRaisingEvents = True
    watcher.IncludeSubdirectories = True
    watcher.Changed += on_change
    watcher.Created += on_change

    while not stop:
    pass

    Whereas for Mac OS X, if you install the MacFSEvents package, the following script could be used to provide the equivalent functionality to inotifywait (again, for this specific application):

    #!/usr/bin/env python

    import os

    from fsevents import Observer, Stream

    stop = False

    def on_change(e):
    global stop
    path = e.name
    if os.path.isfile(path):
    if not path.endswith('.html'):
    stop = True
    print('%s: %s, stop = %s' % (e.name, e.mask, stop))

    observer = Observer()
    observer.start()
    stream = Stream(on_change, os.getcwd(), file_events=True)
    observer.schedule(stream)
    try:
    while not stop:
    pass
    finally:
    observer.unschedule(stream)
    observer.stop()
    observer.join()

    5

    View comments

About
About
Occasional posts about Python programming.
Blog Archive
Loading
All content on this blog is Copyright © 2012-2016 Vinay Sajip. Dynamic Views theme. Powered by Blogger. Report Abuse.