1. Last year, Mark Hammond proposed PEP 397 (Python launcher for Windows), to bring some much needed functionality for Python on Windows. Historically, Python on Windows does not add itself to the system path; this needs to be done manually by users as a separate step. This may change in the future, but it remains the case for Python versions that are already released. With the increasing use of Python 2 and Python 3 on the same system, you can’t be sure if a particular script will run with a version of Python which may have been added to the system path.

    Some Python distributions add the .py and .pyw extensions to the PATHEXT environment variable, and if not, they can be added manually. This tells the Windows shell to try and execute foo.py when you invoke just plain foo on the command line. The shell then uses file associations to try to locate the actual executable to be invoked with foo.py as a parameter, along with any parameters you passed when you typed foo on the command line.

    File associations typically involve a two-step process: extensions are mapped to file types (also called vendor keys), and file types are mapped to executables which are used to perform operations on those files. These mappings are maintained in the registry; only one Python executable can be associated with .py files, which is clearly insufficient in environments where multiple Python versions are installed on the same machine.

    On Unix, Linux, OS X and other similar systems, individual scripts can specify which executable should be used to execute them, using the first line of the script – commonly called the shebang line. This is conventionally a comment line in the scripting language, and is ignored by the scripting language interpreter, but used by the command shell. The typical line for a Python script would be #!/usr/bin/env python, which directs the shell to invoke the script using the default Python interpreter for the system (usually a 2.x version, except on Arch Linux, where it’s a 3.x version).

    All of these systems allow symlinks for python3 and or python2 pointing to a suitable Python 3.x or 2.x version, so individual scripts can easily specify which version of Python is appropriate for them.

    This functionality has not been available in the Windows environment, until recently, when an implementation of a PEP 397-compliant launcher was released. This aims to provide shebang line functionality for Windows, and early adopters have indicated that the launcher is genuinely useful for Python users on Windows.

    The launcher installer comes in four variants, offering different architectures and installation locations:

      Windows folder "%ProgramFiles%\Python Launcher"
    32-bit launchwin.msi launcher.msi
    64-bit launchwin.amd64.msi launcher.amd64.msi

    The main advantage of choosing the launchwin variants is that they are on the system path, so you can invoke the launcher directly (i.e. not indirectly through a script) using the commands py, py –2 or py –3. These commands will result in launching respectively the latest default Python version (2.x or 3.x depending on configuration, described below), the latest installed Python 2.x or the latest installed Python 3.x.

    The launcher installation program associates the Python file types with the launcher executable, so that (with .py and .pyw in PATHEXT), typing e.g. foo -abc def in a command shell (where foo.py exists) will lead to Windows starting the launcher using the command line <path>\py.exe foo.py –abc def, where <path> is wherever the launcher was installed to. The launcher then opens foo.py, reads the first line, looks for a shebang line, and from it tries to determine which executable is needed to actually run the script. If it fails to do this, it exits with an error message; otherwise, it invokes that executable, passing it the same arguments it was passed.

    In order for the shebang line processing to correctly identify an executable to run the script, the shebang line (the first line in the script) must begin with #! and then be followed by some optional whitespace and then an executable specifier. This can take a number of forms:

    Executable specifier What will happen
    /usr/bin/env python Look for an optional version specifier and arguments, and launch the appropriate version of Python with any arguments from the shebang line, the script name and the parameters passed to the script.
    /usr/bin/python As above.
    /usr/local/bin/python As above.
    python As above.
    /usr/bin/env progname Look in the launcher’s configuration (see below) to identify the executable for progname, and if found, launch it with any arguments from the shebang line, the script name and the parameters passed to the script.
    /usr/bin/progname As above.
    /usr/local/bin/progname As above.
    progname As above.

    The progname in the above table is an identifier other than 'python'. It is regarded as a 'customised command' (see below).

    The version specifier can take one of the following forms:

    Version specifier Which Python will be run
    2 The default 2.x version of Python, as configured (see below)
    3 The default 3.x version of Python, as configured (see below)
    2.x where x is a digit The specified version 2.x of Python with the same architecture as the OS
    3.x where x is a digit The specified version 3.x of Python with the same architecture as the OS
    2.x-32 where x is a digit The same as 2.x, except that on a 64-bit system, the 32-bit Python is run
    3.x-32 where x is a digit The same as 3.x, except that on a 64-bit system, the 32-bit Python is run

    The optional –32 suffix, if present, will have no effect when processed by a 32-bit launcher executable. If a specified Python version is not installed on the system, the launcher will terminate with an error message.

    Customised commands are treated as follows. A py.ini file is looked for in two locations: adjacent to the launcher, and in the user’s local AppData folder (non-roaming). This has a standard .ini format; the [commands] section is expected to contain a list of lines with the format progname=path. The file adjacent to the launcher is read first, then the one in the user’s profile location; these commands are read into an internal command table. If the same progname is in both configuration files, the value read from the user’s profile will overwrite the value read from the launcher-adjacent configuration. This table is searched when a progname variant of the shebang line is encountered; if a match is found, the corresponding path is assumed to be the path to the custom executable.

    Customised commands allow the launcher to operate as a very flexible shebang line processor, not limited to launching Python scripts. For example, you can launch Perl, Ruby or Haskell scripts using the launcher (as long as the file type associations are set up for the corresponding .pl, .rb and .hs extensions to point to the launcher, and that the launcher’s configuration allows it to map perl, ruby and runhaskell progname values to the paths of the corresponding executables). Of course, those programs already set the registry up to point to their launchers: but using the customisability of the Python launcher, you could (for example) cause ‘perl’ in a shebang line to invoke the Perl interpreter with useful default arguments such as –w (which enables many useful warnings and is recommended).

    PEP 397 has been accepted, and so the launcher should be incorporated into Python 3.3 (now in beta). If you are a Windows user, please download the launcher (or the Python 3.3 beta), try it out and give feedback (if you find any bugs or have any enhancement requests, you can raise an issue here for the standalone launcher, or here for the Python 3.3 beta). The standalone launcher will remain available for use with older Python versions.

    0

    Add a comment

  2. With the acceptance of PEP 414 (Explicit Unicode Literal for Python 3.3), string literals with u prefixes will be permitted syntax in Python 3.3, though they cause a SyntaxError in Python 3.2 (and earlier 3.x versions). The motivation behind the PEP is to make it easier to port any 2.x project which has a lot of Unicode literals to 3.3 using a single codebase strategy. That’s a strategy which avoids the need to repeatedly run 2to3 on the code, either during development or at installation time. The single codebase strategy has gained currency because the (repeated) running of 2to3 over parts of the codebase causes impedance in the development workflow. The main impact of the PEP from a porting perspective is that the diffs between ported code and unported code will not have the noise created by removing the u prefixes from all the string literals, and should therefore give project owners an easier time of it when reviewing changes. Of course it also saves the work of actually removing the literals, but that would be a one-time operation automated by 2to3, and so not really a significant part of a porting effort.

    While this PEP is fine for people who want to port their 2.x project straight to 3.3, it leaves Python 3.2 users a little bit out in the cold. The PEP does consider the question of 3.2 support, but does somewhat treat 3.2 as a second-class citizen. And for those proposing that people just move to 3.3 as the latest and greatest Python, remember there will be people who are constrained to use 3.2 because of project dependency constraints, whether they are technical or organisational in nature. For example, Ubuntu 12.04 LTS (Long Term Support) will receive 5 years of support; there are already people who have invested time and effort in projects with 3.2 as a dependency, which may not be possible to migrate to 3.3 (which, let’s remember, won’t be released for a while – the scheduled release date is 18 August 2012).

    The PEP offers to support 3.2 users by means of an installation hook which works similarly to 2to3 used at installation time. However, an installation time hook does not provide the benefits of a single codebase in terms of streamlined iterative workflow, involving making code changes interspersed with testing with multiple Python versions.

    An import hook (which was suggested during the PEP 414 discussions on the python-dev mailing list) is a much more attractive proposition. The benefits are that you can have code containing u'xxx' literals, which 3.3 will allow by virtue of PEP 414’s proposed change to Python, and also work with that code transparently in Python 3.2. How it would work is:

    • An import hook is installed.
    • When importing a module, if the compiled .pyc file exists and is up to date, it will be used. The hook will not do anything in this case.
    • If when importing, the .py file is newer than the .pyc file, the hook will load the source code, convert all string literals with u prefixes to unadorned string literals (as expected by 3.1/3.2), and then compile the converted source. The compiled code will be stored in the .pyc file, so conversion will not be performed again until the .py file’s timestamp is more recent than that of its .pyc file.
    • There is no need to integrate with editing environments – any updated source files that are imported will automatically be converted lazily, as needed.

    I set out to try and implement an import hook to do the prefix removal. The initial result is uprefix, a package containing the hook and functions to register and unregister it. It’s available on PyPI, so you can try it out in a virtualenv using pip install uprefix. Or you can just download a source tarball and install it (e.g. into a virtual environment) using python setup.py install, or run the tests using python setup.py test before installing.

    Usage is easy: once you have it on your path, on Python 3.2, you can do

    >>> import uprefix; uprefix.register_hook()
    >>>

    That’s it. You should now be able to import any module containing string literals using u prefixes, as if they weren’t there.

    You can call uprefix.unregister_hook() to remove the hook from the import pipeline.

    This is a proof of concept, and uses lib2to3 to strip the prefixes.  It should allow you to import 2.x code into Python 3.x without worrying about literal syntax (though other gotchas such as relative import, exception syntax etc. may prevent a successful import).

    The performance seems to be good enough. I couldn’t use the modified tokenize.py which is used by the PEP 414 installation hook, even though it would be faster, because it has a couple of bugs which cause it to break on real-world codebases. (These bugs were reported a month ago, but so far don’t appear to have received any love.)

    Your feedback is welcome. I’m just dipping my toes in the Python import machinery, so I might well have missed some things.

    0

    Add a comment

  3. The subprocess module provides some very useful functionality for working with external programs from Python applications, but is often complained about as being harder to use than it needs to be. See, for example, Kenneth Reitz’s Envoy project, which aims to provide an ease-of-use wrapper over subprocess. There’s also Andrew Moffat’s pbs project, which aims to let you do things like

    from pbs import ifconfig
    print ifconfig("eth0")

    Which it does by replacing sys.modules['pbs'] with a subclass of the module type which overrides __getattr__ to look for programs in the path. Which is nice, and I can see that it would be useful in some contexts, but I don’t find that wc(ls("/etc", "-1"), "-l") is more readable than call(“ls /etc –1 | wc –l”) in the general case.

    I’ve been experimenting with my own wrapper for subprocess, called sarge. The main things I need are:

    • I want to use command pipelines, but using subprocess out of the box often leads to deadlocks because pipe buffers get filled up.
    • I want to use bash-style pipe syntax on Windows as well as Posix, but Windows shells don’t support some of the syntax I want to use, like &&, ||, |& and so on.
    • I want to process output from commands in a flexible way, and communicate() is not always flexible enough for my needs - for example, if I need to process output a line at a time.
    • I want to avoid shell injection problems by having the ability to quote command arguments safely, and I want to minimise the use of shell=True, which I generally have to use when using pipelined commands.
    • I don’t want to set arbitrary limits on passing data between processes, such as Envoy’s 10MB limit.
    • subprocess allows you to let stderr be the same as stdout, but not the other way around - and I sometimes need to do that.

    I’ve been working on supporting these use cases, so sarge offers the following features:

    • A simple run function which allows a rich subset of Bash-style shell command syntax, but parsed and run by sarge so that you can run cross-platform on Posix and Windows without cygwin:

      >>> p = run('false && echo foo')
      >>> p.commands
      [Command('false')]
      >>> p.returncodes
      [1]
      >>> p.returncode
      1
      >>> p = run('false || echo foo')
      foo
      >>> p.commands
      [Command('false'), Command('echo foo')]
      >>> p.returncodes
      [1, 0]
      >>> p.returncode
      0
    • The ability to format shell commands with placeholders, such that variables are quoted to prevent shell injection attacks:

      >>> from sarge import shell_format
      >>> shell_format('ls {0}', '*.py')
      "ls '*.py'"
      >>> shell_format('cat {0}', 'a file name with spaces')
      "cat 'a file name with spaces'"
    • The ability to capture output streams without requiring you to program your own threads. You just use a Capture object and then you can read from it as and when you want:

      >>> from sarge import Capture, run
      >>> with Capture() as out:
      ... run('echo foobarbaz', stdout=out)
      ...
      <sarge.Pipeline object at 0x175ed10>
      >>> out.read(3)
      'foo'
      >>> out.read(3)
      'bar'
      >>> out.read(3)
      'baz'
      >>> out.read(3)
      '\n'
      >>> out.read(3)
      ''

      A Capture object can capture the output from multiple commands:

      >>> from sarge import run, Capture
      >>> p = run('echo foo; echo bar; echo baz', stdout=Capture())
      >>> p.stdout.readline()
      'foo\n'
      >>> p.stdout.readline()
      'bar\n'
      >>> p.stdout.readline()
      'baz\n'
      >>> p.stdout.readline()
      ''

      Delays in commands are honoured in asynchronous calls:

      >>> from sarge import run, Capture
      >>> cmd = 'echo foo & (sleep 2; echo bar) & (sleep 1; echo baz)'
      >>> p = run(cmd, stdout=Capture(), async=True) # returns immediately
      >>> p.close() # wait for completion
      >>> p.stdout.readline()
      'foo\n'
      >>> p.stdout.readline()
      'baz\n'
      >>> p.stdout.readline()
      'bar\n'
      >>>

      Here, the sleep commands ensure that the asynchronous echo calls occur in the order foo (no delay), baz (after a delay of one second) and bar (after a delay of two seconds); the capturing works as expected.

    Sarge hasn’t been released yet, but it’s not far off being ready. It’s meant for Python >= 2.6.5 and is tested on 2.6, 2.7, 3.1, 3.2 and 3.3 on Linux, Mac OS X, Windows XP and Windows 7 (not all versions are tested on all platforms, but the overall test coverage is comfortably over 90%).

    I have released the sarge documentation on Read The Docs; I’m hoping people will read this and give some feedback about the API and feature set being proposed, so that I can fill in any gaps where possible and perhaps make it more useful to other people. Please add your comments here, or via the issue tracker on the BitBucket project for the docs.

    2

    View comments

  4. Eric Holscher, one of the creators of Read The Docs, recently posted about the importance of a documentation culture in Open Source development, and about things that could be done to encourage this. He makes some good points, and Read The Docs is a very nice looking showcase for documentation. Writing good documentation is difficult enough at the best of times, and one practical problem that I face when working on Sphinx documentation is that I often feel I have to break away from composing it to building it, to see how it looks - because the look of it on the page will determine how I want to refine it.

    What I’ve tended to do is work iteratively by making some changes to the ReST sources, invoking make html and refreshing the browser to show how the built documentation looks. This is OK, but does break the flow more than a little (for me, anyway, but I can’t believe I’m the only one).

    I had the idea that it would be nice to streamline the process somewhat, so that all I would need to do is to save the changed ReST source – the building and browser refresh would be automatically done, and if I had the editor and browser windows open next to each other in tiled fashion, I could achieve a sort of WYSIWYG effect with the changes appearing in the browser a second or two after I saved any changes.

    I decided to experiment with this idea, and needed a browser which I could easily control (to get it to refresh on-demand). I decided to use Roberto Alsina’s 128-line browser, which is based on QtWebKit and PyQt. Roberto posted his browser code almost a year ago, and I knew I’d find a use for it one day :-)

    I also needed to track changes to .rst files in the documentation tree, and since I do a fair amount of my Open Source development on Linux, I decided to use inotify functionality. Although there is a Python binding for this, I decided to use the command-line interface and the subprocess module in the standard library, because I wasn’t very familiar with inotify and the command-line interface is easier to experiment with.

    The basic mechanism of the solution is that the browser watches for changes in source files in the documentation tree, invokes Sphinx to build the documentation, and then refreshes its contents. This is done in a separate thread:

    class Watcher(QtCore.QThread):
    def run(self):
    self._stop = False
    watch_command = 'inotifywait -rq -e close_write --exclude \'"*.html"\' .'.split()
    make_command = 'make html'.split()
    while not self._stop:
    # Perhaps should put notifier access in a mutex - not bothering yet
    self.notifier = subprocess.Popen(watch_command)
    self.notifier.wait()
    if self._stop:
    break
    subprocess.call(make_command)
    # Refresh the UI ...
    self.parent().changed.emit()

    def stop(self):
    self._stop = True
    # Perhaps should put notifier access in a mutex - not bothering for now
    if self.notifier.poll() is None: # not yet terminated ...
    self.notifier.terminate()

    The thread invokes inotifywait, and waits for it to exit. This happens when a file is written in the documentation tree which has an extension other than .html, and this typically happens when a source file is edited and saved. The inotifywait command is usually available through a Linux package – on Ubuntu, for example, you can install it using sudo apt-get install inotify-tools. In the specific invocation used, the –r flag tells the program to recursively watch a particular directory, –q indicates that output should not be too verbose (it’s used for trouble-shooting only), –e close_write indicates that we’re only interested in files being closed after being opened for writing, and --exclude '"*.html"' indicates that we don’t care about writes to .html files.

    The Watcher instance’s parent is the main (browser) window. In this we create a new custom event, changed, to be emitted when we want the window to know that the HTML has changed. This is done through the following snippets of code:

    changed = QtCore.pyqtSignal()

    which is declared in the main window class, and

    self.watcher = Watcher(self)
    self.changed.connect(self.wb.reload)
    self.watcher.start()

    which are added to the window’s constructor. Here, self.wb is the QWebView component of the browser which actually holds the browser content.

    One last refinement is to save the browser window coordinates on exit and restore them on starting, so that if you have moved  the window to a particular location, it will reappear there every time until you move it. First, we create a module-level QSettings instance:

    settings = QtCore.QSettings("Vinay Sajip", "DocWatch")

    and provide a couple of main window methods to load and save the settings:

        def load_settings(self):
    settings.beginGroup('mainwindow')
    pos = settings.value('pos')
    size = settings.value('size')
    if isinstance(pos, QtCore.QPoint):
    self.move(pos)
    if isinstance(size, QtCore.QSize):
    self.resize(size)
    settings.endGroup()

    def save_settings(self):
    settings.beginGroup('mainwindow')
    settings.setValue('pos', self.pos())
    settings.setValue('size', self.size())
    settings.endGroup()

    When the main window is closed, we need to stop the watcher and save the settings. (We also need to call load_settings in the main window constructor.)

        def closeEvent(self, event):
    self.save_settings()
    self.watcher.stop()

    The last thing is to construct the code which is invoked when the module is invoked as a script. Note that this very simplistic use is consistent with Sphinx’s quick-start script defaults.

    if __name__ == "__main__":
    if not os.path.isdir('_build'):
    # very simplistic sanity check. Works for me, as I generally use
    # sphinx-quickstart defaults
    print('You must run this application from a Sphinx directory containing _build')
    rc = 1
    else:
    app=QtGui.QApplication(sys.argv)
    path = os.path.join('_build', 'html', 'index.html')
    url = 'file:///' + pathname2url(os.path.abspath(path))
    url = QtCore.QUrl(url)
    wb=MainWindow(url)
    wb.show()
    rc = app.exec_()
    sys.exit(rc)

    The code (MIT licensed) is available from here. As it’s a single file standalone script, I haven’t considered putting it on PyPI – it’s probably easier to download it to a $HOME/bin or similar location, then you can invoke it in the docs directory of your project, run your editor, position the browser and editor windows suitably, and you’re ready to go! Here’s a screen-shot using doc-watch and gedit:

    doc-watch

    Please feel free to try it. Comments and suggestions are welcome.

    Update: Another advantage of using the subprocess / command line approach to notification is that it’s easy to slot in a solution for a platform which doesn’t support inotify. Alternatives are available for both Windows and Mac OS X. For example, on Windows, if you have IronPython installed, the following script could be used to provide the equivalent functionality to inotifywait (for this specific application):

    import clr
    import os

    from System.IO import FileSystemWatcher, NotifyFilters

    stop = False

    def on_change(source, e):
    global stop
    if not e.Name.endswith('.html'):
    stop = True
    print('%s: %s, stop = %s' % (e.FullPath, e.ChangeType, stop))

    watcher = FileSystemWatcher(os.getcwd())
    watcher.NotifyFilter = NotifyFilters.LastWrite | NotifyFilters.FileName
    watcher.EnableRaisingEvents = True
    watcher.IncludeSubdirectories = True
    watcher.Changed += on_change
    watcher.Created += on_change

    while not stop:
    pass

    Whereas for Mac OS X, if you install the MacFSEvents package, the following script could be used to provide the equivalent functionality to inotifywait (again, for this specific application):

    #!/usr/bin/env python

    import os

    from fsevents import Observer, Stream

    stop = False

    def on_change(e):
    global stop
    path = e.name
    if os.path.isfile(path):
    if not path.endswith('.html'):
    stop = True
    print('%s: %s, stop = %s' % (e.name, e.mask, stop))

    observer = Observer()
    observer.start()
    stream = Stream(on_change, os.getcwd(), file_events=True)
    observer.schedule(stream)
    try:
    while not stop:
    pass
    finally:
    observer.unschedule(stream)
    observer.stop()
    observer.join()

    5

    View comments

About
About
Occasional posts about Python programming.
Blog Archive
Loading
All content on this blog is Copyright © 2012-2016 Vinay Sajip. Dynamic Views theme. Powered by Blogger. Report Abuse.