With the acceptance of PEP 414 (Explicit Unicode Literal for Python 3.3), string literals with u prefixes will be permitted syntax in Python 3.3, though they cause a SyntaxError in Python 3.2 (and earlier 3.x versions). The motivation behind the PEP is to make it easier to port any 2.x project which has a lot of Unicode literals to 3.3 using a single codebase strategy. That’s a strategy which avoids the need to repeatedly run 2to3 on the code, either during development or at installation time. The single codebase strategy has gained currency because the (repeated) running of 2to3 over parts of the codebase causes impedance in the development workflow. The main impact of the PEP from a porting perspective is that the diffs between ported code and unported code will not have the noise created by removing the u prefixes from all the string literals, and should therefore give project owners an easier time of it when reviewing changes. Of course it also saves the work of actually removing the literals, but that would be a one-time operation automated by 2to3, and so not really a significant part of a porting effort.

While this PEP is fine for people who want to port their 2.x project straight to 3.3, it leaves Python 3.2 users a little bit out in the cold. The PEP does consider the question of 3.2 support, but does somewhat treat 3.2 as a second-class citizen. And for those proposing that people just move to 3.3 as the latest and greatest Python, remember there will be people who are constrained to use 3.2 because of project dependency constraints, whether they are technical or organisational in nature. For example, Ubuntu 12.04 LTS (Long Term Support) will receive 5 years of support; there are already people who have invested time and effort in projects with 3.2 as a dependency, which may not be possible to migrate to 3.3 (which, let’s remember, won’t be released for a while – the scheduled release date is 18 August 2012).

The PEP offers to support 3.2 users by means of an installation hook which works similarly to 2to3 used at installation time. However, an installation time hook does not provide the benefits of a single codebase in terms of streamlined iterative workflow, involving making code changes interspersed with testing with multiple Python versions.

An import hook (which was suggested during the PEP 414 discussions on the python-dev mailing list) is a much more attractive proposition. The benefits are that you can have code containing u'xxx' literals, which 3.3 will allow by virtue of PEP 414’s proposed change to Python, and also work with that code transparently in Python 3.2. How it would work is:

  • An import hook is installed.
  • When importing a module, if the compiled .pyc file exists and is up to date, it will be used. The hook will not do anything in this case.
  • If when importing, the .py file is newer than the .pyc file, the hook will load the source code, convert all string literals with u prefixes to unadorned string literals (as expected by 3.1/3.2), and then compile the converted source. The compiled code will be stored in the .pyc file, so conversion will not be performed again until the .py file’s timestamp is more recent than that of its .pyc file.
  • There is no need to integrate with editing environments – any updated source files that are imported will automatically be converted lazily, as needed.

I set out to try and implement an import hook to do the prefix removal. The initial result is uprefix, a package containing the hook and functions to register and unregister it. It’s available on PyPI, so you can try it out in a virtualenv using pip install uprefix. Or you can just download a source tarball and install it (e.g. into a virtual environment) using python setup.py install, or run the tests using python setup.py test before installing.

Usage is easy: once you have it on your path, on Python 3.2, you can do

>>> import uprefix; uprefix.register_hook()
>>>

That’s it. You should now be able to import any module containing string literals using u prefixes, as if they weren’t there.

You can call uprefix.unregister_hook() to remove the hook from the import pipeline.

This is a proof of concept, and uses lib2to3 to strip the prefixes.  It should allow you to import 2.x code into Python 3.x without worrying about literal syntax (though other gotchas such as relative import, exception syntax etc. may prevent a successful import).

The performance seems to be good enough. I couldn’t use the modified tokenize.py which is used by the PEP 414 installation hook, even though it would be faster, because it has a couple of bugs which cause it to break on real-world codebases. (These bugs were reported a month ago, but so far don’t appear to have received any love.)

Your feedback is welcome. I’m just dipping my toes in the Python import machinery, so I might well have missed some things.

0

Add a comment

When I set up xrdp on Raspbian Jessie a while ago, the keyboard layout appeared to be wrong - commonly used keys seemed to be returning US keycodes rather than UK ones. I found this post very helpful in resolving the problem, but it didn't quite fit the bill when I tried to do the same with a Raspbian Stretch instance recently. Here's what I did on Raspbian Stretch to set up xrdp to provide the correct keycodes.

First, I checked the keboard layout was as expected:

$ cat /etc/default/keyboard | grep LAYOUT XKBLAYOUT="gb" Then, I generated a keyboard mapping file using xrdp-genkeymap:

$ xrdp-genkeymap km-00000809.ini This filename follows the current filename convention (under Jessie, it was km-0809.ini).
1

The implementation of PEP 391 (Dictionary-Based Configuration for Logging) provides, under the hood, the basis for a flexible, general-purpose configuration mechanism. The class which performs the logging configuration work is DictConfigurator, and it's based on another class, BaseConfigurator.

Last year, Mark Hammond proposed PEP 397 (Python launcher for Windows), to bring some much needed functionality for Python on Windows. Historically, Python on Windows does not add itself to the system path; this needs to be done manually by users as a separate step. This may change in the future, but it remains the case for Python versions that are already released.

With the acceptance of PEP 414 (Explicit Unicode Literal for Python 3.3), string literals with u prefixes will be permitted syntax in Python 3.3, though they cause a SyntaxError in Python 3.2 (and earlier 3.x versions). The motivation behind the PEP is to make it easier to port any 2.x project which has a lot of Unicode literals to 3.3 using a single codebase strategy. That’s a strategy which avoids the need to repeatedly run 2to3 on the code, either during development or at installation time.

The subprocess module provides some very useful functionality for working with external programs from Python applications, but is often complained about as being harder to use than it needs to be. See, for example, Kenneth Reitz’s Envoy project, which aims to provide an ease-of-use wrapper over subprocess.
2

Eric Holscher, one of the creators of Read The Docs, recently posted about the importance of a documentation culture in Open Source development, and about things that could be done to encourage this. He makes some good points, and Read The Docs is a very nice looking showcase for documentation.
5
About
About
Occasional posts about Python programming.
Blog Archive
Loading
All content on this blog is Copyright © 2012-2016 Vinay Sajip. Dynamic Views theme. Powered by Blogger. Report Abuse.