Namespaces

2 December 2009
00:12

Most programming languages get namespaces (or importing modules or files) wrong.

using System;
using System.IO;

public class HelloWorld {
   public static void Main(string[] args) {
      Console.WriteLine("Hello, World!");
   }
}

Where did that Console class come from? System? System.IO? Is it somehow implicitly available? Just by looking at the code, without knowing the .NET framework, you can't tell. (Nor can you tell that the System.IO namespace is not used anywhere, so that the corresponding using statement can be safely removed. I think.)

You shouldn't have to ask this question, and you shouldn't have to rely on an IDE or documentation to provide the answer. It should be obvious from just reading or grepping the code. Languages that violate this simple rule include Java, C#, Ruby, and C/C++. I'm sure there are lots more.

UPDATE: Wow, it's been so long since I last hacked on Java code that I forgot how its imports work. Actually, Java imports don't violate the rule unless you use star imports, just like Python.

Turns out the languages I'm currently most interested in get this (mostly) right: Python, Erlang, Go, and (although not a language as such) node.js with its CommonJS-based module system:

var puts = require("sys").puts;
puts("Hello, World!");

Sure, you can put Python into dirty namespace mode by using “star imports”, and you can intentionally mess up the explicit namespacing in node.js by applying the process.mixin() method to pull exports into the global namespace. So please try not to do that.

Py3k

5 December 2008
18:35

Python 3.0 has been been released with some fanfare, here's my take, as someone who writes a lot of Python code, but who has admittedly never actively participated in the evolution of the language or the standard library. For some background, I've been one of the early developers of Trac, and I've also written and continue to maintain a coupe of open-source Python libraries such as Genshi, Babel, and CouchDB-Python.

So, Python 3.0. Technically, it's a great thing. It does away with old warts in the language design, improves consistency and simplicity. The standard library with all of its inconsistently (and sometimes awkwardly) named modules has been reorganized. Strings are unicode by default. Integers have unlimited precision. And quite a bit more.

But…

read on …

View Source

28 August 2008
17:27

I have written before about how I'm running my website on custom developed Python code. Back then I said:

The code isn't publicly available at this point, although I do intend to release it when I feel it's ready.

Well, stuff like this never is truly ready, but I'm putting it out there anyway: the basis is a custom little web framework called “Diva”, the project site is here, Subversion repository here. The code for both my blog and my scratchpad site is included as examples.

read on …

The Truth About Unicode In Python

3 July 2008
10:13

The unicode support in Python is generally considered to be pretty good. And in comparison to many other languages, it's good indeed.

But compared to what is provided by the International Components for Unicode (ICU) project, there's also a lot missing, including collation, special case conversions, regular expressions, text segmentation, and bidirectional text handling. Not to mention extensive support for locale-specific formatting of dates and numbers and time calculations with different calendars.

Basically what Python does provide out of the box is “only” encoding/decoding, normalization, and some other bits such as simple case conversion and splitting on whitespace. It's the absolute minimum you need to do anything useful with unicode, but often not enough to build truly internationalized applications. (Fortunately, most applications get away without true internationalization.)

In this post, I'm going to talk about a couple of the problems with unicode in Python. Please note that this is not intended as a criticism of Python's unicode support or the people who designed and implemented it. Most of those people probably know a whole lot more about unicode than I do, and the limitations discussed here are the result of a pragmatic approach to implementing unicode support, rather than due to a lack of knowledge.

read on …

Offline

15 April 2008
17:52

I'm going on vacation for three weeks starting in a couple hours, and I'll have little to no access to the net. So I'll be even less responsive to email and all that than I am anyway.

I would've really liked to make a Genshi 0.5 release before leaving, but unfortunately that didn't work out. Just as we were closing in on the last couple of tickets, Google came out with App Engine, which Genshi currently does not work with due to various restrictions in the hosting environment. And I'd really like the 0.5 release to be usable with App Engine (some progress has been made on a branch), so the release will have to wait until I'm back.

Incubator4j

28 March 2008
08:38

It's old news by now that CouchDB is moving to the Apache Software Foundation. So what's the family here in the Incubator like, in terms of the other incubating projects?

Currently Incubating Projects By Primary Language

21 Java projects, one project in C, C#, Erlang, Javascript, PHP, and Ruby each

That sure is a lot of Java!

A couple of the non-Java projects look like they haven't been all that active lately (TripleSoup and XAP). Two others are just ports of existing Apache Java projects to other languages (Lucene.NET and Log4php, the first of which is apparently orphaned.) Which leaves just Buildr and CouchDB. And while Buildr is written in Ruby, it is a build system for Java applications.

Just an observation.

Python httplib Performance Problems

25 March 2008
13:50

I've run into a tricky performance problem with my CouchDB client for Python.

I recently started getting some reports about really bad performance of the client on Linux, where you apparently couldn't get more than about 25 requests per second in throughput (that number may sound familiar.) I develop on Mac OS X, and could not reproduce the problem. In fact, on OS X couchdb-python was often faster than the other methods. But when I went on to run the performance tests on this server (a Debian 4 install with Python 2.4.4, running on a Xen-powered Linode), I was seeing the very same problem.

So I've been trying to figure out where the code is going wrong. And right now I'm just puzzled. I'd be grateful if anyone could point me in the right direction.

read on …

Genshi Slot @ GSoC 2008

20 March 2008
12:07

The TurboGears project has been accepted as a mentoring organization for the Google Summer of Code program this year. The project ideas include two around Genshi: generally improved performance, and compatibility with Jython. Both tricky. I volunteered to act as mentor for the performance improvements project.

If you're a student with solid knowledge of both Python and XML/HTML, and you're looking for a GSoC project that's both interesting and challenging, start by reading my own ideas for performance work. The Genshi code makes intensive use of Python language features such as closures and generators, and then there's also AST transformation and bytecode generation going on. The thing is that much of the low-hanging fruit for optimization has already been picked. In trunk we even have a _speedups extension written in C. So making major advancements in performance will require some thinking outside the box.

If you have any questions, ask on the the Genshi mailing list, and/or come visit us on our IRC channel. See also the announcement on the TurboGears mailing list and the GSoC FAQs for more information.

CouchDB, XML, and E4X

4 March 2008
18:09

Not that long ago, CouchDB moved from XML document representations and a custom query language (dubbed “Fabric”) to JSON for documents and Javascript for views. Apparently, that move attracted a lot of new people to the project, myself included.

Not long after the switch, some think about defining JSON encodings of common XML formats. Others ask about using XML in CouchDB. Simply add back the XML backend and let people choose what they prefer? Hell, no.

Turns out there’s a much better way to support XML data in CouchDB: ECMA-357, also known as “ECMAScript for XML”, also known as E4X. And Mozilla’s SpiderMonkey Javascript engine, which CouchDB uses as the default view server, conveniently implements E4X. So it’s just a matter of enabling that support. Which means that, all of the sudden, and without any changes to the core, CouchDB is pretty well positioned for storing and querying XML data in addition to JSON.

For example:

by_lang: function(doc) {
  var html = new XML(doc.content);
  map(html.@lang, {title: html.head.title.text(), …});
}

To be fair, this is already possible if you use other view servers (such as the Ruby or Python ones), where you have access to the XML support provided by the respective standard libraries. Given CouchDB’s incremental view update model, you usually don’t care so much about the performance of view functions as you care about the data they produce. So if your view function can somehow parse the XML and put some data into the view index, that's usually all you need. Actually querying the view is going to be really fast.

But E4X is an exceptionally convenient API for XML. I think using E4X is going to be a pretty good approach for those who want to use CouchDB to store and query XML content.

Notes on Writing a Blog Tool from Scratch

31 January 2008
12:24

  • At least on the Mac, there don't seem to be any good Atom-enabled publishing clients at this point. On Windows you have Microsoft's Windows Live Writer, which comes with good AtomPub support, and actually seems to be pretty nice.

    Wish I'd be using MarsEditI really hope MarsEdit will catch up in this space soon. The 2.1 version released today doesn't include generic AtomPub support, but hopefully an upcoming version will. In the meantime, I'm using Joe Gregorio's apexer command-line tool to post here, as I don't feel like adding either an HTML interface or an implementation of the MetaWeblog API.

  • Is XML-RPC really the most solid option to support some kind of linkback these days? Hey, 1999 called and wants its technology back. Can't we just use a plain HTTP POST, something like:

    POST /ping/ HTTP/1.1
    Host: example.com
    Referer: http://mysite.com/source-uri/
    Content-Type: text/plain;charset=utf-8
    Content-Length: 30
    
    http://example.com/target-uri/

    That is, the body of the request contains the URI of the resource that's being pinged, and the Referer header contains the URI of the resource that's, well, referring to it. Respond with 202 Accepted if the target URI exists and allows pingbacks, otherwise response with 404 Not Found or 403 Forbidden, respectively. Or 409 Conflict if that referrer has already been registered. Or something along those lines.

  • Being tracked by a couple of planets really makes you watch your steps when changing things around.