Python httplib Performance Problems

25 March 2008
13:50

I've run into a tricky performance problem with my CouchDB client for Python.

I recently started getting some reports about really bad performance of the client on Linux, where you apparently couldn't get more than about 25 requests per second in throughput (that number may sound familiar.) I develop on Mac OS X, and could not reproduce the problem. In fact, on OS X couchdb-python was often faster than the other methods. But when I went on to run the performance tests on this server (a Debian 4 install with Python 2.4.4, running on a Xen-powered Linode), I was seeing the very same problem.

So I've been trying to figure out where the code is going wrong. And right now I'm just puzzled. I'd be grateful if anyone could point me in the right direction.

The Basics

For the HTTP communication with the CouchDB server, couchdb-python uses httplib2, which in turn uses the httplib module in the Python standard library. After being able to reproduce the problem, I stripped down the example code to use just httplib2, and then httplib directly.

The basic pattern with httplib is the following:


import httplib
conn = httplib.HTTPConnection("127.0.0.1:5984")

conn.request('GET', '/')
resp = conn.get_response()

conn.request('GET', '/')
resp = conn.get_response()

The HTTPConnection object is reused between requests. The first request goes through as expected. And then you would've expected the second request to run a bit faster, right? Because CouchDB doesn't close the connection, and uses chunked encoding so we know when the entity body has been completely consumed, we should be able to just send the next request over the same socket, enabling a faster response. Right?

The Numbers

Unfortunately though, the numbers on Linux don't exactly meet the expectation here. The following table compares the numbers between creating a new HTTPConnection object for every request, and reusing the same HTTPConnection object for all the requests:

Separate connectionsPersistent connection
12.09ms12.84ms
1.94ms40.46ms
1.83ms40.70ms
1.99ms40.88ms
2.26ms40.89ms
2.06ms40.98ms

No, I did not mix up the columns here: for persistent connections, subsequent requests take over 300% the time of the initial request. But when you recreate the HTTPConnection for every request, subsequent requests are actually significantly faster for some reason I do not comprehend.

You can fetch the complete test code over on my scratchpad site.

Profiling the example reveals that all the additional time is spent in the socket.readline() function.

The Server

So next I suspected CouchDB (or the underlying inets code) may be implementing/using chunked encoding incorrectly. Inspection of the code and the bytes on the ether didn't reveal any problems, but to completely rule out the server, I used the CherryPy “HelloWorld” application to test against. Here's the code:


import cherrypy

class HelloWorld(object):
    def index(self):
        return "Hello World!"
    index.exposed = True

cherrypy.quickstart(HelloWorld())

Running the test code against the CherryPy server produces the exact same slowdown after the first request as before. I first tried with chunked encoding, but that doesn't actually seem to make a difference.

I've so far been unable to reproduce the performance problem with persistent connections when testing against static files served by the Apache HTTPD 2.2 server. With CouchDB and CherryPy I am seeing the slowdown, but not with Apache. All three provide keepalive connections by default, and chunked encoding doesn't seem to make a difference.

Lost

I'm currently completely lost as to what the cause of the problem may be. Note that the tests perform pretty much as expected on OS X: using persistent connections speeds things up a bit (though not as much as I'd have expected.) Google searches didn't really turn up anything related; I did find an old Python bug report about httplib using unbuffered socket I/O. But monkey patching some buffering into httplib did not change the results I was seeing in any way.

So, does anyone out there have an idea what the problem may be, or the skills needed to figure this out? Oh, and I've recently added bare bones comment functionality to my blog, so if you have any ideas, just post a comment here. Thanks!

Reactions

  1. Dave says:

    25 March 2008
    22:20

    This is weird, I set up my own much more simple benchmark and it behaved as expected (ie, faster for persistent connection). However, when I run your benchmark, I get the results you are seeing. I'm not sure how that's possible since mine was doing pretty much the exact same thing.

  2. Colin Percival says:

    25 March 2008
    22:27

    I'd guess that you're running into the interaction of nagling and delayed ACKs. Run a tcpdump and look at the timestamps to see where things are getting held up.

  3. Dave says:

    25 March 2008
    22:47

    And I'm a doof. The reason my original benchmark was so much faster is that I was only doing one request per command. Whatever the delay is, it occurs between requests whether or not you're making another immediate request. So every individual call to the database over the persistent connection was fast, but once they're stuck in a loop, I see the same delay as you.

  4. Sam says:

    26 March 2008
    05:09

    Note that the Linux delayed ack timer is set to 40ms by default, and this also happens to be the time your persistent requests are taking.

    I suggest you follow Colin's advice and use tcpdump (or WireShark) to take a look at the packets and see which one is being delayed. Then read up on the Nagle algorithm if you don't already know about it, and see if a combination of that and delayed acks are at fault, and if so, which end of the connection is creating the problem.

    Or just don't use persistent connections.

  5. Dave says:

    26 March 2008
    05:56

    I decided to whip up a benchmark without httplib, and unfortunately got the exact same results. This means httplib isn't the culprit after all. I also tried disabling nagle and delayed ack, but there was no significant difference.

    My quickly and poorly coded benchmark can be found here.

  6. Dave says:

    26 March 2008
    14:24

    Found a fix on the couchdb end. Not sure if this is the proper way to do it, but it is the delayed ack problem. In the couchinets directory of the couchdb source, edit httptransport.erl. On lines 124 and 128, add {nodelay, true} to the line of options so that they read:

    sock_opt(ip_comm, Addr, [{backlog, 128},
                                          {reuseaddr,true}, {fd,Fd}, {nodelay, true}])};
    

    and

    sock_opt(ip_comm, Addr,
                          [{backlog, 128}, {reuseaddr, true}, {nodelay, true}])}
    

    Running the benchmarks on this version produces pretty much identical results in both shared and reused connections on my local machine, but only opens one socket instead of 1 for each request.

  7. Evan Jones says:

    26 March 2008
    14:28

    This issue is on the server side. The problem is the Nagle algorithm, which delays sending partial packets in order to wait for more data. This is efficient if the application generates lots of small writes, as it will combine them into a single TCP packet. This is bad if the application (in this case, CherryPy or CouchDB) does one small write: Linux waits up to 40 ms before sending it out.

    The solution: Enable TCP_NODELAY on the server. Adding this hack to the top of your CherryPy "hello world" app fixes the times:

    realsocket = socket.socket
    def socketwrap(family=socket.AF_INET, type=socket.SOCK_STREAM, proto=0):
        sockobj = realsocket(family, type, proto)
        sockobj.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
        return sockobj
    socket.socket = socketwrap
    

    The real fix is that CherryPy, CouchDB, and your client should probably all enable this option. This is what Apache does, and why you don't see a problem with it. See the "Nagle Interaction" section in the following page for more information:

    http://www.w3.org/Protocols/HTTP/Performance/Pipeline.html

    This document explains in more detail:

    http://www.isi.edu/lsam/publications/phttptcpinteractions/node2.html#SECTION00023000000000000000

    I suggest you pass this reference and this bug report on to the CouchDB developers, in order to get the problem resolved.

    Evan Jones

  8. Christopher Lenz says:

    26 March 2008
    15:35

    Dave, Evan, thanks a lot for your comments! We'll see what we can do to fix this on the CouchDB side.