Berkeley DB XML Indices
24 August 2005
16:29
As the number of builds of the Bitten project continued to grow, I started to notice a rather dramatic slow-down of the build results view. I suspected that I somehow had gotten the indices on the BDB XML database wrong: while most of the information about builds is stored in the SQLite database, the generated reports (test results, code coverage, and so on) are stored in a BDB XML database. The build results page queries the BDB XML database to find out which reports were generated for a given build step.
So I starting experimenting with the indices a bit to see how much of an effect the make on query execution time. It quickly turned out that for some reason the database on the Bitten site wasn't using indices at all. Which would explain the bad performance.
Anyway… during experimentation I wrote a quick Python script that would test the performance difference of indices for a set of queries. Here's the output for a database lacking any special indices:
Indices:
Executing query: /report[dbxml:metadata("build")=1 and \
dbxml:metadata("step")="test"]
2 results, took 0:00:03.023641
Executing query: /report[dbxml:metadata("build")=1 and \
dbxml:metadata("step")="test" and \
@type="unittest"]
1 results, took 0:00:03.259448
So that's over 3 seconds for a single query. BDB XML actually has to go through all the documents in the database – currently 140. Imagine how the number would look for a database with a couple thousand documents! After adding the indices, though, performance improves dramatically:
Indices:
{}:build "node-metadata-equality-decimal"
{}:step "node-metadata-equality-string"
{}:type "node-attribute-equality-string"
Executing query: /report[dbxml:metadata("build")=1 and \
dbxml:metadata("step")="test"]
2 results, took 0:00:00.046611
Executing query: /report[dbxml:metadata("build")=1 and \
dbxml:metadata("step")="test" and \
@type="unittest"]
1 results, took 0:00:00.013773
The times have now dropped to under 50 milliseconds. This is probably obvious to anyone who's ever used BDB XML, or even just read the documentation. But it's still quite impressive what a difference indices make even for such a relatively small number of documents.