Push-Strategy Web Templating

31 January 2005
18:57

I've long been a fan of the “push model” when it comes to template engines for web applications: basically, the web application code prepares the data that is going to be available on the generated page, and passes that data to the template engine. Now I am starting to have doubts.

As a simple example, the following code snippet demonstrates how the push-strategy works for the template engine ClearSilver:

import neo_cs, neo_util

  # prepare the data that should be rendered
  data = neo_util.HDF()
  data.setValue('article.title', 'Hello world')
  data.setValue('article.pubDate', time.strftime('%x %X', time))

  # apply the template
  tmpl = neo_cs.CS(data)
  tmpl.parseStr("""
<html>
 <head>
  <title><?cs var:article.title ?</title>
 </head>
 <body>
  <h1><?cs var:article.title ?</h1>
  <p><?cs var:article.pubDate ?></p>
 </body>
</html>
""")
  out = tmpl.render()

As a general rule, this model requires that requests are first processed by application code. The template is applied as a last step after all data has been collected and computed. The syntax of templating instructions you can use in ClearSilver templates is very limited, and intentionally so. You can access and output items from the data that was pushed into the engine, you have conditionals and loops, and macros for defining reusable template blocks.

This model encourages keeping logic out of the templates, and it can be found in tons of other template engines, such as Velocity and WebMacro (both Java™) or the Perl module HTML::Template. In a way, this is also the model underlying XSLT. There are many, many others, but few make the restrictions on templates as drastic as ClearSilver does.

Problems with Pure Push-model Templating

One problem with this approach is that a lot of things that arguably do belong in the presentation layer shift into the application code: stuff like escaping HTML/XML characters in strings or formatting dates and numbers come to mind. (ClearSilver actually does have some of this, but only if you use it in combination with the neo_cgi module, which – as the name implies – only works when you're using CGI.) Any serious template engine should provide such essential features, and most do.

Another problem is less obvious: because push-model templating is basically a two-stage process, all data required while applying the template must be generated beforehand. This has two disadvantages:

  • The template might not need all the data you're generating, so you end up wasting precious cycles.

    (Why push data that isn't used in the templates? Because templates are not just a means of separating presentation from logic, but also a layer for customization in many applications. As such you can't be sure which data a customized template might need access to, so you end up stuffing everything you have into the engine.)

  • For larger data-sets, you first assemble the entire data structure in memory. In this case you're wasting memory, which becomes more critical the larger the data-set can get.

    The template will not need access to all of the available data at every single point in the process. Often, you're sequentially looping through a list of items, and you only access the current item.

For both problems, the solution is to provide on-demand generation of data, borrowing the basic concept from “pull-model” templating and integrating it to form a hybrid template engine. Instead of forcing the application code to push all the data, provide a way for it to register hooks that are called when the template requests data items.

Hybrid Template Engines

Thinking about it, JSP 2 is such a hybrid template engine. You have tag libraries, which you can use to pull data into the page. And you can use JSP pages as templates that run after the application has processed the request and prepared data for the page to process. Other template engines can also be considered hybrids: for example, Velocity allows Java™ methods to be called from the template. XSLT allows custom extension functions that return node sets. Et cetera.

However, for all the hybrid template engines I know of, the distinction between the pull and push paradigms is explicit. The template editor needs to know what data has been pushed into the template engine, and what data needs be pulled. And the folks responsible for the application code can easily break templates if they refactor the code so that some data must be pulled rather than pushed — or vice versa.

Ideally, a template engine would allow you to populate the data set with either “dead” values or with callbacks that would get invoked when the template requests a particular data item. Look at the following modification to the ClearSilver snippet above:

import neo_cs, neo_util

  # prepare the data that should be rendered
  data = neo_util.HDF()
  data.setValue('article.title', 'Hello world')

  def render_pubdate(context):
      return time.strftime('%x %X', time)
  data.setCallback('article.pubDate', render_pubdate)

  # apply the template
  tmpl = neo_cs.CS(data)
  tmpl.parseStr("""
<html>
 <head>
  <title><?cs var:article.title ?</title>
 </head>
 <body>
  <h1><?cs var:article.title ?</h1>
  <p><?cs var:article.pubDate ?></p>
 </body>
</html>
""")
  out = tmpl.render()

Here, the only thing that changed is the way we define the pubDate data item. The way you access it from the template has remained the same. The callback system should also support iteration and conditionals, of course.

But Doesn't Pull Strategy Degrade to Push Strategy?

In his paper Enforcing Strict Model-View Separation in Template Engines, Terence Parr argues that items in template data often have dependencies on each other that result in a DAG:

Model data values, hereafter referred to simply as attributes, often depend on the values of other attributes. These dependencies form a DAG, a directed acyclic graph. The graph is acyclic in order to be well-defined; for example, attribute ai cannot depend on itself. To preserve correctness (to be safe), the view may not request attributes from the model in an order that violates a dependency in the graph.

An example of such a dependency is when you want to output the number of items in a list before actually having pulled the list itself. Thus to provide the template with the correct data-set, all data needs to be computed up front:

Hence, [in the worst case] all attributes are computed a priori. Pull degenerates to push in the worst case.

While I'm not sure what this has to do with violating the separation of logic and presentation (the title of the containing section), the argument that pull tends to degenerate to push is interesting for the discussion here.

Not every attribute in the template data has a dependency on every other attribute. A lot of the data items will not have any dependency whatsoever. And while some data items are related to some other data items, that doesn't strictly result in a dependency. For example, getting the length of a list might not even require retrieving the entire list, and more importantly, definitely doesn't require the retrieval of each of the elements of the list. It may be correct that the worst case degrades to push-strategy, but the worst case assumes that all attributes in the template data-set are somehow interconnected to form a single big dependency graph. In that case you probably have a problem anyway.

Consider an example I encountered when working on the Trac project: we have a changeset view where all changes made in a Subversion check-in are displayed as beautifully formatted diffs. The structure of the individual diffs in the data-set is pretty memory intensive and expensive to generate. And if you happen to have a very large changeset (which is pretty rare, but might happen when you merge a branch that has a ton of changes), the code needs to generate all the diffs in advance and hold them in memory, even though the template will only process one at a time! And no, we don't need to process all the diffs just to come up with the list of files that have been changed, that information is available through the database and is very cheap to compute.

(You might be thinking that we should be paging the changeset anyway in this case, and I would agree. But that isn't addressing the source of problem… there will always be situations when you need to generate pages that are larger than your average test input. And even for your average test input, the pull-strategy might result in better performance.)

In conclusion, I am starting to think that pure push-model templating is a nice approach in theory, but trades off performance for purity. As explained above, the push strategy has some serious performance disadvantages. (And we haven't even mentioned the streaming of responses yet.) Most existing template engines offer a hybrid strategy, combining the push and pull strategies in some way. But also, most of them do so in a clumsy way.