Efficient JavaScript string building

From Trephine

Jump to: navigation, search
« Alternative JavaScript worker thread API JavaScript string building benchmarks »

[subscribe] Recent blog entries

Live Demos

Efficient JavaScript string building

It's easy to write poorly performing string-building loops in any language, and JavaScript is no exception. This article showcases common sources of string-building performance problems and supplies a set of simple guidelines for avoiding them.

For the sake of demonstration, consider a function called buildList() which accepts an arbitrary number of arrays as arguments, and builds a string representation of the form "<ul><li>item1</li><li>item2</li></ul>".

For example, if we call:

buildList(
  [1, 2, 3],
  [4, 5, 6]
);

Then we would expect this output:

<ul>
<li>1</li>
<li>2</li>
<li>3</li>
<li>4</li>
<li>5</li>
<li>6</li>
</ul>

Tip: The Test Driven Design thing to do at this juncture is write a test, but that's outside the scope of this article.

Now, let's consider a first stab at an implementation of buildList():

/**
 * Constructs an html string from any number of lists of data items.
 */
function buildList( /* list1, list2, ... */ ) {
  var buf = "<ul>\n";                        // initialize buffer
  for (var i=0; i<arguments.length; i++) {   // iterate over arguments
    var list = arguments[i];                 // grab next list
    for (var j=0; j<list.length; j++) {      // iterate over list
      buf += "<li>";
      buf += list[j];                        // append item to buffer
      buf += "</li>\n";
    }
  }
  buf += "</ul>"                             // finalize buffer
  return buf;
}

In the above implementation, we iterate over all the arguments passed in, and for each one iterate over the elements in the list. For each element, we append the necessary "<li>" wrapper and slap it onto the end of a string called buf, which is our buffer.

The problem with this code is that for each iteration of the inner loop, the JavaScript engine has to create three new string objects. One for the opening li tag, one for the item, and one for the closing tag. Further, each string created is just a little larger than the entire buffer was before hand. This means that as the total number of items (n) increases, the amount of total space allocated grows at rate proportional to n squared.

We can do better. The first thing that we have to do is replace the string buffer with an array buffer:

/**
 * Constructs an html string from any number of lists of data items.
 */
function buildList( /* list1, list2, ... */ ) {
  var buf = [ "<ul>\n" ];                      // initialize array buffer
  for (var i=0; i<arguments.length; i++) {
    var list = arguments[i];
    for (var j=0; j<list.length; j++) {
      buf.push( "<li>", list[j], "</li>\n" );  // append item to buffer
    }
  }
  buf.push( "</ul>" );                         // finalize array buffer
  return buf.join('');                         // concatenate and return
}

The above version is a marked improvement over the first since it avoids the massive amount of copying that a string buffer requires.

We can do still better of course, by following Opera's recommendation of replacing push() with native assignment in performance-critical code. Considering that there may be a very large number of total elements, we can consider the inner loop to be performance-critical, and speed it up accordingly:

/**
 * Constructs an html string from any number of lists of data items.
 */
function buildList( /* list1, list2, ... */ ) {
  var buf = [ "<ul>\n" ];
  for (var i=0; i<arguments.length; i++) {
    var list = arguments[i];
    for (var j=0; j<list.length; j++) {
      buf[buf.length] = "<li>";          // append <li> tags and
      buf[buf.length] = list[j];         // item to array buffer, utilizing
      buf[buf.length] = "</li>\n";       // primitive array assignment
    }
  }
  buf[buf.length] = "</ul>";
  return buf.join('');
}

This is pretty good, but we might be able to do better still by recognizing that the inner loop is basically a join() operation. Native join should be faster than incremental buffering, and so we can apply it as such:

/**
 * Constructs an html string from any number of lists of data items.
 */
function buildList( /* list1, list2, ... */ ) {
  var buf = [];
  for (var i=0; i<arguments.length; i++) {
    var list = arguments[i];
    buf[buf.length] = list.join( "</li>\n<li>" );  // concatenate list, append to buf
  }
  var last = buf.length-1;
  buf[0] = "<ul>\n<li>" + buf[0];                  // add opening <ul> prefix
  buf[last] = buf[last] + "</li>\n</ul>";          // add trailing </ul> suffix
  return buf.join( "</li>\n<li>" );                // concatenate and return
}

There is one problem though with the above code over the example before it. Compared to the previous examlpe, this latest code would end up allocating about twice as much memory - once for all the "rows" and a second time for the overall return value.

And that leads to the final implementation. The previous examples all built the buffer by dynamically extending off the end of the array one item at a time. This carries a cost in that the JavaScript engine may have to periodically re-allocate memory to the list in order to expand. This is a common problem with dynamically sized lists and hashes.

However, by determining the real total number of elements in advance, we can allocate exactly the right amount of space at the beginning and avoid ever incurring that reallocate/copy penalty:

/**
 * Constructs an html string from any number of lists of data items.
 */
function buildList( /* list1, list2, ... */ ) {
  var size = 0;                                  // initalize size counter
  for (var i=0, l=arguments.length; i<l; i++) {  // iterate over arguments
    size += arguments[i].length;                 // add up all lists lengths
  }
  if (size==0) return "<ul></ul>";               // short-circuit for zero items
  var buf = new Array( size ), pos = 0;          // initialize buffer and position
  for (var i=0, l=arguments.length; i<l; i++) {  // iterate over arguments
    var list = arguments[i];                     // grab the next list
    for (var j=0, k=list.length; j<k; j++) {     // iterate over list
      buf[pos++] = list[j];                      // set buffer position to list item
    }
  }
  buf[0] = "<ul>\n<li>" + buf[0];                // add opening <ul> prefix
  buf[size-1] += "</li>\n</ul>";                 // add trailing </ul> suffix
  return buf.join("</li>\n<li>");                // concatenate and return
}

This is by far the most complex implementation, but it avoids the pitfalls of all previous examples. It also handles the case of zero total arguments much more elegantly (the previous examples assumed that there was at least one item, whereas this implementation correctly short-circuits if there are exactly zero). And finally, it avoids re-evaluating length lookups on the arguments array and all the individual lists.

Now of course, writing code in this style is probably overkill in most situations. Premature optimization is the root of all evil after all. But the key takeaways can help you identify potential trouble spots later on:

  1. Never use a string as a buffer to build a long string - prefer an array buffer with join(),
  2. Prefer native array index assignment to push() inside potentially expensive loops,
  3. Set a variable equal to the length of the list before hand, rather than looking it up each pass,
  4. Only use native string concatenation (+) when there are a known small number of strings to add (2 or 3), and
  5. Think about whether you can quickly determine the total space requirements, and pre-allocate if possible.

If you acknowledge these guidelines while coding string-building loops, you'll end up with better performing code in the long run. Hope this helps!

Public domain declaration

Just so there's no confusion: all of the code snippets on this page are provided "AS IS", without warranty of any kind, express or implied.

All of the code snippets on this page are hereby released into the public domain by the me, the copyright holder. This applies worldwide. Or in case this is not legally possible: The copyright holder grants any entity the right to use this work for any purpose, without any conditions, unless such conditions are required by law.

If you'd feel better with a "real" license, you're free to use code snippets on this page under the MIT license as described on the about page.

Any links back to this site are always appreciated, but not required. Enjoy!

--Jim R. Wilson (jimbojw) 12:52, 28 April 2009 (UTC)
Personal tools