A question that I’ve been seeing more frequently these days is how to extract a JavaScript object from an HTML list or table, given no data or information other than the markup. It’s not ideal to work backwards from HTML, but sometimes you just don’t have a lot of choice in the matter.

Whether you’re enhancing legacy elements that have been generated on the server-side or want to parse the output of a third-party DHTML widget, there are a variety of situations where converting HTML to raw data is a legitimate need. You may have seen iterative solutions to this problem before. However, nested looping code gets messy fast, doesn’t feel much like idiomatic jQuery, and certainly isn’t as concise as you’d probably like.

Luckily, one of JavaScript’s lesser-known utility methods and jQuery’s implementation of it can improve the situation quite a bit. In this post, I’m going to show you how to use this method, jQuery’s cross-browser solution, and how to use it to extract data objects from arbitrary HTML lists and tables.


Array.map()

It turns out that there’s a tool perfectly suited to the task of coercing one data structure into another: map.

Map is a higher-order function that allows you to transform the contents of a collection by applying a function to each item, capturing the result, and building a new collection of those results.

Map is a perfect tool for translating a collection full of extraneous data into a tightly-focused collection of exactly the desired subset. Even better, JavaScript 1.6 even includes a native implementation of map, which is exposed as a method on the Array prototype.

For example, this is how you could use JavaScript 1.6’s Array.map() to analyze an array of strings and create a new array containing each string’s length:

var sites = ['Encosia', 'jQuery', 'ASP.NET', 'StackOverflow'];
 
// For each site in the array, apply this function 
//  and build an array of the results.
var lengths = sites.map(function(site, index) {
  // Use the length of each name as its value in the new array.
  return site.length;
});
 
// This outputs: [7, 6, 7, 13]
console.log(lengths);

This is a very simple example, but you can probably already imagine applying that same technique to an array of list elements or table rows. The concise expressiveness of the map approach is great for paring away extraneous markup and extracting just underlying data.

Mapping uncharted territory

Unfortunately, JavaScript 1.6 and its map implementation is not something that you can count on being available in older browsers. Notably, Internet Explorer doesn’t provide an Array.map() implementation until IE9.

Though that is disappointing, map isn’t difficult to manually implement. For example, this is a polyfill that the MDC recommends for patching Array.map() into older browsers:

if (!Array.prototype.map) {
  Array.prototype.map = function(fun /*, thisp */)  {
    "use strict";
 
    if (this === void 0 || this === null)
      throw new TypeError();
 
    var t = Object(this);
    var len = t.length >>> 0;
    if (typeof fun !== "function")
      throw new TypeError();
 
    var res = new Array(len);
    var thisp = arguments[1];
    for (var i = 0; i < len; i++)  {
      if (i in t)
        res[i] = fun.call(thisp, t[i], i, t);
    }
 
    return res;
  };
}

That’s a workable solution, but I doubt you’re very excited about the prospect of including all this code in your page. I know I wouldn’t be.

jQuery has you covered

If you’re already including jQuery in your pages, the good news is that jQuery has a built-in map implementation that works in every browser. In fact, jQuery provides two separate map methods: one that’s specially suited to working with jQuery selections and a general utility method that’s more similar to the polyfill shown above.

For working with HTML, I’m going to focus on using the former: .map().

To replicate the JavaScript 1.6 dependent example shown earlier, using jQuery’s implementation instead, the code would look like this:

var sites = ['Encosia', 'jQuery', 'ASP.NET', 'StackOverflow'];
 
// Same as before, using jQuery's map() implementation.
var lengths = $(sites).map(function(index, site) {
  // Use the length of each site name as its value in the new array.
  return site.length;
});
 
// This outputs: [7, 6, 7, 13]
console.log(lengths);

Making “this” approach more concise

To condense the code a bit, we can take advantage of the execution context within the callback function. During each callback, this holds the value of the array item currently being operated on. So, there’s no need to bother capturing the callback’s two input parameters:

var lengths = $(sites).map(function() {
  // "this" refers to the current array element as this callback is
  //  applied to each array element.
  return this.length;
});

That isn’t a huge improvement, but every little bit helps and I’ll be using this in the examples throughout the rest of this post. So, I wanted to make sure what’s happening there is clear.

Unwrapping the result of jQuery’s .map()

The one quirk when using jQuery’s .map() method is that it sometimes returns a jQuery wrapped set; specifically, when you apply it to the result of a jQuery DOM selection. Even if your mapping function returns scalar values like strings and numbers, the end result of .map() will include the jQuery object prototype on each element.

That isn’t really a problem if you only intend to use that result immediately in your JavaScript code. However, the jQuery object prototype hanging off each element throws a wrench in the works if you try to use JSON.stringify() on the result of .map(). Since JSON serialization is such a common task when storing or transmitting JavaScript data, this quirk turns out to be a real issue.

The solution is to call jQuery’s get() method on those wrapped-array results, which boils them down to plain arrays. When you see .get() tagged onto the end of the examples ahead, that’s why it’s there.

Now, let’s take a look at applying .map() to HTML and using it to extract data.

Mapping the data within HTML unordered lists

Using .map() against an unordered list is one of the most straightforward examples to start with. Imagine you had this simple HTML markup:

<ul>
  <li>Item 1</li>
  <li>Item 2</li>
  <li>Item 3</li>
</ul>

To extract each of those items’ displayed value, you could use .map() like this:

// Returns ['Item 1', 'Item 2', 'Item 3']
$('li').map(function() {
  // For each <li> in the list, return its inner text and let .map()
  //  build an array of those values.
  return $(this).text();
}).get();

Complicating things slightly, maybe the list items also have an HTML5 data- attribute that you need to collect in addition to their values:

<ul>
  <li data-id="123">Item 1</li>
  <li data-id="456">Item 2</li>
  <li data-id="789">Item 3</li>
</ul>

Using .map() to extract that more complex data is just as easy:

// Returns [{id: 123, text: 'Item 1'}, 
//          {id: 456, text: 'Item 2'},
//          {id: 789, text: 'Item 3'}]
$('li').map(function() {
  // $(this) is used more than once; cache it for performance.
  var $item = $(this);
 
  return { 
    // Note: using .data() to read HTML5 data- attributes 
    //  requires jQuery 1.4.3+. Use attr() in older versions.
    id: $item.data('id'), 
    text: $item.text()
  };
}).get();

As you can see, .map() is a powerful tool for concisely pulling arbitrary bits of data together into a useful structure. You could certainly do this with a temp variable and for-loop, but it’s hard to beat the clean expressiveness this approach lends your code.

There’s a great JavaScript learning opportunity in the code above, but it’s on a bit of a tangent. Rather than let this post run even longer, I wrote about that in a separate post. If you’re interested in how an innocuous change to the location of one curly brace in the preceding code can transparently break it, that post is for you.

You can find that post here: In JavaScript, curly brace placement matters: An example.

Extracting data from HTML tables

Working with the lists is good for a simple example, but what if we need to apply this technique to an HTML structure that’s more complex than an unordered list?

HTML tables are one of the most common targets for this technique. It’s not unusual to end up with a pre-rendered table that was generated off-page and to desire a client-side data structure representing that table’s data.

For example, here’s a tabular representation of the same data contained in the second list example:

<table id="myTable">
  <thead>
    <tr>
      <th>id</th>
      <th>text</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>123</td>
      <td>Item 1</td>
    </tr>
    <tr>
      <td>456</td>
      <td>Item 2</td>
    </tr>
    <tr>
      <td>789</td>
      <td>Item 3</td>
    </tr>
  </tbody>
</table>

If you wanted to boil that table down to exactly the same JavaScript object shown in the second list example, this .map() usage would do the trick:

// Returns [{id: 123, text: 'Item 1'}, 
//          {id: 456, text: 'Item 2'},
//          {id: 789, text: 'Item 3'}]
$('#myTable tbody tr').map(function() {
  // $(this) is used more than once; cache it for performance.
  var $row = $(this);
 
  // For each row that's "mapped", return an object that
  //  describes the first and second <td> in the row.
  return {
    id: $row.find(':nth-child(1)').text(),
    text: $row.find(':nth-child(2)').text()
  };
}).get();

The key to making this approach work is using the :nth-child selector to index into each row and retrieve the contents of the cells we’re interested in. This is very similar to how we handled the unordered list earlier, but can be applied to arbitrarily large structures such as wide HTML tables.

If you use this approach, one thing to keep in mind is that :nth-child uses one-based indexing. So, you must use :nth-child(1) to select the first cell, not :nth-child(0) as you might expect.

A general solution for tables

Using hard coded :nth-child selectors works well enough in simple scenarios, but it’s brittle. If the table structure changes, relying on a certain table layout will break. Hard coding the selectors for each column also becomes tedious when dealing with wider tables that have many columns.

So, as you apply this technique to larger or less predictable tables, you may desire a more general solution for extracting the data. One way of doing that is using the table’s column heading cells to build a basic schema of the table’s data.

Assuming your table has a proper <thead>, this is how you could extract an array of its column headings to use as a schema for mapping the rest of the table’s data:

var columns = $('#myTable thead th').map(function() {
  // This assumes that your headings are suitable to be used as
  //  JavaScript object keys. If the headings contain characters 
  //  that would be invalid, such as spaces or dashes, you should
  //  use a regex here to strip those characters out.
  return $(this).text();
});

With that column list handy, we can determine which column name any cell in the table should be filed under, given nothing more than its index in the row. Now we can automate the process that previously required those :nth-child selectors:

var tableObject = $('#myTable tbody tr').map(function(i) {
  var row = {};
 
  // Find all of the table cells on this row.
  $(this).find('td').each(function(i) {
    // Determine the cell's column name by comparing its index
    //  within the row with the columns list we built previously.
    var rowName = columns[i];
 
    // Add a new property to the row object, using this cell's
    //  column name as the key and the cell's text as the value.
    row[rowName] = $(this).text();
  });
 
  // Finally, return the row's object representation, to be included
  //  in the array that $.map() ultimately returns.
  return row;
 
// Don't forget .get() to convert the jQuery set to a regular array.
}).get();

That’s it.

With all the comments, that looks like more work than it actually is. Eleven lines of code for the entire ordeal isn’t bad considering that it will automatically handle the majority of tables you throw at it.

Conclusion

I’m going to stop here, before this gets any longer. I hope that you found this helpful and/or interesting.

Even if you don’t often convert HTML markup to JavaScript objects, do keep .map() in mind when you’re working with collections of any type. When you need it, the notion of map is an extremely useful aspect of JavaScript’s functional nature, but often goes overlooked.