Use jQuery to extract data from HTML lists and tables
JavaScript, jQuery, UI By Dave Ward; Updated March 31, 2011A question that I’ve been seeing more frequently these days is how to extract a JavaScript object from an HTML list or table, given no data or information other than the markup. It’s not ideal to work backwards from HTML, but sometimes you just don’t have a lot of choice in the matter.
Whether you’re enhancing legacy elements that have been generated on the server-side or want to parse the output of a third-party DHTML widget, there are a variety of situations where converting HTML to raw data is a legitimate need. You may have seen iterative solutions to this problem before. However, nested looping code gets messy fast, doesn’t feel much like idiomatic jQuery, and certainly isn’t as concise as you’d probably like.
Luckily, one of JavaScript’s lesser-known utility methods and jQuery’s implementation of it can improve the situation quite a bit. In this post, I’m going to show you how to use this method, jQuery’s cross-browser solution, and how to use it to extract data objects from arbitrary HTML lists and tables.
Array.map()
It turns out that there’s a tool perfectly suited to the task of coercing one data structure into another: map.
Map is a higher-order function that allows you to transform the contents of a collection by applying a function to each item, capturing the result, and building a new collection of those results.
Map is a perfect tool for translating a collection full of extraneous data into a tightly-focused collection of exactly the desired subset. Even better, JavaScript 1.6 even includes a native implementation of map, which is exposed as a method on the Array prototype.
For example, this is how you could use JavaScript 1.6′s Array.map() to analyze an array of strings and create a new array containing each string’s length:
var sites = ['Encosia', 'jQuery', 'ASP.NET', 'StackOverflow']; // For each site in the array, apply this function // and build an array of the results. var lengths = sites.map(function(site, index) { // Use the length of each name as its value in the new array. return site.length; }); // This outputs: [7, 6, 7, 13] console.log(lengths);
This is a very simple example, but you can probably already imagine applying that same technique to an array of list elements or table rows. The concise expressiveness of the map approach is great for paring away extraneous markup and extracting just underlying data.
Mapping uncharted territory
Unfortunately, JavaScript 1.6 and its map implementation is not something that you can count on being available in older browsers. Notably, Internet Explorer doesn’t provide an Array.map() implementation until IE9.
Though that is disappointing, map isn’t difficult to manually implement. For example, this is a polyfill that the MDC recommends for patching Array.map() into older browsers:
if (!Array.prototype.map) { Array.prototype.map = function(fun /*, thisp */) { "use strict"; if (this === void 0 || this === null) throw new TypeError(); var t = Object(this); var len = t.length >>> 0; if (typeof fun !== "function") throw new TypeError(); var res = new Array(len); var thisp = arguments[1]; for (var i = 0; i < len; i++) { if (i in t) res[i] = fun.call(thisp, t[i], i, t); } return res; }; }
That’s a workable solution, but I doubt you’re very excited about the prospect of including all this code in your page. I know I wouldn’t be.
jQuery has you covered
If you’re already including jQuery in your pages, the good news is that jQuery has a built-in map implementation that works in every browser. In fact, jQuery provides two separate map methods: one that’s specially suited to working with jQuery selections and a general utility method that’s more similar to the polyfill shown above.
For working with HTML, I’m going to focus on using the former: .map().
To replicate the JavaScript 1.6 dependent example shown earlier, using jQuery’s implementation instead, the code would look like this:
var sites = ['Encosia', 'jQuery', 'ASP.NET', 'StackOverflow']; // Same as before, using jQuery's map() implementation. var lengths = $(sites).map(function(index, site) { // Use the length of each site name as its value in the new array. return site.length; }); // This outputs: [7, 6, 7, 13] console.log(lengths);
Making “this” approach more concise
To condense the code a bit, we can take advantage of the execution context within the callback function. During each callback, this holds the value of the array item currently being operated on. So, there’s no need to bother capturing the callback’s two input parameters:
var lengths = $(sites).map(function() { // "this" refers to the current array element as this callback is // applied to each array element. return this.length; });
That isn’t a huge improvement, but every little bit helps and I’ll be using this in the examples throughout the rest of this post. So, I wanted to make sure what’s happening there is clear.
Unwrapping the result of jQuery’s .map()
The one quirk when using jQuery’s .map() method is that it sometimes returns a jQuery wrapped set; specifically, when you apply it to the result of a jQuery DOM selection. Even if your mapping function returns scalar values like strings and numbers, the end result of .map() will include the jQuery object prototype on each element.
That isn’t really a problem if you only intend to use that result immediately in your JavaScript code. However, the jQuery object prototype hanging off each element throws a wrench in the works if you try to use JSON.stringify() on the result of .map(). Since JSON serialization is such a common task when storing or transmitting JavaScript data, this quirk turns out to be a real issue.
The solution is to call jQuery’s get() method on those wrapped-array results, which boils them down to plain arrays. When you see .get() tagged onto the end of the examples ahead, that’s why it’s there.
Now, let’s take a look at applying .map() to HTML and using it to extract data.
Mapping the data within HTML unordered lists
Using .map() against an unordered list is one of the most straightforward examples to start with. Imagine you had this simple HTML markup:
<ul> <li>Item 1</li> <li>Item 2</li> <li>Item 3</li> </ul>
To extract each of those items’ displayed value, you could use .map() like this:
// Returns ['Item 1', 'Item 2', 'Item 3'] $('li').map(function() { // For each <li> in the list, return its inner text and let .map() // build an array of those values. return $(this).text(); }).get();
Complicating things slightly, maybe the list items also have an HTML5 data- attribute that you need to collect in addition to their values:
<ul> <li data-id="123">Item 1</li> <li data-id="456">Item 2</li> <li data-id="789">Item 3</li> </ul>
Using .map() to extract that more complex data is just as easy:
// Returns [{id: 123, text: 'Item 1'}, // {id: 456, text: 'Item 2'}, // {id: 789, text: 'Item 3'}] $('li').map(function() { // $(this) is used more than once; cache it for performance. var $item = $(this); return { // Note: using .data() to read HTML5 data- attributes // requires jQuery 1.4.3+. Use attr() in older versions. id: $item.data('id'), text: $item.text() }; }).get();
As you can see, .map() is a powerful tool for concisely pulling arbitrary bits of data together into a useful structure. You could certainly do this with a temp variable and for-loop, but it’s hard to beat the clean expressiveness this approach lends your code.
There’s a great JavaScript learning opportunity in the code above, but it’s on a bit of a tangent. Rather than let this post run even longer, I wrote about that in a separate post. If you’re interested in how an innocuous change to the location of one curly brace in the preceding code can transparently break it, that post is for you.
You can find that post here: In JavaScript, curly brace placement matters: An example.
Extracting data from HTML tables
Working with the lists is good for a simple example, but what if we need to apply this technique to an HTML structure that’s more complex than an unordered list?
HTML tables are one of the most common targets for this technique. It’s not unusual to end up with a pre-rendered table that was generated off-page and to desire a client-side data structure representing that table’s data.
For example, here’s a tabular representation of the same data contained in the second list example:
<table id="myTable"> <thead> <tr> <th>id</th> <th>text</th> </tr> </thead> <tbody> <tr> <td>123</td> <td>Item 1</td> </tr> <tr> <td>456</td> <td>Item 2</td> </tr> <tr> <td>789</td> <td>Item 3</td> </tr> </tbody> </table>
If you wanted to boil that table down to exactly the same JavaScript object shown in the second list example, this .map() usage would do the trick:
// Returns [{id: 123, text: 'Item 1'}, // {id: 456, text: 'Item 2'}, // {id: 789, text: 'Item 3'}] $('#myTable tbody tr').map(function() { // $(this) is used more than once; cache it for performance. var $row = $(this); // For each row that's "mapped", return an object that // describes the first and second <td> in the row. return { id: $row.find(':nth-child(1)').text(), text: $row.find(':nth-child(2)').text() }; }).get();
The key to making this approach work is using the :nth-child selector to index into each row and retrieve the contents of the cells we’re interested in. This is very similar to how we handled the unordered list earlier, but can be applied to arbitrarily large structures such as wide HTML tables.
If you use this approach, one thing to keep in mind is that :nth-child uses one-based indexing. So, you must use :nth-child(1) to select the first cell, not :nth-child(0) as you might expect.
A general solution for tables
Using hard coded :nth-child selectors works well enough in simple scenarios, but it’s brittle. If the table structure changes, relying on a certain table layout will break. Hard coding the selectors for each column also becomes tedious when dealing with wider tables that have many columns.
So, as you apply this technique to larger or less predictable tables, you may desire a more general solution for extracting the data. One way of doing that is using the table’s column heading cells to build a basic schema of the table’s data.
Assuming your table has a proper <thead>, this is how you could extract an array of its column headings to use as a schema for mapping the rest of the table’s data:
var columns = $('#myTable thead th').map(function() { // This assumes that your headings are suitable to be used as // JavaScript object keys. If the headings contain characters // that would be invalid, such as spaces or dashes, you should // use a regex here to strip those characters out. return $(this).text(); });
With that column list handy, we can determine which column name any cell in the table should be filed under, given nothing more than its index in the row. Now we can automate the process that previously required those :nth-child selectors:
var tableObject = $('#myTable tbody tr').map(function(i) { var row = {}; // Find all of the table cells on this row. $(this).find('td').each(function(i) { // Determine the cell's column name by comparing its index // within the row with the columns list we built previously. var rowName = columns[i]; // Add a new property to the row object, using this cell's // column name as the key and the cell's text as the value. row[rowName] = $(this).text(); }); // Finally, return the row's object representation, to be included // in the array that $.map() ultimately returns. return row; // Don't forget .get() to convert the jQuery set to a regular array. }).get();
That’s it.
With all the comments, that looks like more work than it actually is. Eleven lines of code for the entire ordeal isn’t bad considering that it will automatically handle the majority of tables you throw at it.
Conclusion
I’m going to stop here, before this gets any longer. I hope that you found this helpful and/or interesting.
Even if you don’t often convert HTML markup to JavaScript objects, do keep .map() in mind when you’re working with collections of any type. When you need it, the notion of map is an extremely useful aspect of JavaScript’s functional nature, but often goes overlooked.
Similar posts
What do you think?
I appreciate all of your comments, but please try to stay on topic. If you have a question unrelated to this post, I recommend posting on the ASP.NET forums or Stack Overflow instead.
If you're replying to another comment, use the threading feature by clicking "Reply to this comment" before submitting your own.
2 Mentions Elsewhere
- 網站製作學習誌 » [Web] 連結分享
- Creating multi-dimensional array in jquery from li list using map function | Whoila Blog



You shouldn’t need to filter out “invalid” names from the table headers. So long as you’re using the [name] syntax for property access, you can use any string value (see http://jsfiddle.net/zwkYj/).
Also, on the first table example, you’ll probably want a var before $row.
You’re right about the missing
var, thanks.I avoid using messy keys in the client-side objects because sending them to .NET server-side backends is such a headache (when possible at all). I suppose that wouldn’t be a problem every type of backend though; especially if you’re using JavaScript on the server-side.
Yeah, I hadn’t thought about what happens once you get it into the JavaScript object…
Great post, thanks for the great overview!
This came to my Inbox right when I was struggling with this very thing! Thanks so much. Now to see if I can hook it into the rest of the problem. (I’m new to jQuery.)
I have to compare two tables (using ASP.NET): one is from a SQL database, the other is user input to be sorted before the comparison. If they match, then the original user input gets sent to a separate table in the database. If not, they keep redoing it until they get it right.
This solution helped me to get the generated gridview into an array I can compare to the sorted user input.
Thanks for the insight in this!
NB. the reason this works is that jQuery automatically converts DOM node lists to arrays — map() is an Array method, but node lists are NOT arrays. To do this without jQuery you’d have to convert your node lists to arrays before you can work with them like this.
But either way, you’ve gained nothing in this approach that you couldn’t have done with conventional iteration — don’t let handy shortcuts fool you into thinking that the script is doing less work!
Right. If it wasn’t clear in the post, the point of this is to use more concise/expressive code to accomplish the task, not that it’s faster today.
In the future, I wouldn’t be surprised if jQuery’s map() implementation is migrated to a polyfill that defers to JavaScript 1.6′s native method when available, so this same code would get an automatic performance upgrade at that time.
“expressive” code?
Yes!
Except that a collection of elements (such as items in a list) is a node list, not an array, and the map() method is for arrays, not node lists.
The point is, somewhere along the line — either in jQuery, or your own code if you’re not using jQuery — that node list has to be converted to an array; and that will require iterating through them. And if you’re going to iterate through them, you might as well use *those* iterators to compile the data, rather than have a secondary stage of processing.
However the functional code dices and slices this, the technique you’re suggesting will *always* be more work than the technique it replaces.
Most of the work is in the inner .find() selectors. The performance difference between using .map(), with .get() to extract an array, and building the array iteratively is a rounding error in the tests I’ve run in the past.
Here’s a jsPerf showing both approaches in context of this post’s code: http://jsperf.com/map-vs-loop
Even in IE compatibility mode, both complete in less than a millisecond. For something that should only run infrequently, I definitely prefer the more expressive .map() approach over prematurely optimizing for negligible gains.
This is a good post, Dave, and I love your articles. Have helped me greatly.
Now what if I have a table but there are a series of Drop Downs that I want to extract the selected IDs from. Is there a good clean way to go about that?
Thanks
Sure, you could use a selector something like this to find the dropdown within a table cell and return its value instead of the cell’s .text().
hello dave
i need some help as i am newbie to javascript (i dont know javascript )
what i need is
i have a html table that updates it values base on user interaction
its all ok
but it want to extract the data in table and display it in
text area box below
from what i understand is your script collect the data and store it in array but how do i display that information text area
please please help i really badly want to display table data information in
text area
i already have the script that changes value of tables based on interaction
http://jsfiddle.net/n9jSy/3/
thanks again
please reply
That’s a bit too complex for me to write the entire script for you. Since you already have the data in object form in JavaScript, there’s no need to use this extraction process. You’ll just want to build the lower area based on the data, just like you built the table. I’d recommend using jQuery Templates for both areas; it’s great at rendering HTML from JavaScript data objects.
thanks fro replying dave
that is not my script and im still in learning javascript :(
but is there any way that i can Copy table cells to the clipboard
through the click of button ?
thanks
You can’t access the clipboard directly from JavaScript. People usually use Flash for that.