JavaScript’s Dream Team: in praise of split and join

JavaScript is blessed with two remarkably powerful yet under-appreciated methods: split and join act as perfect counterparts. Their symmetry allows JavaScript’s array and string types to enjoy a unique coupling: arrays can easily be serialized to strings and back again, a feature we can leverage to good effect. In a moment we’ll explore some interesting applications – but first some introductions:


String.prototype.split(separator, limit)

Creates an array of substrings delimited by each occurrence of the separator. The optional limit argument sets the maximum number of members in the resulting array.

"85@@86@@53".split('@@'); //['85','86','53'];
"banana".split(); //["banana"]; //( thanks peter (-: )
"president,senate,house".split(',',2); //["president", "senate"]

 
Array.prototype.join(separator)

Converts the elements of the array to strings, which are then concatenated into a single string using the optional separator string as glue. If no separator is supplied then a comma is used as the binding (which is essentially the same as the toString method of array).

["slugs","snails","puppy dog's tails"].join(' and '); //"slugs and snails and puppy dog's tails"
['Giants', 4, 'Rangers', 1].join(' '); //"Giants 4 Rangers 1"
[1962,1989,2002,2010].join(); //"1962,1989,2002,2010"

 
Now lets put them to work…

replaceAll

Here’s a simple function that, unlike the native replace method, will perform a global substring replacement without the use of regular expressions.

String.prototype.replaceAll = function(find, replaceWith) {
    return this.split(find).join(replaceWith);	 
}

"the man and the plan".replaceAll('the','a'); //"a man and a plan"

It performs slower than the native function for small strings with many single character replacements (the trade off is two extra function calls against a regex match) but is actually faster in mozilla when the string gets long and the regex expression runs to more than 2 or 3 chars

occurences

This method tallies the number of matches of a given substring. Again the implementation is straightforward and the invocation requires no regex.

String.prototype.occurences = function(find, matchCase) {
    var text = this;
    matchCase || (find = find.toLowerCase(), text = text.toLowerCase());
    return text.split(find).length-1;	 
}

document.body.innerHTML.occurences("div"); //google home page has 114
document.body.innerHTML.occurences("/div"); //google home page has 57
"England engages its engineers".occurrences("eng",true); //2

 
repeat

I stole this little gem from Prototype.js:

String.prototype.repeat = function(times) {
    return new Array(times+1).join(this);	 
}

"go ".repeat(3) + "Giants!"; //"go go go Giants!"

The beauty lies in the novel use of the join method. The focus is on the separator argument while the base array comprises only undefined member values. To illustrate the principal more clearly, lets reproduce the above example in longhand:

[undefined,undefined,undefined,undefined].join("go ") + "Giants!";

Remember each array member is converted into a string (in this case an empty string) before being concatenated. The implementation of the repeat function is one of the few examples where defining the array via an array literal is not feasible.

Employing the limit param

I rarely use the split function’s optional limit param, but I conjured up an example that does:

var getDomain = function(url) {
    return url.split('/',3).join('/');
}

getDomain("http://www.aneventapart.com/2010/seattle/slides/"); 
//"http://www.aneventapart.com"
getDomain("https://addons.mozilla.org/en-US/firefox/bookmarks/"); 
//"https://addons.mozilla.org"

(for ‘domain’, read ‘protocol and domain’)

Modifying array members

If we add regex into the mix we can easily use join and split to modify the members of an array. Don’t be scared by the name of the function that follows – its task is merely to remove the given string from the front of each item in a given array.

var beheadMembers = function(arr, removeStr) {
    var regex = RegExp("[,]?" + removeStr);
    return arr.join().split(regex).slice(1);
}

//make an array containing only the numeric portion of flight numbers
beheadMembers(["ba015","ba129","ba130"],"ba"); //["015","129","130"]

 
Unfortunately this will fail in IE because they incorrectly omit the first empty member from the split. So now things get a little less pretty:

var beheadMembers = function(arr, removeStr) {
    var regex = RegExp("[,]?" + removeStr);
    var result = arr.join().split(regex);
    return result[0] && result || result.slice(1); //IE workaround
}

 
Why would we use this technique instead of simply using the array map method specified by EcmaScript 5?

["ba015","ba129","ba130"].map(function(e) {
	return e.replace('ba','')
}); //["015","129","130"] 

 
Well in production code I’d generally use the native map implementation when its available (its not available in IE<9) – this example was mainly intended as an educational tool. But its also worth noting that the invocation syntax of the join/split technique is shorter and a little more direct. Most interestingly its also very efficient. When the regex is pre-cached, it slightly outperforms map in FF and Safari even for very small arrays – and for larger arrays the map version is blown out of the water (in all browsers) because the join/split technique requires dramatically fewer function calls:

//test 1 - using join/split
var arr = [], x = 1000;
while (x--) {arr.push("ba" + x);}

var beheadMembers = function(arr, regex) {
    return arr.join().split(regex).slice(1);
}

var regex = RegExp("[,]?" + 'ba');
var timer = +new Date, y = 1000;
while(y--) {beheadMembers(arr,regex);};
+new Date - timer;

//FF 3.6 733ms
//Ch 7   464ms
//Sa 5   701ms
//IE 8  1256ms 

//test 2 - using native map function 
var arr = [], x = 1000;
while (x--) {arr.push("ba" + x);}

var timer = +new Date, y = 1000;
while(y--) {
    arr.map(function(e) {
        return e.replace('ba','')
    }); 
}
+new Date - timer;

//FF 3.6 2051ms
//Cr 7    732ms
//Sf 5   1520ms 
//IE 8   (Not supported)

 
Pattern matching

Arrays require iteration to perform pattern searching, strings don’t. Regular expressions can be applied to strings, but not to arrays. The benefits of converting arrays to strings for pattern matching are potentially huge and beyond the scope of this article, but let’s at least scratch the surface with a basic example.

Suppose the results of a foot race are stored as members of an array. The intention is that the array should alternate the names of runners and their recorded times. We can verify this format with a join and a regular expression. The following code tests for accidentally omitted times by looking for two successive names.

var results = ['sunil', '23:09', 'bob', '22:09', 'carlos', 'mary', '22:59'];
var badData = results.join(',').match(/[a-zA-Z]+,[a-zA-Z]+/g);
badData; //["carlos,mary"]

 
Wrap up

I hope I’ve demonstrated a few reasons to nominatge split and join as JavaScript’s perfect couple. There are plenty of other satisfying uses for these stalwart workhorses, feel free to ping me with any favourites that I left off.

Further Reading

ECMA-262 5th Edition
15.4.4.5 Array.prototype.join
15.5.4.14 String.prototype.split

About these ads

25 thoughts on “JavaScript’s Dream Team: in praise of split and join

  1. I’ve set up some tests to compare the performance of the regular expression against split + join for your replaceAll method.
    http://jsperf.com/replace-regexp-vs-split-join

    The idea is good (actually, I’ve used it in a templating system to fix the /g option that does not work everywhere), but it seems that the regular expression still has the performance advantage over the call of 2 functions

    It would be interesting to test over longer strings and to test more of your other functions though, we might have gem hidden somewhere :)

    • Hi JP – yes in my tests the cost of two function calls was too much – though I only tested for small strings.

      On the other hand split+join works much faster than array.map for large array mapping – though I did not test vs. simple for loops – which I suspect are faster due to many fewer function calls

  2. Minor fix… split called without parameters returns the string. Or rather:
    “If separator is undefined, then the result array contains just one String, which is the this value (converted to a String)”
    To split on character you need to give empty string as the separator…

  3. I can’t believe you left out my personal favorite : the StringBuilder

    var HTML = [];
    HTML.push(“Hi”);
    // more pushing of strings
    HTML.join();

    For longer strings it really blows the pants off repeated concatenation.

    • @goyuix

      You’re right

      var a1 = +new Date;
      var i1 = 1000000, x1 = [];
      while (i1--) {
          x1.push[i1*i1*i1];
      }
      x1.join();
      console.log(+new Date - a1);
      //2817ms (windows/ff 3.6)
      //1285ms (mac/safari 5)
      
      var a2 = +new Date;
      var i2 = 1000000, x2 = '';
      while (i2--) {
          x2 = x2 + (i2*i2*i2);
      }
      x2;
      console.log(+new Date - a2);
      //5452ms (windows/ff 3.6)
      //1601ms (mac/safari 5)
      
      
  4. Cool stuff but your beheadMembers messes up if all your members dont have the removeStr defined.

    beheadMembers(["ba015","ba129","b130"], “ba”); // ["015", "129,b130"]

    It’d be better as:
    // (didn’t test on all browsers)
    var beheadMembers = function(arr, removeStr) {
    var regex = RegExp(“[,]?” + removeStr);
    var result = arr.join().split(regex).join(“,”).split(“,”);
    return result[0] && result || result.slice(1); //IE workaround
    }

    to ensure that you stripped what was there.

    • 1) In Firefox 3.6.12 split/join won by 25% in your own tests!
      2) The article never claimed it was a faster option – in fact I said from the outset it was slower – so not sure what your point is.

      When I share usage examples for language features I am not saying “please do this without thinking”. I’m not saying “its great”. I’m saying “this is an example of how it works”. (I wrote an article on RegEx too. Does that mean I’m saying RegEx is the answer to everything? ;-) )

      • I’ll answer your last question first:

        3) Since when did exploring the behavior of language features by example suggest “praises the use of .split() and .join() for just about anything in JavaScript”.

        That was just a joke, meant in a positive way — you shouldn’t take it literally. I really like this article and the other ones on this site.

        Don’t get me wrong, I’m not trying to attack you or anything; I just want to make a point and maybe have a bit of a discussion. No need to get angry, emotional or personal :)

        Now, for your other questions:

        1) In Firefox 3.6.12 split/join by 25% in your own tests!

        In Chrome, replace is at least 150× faster. In Safari, it’s ± 3 times faster. That makes a 25% in another browser a very small difference IMHO.

        2) The article never claimed it was a faster option – in fact I said from the outset it was slower – so not sure what your point is.

        Your article says:

        [split/join] performs almost as well as the native function […]

        My point, since you asked, is that this is only the case for some browsers, and you probably should’ve been more specific not to confuse anyone.

        I realize you’re exploring the language and its features, and you don’t mean to say “please do this without thinking”, but I can imagine people who don’t know your site or who are new to JavaScript not getting that if you leave out vital information like this.

      • Mathias – good point about my wording around performance – when I wrote the article I did some very quick tests in ff only and saw a relatively small performance difference (~20%). I should have tested more before making claims

        About the other stuff – no worries – I actually edited my last comment to remove point 3 anyway – I realize it was bit harsh (the moderator has the benefit of editing his comments ;-) )

        thanks for helping with the investigation – good stuff

  5. It appears that the beheadMembers function can chop off the neck if stings have repeats of the removeStr at the begining.

    And dismember a string into separate array elemements if there are commas.

    beheadMembers( [ 'hi,x', 'hihix' ], ‘hi’ );
    ["", "x", "", "x"]

    Also “smartquotes” thwart my attempts at copy-pasta.

  6. Pingback: Javascript: concatenar cádenas con join() | EtnasSoft

  7. That repeat function is indeed a small gem. I immediately rewrote my number zero-padding function which used to contain a loop. Saving it here for posterity:

    // Only works correctly for positive numbers and zero.
    var zeropad = function (str, len) {
    str = str.toString();
    if (str.length < len) {
    return new Array(len – str.length).concat(str).join('0');
    } else {
    return str;
    }
    };

  8. TestComplete allows you to write test scripts in javascript but its file system helpers leave something to be desired – e.g GetContainingFolder(fullFilePath) chokes if the file path contains an ampersand. My regex game was weak today, so I turned to split-slice-join:

    .split(‘\\’).slice(0, -1).join(‘\\’)

    no idea on performance, but I think it’s neatly expressive.

  9. A very fresh look, you’ve got. I like it. But why do you use looping? In the global replacement scheme by way of join/split you can do without it. Here’s your code adjusted, which reduces processing time of either method, the 1st one still remaining much more effective.

    var regex = RegExp(“[,]?” + ‘ba’);

    // prepare data
    var arr = [], x = 65535;
    while (x–) {arr.push(“ba” + x);}
    console.log( arr );

    //test 1 – using join/split
    var timer = +new Date;
    var narr = arr.join().split(regex).slice(1); //beheadMembers
    console.log(+new Date – timer);

    //test 2 – using native map function
    var timer = +new Date;
    var marr = arr.map(function (e) { return e.replace(‘ba’,”) });
    console.log(+new Date – timer);

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s