JavaScript is blessed with two remarkably powerful yet under-appreciated methods: split
and join
act as perfect counterparts. Their symmetry allows JavaScript’s array
and string
types to enjoy a unique coupling: arrays can easily be serialized to strings and back again, a feature we can leverage to good effect. In a moment we’ll explore some interesting applications – but first some introductions:
String.prototype.split(separator, limit)
Creates an array of substrings delimited by each occurrence of the separator. The optional limit argument sets the maximum number of members in the resulting array.
"85@@86@@53".split('@@'); //['85','86','53']; "banana".split(); //["banana"]; //( thanks peter (-: ) "president,senate,house".split(',',2); //["president", "senate"]
Array.prototype.join(separator)
Converts the elements of the array to strings, which are then concatenated into a single string using the optional separator string as glue. If no separator is supplied then a comma is used as the binding (which is essentially the same as the toString
method of array).
["slugs","snails","puppy dog's tails"].join(' and '); //"slugs and snails and puppy dog's tails" ['Giants', 4, 'Rangers', 1].join(' '); //"Giants 4 Rangers 1" [1962,1989,2002,2010].join(); //"1962,1989,2002,2010"
Now lets put them to work…
replaceAll
Here’s a simple function that, unlike the native replace
method, will perform a global substring replacement without the use of regular expressions.
String.prototype.replaceAll = function(find, replaceWith) { return this.split(find).join(replaceWith); } "the man and the plan".replaceAll('the','a'); //"a man and a plan"
It performs slower than the native function for small strings with many single character replacements (the trade off is two extra function calls against a regex match) but is actually faster in mozilla when the string gets long and the regex expression runs to more than 2 or 3 chars
occurences
This method tallies the number of matches of a given substring. Again the implementation is straightforward and the invocation requires no regex.
String.prototype.occurences = function(find, matchCase) { var text = this; matchCase || (find = find.toLowerCase(), text = text.toLowerCase()); return text.split(find).length-1; } document.body.innerHTML.occurences("div"); //google home page has 114 document.body.innerHTML.occurences("/div"); //google home page has 57 "England engages its engineers".occurrences("eng",true); //2
repeat
I stole this little gem from Prototype.js:
String.prototype.repeat = function(times) { return new Array(times+1).join(this); } "go ".repeat(3) + "Giants!"; //"go go go Giants!"
The beauty lies in the novel use of the join
method. The focus is on the separator argument while the base array comprises only undefined member values. To illustrate the principal more clearly, lets reproduce the above example in longhand:
[undefined,undefined,undefined,undefined].join("go ") + "Giants!";
Remember each array member is converted into a string (in this case an empty string) before being concatenated. The implementation of the repeat
function is one of the few examples where defining the array via an array literal is not feasible.
Employing the limit
param
I rarely use the split
function’s optional limit
param, but I conjured up an example that does:
var getDomain = function(url) { return url.split('/',3).join('/'); } getDomain("http://www.aneventapart.com/2010/seattle/slides/"); //"http://www.aneventapart.com" getDomain("https://addons.mozilla.org/en-US/firefox/bookmarks/"); //"https://addons.mozilla.org"
(for ‘domain’, read ‘protocol and domain’)
Modifying array members
If we add regex into the mix we can easily use join
and split
to modify the members of an array. Don’t be scared by the name of the function that follows – its task is merely to remove the given string from the front of each item in a given array.
var beheadMembers = function(arr, removeStr) { var regex = RegExp("[,]?" + removeStr); return arr.join().split(regex).slice(1); } //make an array containing only the numeric portion of flight numbers beheadMembers(["ba015","ba129","ba130"],"ba"); //["015","129","130"]
Unfortunately this will fail in IE because they incorrectly omit the first empty member from the split. So now things get a little less pretty:
var beheadMembers = function(arr, removeStr) { var regex = RegExp("[,]?" + removeStr); var result = arr.join().split(regex); return result[0] && result || result.slice(1); //IE workaround }
Why would we use this technique instead of simply using the array map
method specified by EcmaScript 5?
["ba015","ba129","ba130"].map(function(e) { return e.replace('ba','') }); //["015","129","130"]
Well in production code I’d generally use the native map
implementation when its available (its not available in IE<9) – this example was mainly intended as an educational tool. But its also worth noting that the invocation syntax of the join/split
technique is shorter and a little more direct. Most interestingly its also very efficient. When the regex is pre-cached, it slightly outperforms map
in FF and Safari even for very small arrays – and for larger arrays the map
version is blown out of the water (in all browsers) because the join/split
technique requires dramatically fewer function calls:
//test 1 - using join/split var arr = [], x = 1000; while (x--) {arr.push("ba" + x);} var beheadMembers = function(arr, regex) { return arr.join().split(regex).slice(1); } var regex = RegExp("[,]?" + 'ba'); var timer = +new Date, y = 1000; while(y--) {beheadMembers(arr,regex);}; +new Date - timer; //FF 3.6 733ms //Ch 7 464ms //Sa 5 701ms //IE 8 1256ms //test 2 - using native map function var arr = [], x = 1000; while (x--) {arr.push("ba" + x);} var timer = +new Date, y = 1000; while(y--) { arr.map(function(e) { return e.replace('ba','') }); } +new Date - timer; //FF 3.6 2051ms //Cr 7 732ms //Sf 5 1520ms //IE 8 (Not supported)
Pattern matching
Arrays require iteration to perform pattern searching, strings don’t. Regular expressions can be applied to strings, but not to arrays. The benefits of converting arrays to strings for pattern matching are potentially huge and beyond the scope of this article, but let’s at least scratch the surface with a basic example.
Suppose the results of a foot race are stored as members of an array. The intention is that the array should alternate the names of runners and their recorded times. We can verify this format with a join
and a regular expression. The following code tests for accidentally omitted times by looking for two successive names.
var results = ['sunil', '23:09', 'bob', '22:09', 'carlos', 'mary', '22:59']; var badData = results.join(',').match(/[a-zA-Z]+,[a-zA-Z]+/g); badData; //["carlos,mary"]
Wrap up
I hope I’ve demonstrated a few reasons to nominatge split
and join
as JavaScript’s perfect couple. There are plenty of other satisfying uses for these stalwart workhorses, feel free to ping me with any favourites that I left off.
Further Reading
ECMA-262 5th Edition
15.4.4.5 Array.prototype.join
15.5.4.14 String.prototype.split
I’ve set up some tests to compare the performance of the regular expression against split + join for your replaceAll method.
http://jsperf.com/replace-regexp-vs-split-join
The idea is good (actually, I’ve used it in a templating system to fix the /g option that does not work everywhere), but it seems that the regular expression still has the performance advantage over the call of 2 functions
It would be interesting to test over longer strings and to test more of your other functions though, we might have gem hidden somewhere 🙂
Hi JP – yes in my tests the cost of two function calls was too much – though I only tested for small strings.
On the other hand split+join works much faster than array.map for large array mapping – though I did not test vs. simple for loops – which I suspect are faster due to many fewer function calls
Minor fix… split called without parameters returns the string. Or rather:
“If separator is undefined, then the result array contains just one String, which is the this value (converted to a String)”
To split on character you need to give empty string as the separator…
whoops – yeah thanks Peter
I can’t believe you left out my personal favorite : the StringBuilder
var HTML = [];
HTML.push(“Hi”);
// more pushing of strings
HTML.join();
For longer strings it really blows the pants off repeated concatenation.
@goyuix
You’re right
FYI: On Windows 7 (x64, Intel Core 2 Duo) using Chrome 9 (dev) and Firefox 4 (minefield) the StringBuilder Array.join() performs worse than the string concatenation.
According to High Performance JavaScript, array joining is only faster than concatenation in IE.
Cool stuff but your beheadMembers messes up if all your members dont have the removeStr defined.
beheadMembers([“ba015″,”ba129″,”b130”], “ba”); // [“015”, “129,b130”]
It’d be better as:
// (didn’t test on all browsers)
var beheadMembers = function(arr, removeStr) {
var regex = RegExp(“[,]?” + removeStr);
var result = arr.join().split(regex).join(“,”).split(“,”);
return result[0] && result || result.slice(1); //IE workaround
}
to ensure that you stripped what was there.
@frio80 nice catch – thanks
Using
split()
andjoin()
for global replaces in strings is a very bad idea performance-wise. Check out this jsPerf test case: http://jsperf.com/global-string-replacementHere’s another test case with a 5000-char string: http://jsperf.com/global-string-replacement/3
Your test uses like 140k chars. Have you tested in other browsers than Firefox 3.6?
split/join comfortably won your test on my mozilla browser (ff3.6.12) 🙂
By the way , yes I tried chrome and split/join got it’s butt kicked 😉 (which is why I failed to mention it!)
In Firefox 3.6 and Opera 10.6 and 11, split/join wins by a very small margin in that last test (with the long string). http://jsperf.com/global-string-replacement/3
.replace()
is much faster than split/join in every other browser — including Firefox 4.0b8pre.1) In Firefox 3.6.12 split/join won by 25% in your own tests!
2) The article never claimed it was a faster option – in fact I said from the outset it was slower – so not sure what your point is.
When I share usage examples for language features I am not saying “please do this without thinking”. I’m not saying “its great”. I’m saying “this is an example of how it works”. (I wrote an article on RegEx too. Does that mean I’m saying RegEx is the answer to everything? 😉 )
I’ll answer your last question first:
That was just a joke, meant in a positive way — you shouldn’t take it literally. I really like this article and the other ones on this site.
Don’t get me wrong, I’m not trying to attack you or anything; I just want to make a point and maybe have a bit of a discussion. No need to get angry, emotional or personal 🙂
Now, for your other questions:
In Chrome, replace is at least 150× faster. In Safari, it’s ± 3 times faster. That makes a 25% in another browser a very small difference IMHO.
Your article says:
My point, since you asked, is that this is only the case for some browsers, and you probably should’ve been more specific not to confuse anyone.
I realize you’re exploring the language and its features, and you don’t mean to say “please do this without thinking”, but I can imagine people who don’t know your site or who are new to JavaScript not getting that if you leave out vital information like this.
Mathias – good point about my wording around performance – when I wrote the article I did some very quick tests in ff only and saw a relatively small performance difference (~20%). I should have tested more before making claims
About the other stuff – no worries – I actually edited my last comment to remove point 3 anyway – I realize it was bit harsh (the moderator has the benefit of editing his comments 😉 )
thanks for helping with the investigation – good stuff
In this blog we always can find interesting ideas 🙂
It appears that the beheadMembers function can chop off the neck if stings have repeats of the removeStr at the begining.
And dismember a string into separate array elemements if there are commas.
beheadMembers( [ ‘hi,x’, ‘hihix’ ], ‘hi’ );
[“”, “x”, “”, “x”]
Also “smartquotes” thwart my attempts at copy-pasta.
That repeat function is indeed a small gem. I immediately rewrote my number zero-padding function which used to contain a loop. Saving it here for posterity:
// Only works correctly for positive numbers and zero.
var zeropad = function (str, len) {
str = str.toString();
if (str.length < len) {
return new Array(len – str.length).concat(str).join('0');
} else {
return str;
}
};
TestComplete allows you to write test scripts in javascript but its file system helpers leave something to be desired – e.g GetContainingFolder(fullFilePath) chokes if the file path contains an ampersand. My regex game was weak today, so I turned to split-slice-join:
.split(‘\\’).slice(0, -1).join(‘\\’)
no idea on performance, but I think it’s neatly expressive.
So i did a couple of performance tests and it appears to be dependent on the browser:
http://jsperf.com/getcontainingfolder
I’m intrigued by the opera result but it wasn’t me so i can’t comment.
A very fresh look, you’ve got. I like it. But why do you use looping? In the global replacement scheme by way of join/split you can do without it. Here’s your code adjusted, which reduces processing time of either method, the 1st one still remaining much more effective.
var regex = RegExp(“[,]?” + ‘ba’);
// prepare data
var arr = [], x = 65535;
while (x–) {arr.push(“ba” + x);}
console.log( arr );
//test 1 – using join/split
var timer = +new Date;
var narr = arr.join().split(regex).slice(1); //beheadMembers
console.log(+new Date – timer);
//test 2 – using native map function
var timer = +new Date;
var marr = arr.map(function (e) { return e.replace(‘ba’,”) });
console.log(+new Date – timer);
Which is better replaceAll or split/join combination ?