Sometimes you want to order an array of strings with embedded numbers. But if we use the sort method from the array class we can have some big surprises (some information has been deleted for clarity):
irb>foo = ['foobar', 'a13bb32', 'a10', 'a13bb2c10',
'a2', 'a3', 'a13bb2c9', 'a13', 'a20', 'a13bb2',
'a13bb1', 'a13bb10', 'a13bb20', 'a13bb3', 'a13bb01']
irb>puts foo.sort
a10
a13
a13bb01
a13bb1
a13bb10
a13bb2
a13bb20
a13bb2c10
a13bb2c9
a13bb3
a13bb32
a2
a20
a3
foobar
The problem is that the sort method from the Array class cannot "see" the embedded numbers. It only sorts comparing each characters one at a time (usually with the ASCII numbers for the priority), so that 'a13' comes before 'a2' (because '1' comes before '2', the sort method doesn't compare '13' with '2'). Because we can open Ruby classes easily, we can add a new way to sort our strings with embedded numbers:
# --------8<--------
class Array
# Method which sort an array composed of strings with embedded numbers by
# the 'natural' representation of numbers inside a string.
def natcmp
reg_number = /\d+/
# We call the sort method of the Array class.
self.sort do |str1, str2|
# We try to find an embedded number
a = str1.match(reg_number)
b = str2.match(reg_number)
# If there is no number
if [a,b].include? nil
str1 <=> str2
else
while true
begin
# We compare strings before the number. If there
# are equal, we will have to compare the numbers
if (comp = a.pre_match <=> b.pre_match) == 0
# If the numbers are equal
comp = (a[0] == b[0]) ? comp = a[0] + a.post_match <=> b[0] + b.post_match :
comp = a[0].to_i <=> b[0].to_i
end
str1, str2 = a.post_match, b.post_match
a = str1.match(reg_number)
b = str2.match(reg_number)
rescue
break
end
end
comp
end
end
end
# Same as 'natcmp' but replace in place.
def natcmp!
self.replace(natcmp)
end
end
# --------8<--------
So now, if we try our new method, here's the result:
irb>puts foo.natcmp
a2
a3
a10
a13
a13bb1
a13bb01
a13bb2
a13bb2c9
a13bb2c10
a13bb3
a13bb10
a13bb20
a13bb32
a20
foobar
We can even use the 'natcmp!' method if we want to change the array in place. I hope this will help somebody.
Tuesday, May 1, 2007
Subscribe to:
Post Comments (Atom)
6 comments:
This is great. I need natural order sorting for filenames for a minor task automation script of mine, which I'm moving from multiple bash batch files to one Ruby program, and I was almost ready to write my own function when I found yours. This is perfect!
Thanks for writing this, and for putting it out here for everyone to benefit from!
This proves extremely helpful for a project I'm currently working on, thanks a ton!
How come something like this isn't in Ruby itself?
thanks, this is exactly what i was looking for!
Nice
But I've got two issues with this:
a) you name the method "natcmp", this is IMO a misnomer (even you call it "natural sorting" in the text)
b) putting it into Array is the wrong place. This requires the array to be of strings and makes it impossible (unreasonably hard) to sort objects that have a string attribute by it. Also it means that you have to add 2 components: comparison and sorting.
Putting the comparison into String is the better choice for those reasons IMO.
For an example see http://pastie.org/557709
I also added String#natsortkey to allow using it with sort_by.
Here is a new open source project written in ruby. Called NaturalSorter. Please take a look.
https://github.com/reiz/naturalsorter
OH. Here again as a real link:
https://github.com/reiz/naturalsorter
Post a Comment