1 2 3 4 5 6 7 8 9 10 11 12
module Enumerable def cluster sort.inject([[]]) do |out, n| if out.last.include?(n) or out.last.empty? out.last << n else out << [n] end out end end end
Refactorings
No refactoring yet !
Wolfbyte
July 18, 2008, July 18, 2008 04:27, permalink
How about this that does it iteratively. It has to go over the initial list 3 times so execution time would be 3n and it doesn't sort the output. It works in 3 steps.
The first phase gets a count of each item in the enumerable and stores it into a hash.
The second phase creates a new array. For each index in the original the hash is looked up to see how many of the items at this index were marked in phase 1 and produces an array of that many items in the new array. It then sets the count to 0 so that the rest of the elements in the original enumerable will be replaced with empty arrays when encountered. (Note that because of the way that map works these operations had to be reversed and a temporary variable introduced)
The third phase removes the empty lists from the new array.
1 2 3 4 5 6 7 8
module Enumerable def cluster counts = {} counts.default = 0 self.each { |i| counts[i] += 1 } self.map { |i| j, counts[i] = counts[i], 0; [i] * j }.reject { |i| i.empty? } end end
Joran Jessurun
July 18, 2008, July 18, 2008 08:16, permalink
This is not really different, but I think it is easyer to read.
1 2 3 4 5 6 7 8 9 10 11 12 13
module Enumerable def cluster result = [] sort.each do |value| if result.last && result.last.last == value result.last << value else result << [value] end end result end end
Emmett
July 18, 2008, July 18, 2008 18:03, permalink
One way to think about this is a hash with counts; that's probably going to be most efficient.
Another way to think about it is simply with appends and finds.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
# hash with counts module Enumerable def cluster h = Hash.new(0) each {|x| h[x] += 1} h.map {|k, count| Array.new(count, k)} end end # appends and finds module Enumerable def cluster a = Array.new each {|x| if sub = a.find {|y| y.first == x} sub << x else a << [x] end } a end end
Emmett
July 18, 2008, July 18, 2008 18:05, permalink
Oops, I just realized I can write appends-n-finds (which is natural order preserving, btw) much more elegantly
1 2 3 4 5 6 7 8 9 10 11 12 13
# appends and finds module Enumerable def cluster inject(Array.new) {|a, x| if sub = a.find {|y| y.first == x} sub << x a else a << [x] end } end end
Joran Jessurun
July 18, 2008, July 18, 2008 19:07, permalink
What about this one, it is a variation on with hash and counts, but I build the result directly into the hash.
1 2 3 4 5 6 7 8 9 10
module Enumerable def cluster result = {} each do |value| result[value] ||= [] result[value] << value end result.values end end
Joran Jessurun
July 18, 2008, July 18, 2008 19:17, permalink
Exactly the same solution as above, but with less lines of code. But I don't like it much because it is too cryptic.
1 2 3 4 5 6 7
module Enumerable def cluster result = Hash.new { |hash,key| hash[key] = [] } each { |value| result[value] << value } result.values end end
Jason Dew
July 18, 2008, July 18, 2008 23:36, permalink
If you don't need the order preserved and a hash is sufficient... If you want to keep the ordering, replace with OrderedHash in ActiveSupport (or Ruby 1.9). cluster_arrayed an implementation that returns an array of arrays.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
module Enumerable def cluster inject(Hash.new) do |all, one| all[one] = (all[one] || 0) + 1 all end end def cluster_arrayed cluster.inject(Array.new) do |all, (key, count)| all << [key] * count end end end
Jeremy Weiskotten
July 22, 2008, July 22, 2008 02:04, permalink
This is a pretty clean one-liner. It doesn't perform as well, and I would expect degradation on larger arrays.
1 2 3 4 5
module Enumerable def cluster self.uniq.sort.map { |u| self.select { |v| u == v } } end end
Wolfbyte
July 22, 2008, July 22, 2008 03:05, permalink
@Jeremy - Be aware that uniq is not a method of Enumerable. This means that if you call {:a=>'a', :b=>'b'}.cluster you will get a failure. Be sure to define a method only for classes that it will be acceptable for. This means that if you are extending the Enumerable module you can only rely on the Enumerable methods. The best bet to doing this is to build a minimal class that includes Enumerable for testing.
Not really any different from my first go but this is a 2-liner. It's practically golf now though and I think the original is clearer
1 2 3 4 5 6
module Enumerable def cluster c=inject(Hash.new(0)){|h,i|h[i]+=1;h} map{|i|j,c[i]=c[i],0;[i]*j}.reject{|i|i==[]} end end
Jeremy Weiskotten
July 22, 2008, July 22, 2008 15:16, permalink
@Wolfbyte, good catch. I suppose my method could be added to Array instead of Enumerable.
%w[app noot app mies zoo mies app zoo zo z].cluster
should result in:
[["app", "app", "app"], ["mies", "mies"], ["noot"], ["z"], ["zo"], ["zoo", "zoo"]]
Can this inject be written more naturally? I am not that good in fold and inject.
Also the result should not have to be sorted, maybe the natural order could be preserved somewhat?
What do you think?