Avatar

%w[app noot app mies zoo mies app zoo zo z].cluster

should result in:

[["app", "app", "app"], ["mies", "mies"], ["noot"], ["z"], ["zo"], ["zoo", "zoo"]]

Can this inject be written more naturally? I am not that good in fold and inject.
Also the result should not have to be sorted, maybe the natural order could be preserved somewhat?

What do you think?

1
2
3
4
5
6
7
8
9
10
11
12
module Enumerable
    def cluster 
        sort.inject([[]]) do |out, n|
            if out.last.include?(n) or out.last.empty?
                out.last << n
            else    
                out << [n]
            end     
            out     
        end     
    end
end

Refactorings

No refactoring yet !

861f311cc4a077c439099d0e5d251e73

Wolfbyte

July 18, 2008, July 18, 2008 04:27, permalink

No rating. Login to rate!

How about this that does it iteratively. It has to go over the initial list 3 times so execution time would be 3n and it doesn't sort the output. It works in 3 steps.

The first phase gets a count of each item in the enumerable and stores it into a hash.

The second phase creates a new array. For each index in the original the hash is looked up to see how many of the items at this index were marked in phase 1 and produces an array of that many items in the new array. It then sets the count to 0 so that the rest of the elements in the original enumerable will be replaced with empty arrays when encountered. (Note that because of the way that map works these operations had to be reversed and a temporary variable introduced)

The third phase removes the empty lists from the new array.

1
2
3
4
5
6
7
8
module Enumerable
  def cluster
    counts = {}
    counts.default = 0
    self.each { |i| counts[i] += 1 }
    self.map { |i| j, counts[i] = counts[i], 0; [i] * j }.reject { |i| i.empty? }
  end
end
429d95962c6ae646c107a812bfc49a9d

Joran Jessurun

July 18, 2008, July 18, 2008 08:16, permalink

1 rating. Login to rate!

This is not really different, but I think it is easyer to read.

1
2
3
4
5
6
7
8
9
10
11
12
13
module Enumerable
  def cluster 
    result = []
    sort.each do |value|
      if result.last && result.last.last == value
        result.last << value
      else
        result << [value]
      end
    end
    result    
  end
end
A603b733176e19e6f2b2dccec29294c0

Emmett

July 18, 2008, July 18, 2008 18:03, permalink

2 ratings. Login to rate!

One way to think about this is a hash with counts; that's probably going to be most efficient.

Another way to think about it is simply with appends and finds.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# hash with counts
module Enumerable
    def cluster 
        h = Hash.new(0)
        each {|x| h[x] += 1}
        h.map {|k, count| Array.new(count, k)}     
    end
end

# appends and finds
module Enumerable
    def cluster
       a = Array.new
       each {|x|
         if sub = a.find {|y| y.first == x}
           sub << x
         else
           a << [x]
         end
       }
       a
    end
end
A603b733176e19e6f2b2dccec29294c0

Emmett

July 18, 2008, July 18, 2008 18:05, permalink

1 rating. Login to rate!

Oops, I just realized I can write appends-n-finds (which is natural order preserving, btw) much more elegantly

1
2
3
4
5
6
7
8
9
10
11
12
13
# appends and finds
module Enumerable
    def cluster
       inject(Array.new) {|a, x|
         if sub = a.find {|y| y.first == x}
           sub << x
           a
         else
           a << [x]
         end
       }
    end
end
429d95962c6ae646c107a812bfc49a9d

Joran Jessurun

July 18, 2008, July 18, 2008 19:07, permalink

1 rating. Login to rate!

What about this one, it is a variation on with hash and counts, but I build the result directly into the hash.

1
2
3
4
5
6
7
8
9
10
module Enumerable
  def cluster
    result = {}
    each do |value|
      result[value] ||= []
      result[value] << value
    end
    result.values
  end
end
429d95962c6ae646c107a812bfc49a9d

Joran Jessurun

July 18, 2008, July 18, 2008 19:17, permalink

2 ratings. Login to rate!

Exactly the same solution as above, but with less lines of code. But I don't like it much because it is too cryptic.

1
2
3
4
5
6
7
module Enumerable
  def cluster
    result = Hash.new { |hash,key| hash[key] = [] }
    each { |value| result[value] << value }
    result.values
  end
end
D16d53391068ff0830269149b060789d

Jason Dew

July 18, 2008, July 18, 2008 23:36, permalink

1 rating. Login to rate!

If you don't need the order preserved and a hash is sufficient... If you want to keep the ordering, replace with OrderedHash in ActiveSupport (or Ruby 1.9). cluster_arrayed an implementation that returns an array of arrays.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
module Enumerable
  def cluster
    inject(Hash.new) do |all, one|
      all[one] = (all[one] || 0) + 1
      all
    end
  end

  def cluster_arrayed
    cluster.inject(Array.new) do |all, (key, count)|
      all << [key] * count
    end
  end
end
5170ca260dbd2cdfd5a887a4dba7636f

Jeremy

July 22, 2008, July 22, 2008 02:04, permalink

No rating. Login to rate!

This is a pretty clean one-liner. It doesn't perform as well, and I would expect degradation on larger arrays.

1
2
3
4
5
module Enumerable
  def cluster
    self.uniq.sort.map { |u| self.select { |v| u == v } }
  end
end
861f311cc4a077c439099d0e5d251e73

Wolfbyte

July 22, 2008, July 22, 2008 03:05, permalink

No rating. Login to rate!

@Jeremy - Be aware that uniq is not a method of Enumerable. This means that if you call {:a=>'a', :b=>'b'}.cluster you will get a failure. Be sure to define a method only for classes that it will be acceptable for. This means that if you are extending the Enumerable module you can only rely on the Enumerable methods. The best bet to doing this is to build a minimal class that includes Enumerable for testing.

Not really any different from my first go but this is a 2-liner. It's practically golf now though and I think the original is clearer

1
2
3
4
5
6
module Enumerable
  def cluster
    c=inject(Hash.new(0)){|h,i|h[i]+=1;h}
    map{|i|j,c[i]=c[i],0;[i]*j}.reject{|i|i==[]}
  end
end
5170ca260dbd2cdfd5a887a4dba7636f

Jeremy

July 22, 2008, July 22, 2008 15:16, permalink

No rating. Login to rate!

@Wolfbyte, good catch. I suppose my method could be added to Array instead of Enumerable.

Your refactoring





Format Copy from initial code

or Cancel