8abd3736ee784dcd159d3b26b882076e

I'm looking for an optimal way to group similar ActiveRecord objects into an array of arrays. The method I have below works but seems highly inefficient since I have to make a full copy of the array, do the grouping, then run uniq to eliminate the duplicates. I know there must be a better way to handle this. I tried Enumerable#group_by but that returns a hash.

Thanks for your help.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
grouped = [] 
arr = [#<Foo id: 1, status_id: 2, profile_id: 3>,#<Foo id: 2, status_id: 2, profile_id: 3>,#<Foo id: 3, status_id: 1, profile_id: 3>]
arr_copy = arr
arr.each do |a|
  grouped << arr_copy.select do |copy|
                       (copy.status_id == arr.status_id) &&
                       (copy.profile_id == arr.profile_id) &&
                     end
end
# necessary to eliminate dupes
final_group = grouped.uniq

# end result
# [[#<Foo id: 1, status_id: 2, profile_id: 3>,#<Foo id: 2, status_id: 2, profile_id: 3>],[#<Foo id: 3, status_id: 1, profile_id: 3>]]

Refactorings

No refactoring yet !

D85d44a0eca045f40e5a31449277c26c

Ben Marini

March 29, 2010, March 29, 2010 15:13, permalink

No rating. Login to rate!

#group_by does what you need, you just need to grab the values of the hash afterward

1
2
3
4
5
6
require 'rubygems'
require 'active_support'
Foo = Struct.new(:id, :status_id, :profile_id)
arr = [Foo.new(1,2,3),Foo.new(2,2,3),Foo.new(3,1,3)]
res = arr.group_by(&:status_id).values
p res
8abd3736ee784dcd159d3b26b882076e

toro04

March 30, 2010, March 30, 2010 18:12, permalink

No rating. Login to rate!

Ben, good point about just getting the values back from group_by. I need to group by both status_id and profile_id so I ended up doing the following. I did some benchmarking and was surprised to see that using the group_by method did not perform much better than my first post where I dup the array and then use uniq. Performance drops off significantly when there are lots of objects in the array too. Still not convinced I'm doing this optimally.

1
2
3
4
5
6
7
8
9
10
11
require 'rubygems'
require 'active_support'
Foo = Struct.new(:id, :status_id, :profile_id)
arr = [Foo.new(1,2,3),Foo.new(2,2,3),Foo.new(3,1,3),Foo.new(4,2,3)]
res = arr.group_by do |obj|
  arr.select {|f| f.status_id == obj.status_id && f.profile_id == obj.profile_id}
end.values

p res

# [[#<struct Foo id=1, status_id=2, profile_id=3>, #<struct Foo id=2, status_id=2, profile_id=3>, #<struct Foo id=4, status_id=2, profile_id=3>], [#<struct Foo id=3, status_id=1, profile_id=3>]]
D85d44a0eca045f40e5a31449277c26c

Ben Marini

March 31, 2010, March 31, 2010 00:27, permalink

1 rating. Login to rate!

You don't need to do a select inside the group_by. The way #group_by works is it groups based on the return value of the block you pass it. Try it this way and see how it performs:

1
res = arr.group_by { |f| [f.status_id, f.profile_id] }.values
8abd3736ee784dcd159d3b26b882076e

toro04

March 31, 2010, March 31, 2010 18:14, permalink

No rating. Login to rate!

Ben, thanks for showing me the correct way to use group_by! Below are the new Benchmark results comparing the two different methods. The benchmark was run with an array of 500 objects. What a major improvement! You made my day.

Thanks again!

1
2
3
Dup/Select/Uniq  11.380000   0.100000  11.480000 ( 11.517070)
Group_By          0.020000   0.000000   0.020000 (  0.014287)
----------------------------------------- total: 11.500000sec

Your refactoring





Format Copy from initial code

or Cancel