1 2 3 4 5 6 7 8 9
pools = {1 => [1,2,3], 2=> [3,4,5], 3=> [5,6,7]} intersection = [] pools.each do |k,v| not_k = pools.dup not_k.delete_if{|nk,nv| nk == k } inter << (v & not_k.values.flatten) end intersection.flatten.uniq
Refactorings
No refactoring yet !
Jeremy
July 31, 2008, July 31, 2008 21:59, permalink
Here are a couple of options, depending on which version of Ruby you're using. Unfortunately Array#count is not available in Ruby 1.8.6 (although you could add it to the class pretty easily if you wanted to).
1 2 3 4 5 6 7 8 9 10
# Ruby 1.8.6 pools = {1 => [1,2,3], 2 => [3,4,5], 3 => [5,6,7]} values = pools.values.flatten intersection = values.select { |v| values.select { |w| w == v }.size > 1 }.uniq # Ruby 1.8.7+ pools = {1 => [1,2,3], 2 => [3,4,5], 3 => [5,6,7]} values = pools.values.flatten intersection = values.select { |v| values.count(v) > 1 }.uniq
Maciej Piechotka
July 31, 2008, July 31, 2008 22:03, permalink
May be something like that
1 2 3
pools = {1 => [1,2,3], 2=> [3,4,5], 3=> [5,6,7]} rej = Hash.new {0} pools.values.flatten.sort.reject! {|elem| rej[elem]++ != 1}
AviFlombaum
July 31, 2008, July 31, 2008 22:14, permalink
Nice! However, the issue with your method is on large data sets that select statement requires you to iterate over each element in the array which adds significant time to the task. My method above, while not as pretty and concise, save significant processing time. I guess I should have mentioned that performance is a huge issue on this one.
badcarl
July 31, 2008, July 31, 2008 23:06, permalink
1 2 3 4 5
require "rubygems" require "facets/enumerable/probability" pools = {1 => [1,2,3], 2=> [3,4,5], 3=> [5,6,7]} intersection = pools.values.flatten.frequency.delete_if { |k, v| v == 1 }.keys
AviFlombaum
August 1, 2008, August 01, 2008 02:21, permalink
badcarl - thanks! That's awesome and fast - but in the end, I like my method best - requires no gems, is hella fast (uses less mem and runs faster on large sets) and I can pretty it up a bit. Thanks though, you both rock!
badcarl
August 1, 2008, August 01, 2008 04:06, permalink
Here is another way. Keep a set of numbers that have appeared once. Add things to the intersection if they've appeared already.
1 2 3 4 5 6 7
require "set" pools = {1 => [1,2,3], 2=> [3,4,5], 3=> [5,6,7]} intersection, appeared = Set.new, Set.new pools.values.flatten.each do |value| appeared.include?(value) ? intersection << value : appeared << value end
Maciej Piechotka
August 1, 2008, August 01, 2008 07:03, permalink
The sort is unnecessary in my code
1 2 3
pools = {1 => [1,2,3], 2=> [3,4,5], 3=> [5,6,7]} rej = Hash.new {0} pools.values.flatten.reject! {|elem| rej[elem]++ != 1}
kmcd
August 1, 2008, August 01, 2008 10:08, permalink
Hi all,
This is refactoring is similar to badcarls set based suggestion. I haven't analysed the complexity of this solution yet. I'll do a performance test & dig into my old big O notes ... later.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
class Hash # Returns duplicate elements of values arrays. Usage: # {1 => [1,2,3], 2 => [3,4,5], 3 => [5,6,7]}.intersection # => [3,5] def intersection self.values.flatten.duplicates end end class Array # Returns an array of duplicate elements. Usage: # [1,2,3,3].duplicates # => [3] def duplicates dups, unique = [], [] self.each { |e| unique.member?(e) ? dups.push(e) : unique.push(e) } dups end end require 'test/unit' class TestArrayDuplicates < Test::Unit::TestCase def test_return_empty_with_all_uniq assert_equal [1,2,3].duplicates, [] end def test_return_duplicates_with_duplicate_elements assert_equal [1,2,3,3].duplicates, [3] end end class TestHashIntersection < Test::Unit::TestCase # Using setup for clarity. # assert_equal wont accept an unitialised hash # ie assert_equal {}, {} gives a syntax error def setup @dups_hash = {1 => [1,2], 2 => [2,3], 3 => [3,4]} @uniq_hash = {1 => [1,2,3], 2 => [4,5], 3 => [6,7]} end def test_return_empty_with_all_uniq assert_equal @uniq_hash.intersection, [] end def test_return_duplicates_with_duplicate_elements assert_equal @dups_hash.intersection, [2,3] end end
rjspotter
August 6, 2008, August 06, 2008 22:01, permalink
I'm basicly golfing at this point but it was a fun exersize
1 2 3 4 5 6 7
pools = {1 => [1,2,3], 2=> [3,4,5], 3=> [5,6,7]} pools.inject([]) {|m,x| m << x[1].select {|y| pools[x[0] + 1].include?(y) unless pools[x[0] + 1].nil?}}.flatten #or pools.inject([]) {|m,x| m << (x[1] & pools[x[0] + 1] unless pools[x[0] + 1].nil?)}.compact.flatten
mike
August 11, 2008, August 11, 2008 01:15, permalink
Read this thread and figured Enumerable#nonuniq from facets run on pools.values.flatten would probably be faster... it was!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
irb(main):001:0> @pools = {1 => [1,2,3], 2=> [3,4,5], 3=> [5,6,7]} => {1=>[1, 2, 3], 2=>[3, 4, 5], 3=>[5, 6, 7]} irb(main):002:0> irb(main):003:0* def method1 irb(main):004:1> intersection = [] irb(main):005:1> @pools.each do |k,v| irb(main):006:2* not_k = @pools.dup irb(main):007:2> not_k.delete_if{|nk,nv| nk == k } irb(main):008:2> intersection << (v & not_k.values.flatten) irb(main):009:2> end irb(main):010:1> intersection.flatten.uniq irb(main):011:1> end => nil irb(main):012:0> irb(main):013:0* module Enumerable irb(main):014:1> # from facets irb(main):015:1* def nonuniq irb(main):016:2> h1 = {} irb(main):017:2> h2 = {} irb(main):018:2> each {|i| irb(main):019:3* h2[i] = true if h1[i] irb(main):020:3> h1[i] = true irb(main):021:3> } irb(main):022:2> h2.keys irb(main):023:2> end irb(main):024:1> end => nil irb(main):025:0> irb(main):026:0* require 'rubygems'; require 'benchmark' => true irb(main):027:0> Benchmark.bm do |x| irb(main):028:1* x.report("method1") { 100_000.times { method1 }} irb(main):029:1> x.report("method2") { 100_000.times { @pools.values.flatten.nonuniq }} irb(main):030:1> end user system total real method1 4.070000 0.440000 4.510000 ( 4.573123) method2 1.790000 0.250000 2.040000 ( 2.046597) => true
AviFlombaum
August 11, 2008, August 11, 2008 02:30, permalink
Can you benchmark off something like this to find all intersections?
{1 => [1..50000], 2=> [20000..40000], 3=> [30000..50000]}
leemic
August 13, 2008, August 13, 2008 02:36, permalink
Can you please clarify your example? This does not make sense how you are finding intersection. Do you mean that the element appears on all the arrays?
For example, you are given the following hash
{1 => [1,2,3], 2=> [3,4,5], 3=> [5,6,7]}
This means you have three different arrays
[ 1, 2, 3 ]
[ 3, 4, 5 ]
[ 5, 6, 7 ]
There is no value that exists in all three arrays.
Your answer is [ 3, 5 ]. This means 3 came from the intersection of the first and second arrays. 5 came from the intersection of the second and third arrays.
Let's do another example
{1 => [1,2,3], 2=> [3,4,5], 3=> [1, 5,6] }
The answer should be [ 1, 3, 5 ]. However, '1' came from the intersection between the first and third arrays.
Anyway, if you need the intersection of all three arrays - here is one liner
1 2 3
require 'set' intersect = pools.values.collect{ |arr| Set.new(arr) }.inject{ |inter, set| inter.nil? ? set : inter & set }
AviFlombaum
August 13, 2008, August 13, 2008 05:43, permalink
I mean the element appears in at least one other array besides its own.
[ 1, 2, 3 ]
[ 3, 4, 5 ]
[ 5, 6, 7 ]
is [3,5], for the reason you pointed out.
In your example: {1 => [1,2,3], 2=> [3,4,5], 3=> [1, 5,6] }
[1,3,5] is correct. I'll test out your method but I was hoping to not require any gems. Does this make sense?
Find the intersection of arrays contained within a hash. So given {1 => [1,2,3], 2=> [3,4,5], 3=> [5,6,7]} return [3,5]