Cee19a4eb998343e677f7f1cfd4de5bb

This is really my first attempt at creating my own Ruby classes that aren't subclassing anything from Rails, so I think I need some help. First of all, ProductScraper::Base should be - in Java terms, at least - an abstract class. It should never be instantiated. Neither should its nested class (module, in this case): ProductScraper::Base::ProductPage. But what I want to do is created a bunch of classes that subclass ProductScraper::Base and it's nested class. I also want to be able to refer to the more concrete classes from ProductScraper::Base, like is done in the the product_pages method.

Should ProductScraper::Base and ProductScraper::Base::ProductPage be classes or should they be modules? If they should be classes, how do I make them "abstract" classes? Are there any shortcuts to use instead of the full-qualified class names that I use here?

Thanks for the help!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# in lib/product_scraper/base.rb
class ProductScraper::Base
  def product_pages
    product_urls.collect do |url|
      print "."
      self.class::ProductPage.new(url)
    end
  end

  def update_products!
    #...
  end

  module ProductPage
    attr_reader :url

    def initialize(product_url)
      #...
    end

    #...
  end
end

# in lib/product_scraper/some_company.rb
class ProductScraper::SomeCompany < ProductScraper::Base

  def product_urls
    #...
  end

  class ProductPage
    include ProductScraper::Base::ProductPage

    #...
  end
end

Refactorings

No refactoring yet !

Avatar

Jordan Glasner

February 2, 2008, February 02, 2008 18:05, permalink

1 rating. Login to rate!

I forgot you didn't want to allow Base to be instantiated, so this might not work for you. You'd need to turn Base and ProductPage into modules then include them in AmazonScraper and AmazonPage, instead of subclassing them.

BTW, a product scraper sounds like an interesting product :)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
module ProductScraper
  class Base
      
      class << self
        
        # Store class for Page type here 
        def product_page_type
          ProductPage
        end
        
      end
      
      
      def initialize
        
      end
      
      
      # Array 
      def product_urls
        
      end
      

      # Create new ProductPage from url
      def product_page(url)
        self.class.product_page_type.new(url)
      end
      
      def product_pages
        product_urls.collect { |x| product_page(x) }
      end
      
      def update_products!
        product_pages.collect { |x| x.update! }
      end
      
  end
  
 
  
  class ProductPage(url)
    
    attr_reader :url
    
    def initialize(url)
      @url = url
    end
    
    
    def update!
       scrape
       # save
    end
    
    def scrape
      # ...
    end
    
  end
       
end


module ProductScraper

  class AmazonScraper < Base
    
    class << self
       
      def prouduct_page_type
        AmazonPage
      end
      
    end
    
    
  end
  
  
  class AmazonPage < ProductPage
    
  end   

end 
Avatar

Jordan Glasner

February 2, 2008, February 02, 2008 19:42, permalink

1 rating. Login to rate!

OK.. the above just didn't do it for me. The following is also closer to what you wanted. All of the functionality is setup in the Scrapable module. Include that module in your Page class to make it scrapable.

I did get rid of the separate Scraper class. Didn't make sense in this context.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
module Scrapable
  
  
  module ClassMethods
    # Returns Array of URLs as Strings
    def urls
      # Array of urls
      [1,2,3]
    end

    # Returns Array of CompanyPages
    def all
      urls.map { |url| self.new(url) }
    end

    # Updates all
    def update!
      all.map { |x| x.update! }
    end
    
  end
  
  # Adds methods in ClassMethods module as class methods
  def self.included(base)
      base.extend(ClassMethods)
  end
  
  # Instance Methods
  
  def update!
    scrape
    save!
  end
    
    def scrape
      # scrapes ;)
    end
    
    def save!
      print "Saved\n"
    end 
        

  
end


class MerchantPage

  include Scrapable
  
  def initialize(url)
  end
  
end
  
# Scrape specific url
MerchantPage.new('http://merchant.com/page').scrape

# List of all URLs
MerchantPage.urls

# Array of all pages
MerchantPage.all

# Update all pages
MerchantPage.update!

  
Cee19a4eb998343e677f7f1cfd4de5bb

misfo.myopenid.com

February 2, 2008, February 02, 2008 22:58, permalink

No rating. Login to rate!

Jordan, I like your second refactoring a lot. The object model makes so much more sense using class and instance methods instead of nesting a class. Your changing the names of the classes made it clear that this was the way to do it. I wanted to make a base class that could be subclassed just like ActiveRecord::Base, as opposed to using modules. So here is what I came up with, using your changes, but with classes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# in lib/product_page/base.rb
class ProductPage::Base

  class << self
    def new(url)
      raise TypeError, "ProductPage::Base cannot be instantiated" if self == ProductPage::Base
      super
    end
 
    def all
      urls.collect {|url| new(url) }
    end

    def save_all!
      puts "Parsing the pages"
      pages = all
      puts "\nUpdating the database"
      pages.each(&:save!)
    end
  end
  
  attr_reader :url

  def initialize(product_url)
    #...
  end

  def save!
    #...
  end
end

# in lib/product_page/some_company.rb
class ProductPage::SomeCompany < ProductPage::Base

  class << self
    def urls
      #...
    end
  end

  def name
    #...
  end

  #...
end

Your refactoring





Format Copy from initial code

or Cancel