1 2 3 4 5 6 7 8 9 10 11 12
xml = REXML::Document.new(data) artists = [] i = 1 xml.elements.each('//artist') do |artist| if i <= limit artists << { "name"=>artist.elements['name'].text } end i = i + 1 end
Refactorings
No refactoring yet !
Chris Hunter
July 14, 2008, July 14, 2008 20:13, permalink
It would be nice if REXML offered an each_with_index method the way that Enumerable does but since it doesn't, you can use the XPath support in each to fake it.
Here's my quick take:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
require 'rexml/document' xml = REXML::Document.new('<xml> <artist><name>Bob</name></artist> <artist><name>Bill</name></artist> <artist><name>Chris</name></artist> <artist><name>Dylan</name></artist> </xml>') artists = [] xml.elements.each('//artist[position() < 3]') do |artist| artists << { "name" => artist.elements['name'].text } end puts "artists: #{artists.inspect}"
David Calavera
July 14, 2008, July 14, 2008 22:03, permalink
you can also replace "xml.elements.to_a('//artist')" with "REXML::XPath.match(xml, '//artist')"
1 2 3 4 5 6
xml = REXML::Document.new(data) artists = xml.elements.to_a('//artist')[1..limit].map {|artist| {:name => artist.elements['name'].text} }
David Calavera
July 14, 2008, July 14, 2008 22:29, permalink
1 2 3 4 5
xml = REXML::Document.new(data) artists = xml.elements.to_a('//artist')[1..3].map {|e| {:name => e.elements['name'].text } }
Joe Grossberg
July 14, 2008, July 14, 2008 22:41, permalink
You might want to investigate <a href="http://www.ruby-doc.org/core/classes/Enumerable.html#M001164">Enumerable#each_with_index</a> and integrating that into your code.
It would get rid of lines 5 and 11.
Fabien Jakimowicz
July 14, 2008, July 14, 2008 23:38, permalink
i'm not familiar with REXML library, but in the api doc there is a function to_a, you should give it a try.
1 2 3 4 5
xml = REXML::Document.new(data) artists = xml.elements.to_a('//artist')[0..limit].collect do |artist| {"name" => artist.elements['name'].text end
Mike Lapinsky
July 14, 2008, July 14, 2008 23:47, permalink
I think this code might be a bit better. Document#elements can be converted to an array and can involve an XPath, so I use that to filter it, then just slice the first five.
1 2 3 4 5 6 7 8
xml = REXML::Document.new(data) artists = [] xml.elements.to_a('//artist')[1..5].each { |artist| artists << { "name" => artist.elements['name'].text }
Jason Dew
July 15, 2008, July 15, 2008 00:36, permalink
Stream parsing might be something to look into -- http://www.germane-software.com/software/rexml/docs/tutorial.html#id2248814
Bobby Uhlenbrock
July 15, 2008, July 15, 2008 01:38, permalink
Converting the element results to an array opens up a whole world of options.
1 2 3
xml = REXML::Document.new(data) artists = xml.elements['//artists'].to_a.first(5).collect { |a| ["name", a.elements['name'].text] }
Wolfbyte
July 15, 2008, July 15, 2008 05:43, permalink
Be wary of converting it to an array. If there are 4 million artists do you really want to create an array of 4 million records and then splice off the first 3 or 4. Here's a simple example which hides the code that does the limiting into a method on REXML::Elements.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
require 'rexml/document' class REXML::Elements def each_with_limit( limit, xpath = nil ) index = 0 each( xpath ) do |elem| break if (index += 1) > limit yield elem end end end xml = REXML::Document.new( "<xml>" + %w{ Bob Bill Chris Dylan }.collect { |n| "<artist><name>#{n}</name></artist>" }.join('') + "</xml>" ) artists = [] xml.elements.each_with_limit( 3, '//artist' ) do |artist| artists << artist.elements['name'].text end puts artists
Muenster hotel
April 8, 2010, April 08, 2010 15:59, permalink
Investigation Nose,challenge member audience instrument white promote gentleman master win recover representation scale capital suppose few selection dream face training economy media ourselves necessarily accident fine paper write forget circle half call either knee reference front earn useful situation address answer maybe increased appeal photograph achievement prospect content male museum an dark sometimes consumer popular incident nearly fully lawyer share output child unemployment dream yesterday prisoner response finish announce computer response heat necessary especially condition colleague terms before plate display commercial project smile
1
Investigation Nose,challenge member audience instrument white promote gentleman master win recover representation scale capital suppose few selection dream face training economy media ourselves necessarily accident fine paper write forget circle half call either knee reference front earn useful situation address answer maybe increased appeal photograph achievement prospect content male museum an dark sometimes consumer popular incident nearly fully lawyer share output child unemployment dream yesterday prisoner response finish announce computer response heat necessary especially condition colleague terms before plate display commercial project smile
Wow! All of these refactorings were really helpful in various educational ways... Thank you!
i am doing an API call and want to find any way to speed/clean up the process - the API doesn't allow a limit parameter but i figure that limiting the number of elements that i look at might speed up the process a bit since i only need X number of items in my array...
currently i am doing xml.elements.each then within that loop i'm using an iterator but i could not find a way to do this within the first line eg xml.elements.3.times do or xml.elements.upto or xml.elements['artist][0,5]? Can anybody suggest a cleaner way of handling this?