1 2 3 4 5 6 7 8 9 10 11 12
xml = REXML::Document.new(data) artists = [] i = 1 xml.elements.each('//artist') do |artist| if i <= limit artists << { "name"=>artist.elements['name'].text } end i = i + 1 end
Refactorings
No refactoring yet !
Chris Hunter
July 14, 2008, July 14, 2008 20:13, permalink
It would be nice if REXML offered an each_with_index method the way that Enumerable does but since it doesn't, you can use the XPath support in each to fake it.
Here's my quick take:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
require 'rexml/document' xml = REXML::Document.new('<xml> <artist><name>Bob</name></artist> <artist><name>Bill</name></artist> <artist><name>Chris</name></artist> <artist><name>Dylan</name></artist> </xml>') artists = [] xml.elements.each('//artist[position() < 3]') do |artist| artists << { "name" => artist.elements['name'].text } end puts "artists: #{artists.inspect}"
David Calavera
July 14, 2008, July 14, 2008 22:03, permalink
you can also replace "xml.elements.to_a('//artist')" with "REXML::XPath.match(xml, '//artist')"
1 2 3 4 5 6
xml = REXML::Document.new(data) artists = xml.elements.to_a('//artist')[1..limit].map {|artist| {:name => artist.elements['name'].text} }
David Calavera
July 14, 2008, July 14, 2008 22:29, permalink
1 2 3 4 5
xml = REXML::Document.new(data) artists = xml.elements.to_a('//artist')[1..3].map {|e| {:name => e.elements['name'].text } }
Joe Grossberg
July 14, 2008, July 14, 2008 22:41, permalink
You might want to investigate <a href="http://www.ruby-doc.org/core/classes/Enumerable.html#M001164">Enumerable#each_with_index</a> and integrating that into your code.
It would get rid of lines 5 and 11.
Fabien Jakimowicz
July 14, 2008, July 14, 2008 23:38, permalink
i'm not familiar with REXML library, but in the api doc there is a function to_a, you should give it a try.
1 2 3 4 5
xml = REXML::Document.new(data) artists = xml.elements.to_a('//artist')[0..limit].collect do |artist| {"name" => artist.elements['name'].text end
Mike Lapinsky
July 14, 2008, July 14, 2008 23:47, permalink
I think this code might be a bit better. Document#elements can be converted to an array and can involve an XPath, so I use that to filter it, then just slice the first five.
1 2 3 4 5 6 7 8
xml = REXML::Document.new(data) artists = [] xml.elements.to_a('//artist')[1..5].each { |artist| artists << { "name" => artist.elements['name'].text }
Jason Dew
July 15, 2008, July 15, 2008 00:36, permalink
Stream parsing might be something to look into -- http://www.germane-software.com/software/rexml/docs/tutorial.html#id2248814
Bobby Uhlenbrock
July 15, 2008, July 15, 2008 01:38, permalink
Converting the element results to an array opens up a whole world of options.
1 2 3
xml = REXML::Document.new(data) artists = xml.elements['//artists'].to_a.first(5).collect { |a| ["name", a.elements['name'].text] }
Wolfbyte
July 15, 2008, July 15, 2008 05:43, permalink
Be wary of converting it to an array. If there are 4 million artists do you really want to create an array of 4 million records and then splice off the first 3 or 4. Here's a simple example which hides the code that does the limiting into a method on REXML::Elements.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
require 'rexml/document' class REXML::Elements def each_with_limit( limit, xpath = nil ) index = 0 each( xpath ) do |elem| break if (index += 1) > limit yield elem end end end xml = REXML::Document.new( "<xml>" + %w{ Bob Bill Chris Dylan }.collect { |n| "<artist><name>#{n}</name></artist>" }.join('') + "</xml>" ) artists = [] xml.elements.each_with_limit( 3, '//artist' ) do |artist| artists << artist.elements['name'].text end puts artists
Wow! All of these refactorings were really helpful in various educational ways... Thank you!
i am doing an API call and want to find any way to speed/clean up the process - the API doesn't allow a limit parameter but i figure that limiting the number of elements that i look at might speed up the process a bit since i only need X number of items in my array...
currently i am doing xml.elements.each then within that loop i'm using an iterator but i could not find a way to do this within the first line eg xml.elements.3.times do or xml.elements.upto or xml.elements['artist][0,5]? Can anybody suggest a cleaner way of handling this?