1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
def format_xml(xml) formatted = "" # Add Newlines xml = xml.gsub( /(>)(<)(\/*)/ ) { "#$1\n#$2#$3" } # Add Indents pad = 0 xml.split( "\n" ).each do |node| # check various tag states indent = 0 if node.match( /.+<\/\w[^>]*>$/ ) # open and closing on the same line indent = 0 elsif node.match( /^<\/\w/ ) # closing tag pad -= 1 unless pad == 0 elsif node.match( /^<\w[^>]*[^\/]>.*$/ ) # opening tag indent = 1 else indent = 0 end formatted << "\t" * pad + node + "\n" pad += indent end formatted end
Refactorings
No refactoring yet !
slaskis
November 23, 2007, November 23, 2007 15:36, permalink
Some example xml/xhtml i used to test with
1 2 3 4 5 6 7
x1 = "<xml testsdf=\"dsfsdf\"><test/><node><content attr='fgsd' /></node><node id='2'><content /></node></xml>" x2 = <<END <?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head><meta content="text/html; charset=utf-8" http-equiv="Content-Type"/><title>blog</title><script type="text/javascript" src="/blog/static/jquery-1.1.3.pack.js"></script><script type="text/javascript" src="/blog/static/blog.js"></script><link type="text/css" rel="stylesheet" href="/blog/styles.css" media="screen"/></head><body><h1 class="header"><a href="/blog/">blog</a></h1><div class="content"><h1 class="post_head"><a href="/blog/view/2">Hej</a><a class="edit_link" href="/blog/edit/2">edit</a></h1><p>ASDJlksdjfsld </p><h2 class="comment_head"><a href="javascript:getComments(2)">See comments (2)</a></h2><div id="comments"><h3>tyysen</h3><p>miljoooner</p><h3>hej</h3><p>bulan</p></div><h2 class="add_comment_head"><a href="#comment_form">Add comment</a></h2><form method="post" name="comment_form" id="comment_form" action="/blog/comment"><label for="post_username">Name</label><br/><input type="text" name="post_username"/><br/><label for="post_body">Comment</label><br/><textarea name="post_body"></textarea><br/><input type="hidden" name="post_id" value="2"/><input type="submit" value="Add comment"/></form></div></body></html> END puts format_xml(x1) puts format_xml(x2)
Remco
November 23, 2007, November 23, 2007 15:43, permalink
Using case/when instead of if-elsif would be nice.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
def format_xml(xml) formatted = "" # Add Newlines xml = xml.gsub( /(>)(<)(\/*)/ ) { "#$1\n#$2#$3" } # Add Indents pad = 0 xml.split( "\n" ).each do |node| # check various tag states indent = 0 case node when /.+<\/\w[^>]*>$/ : indent = 0 # open and closing on the same line when /^<\/\w/ : pad -= 1 unless pad == 0 # closing tag when /^<\w[^>]*[^\/]>.*$/ : indent = 1 # opening tag else indent = 0 end formatted << "\t" * pad + node + "\n" pad += indent end formatted end
andrew
November 26, 2007, November 26, 2007 02:15, permalink
parse it
1 2 3 4 5 6 7
require 'libxml' #===== just use a parser... parser = XML::Parser.new parser.string = xml doc = parser.parse nice_xml = doc.to_s
Ransford
July 11, 2008, July 11, 2008 16:12, permalink
You are a genious, I searched everywhere to get something like this.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
function spaces(len) { var s = ''; var indent = len*4; for (i=0;i<indent;i++) {s += " ";} return s; } function format_xml(str) { var xml = ''; // add newlines str = str.replace(/(>)(<)(\/*)/g,"$1\r$2$3"); // add indents var pad = 0; var indent; var node; // split the string var strArr = str.split("\r"); // check the various tag states for (var i = 0; i < strArr.length; i++) { indent = 0; node = strArr[i]; if(node.match(/.+<\/\w[^>]*>$/)){ //open and closing in the same line indent = 0; } else if(node.match(/^<\/\w/)){ // closing tag if (pad > 0){pad -= 1;} } else if (node.match(/^<\w[^>]*[^\/]>.*$/)){ //opening tag indent = 1; } else indent = 0; //} xml += spaces(pad) + node + "\r"; pad += indent; } return xml; }
Sometimes the xml is just one line, or worse, in illogical lines. This little snippet cleans it up a bit by indenting it "properly" it doesn't case about semantics, just adds newlines and indentations.
I challenge you to refactor it! ;)