Ad2ffc6b05fb4390643f36a258b86362

Sometimes the xml is just one line, or worse, in illogical lines. This little snippet cleans it up a bit by indenting it "properly" it doesn't case about semantics, just adds newlines and indentations.

I challenge you to refactor it! ;)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def format_xml(xml)
  formatted = ""
  # Add Newlines
  xml = xml.gsub( /(>)(<)(\/*)/ ) { "#$1\n#$2#$3" }
  # Add Indents
  pad = 0
  xml.split( "\n" ).each do |node|
    # check various tag states
    indent = 0
    if node.match( /.+<\/\w[^>]*>$/ ) # open and closing on the same line
      indent = 0
    elsif node.match( /^<\/\w/ ) # closing tag
      pad -= 1 unless pad == 0
    elsif node.match( /^<\w[^>]*[^\/]>.*$/ ) # opening tag
      indent = 1
    else
      indent = 0
    end
    formatted << "\t" * pad + node + "\n"
    pad += indent
  end
  formatted
end

Refactorings

No refactoring yet !

Ad2ffc6b05fb4390643f36a258b86362

slaskis

November 23, 2007, November 23, 2007 15:36, permalink

No rating. Login to rate!

Some example xml/xhtml i used to test with

1
2
3
4
5
6
7
x1 = "<xml testsdf=\"dsfsdf\"><test/><node><content attr='fgsd' /></node><node id='2'><content /></node></xml>"
x2 = <<END
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head><meta content="text/html; charset=utf-8" http-equiv="Content-Type"/><title>blog</title><script type="text/javascript" src="/blog/static/jquery-1.1.3.pack.js"></script><script type="text/javascript" src="/blog/static/blog.js"></script><link type="text/css" rel="stylesheet" href="/blog/styles.css" media="screen"/></head><body><h1 class="header"><a href="/blog/">blog</a></h1><div class="content"><h1 class="post_head"><a href="/blog/view/2">Hej</a><a class="edit_link" href="/blog/edit/2">edit</a></h1><p>ASDJlksdjfsld
</p><h2 class="comment_head"><a href="javascript:getComments(2)">See comments (2)</a></h2><div id="comments"><h3>tyysen</h3><p>miljoooner</p><h3>hej</h3><p>bulan</p></div><h2 class="add_comment_head"><a href="#comment_form">Add comment</a></h2><form method="post" name="comment_form" id="comment_form" action="/blog/comment"><label for="post_username">Name</label><br/><input type="text" name="post_username"/><br/><label for="post_body">Comment</label><br/><textarea name="post_body"></textarea><br/><input type="hidden" name="post_id" value="2"/><input type="submit" value="Add comment"/></form></div></body></html>
END
puts format_xml(x1)
puts format_xml(x2)
Avatar

Remco

November 23, 2007, November 23, 2007 15:43, permalink

No rating. Login to rate!

Using case/when instead of if-elsif would be nice.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def format_xml(xml)
  formatted = ""
  # Add Newlines
  xml = xml.gsub( /(>)(<)(\/*)/ ) { "#$1\n#$2#$3" }
  # Add Indents
  pad = 0
  xml.split( "\n" ).each do |node|
    # check various tag states
    indent = 0

    case node
    when /.+<\/\w[^>]*>$/     : indent = 0                # open and closing on the same line
    when /^<\/\w/             : pad -= 1 unless pad == 0  # closing tag
    when /^<\w[^>]*[^\/]>.*$/ : indent = 1                # opening tag
    else
      indent = 0
    end

    formatted << "\t" * pad + node + "\n"
    pad += indent
  end
  
  formatted
end
Aec10c73877ea116bab759e060b2fcb3

andrew

November 26, 2007, November 26, 2007 02:15, permalink

1 rating. Login to rate!

parse it

1
2
3
4
5
6
7
require 'libxml'

#===== just use a parser...
parser = XML::Parser.new
parser.string = xml
doc = parser.parse
nice_xml = doc.to_s
4da28336e3ec7b7b67f6fbc0de6000fd

Ransford

July 11, 2008, July 11, 2008 16:12, permalink

No rating. Login to rate!

You are a genious, I searched everywhere to get something like this.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
function spaces(len)
{
	var s = '';
	var indent = len*4;
	for (i=0;i<indent;i++) {s += " ";}
	
	return s;
}

function format_xml(str)
{
	var xml = '';

	// add newlines
	str = str.replace(/(>)(<)(\/*)/g,"$1\r$2$3");

	// add indents
	var pad = 0;
	var indent;
	var node;

	// split the string
	var strArr = str.split("\r");

	// check the various tag states
	for (var i = 0; i < strArr.length; i++) {
		indent = 0;
		node = strArr[i];

		if(node.match(/.+<\/\w[^>]*>$/)){ //open and closing in the same line
			indent = 0;
		} else if(node.match(/^<\/\w/)){ // closing tag
			if (pad > 0){pad -= 1;}
		} else if (node.match(/^<\w[^>]*[^\/]>.*$/)){ //opening tag
			indent = 1;
		} else
			indent = 0;
		//}

		xml += spaces(pad) + node + "\r";
		pad += indent;
	}

	return xml;
}

Your refactoring





Format Copy from initial code

or Cancel