<?xml version="1.0" encoding="UTF-8"?>
<codes type="array">
  <code>
    <code># Initial attributes Hash
# 
attributes = {
  "recs"=&gt;nil, 
  "location"=&gt;nil,
  "uname"=&gt;"b", 
  "svc_tw_id"=&gt;99991212, 
  "lang"=&gt;nil,
  "hash"=&gt;"8421ce0c2d9eabba44b9c5bvhr8u672ecc", 
  "id"=&gt;33, 
  "has_p"=&gt;nil, 
  "full_name"=&gt;"Ben Roos", 
  "time_zone"=&gt;nil, 
  "referrals"=&gt;nil, 
  "bio"=&gt;nil, 
  "svc_fb_id"=&gt;12129999, 
  "email"=&gt;"email@domain.tld"
}

# Array I need to extract from the hash:
#
ary_end_result = ["fb.12129999", "tw.99991212"]


# My newbie solution:
# Greps for matching keys, then collects them by removing "svc_" and "_id" and setting value to key of hash
# Would it be more elegant/efficient to use a regex object inside a block?
#
ary_end_result = attributes.keys.grep(/^svc_(\w{2})_id$/).collect do |key|      
  key.sub("svc_", "").sub("_id", "") + ".#{attributes[key]}"
end</code>
    <comment>I'm looking for a better way to do this using a block structure and the regex backreference inside the block (unused here). I'm not really fond of the dual sub().sub() I used to get at the pertinent bit of the hash key.

Given a hash with some keys in the form "svc_XX_id", where XX is a two letter code, extract an array with the form ["XX.&lt;value&gt;", "XX.&lt;value&gt;"].</comment>
    <created-at type="datetime">2010-02-24T23:13:25+00:00</created-at>
    <id type="integer">1187</id>
    <language>Ruby</language>
    <permalink>ruby-hash-extraction-using-regex-on-key</permalink>
    <refactors-count type="integer">7</refactors-count>
    <title>Ruby Hash Extraction Using Regex on Key</title>
    <trackback-url></trackback-url>
    <updated-at type="datetime">2010-03-08T20:53:13+00:00</updated-at>
    <user-id type="integer">1950</user-id>
    <user>
      <id type="integer">1950</id>
      <identity-url>http://notbrain.myopenid.com</identity-url>
      <name>notbrain.myopenid.com</name>
      <rating type="float">0.0</rating>
      <refactors-count type="integer">0</refactors-count>
      <website nil="true"></website>
    </user>
  </code>
  <code>
    <code>$outputstr =  preg_replace('/[-]{2,}/','-',trim(strtolower(str_replace(' ','-',preg_replace('/[^A-Z0-9- ]+/i','',$inputstr))),'-'));
</code>
    <comment>I'm trying to get an input string to get rid of all characters except alpha-numerics, hyphen and a space, then the result i would like any hyphens at the start and at the end removed, also, if there are more than one hyphen in a row, to just replace it with one, and change the whole thing to lower case e.g :

-!! This is a "**Test-  -string**" !!!!-

results to :

this-is-a-test-string

The code works fine, but it seems a bit long-winded the way i've done it, can this code be refactored at all?
</comment>
    <created-at type="datetime">2010-02-05T16:07:00+00:00</created-at>
    <id type="integer">1169</id>
    <language>PHP</language>
    <permalink>trimming-excess-from-string</permalink>
    <refactors-count type="integer">2</refactors-count>
    <title>Trimming excess from string</title>
    <trackback-url></trackback-url>
    <updated-at type="datetime">2010-03-04T16:04:53+00:00</updated-at>
    <user-id type="integer">1718</user-id>
    <user>
      <id type="integer">1718</id>
      <identity-url>http://paulswansea.myopenid.com</identity-url>
      <name>paulswansea.myopenid.com</name>
      <rating type="float">0.0</rating>
      <refactors-count type="integer">0</refactors-count>
      <website></website>
    </user>
  </code>
  <code>
    <code>String className = file.getAbsolutePath();

/* TODO, replace this with regex is possible */
className = className.substring(0, className.length() - 6);       // chop off ".class"
className = className.replace('/', '.');
className = className.replace('\\', '.');
className = className.substring(className.indexOf(CLASSES) + CLASSES.length());</code>
    <comment>Can this be refactored to use REGEX to make the format and replacement changes?

The goal is to turn 

c:\dev\svnlocal\project\target\classes\com\company\project\File.class 

into

com.company.project.File

It should also handle forward slashes.</comment>
    <created-at type="datetime">2009-08-07T13:34:53+00:00</created-at>
    <id type="integer">990</id>
    <language>Java</language>
    <permalink>replace-with-regex</permalink>
    <refactors-count type="integer">4</refactors-count>
    <title>replace with regex</title>
    <trackback-url></trackback-url>
    <updated-at type="datetime">2009-08-12T03:10:10+00:00</updated-at>
    <user-id type="integer">1474</user-id>
    <user>
      <id type="integer">1474</id>
      <identity-url>http://mikenereson.blogspot.com</identity-url>
      <name>mikenereson.blogspot.com</name>
      <rating type="float">0.0</rating>
      <refactors-count type="integer">3</refactors-count>
      <website>architecturerules.org</website>
    </user>
  </code>
  <code>
    <code>#!/usr/bin/perl

use warnings;
use strict;

my $correct_usage = $ARGV[1];

my $option = $ARGV[0];
my $message = $ARGV[1];

$correct_usage &amp;&amp;= ($option eq "-d" || $option eq "-q");

if (!$correct_usage) {
    print(
"Usage: dvorak_cypher.pl &lt;option&gt; &lt;message&gt;\n".
"Options:\n".
"      -d        Converts &lt;message&gt; from a QWERTY layout to a Dvorak layout\n".
"      -q        Converts &lt;message&gt;  from a Dvorak layout to a QWERTY layout\n";
    );
    exit(0);
}

my $qwerty = quotemeta(
'~!@#$%^&amp;*()_+`1234567890-='.
'QWERTYUIOP{}|qwertyuiop[]\\'.
'ASDFGHJKL:"asdfghjkl;\''.
'ZXCVBNM&lt;&gt;?zxcvbnm,./'
);

my $dvorak = quotemeta(
'~!@#$%^&amp;*(){}`1234567890[]'.
'"&lt;&gt;PYFGCRL?+|\',.pyfgcrl/=\\'.
'AOEUIDHTNS_aoeuidhtns-'.
':QJKXBMWVZ;qjkxbmwvz'
);

if ($option eq "-d") {
   $message =~ tr/$qwerty/$dvorak/;
} elsif ($option eq "-d") {
   $message =~ tr/$dvorak/$qwerty/;
}

print $message."\n";</code>
    <comment>I can't quite get the last bit to work. How can I get tr/// to interpolate the variables?

Meanwhile, is there a better way to parse the arguments?</comment>
    <created-at type="datetime">2009-08-06T03:32:38+00:00</created-at>
    <id type="integer">986</id>
    <language>Perl</language>
    <permalink>a-dvorak-keyboard-layout-cypher</permalink>
    <refactors-count type="integer">2</refactors-count>
    <title>A Dvorak keyboard layout cypher</title>
    <trackback-url></trackback-url>
    <updated-at type="datetime">2009-08-20T16:50:36+00:00</updated-at>
    <user-id type="integer">1159</user-id>
    <user>
      <id type="integer">1159</id>
      <identity-url>http://lordzoner.myopenid.com</identity-url>
      <name>lordzoner.myopenid.com</name>
      <rating type="float">0.0</rating>
      <refactors-count type="integer">4</refactors-count>
      <website></website>
    </user>
  </code>
  <code>
    <code>class PropertyGroups
  def self.from_xml(xml)
    document = Document.new(xml)
    properties = Hash.new
    document.elements.each("Project/PropertyGroup/*") do |element|
      properties[element.name] = element.text.strip
    end
    return PropertyGroups.new(properties)
  end
 
  def initialize(elements)
    variable_regexp = /\$\([\w]+\)/
    elements.each_pair do |key, value|
      PropertyGroups.class_eval do
        define_method key.to_sym do ||
          replaced_value = value
          while variable_regexp.match(replaced_value)
            match = variable_regexp.match(replaced_value)
            full_var = match[0]
            var_name = full_var[2..-2]
            if elements.key? var_name
              replaced_value = replaced_value.sub(full_var, elements[var_name])
            else
              replaced_value = replaced_value.sub(full_var, "")
            end
          end
          return replaced_value
        end
      end
    end
  end
end</code>
    <comment>The code is part of a library that parses .net msbuild project files containing property groups and is up at http://github.com/brunomlopes/ironbuildrake/ , together with tests.
Each property has a value that I want to access, and the value can be defined in function of other variables. A variable has the format $(varname).
This code is rather straightforward, but seems rather non-idiomatic.
The use of define_method in a class eval to avoid method_missing is a personal preference in order to avoid nasty recursive bugs and to allow for better runtime inspection of available methods.

</comment>
    <created-at type="datetime">2009-07-31T13:25:15+00:00</created-at>
    <id type="integer">978</id>
    <language>Ruby</language>
    <permalink>ruby-text-replacement-done-by-a-net-guy</permalink>
    <refactors-count type="integer">1</refactors-count>
    <title>Ruby text replacement done by a .net guy</title>
    <trackback-url></trackback-url>
    <updated-at type="datetime">2009-08-08T04:03:47+00:00</updated-at>
    <user-id type="integer">1623</user-id>
    <user>
      <id type="integer">1623</id>
      <identity-url>http://blog.brunomlopes.com</identity-url>
      <name>Bruno Lopes</name>
      <rating type="float">0.0</rating>
      <refactors-count type="integer">0</refactors-count>
      <website>http://www.brunomlopes.com</website>
    </user>
  </code>
  <code>
    <code>  def check_user_agent
    if api_access?
      regex = Regexp.new("^([^/[:space:]]*)(/([^[:space:]]*))?([[:space:]]*\[[a-zA-Z][a-zA-Z]\])?[[:space:]]*(\\((([^()]|(\\([^()]*\\)))*)\\))?[[:space:]]*")
      regex_match = regex.match(request.user_agent)

      raise ForbiddenRequestError unless regex_match &amp;&amp; USER_AGENTS.include?(regex_match[1])
    end
  end</code>
    <comment>Hola guys.. I am somewhat lost here what to escape and whatnot. Basically this lil method takes regular UserAgent Strings you can find in http headers and checks whether it's an allowed one. It only takes the actual application name (e.g. 'someUserAgent/v1.0 (compatible: yadda)' would only check someUserAgent.... the idea behind it simply is to make sure only allowed client apps access the webservice. api_access? only checks whether the request is coming via json/xml...

However the problem is, I see the warnings in the title all around.. the regex works, but fills up (in my case the apache2 error_log since it's being used with mod_rails)....

Any ideas/suggestions.. maybe major improvements?

Thanks,
-J :)</comment>
    <created-at type="datetime">2009-02-16T01:27:33+00:00</created-at>
    <id type="integer">759</id>
    <language>Ruby</language>
    <permalink>warning-character-class-has-without-escape</permalink>
    <refactors-count type="integer">2</refactors-count>
    <title>warning: character class has `['/`]' without escape</title>
    <trackback-url></trackback-url>
    <updated-at type="datetime">2009-11-29T18:56:59+00:00</updated-at>
    <user-id type="integer">1351</user-id>
    <user>
      <id type="integer">1351</id>
      <identity-url>http://claimid.com/joergbattermann</identity-url>
      <name>J&#246;rg</name>
      <rating type="float">0.0</rating>
      <refactors-count type="integer">0</refactors-count>
      <website>http://joergbattermann.com</website>
    </user>
  </code>
  <code>
    <code>function formatCurrency(num) {
    num = num.toString().replace(/\\$|\\,/g,'');
    if (isNaN(num)) num = '0';
    sign = (num == (num = Math.abs(num)));
    num = Math.floor(num*100+0.50000000001);
    cents = num%100;
    num = Math.floor(num/100).toString();
    if (cents &lt; 10) cents = '0' + cents;
    
    for (var i = 0; i &lt; Math.floor((num.length-(1+i))/3); i++)
        num = num.substring(0,num.length-(4*i+3))+','+ num.substring(num.length-(4*i+3));
        
    return (((sign)?'':'-') + '$' + num + '.' + cents);
}

</code>
    <comment>During a code review we found the following function buried deep in the web UI written by someone no longer here.  There has to be a better way.  One of our requirements is that prices be displayed with four decimal places. (ex: 0.0001)</comment>
    <created-at type="datetime">2009-01-20T20:01:51+00:00</created-at>
    <id type="integer">709</id>
    <language>JavaScript</language>
    <permalink>format-currency</permalink>
    <refactors-count type="integer">7</refactors-count>
    <title>Format Currency</title>
    <trackback-url></trackback-url>
    <updated-at type="datetime">2009-03-02T19:14:38+00:00</updated-at>
    <user-id type="integer">1046</user-id>
    <user>
      <id type="integer">1046</id>
      <identity-url>http://sfusco.myopenid.com</identity-url>
      <name>slf</name>
      <rating type="float">0.0</rating>
      <refactors-count type="integer">7</refactors-count>
      <website></website>
    </user>
  </code>
  <code>
    <code>category_id = category_name.scan(/category-(\d+)/)[0][0]</code>
    <comment>I try to extract an id from a string. The string will look like "category-1337" and I want my result to be 1337. Surely I could use string slicing but I sometimes I need more power.</comment>
    <created-at type="datetime">2008-12-18T12:39:32+00:00</created-at>
    <id type="integer">674</id>
    <language>Ruby</language>
    <permalink>ugly-regex-accessor</permalink>
    <refactors-count type="integer">9</refactors-count>
    <title>Ugly RegEx Accessor</title>
    <trackback-url></trackback-url>
    <updated-at type="datetime">2008-12-23T06:59:31+00:00</updated-at>
    <user-id type="integer">1228</user-id>
    <user>
      <id type="integer">1228</id>
      <identity-url>http://sebastian.deutsch.myopenid.com</identity-url>
      <name>sebastian.deutsch.myopenid.com</name>
      <rating type="float">0.0</rating>
      <refactors-count type="integer">0</refactors-count>
      <website>http://www.9elements.com</website>
    </user>
  </code>
  <code>
    <code>/(?&lt;part&gt;(w(?!ww)|w(?=www)|w(?=[a-z0-9]+ww)|ww(?=[a-z0-9]+w)|www[a-z0-9]+|[a-vx-z0-9])[a-z0-9]*)\.example\.com/</code>
    <comment>I have written this regular expression which matches any subdomain (consisting of only letters and digits) right below example.com (including the domain, e.g. test.example.com) but not www.example.com. It works, but it is not very elegant.</comment>
    <created-at type="datetime">2008-11-19T18:48:58+00:00</created-at>
    <id type="integer">610</id>
    <language>Ruby</language>
    <permalink>shortest-regular-expression-for-matching-a-subdomain</permalink>
    <refactors-count type="integer">3</refactors-count>
    <title>Shortest regular expression for matching a subdomain.</title>
    <trackback-url></trackback-url>
    <updated-at type="datetime">2008-11-21T12:17:15+00:00</updated-at>
    <user-id type="integer">1179</user-id>
    <user>
      <id type="integer">1179</id>
      <identity-url>https://troethom.loginbuzz.com/</identity-url>
      <name>troethom</name>
      <rating type="float">0.0</rating>
      <refactors-count type="integer">1</refactors-count>
      <website></website>
    </user>
  </code>
  <code>
    <code>/^(?:(\d)[ \-\.]?)?(?:\(?(\d{3})\)?[ \-\.])?(\d{3})[ \-\.](\d{4})(?: ?x?(\d+))?$/

# expanded version w/ comments

/^
  (?:
    (\d)           (?# prefix digit)
    [ \-\.]?       (?# optional separator)
  )?
  (?:
    \(?(\d{3})\)?  (?# area code)
    [ \-\.]        (?# separator
  )?
  (\d{3})          (?# trunk)
  [ \-\.]          (?# separator)
  (\d{4})          (?# line)
  (?:\ ?x?         (?# optional space or 'x')
    (\d+)          (?# extension)
  )?
$/x</code>
    <comment>I wrote this regex for parsing phone numbers. It's PCRE compatible so it will work with Perl, Ruby, Javascript and many other languages, but I like Ruby best so it's classified as Ruby. I'm sure it can be made tighter. What do you think?

EDIT: Left out a necessary space</comment>
    <created-at type="datetime">2008-10-31T16:00:42+00:00</created-at>
    <id type="integer">573</id>
    <language>Ruby</language>
    <permalink>phone-number-regex</permalink>
    <refactors-count type="integer">3</refactors-count>
    <title>Phone number regex</title>
    <trackback-url></trackback-url>
    <updated-at type="datetime">2008-10-31T20:56:43+00:00</updated-at>
    <user-id type="integer">1137</user-id>
    <user>
      <id type="integer">1137</id>
      <identity-url>http://lexmachina.wordpress.com</identity-url>
      <name>Lex</name>
      <rating type="float">0.0</rating>
      <refactors-count type="integer">1</refactors-count>
      <website nil="true"></website>
    </user>
  </code>
  <code>
    <code>package ircbot;

/**
 * JavaBot (version 1.2)
 * 
 * MIT License, dydx (Josh Sandlin) &lt;dydx@thenullbyte.org&gt;
 *  
 */

import java.io.*;
import java.net.*;
import java.util.regex.*;
import java.util.Date;

public class Main
{   
    public static void main( String[] args ) throws IOException
    {
        //connection variables
        String server = "irc.snappeh.com";
        String nick = "JavaBot";
        String login = "JavaBot";
        String channel = "#bots";
        int port = 6667;
        
        //for security
        String owner = "dydx";
        
        try {
            
            //our socket we're connected with
            Socket irc = new Socket( server, port );
            //out output stream
            BufferedWriter bw = new BufferedWriter( new OutputStreamWriter( irc.getOutputStream() ) );
            //our input stream
            BufferedReader br = new BufferedReader( new InputStreamReader( irc.getInputStream() ) );
            
            //authenticate with the server
            bw.write( "NICK " + nick + "\n" );
            bw.write( "USER " + login + " thenullbyte.org JB: Java Bot\n" );
            bw.flush();
            
            //join a channel
            bw.write( "JOIN " + channel + "\n" );
            bw.write( "PRIVMSG " + channel + " :Whats up everybody?\n" );
            bw.flush();
            System.out.println( "Successfully connected to IRC" );
            
            String currLine = null;
            while( ( currLine = br.readLine() ) != null )
            {
                //checks for PING, if one is found; return a PONG
                Pattern pingRegex = Pattern.compile( "^PING", Pattern.CASE_INSENSITIVE ); 
                Matcher ping = pingRegex.matcher( currLine );
                if( ping.find() )
                {
                    bw.write( "PONG " + channel + "\n" );
                    bw.flush();
                }
                
                
                //check for ownership
                Pattern checkOwner = Pattern.compile( "^:"+owner, Pattern.CASE_INSENSITIVE );
                Matcher ownership = checkOwner.matcher( currLine );
                
                //!exit - quit current irc room
                Pattern exitRegex = Pattern.compile( "!exit", Pattern.CASE_INSENSITIVE );
                Matcher exit = exitRegex.matcher( currLine );
                if( exit.find() &amp;&amp; ownership.find() )
                {
                    bw.write( "PRIVMSG " + channel + " :Bye Bye\n" );
                    bw.write( "PART " + channel + "\n" );
                    bw.flush();
                    irc.close();
                }
                
                //!time - return current time
                Pattern timeRegex = Pattern.compile( "!time", Pattern.CASE_INSENSITIVE );
                Matcher time = timeRegex.matcher( currLine );
                if( time.find()  &amp;&amp; ownership.find() )
                {
                    Date d = new Date();
                    bw.write( "PRIVMSG " + channel + " :" + d +"\n" );
                    bw.flush();
                }
                
                //!sayhi - shows a little message saying hello
                Pattern helloRegex = Pattern.compile( "!sayhi", Pattern.CASE_INSENSITIVE );
                Matcher hello = helloRegex.matcher( currLine );
                if( hello.find()  &amp;&amp; ownership.find() )
                {
                    bw.write( "PRIVMSG " + channel + " :Hello, I'm a JavaBot. I was coded by dydx in Java!\n");
                    bw.flush();
                }
                
                //!join &lt;room&gt; - changes to a new room and sets the variables accordingly
                Pattern joinRegex = Pattern.compile( "!join", Pattern.CASE_INSENSITIVE );
                Matcher join = joinRegex.matcher( currLine );
                if( join.find()  &amp;&amp; ownership.find() )
                {
                    String[] token = currLine.split( " " );
                    bw.write( "PRIVMSG " + channel + " :Im going over to " + token[4] + "\n" );
                    bw.write( "PART " + channel + "\n" );
                    channel = token[4];
                    bw.write( "JOIN " + channel + "\n" );
                    bw.flush();
                }
            }
        } catch ( UnknownHostException e ) {
            System.err.println( "No such host" );
        } catch ( IOException e ) {
            System.err.println( "There was an error connecting to the host" );
        } 
    }
}</code>
    <comment>I've been working on this for a few days, and I finally have it to where it actually works like I want it to. I'm new to Java, so I know this isn't the best way I could have coded it, and I'm curious to see how anyone would improve upon it.</comment>
    <created-at type="datetime">2008-09-13T06:04:15+00:00</created-at>
    <id type="integer">490</id>
    <language>Java</language>
    <permalink>java-irc-bot</permalink>
    <refactors-count type="integer">4</refactors-count>
    <title>Java IRC Bot</title>
    <trackback-url></trackback-url>
    <updated-at type="datetime">2008-11-04T11:46:39+00:00</updated-at>
    <user-id type="integer">707</user-id>
    <user>
      <id type="integer">707</id>
      <identity-url>http://joshua.sandlin.myopenid.com</identity-url>
      <name>Ishkur</name>
      <rating type="float">0.0</rating>
      <refactors-count type="integer">14</refactors-count>
      <website>http://jsandlin.org</website>
    </user>
  </code>
  <code>
    <code>require 'net/http'
require 'uri'
require 'strscan'

html = Net::HTTP.get URI.parse("http://www.apple.com")

s = StringScanner.new(html)

while true 
  txt = s.scan_until(/\.jpg/)
  if not s.matched?
    break
  end
  p /src=.(.*jpg)/.match(txt)[1]
end
</code>
    <comment>This code display jpg image url from a web site.
How can we make the code smarter and better ?
Thanks !
</comment>
    <created-at type="datetime">2008-01-30T04:47:55+00:00</created-at>
    <id type="integer">223</id>
    <language>Ruby</language>
    <permalink>display-jpg-image-url-from-a-website</permalink>
    <refactors-count type="integer">5</refactors-count>
    <title>Display jpg image url from a website</title>
    <trackback-url></trackback-url>
    <updated-at type="datetime">2010-01-03T14:40:58+00:00</updated-at>
    <user-id type="integer">485</user-id>
    <user>
      <id type="integer">485</id>
      <identity-url>https://www.google.com/accounts/o8/id</identity-url>
      <name>Gregory Barborini</name>
      <rating type="float">0.0</rating>
      <refactors-count type="integer">2</refactors-count>
      <website></website>
    </user>
  </code>
  <code>
    <code>&lt;?php

function fetch($url,$start,$end){
	$page = file_get_contents($url);
 	$s1=explode($start, $page);
 	$s2=explode($end, $page);
 	$page=str_replace($s1[0], '', $page);
 	$page=str_replace($s2[1], '', $page);
 	return $page;
}

$query = $_GET['query'];

if($_GET['lang'] != '')
{
	$lang = $_GET['lang'];
}
else
{
	$lang = 'it';
}

$xml = fetch("http://".$lang.".wikipedia.org/w/api.php?action=query&amp;prop=revisions&amp;titles=".$query."&amp;rvprop=content&amp;format=xml","&lt;rev&gt;","&lt;/rev&gt;");

/* THIS FUNCTION WAS IN PARSER.PHP */
function doHeadings($text)
{
	for ( $i = 6; $i &gt;= 1; --$i ) 
	{
		$h = str_repeat( '=', $i );
		$text = preg_replace( "/^{$h}(.+){$h}\\s*$/m","&lt;h{$i}&gt;\\1&lt;/h{$i}&gt;\\2", $text );
	}
		
	return $text;
}
/* THIS FUNCTION WAS IN PARSER.PHP */
function doAllQuotes($text)
{
	$outtext = '';
	$lines = explode( "\n", $text );
	foreach ( $lines as $line ) {
		$outtext .= doQuotes ( $line ) . "\n";
	}
	$outtext = substr($outtext, 0,-1);

	return $outtext;
}
/* THIS FUNCTION WAS IN PARSER.PHP */
function doQuotes( $text ) {
		$arr = preg_split( "/(''+)/", $text, -1, PREG_SPLIT_DELIM_CAPTURE );
		if ( count( $arr ) == 1 )
			return $text;
		else
		{
			# First, do some preliminary work. This may shift some apostrophes from
			# being mark-up to being text. It also counts the number of occurrences
			# of bold and italics mark-ups.
			$i = 0;
			$numbold = 0;
			$numitalics = 0;
			foreach ( $arr as $r )
			{
				if ( ( $i % 2 ) == 1 )
				{
					# If there are ever four apostrophes, assume the first is supposed to
					# be text, and the remaining three constitute mark-up for bold text.
					if ( strlen( $arr[$i] ) == 4 )
					{
						$arr[$i-1] .= "'";
						$arr[$i] = "'''";
					}
					# If there are more than 5 apostrophes in a row, assume they're all
					# text except for the last 5.
					else if ( strlen( $arr[$i] ) &gt; 5 )
					{
						$arr[$i-1] .= str_repeat( "'", strlen( $arr[$i] ) - 5 );
						$arr[$i] = "'''''";
					}
					# Count the number of occurrences of bold and italics mark-ups.
					# We are not counting sequences of five apostrophes.
					if ( strlen( $arr[$i] ) == 2 )      { $numitalics++;             }
					else if ( strlen( $arr[$i] ) == 3 ) { $numbold++;                }
					else if ( strlen( $arr[$i] ) == 5 ) { $numitalics++; $numbold++; }
				}
				$i++;
			}

			# If there is an odd number of both bold and italics, it is likely
			# that one of the bold ones was meant to be an apostrophe followed
			# by italics. Which one we cannot know for certain, but it is more
			# likely to be one that has a single-letter word before it.
			if ( ( $numbold % 2 == 1 ) &amp;&amp; ( $numitalics % 2 == 1 ) )
			{
				$i = 0;
				$firstsingleletterword = -1;
				$firstmultiletterword = -1;
				$firstspace = -1;
				foreach ( $arr as $r )
				{
					if ( ( $i % 2 == 1 ) and ( strlen( $r ) == 3 ) )
					{
						$x1 = substr ($arr[$i-1], -1);
						$x2 = substr ($arr[$i-1], -2, 1);
						if ($x1 == ' ') {
							if ($firstspace == -1) $firstspace = $i;
						} else if ($x2 == ' ') {
							if ($firstsingleletterword == -1) $firstsingleletterword = $i;
						} else {
							if ($firstmultiletterword == -1) $firstmultiletterword = $i;
						}
					}
					$i++;
				}

				# If there is a single-letter word, use it!
				if ($firstsingleletterword &gt; -1)
				{
					$arr [ $firstsingleletterword ] = "''";
					$arr [ $firstsingleletterword-1 ] .= "'";
				}
				# If not, but there's a multi-letter word, use that one.
				else if ($firstmultiletterword &gt; -1)
				{
					$arr [ $firstmultiletterword ] = "''";
					$arr [ $firstmultiletterword-1 ] .= "'";
				}
				# ... otherwise use the first one that has neither.
				# (notice that it is possible for all three to be -1 if, for example,
				# there is only one pentuple-apostrophe in the line)
				else if ($firstspace &gt; -1)
				{
					$arr [ $firstspace ] = "''";
					$arr [ $firstspace-1 ] .= "'";
				}
			}

			# Now let's actually convert our apostrophic mush to HTML!
			$output = '';
			$buffer = '';
			$state = '';
			$i = 0;
			foreach ($arr as $r)
			{
				if (($i % 2) == 0)
				{
					if ($state == 'both')
						$buffer .= $r;
					else
						$output .= $r;
				}
				else
				{
					if (strlen ($r) == 2)
					{
						if ($state == 'i')
						{ $output .= '&lt;/i&gt;'; $state = ''; }
						else if ($state == 'bi')
						{ $output .= '&lt;/i&gt;'; $state = 'b'; }
						else if ($state == 'ib')
						{ $output .= '&lt;/b&gt;&lt;/i&gt;&lt;b&gt;'; $state = 'b'; }
						else if ($state == 'both')
						{ $output .= '&lt;b&gt;&lt;i&gt;'.$buffer.'&lt;/i&gt;'; $state = 'b'; }
						else # $state can be 'b' or ''
						{ $output .= '&lt;i&gt;'; $state .= 'i'; }
					}
					else if (strlen ($r) == 3)
					{
						if ($state == 'b')
						{ $output .= '&lt;/b&gt;'; $state = ''; }
						else if ($state == 'bi')
						{ $output .= '&lt;/i&gt;&lt;/b&gt;&lt;i&gt;'; $state = 'i'; }
						else if ($state == 'ib')
						{ $output .= '&lt;/b&gt;'; $state = 'i'; }
						else if ($state == 'both')
						{ $output .= '&lt;i&gt;&lt;b&gt;'.$buffer.'&lt;/b&gt;'; $state = 'i'; }
						else # $state can be 'i' or ''
						{ $output .= '&lt;b&gt;'; $state .= 'b'; }
					}
					else if (strlen ($r) == 5)
					{
						if ($state == 'b')
						{ $output .= '&lt;/b&gt;&lt;i&gt;'; $state = 'i'; }
						else if ($state == 'i')
						{ $output .= '&lt;/i&gt;&lt;b&gt;'; $state = 'b'; }
						else if ($state == 'bi')
						{ $output .= '&lt;/i&gt;&lt;/b&gt;'; $state = ''; }
						else if ($state == 'ib')
						{ $output .= '&lt;/b&gt;&lt;/i&gt;'; $state = ''; }
						else if ($state == 'both')
						{ $output .= '&lt;i&gt;&lt;b&gt;'.$buffer.'&lt;/b&gt;&lt;/i&gt;'; $state = ''; }
						else # ($state == '')
						{ $buffer = ''; $state = 'both'; }
					}
				}
				$i++;
			}
			# Now close all remaining tags.  Notice that the order is important.
			if ($state == 'b' || $state == 'ib')
				$output .= '&lt;/b&gt;';
			if ($state == 'i' || $state == 'bi' || $state == 'ib')
				$output .= '&lt;/i&gt;';
			if ($state == 'bi')
				$output .= '&lt;/b&gt;';
			# There might be lonely ''''', so make sure we have a buffer
			if ($state == 'both' &amp;&amp; $buffer)
				$output .= '&lt;b&gt;&lt;i&gt;'.$buffer.'&lt;/i&gt;&lt;/b&gt;';
			return $output;
		}
	}
	
$xml = doHeadings($xml);
$xml = doAllQuotes($xml);
$xml = str_replace('[[','',$xml); /* WIKIMEDIA IDENTIFY THE CONTENT BETWEEN [[ and ]] AS A LINK AND I REMOVE IT. */
$xml = str_replace(']]','',$xml);

echo $xml;exit;
?&gt;</code>
    <comment>I need to add Wikipedia's informations on my website... I looked at wikimedia documentation for API support and i found this: http://www.mediawiki.org/wiki/API. Now the problem is that the output text i receive in xml is formattet with wikimedia code and i need to convert it in plain html... In the Wikimedia package i found a page, Parser.php, that include some functions that help me to convert something.

Demo of the script as you see here: http://www.federicopepe.com/test/test2.php?query=Metallica&amp;lang=en

I need to delete or format the content between {{ and }} and between &lt;ref&gt; and &lt;/ref&gt;... Maybe with a regex?</comment>
    <created-at type="datetime">2008-01-09T18:37:19+00:00</created-at>
    <id type="integer">208</id>
    <language>PHP</language>
    <permalink>wikipedia-api-and-text-formatting</permalink>
    <refactors-count type="integer">1</refactors-count>
    <title>Wikipedia API and text formatting</title>
    <trackback-url></trackback-url>
    <updated-at type="datetime">2009-08-05T00:22:28+00:00</updated-at>
    <user-id type="integer">456</user-id>
    <user>
      <id type="integer">456</id>
      <identity-url>http://z3ro.myopenid.com</identity-url>
      <name>z3ro</name>
      <rating type="float">0.0</rating>
      <refactors-count type="integer">0</refactors-count>
      <website>http://www.federicopepe.com</website>
    </user>
  </code>
  <code>
    <code>## Base function [javascript]
//function that loops through the elements
function __validateForm(theform){
	numEle=theform.elements.length;
	for(i=0;i&lt;numEle;i++){
		elem=theform.elements[i];
		if(elem.alt){
			parts=elem.alt.split("::");
			regex=new RegExp(parts[0],'i');
			if(!regex.test(elem.value)){
				alert(parts[1]);
				return false;
			}
		}
	}
			
	return true;
}

//init function to bind the validation to the onsubmit
//can take an argument, that is a function for extra validation
function validateForm(){
	var vaditional;
	if(!arguments.length){
		f=function(){
			return __validateForm(this);
		}
	}else{
		vaditional=arguments[0];
		f=function(){
			ret=__validaForm(this);
			if(ret)
				return vaditional(this);
			return false;
		}
	}
	return f;
}

##case of use 1 [html]
&lt;html&gt;
&lt;script&gt;
function init(){
   document.forms.registro.onsubmit=validateForm();
}
window.onload=init;
&lt;/script&gt;
&lt;body&gt;
&lt;form name="registro" id="registro" action="procesa_login.html" method="post"&gt;
					&lt;fieldset&gt;
						&lt;legend&gt;Login&lt;/legend&gt;
						&lt;p&gt;Username:&lt;/p&gt;
						&lt;input type="text" name="usuario" alt="^[a-z0-9_-]{4,12}$::Specifie a username between 4 and 12 alfanumeric character. No whit space allowed" /&gt;
						&lt;p&gt;Password:&lt;/p&gt;
						&lt;input type="password" name="password" alt="^.{4,12}$::Introduce a password between 4 and 12 characters" /&gt;
					&lt;/fieldset&gt;
					&lt;input type="submit" name="submit" value="Enviar" /&gt;
				&lt;/form&gt;
&lt;/body&gt;
&lt;/html&gt;

##case of use 2 [html]
&lt;html&gt;
&lt;script&gt;
//in this case we use a function to compare the two passwords fields
function comparePasswords(theform){
   if(theform.password.value!=theform.repassword.value){
		alert('The two passwords are not equel');
		return false;
   }
   return true;
}

function init(){
    //we atach the comparePasswords to the validateForm
   document.forms.registro.onsubmit=validateForm(comparePasswords);
}
window.onload=init;
&lt;/script&gt;
&lt;body&gt;
&lt;form name="registro" id="registro" action="procesa_login.html" method="post"&gt;
					&lt;fieldset&gt;
						&lt;legend&gt;Login&lt;/legend&gt;
						&lt;p&gt;Username:&lt;/p&gt;
						&lt;input type="text" name="usuario" alt="^[a-z0-9_-]{4,12}$::Specifie a username between 4 and 12 alfanumeric character. No whit space allowed" /&gt;
						&lt;p&gt;Password:&lt;/p&gt;
						&lt;input type="password" name="password" alt="^.{4,12}$::Introduce a password between 4 and 12 characters" /&gt;
                                               &lt;p&gt;Reenter Password:&lt;/p&gt;
						&lt;input type="password" name="repassword" /&gt;
					&lt;/fieldset&gt;
					&lt;input type="submit" name="submit" value="Enviar" /&gt;
				&lt;/form&gt;
&lt;/body&gt;
&lt;/html&gt;</code>
    <comment>This is a real simple form validation function. It loops through all from elements and if it has set the attribute atr it makes a split with :: takes the first part as a RegEx and the second one the error message</comment>
    <created-at type="datetime">2007-11-11T10:32:46+00:00</created-at>
    <id type="integer">149</id>
    <language>JavaScript</language>
    <permalink>simple-form-validator-with-regex</permalink>
    <refactors-count type="integer">5</refactors-count>
    <title>Simple form validator with RegEx</title>
    <trackback-url></trackback-url>
    <updated-at type="datetime">2010-01-11T20:29:18+00:00</updated-at>
    <user-id type="integer">293</user-id>
    <user>
      <id type="integer">293</id>
      <identity-url>http://jdeveloper.myopenid.com</identity-url>
      <name>eljota</name>
      <rating type="float">0.0</rating>
      <refactors-count type="integer">2</refactors-count>
      <website></website>
    </user>
  </code>
</codes>
