Avatar

Is it good way to extract pattern match from string?

1
2
3
4
5
6
import re

def process_html(str):
    pattern = re.compile('<object ([\w="\d+"]|\s)+>([\x20-\x7E\s])+</object>')
    match   = pattern.match(str)
    return match.group()

Refactorings

No refactoring yet !

7f69b0a9f0a030c37dca69736abb9f39

nicerobot

March 17, 2010, March 17, 2010 12:46, permalink

No rating. Login to rate!

1. Regular Expressions Are Not A Good Idea for Parsing XML, HTML, or e-mail Addresses http://wiki.tcl.tk/4164
2. Your code refers to matches as 'm' (line 6) but the matches are named 'match' (line 5).
3. Groups are referenced by specifying the (one-based) group number.
Note: I didn't change your re. I just changed lines 5 and 6.

1
2
3
4
5
6
import re

def process_html(str):
    pattern = re.compile('<object ([\w="\d+"]|\s)+>([\x20-\x7E\s])+</object>')
    m = pattern.match(str)
    return m.group(1)
Avatar

rullon.myopenid.com

March 17, 2010, March 17, 2010 13:33, permalink

No rating. Login to rate!

2nicerobot, thx for reply!
goal was to clean vimeo(or any other service) embed player code. so i decided to not parse anything.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
we have: 
--------

<object width="400" height="300"><param name="allowfullscreen" value="true" />
    <param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=9851483&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" />
    <embed src="http://vimeo.com/moogaloop.swf?clip_id=9851483&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="400" height="300"></embed>
</object>
<p><a href="http://vimeo.com/9851483">Gorillaz - Stylo</a> from <a href="http://vimeo.com/uccimaru">mario ucci</a> on <a href="http://vimeo.com">Vimeo</a>.</p>

we want:
--------

<object width="400" height="300"><param name="allowfullscreen" value="true" />
    <param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=9851483&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" />
    <embed src="http://vimeo.com/moogaloop.swf?clip_id=9851483&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="400" height="300"></embed>
</object>
7f69b0a9f0a030c37dca69736abb9f39

nicerobot

March 17, 2010, March 17, 2010 17:28, permalink

1 rating. Login to rate!

If that's the entire file, all the time, i'd replace "<p><a.*" with "".
And i'd do it with perl!

1
perl -pi -e 's|<p><a.*||s'

Your refactoring





Format Copy from initial code

or Cancel