1
curl http://www.cnn.com | perl -ne 'm/>([^<].*?[^>])<\// && print$1."\n"'
Refactorings
No refactoring yet !
V
November 13, 2007, November 13, 2007 22:57, permalink
didn't test it, so this could be wrong.
1
curl http://www.cnn.com | perl -ne 'm/>([^<>]*?)<\// && print$1."\n"'
griflet
November 20, 2007, November 20, 2007 18:00, permalink
Tested. Works. I also added a sed command to remove blank lines. Anyone cares to insert that in the perl one-liner, for sports?
1
curl http://www.cnn.com | perl -ne 'm/>([^<>]*?)<\// && print$1."\n"' | sed -e '/^$/d'
pascal.charest
February 5, 2008, February 05, 2008 21:19, permalink
Here is another version.
Using curl -s flag enable silent mode, you won't have a progress bar on your terminal output.
Using +? instead of *? remove a lot of empty line that were matched by a succession of tags.
1
curl -s http://www.cnn.com | perl -ne 'm/>([^<>]+?)<\// && print$1."\n"'
Hello,
I'm a web-scrapping enthusiast and I script short one-liners in bash using sed, awk, perl, grep, tail, head, tr,... that sort of programs. Here's a really cool perl one-liner that basically extracts values from any xml(html) tag. You should try it. Can you make it shorter, or any more powerful?
Cheers,
Guillaume