scraping websites...
Jun 26, 2007 · 1 minute readcode
tools
many times, i find myself having to scrape a website for any particular reason. now a days, if i need to do it, i’d probably do it with some version of mechanize (www::mechanize in perl, hpricot in ruby, etc). when i was looking for a bug in one of the scrapers i’d written a long time ago, what took me by surprise was that i’d written a lexer to do it.
i guess this was shortly after i’d taken “languages and interpreters” in college, in which we used [f]lex and yacc/(bison). i just figured it was interesting, after working with a technology, that we try to utilize it. its not necessarily a bad way to do things, i just would do things differently now…