Dapper is a web based application for generating XML for website content. You create “Dapps” (web services) by using Dapper’s virtual browser to grab content from web pages. Dapper is trained by feeding it several example urls that hold examples of content you’re interested in. Dapper looks at the similarities between the pages to take a guess at the important content to pull from the page. After Dapper has analyzed the page, you can narrow down the fields on the page you want to track.
Wednesday, March 7, 2007
I haven't had time to play with most of these, but Dapper seems like a strong candidate for making web page scraping easier: