Crawling through a site: simulating a browser as close as possible
I'm creating a piece of software that basically crawls through several pages getting ALL the links on the page.
That's the reason of my title. I need to crawl through ALL of the links on a page, but on a decent speed... JUST as a browser does.
My questions are:
1. Recommendations on a parser to do this job? I already tried with JerichoParser, and it is pretty slow (the JS part)
2. Which variety of links and technologies will I encounter (besides HTML and JS)? I need a parser that handles all of them on an efficient (and easy!) way.
Thanks in advance!