Web Crawling is too Slow!
Hello Sir,
my code (pasted below) working properly but at very slow rate. can you please suggest any changes required (if any) from programming side to make to work fast. thanks in advance.
Code java:
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import java.net.PasswordAuthentication;
import java.net.*;
import java.io.*;
import java.lang.Object;
class ProxyAuthenticator extends java.net.Authenticator {
private String user, password;
public ProxyAuthenticator(String user, String password) {
this.user = user;
this.password = password;
}
protected java.net.PasswordAuthentication getPasswordAuthentication() {
return new PasswordAuthentication(user, password.toCharArray());
}
}
public class CrawlWeb1 {
public static void main(String args[]) {
try {
Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(<IPAddr>, <Port>));
Authenticator.setDefault(new ProxyAuthenticator(<username>, <password>));
String surl = "http://www.timesofindia.com/";
URL asksearch = new URL(surl);
URLConnection yc = asksearch.openConnection(proxy);
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
String inputLine;
FileWriter fw = new FileWriter("timeofindia.html");
BufferedWriter out = new BufferedWriter(fw);
Boolean body = false;
while ((inputLine = in.readLine()) != null) {
out.write(inputLine);
}
System.out.println("Crawling Done...! URL : " +surl);
out.close();
fw.close();
in.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Re: Web Crawling is too Slow!
That is essentially as fast as that code is going to be (other than possibly a few insignificant optimizations).
The only way I can see this code running faster is if your network speed gets faster, or if the website you're connecting to decides to allocate more bandwidth for you. Whichever is the slower of the two will limit what your actual download speed is.
Re: Web Crawling is too Slow!
In what way is your crawl slow, where exactly does the code stop for a while?
Another reason for the url fetch to be slow could be the fact that the server you are connecting to is very slow in handling requests.
// Json
Re: Web Crawling is too Slow!
thank you both. my n/w is very slow now a days!. just want to confirm that nothing can be done from programming side.
Re: Web Crawling is too Slow!
I don't really think much can be done from the programming side. The code looks OK to me. I think the speed will totally depend on the internet connection and server response time.
Re: Web Crawling is too Slow!
Another thing which can be a MAJOR factor is the DNS lookup but from a programming/code perspective there's probably not much you can do without getting into a lower level.
// Json