Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 7 of 7

Thread: URLConnection inconsistency + SSCCE

  1. #1
    Forum VIP
    Join Date
    Oct 2010
    Posts
    275
    My Mood
    Cool
    Thanks
    32
    Thanked 54 Times in 47 Posts
    Blog Entries
    2

    Default URLConnection inconsistency + SSCCE

    So I decided to try out URL connections and requests and whatnot, and I didn't want to use any third-party libraries for my first time, and I am quite confused why it is being very, very inconsistent. The idea was to use wiktionary's api for a dictionary for a small application. However, the URLConnection will seemingly randomly return 0 after the first few calls, and then sometimes get two more useful connections and then back to 0. This might be more clear with a SSCCE to demonstrate the problem.

    import java.net.*;
    import java.io.*;
    /**
     * This class is an SSCEE to demonstrate the inconsistency
     * in the URLConnection class
     */
    public class URLConnectionTest
    {
      public static void main(String[] args) throws Exception //No error handling to keep it short and sweet
      {
        for(int i = 0; i < 15; i++)
        {
          HttpURLConnection connection = (HttpURLConnection)new URL("http://en.wiktionary.org/wiki/amo?action=raw").openConnection();
          InputStream in = connection.getInputStream();
          System.out.println((i+1)+". Expected bytes = "+in.available());
          in.close();
          connection.disconnect();
        }
      }
    }

    Average case:
    1. Expected bytes = 2896
    2. Expected bytes = 0
    3. Expected bytes = 0
    4. Expected bytes = 0
    5. Expected bytes = 0
    6. Expected bytes = 0
    7. Expected bytes = 0
    8. Expected bytes = 0
    9. Expected bytes = 0
    10. Expected bytes = 0
    11. Expected bytes = 0
    12. Expected bytes = 0
    13. Expected bytes = 0
    14. Expected bytes = 0
    15. Expected bytes = 0

    Edit:
    The response code is always 200. [HTTP_OK]
    Edit 2:
    The first time it takes around 330 ms, the rest take around 220 ms most of the time, but every once in a while the first time is faster. Also, no amount of delay/forcing the gc makes any difference.
    Last edited by Tjstretch; March 14th, 2012 at 04:59 PM.


  2. #2
    Super Moderator Sean4u's Avatar
    Join Date
    Jul 2011
    Location
    Tavistock, UK
    Posts
    637
    Thanks
    5
    Thanked 103 Times in 93 Posts

    Default Re: URLConnection inconsistency + SSCCE

    You have to remember that the program that's sending the data at the other end of that stream is running asynchronously. You've only just opened the connection: chances are all that has been sent is headers and they might have been enough to fill a network packet with little left over for content. Read the API doc for InputStream.available. It sounds awkward because it doesn't give you much: the number of bytes you can read without blocking. You're doing web requests, so expect plenty of blocking reads. Put a BufferedReader or something on that InputStream and invoke readLine() until you get to the end of stream. You'll see something much better.

  3. The Following User Says Thank You to Sean4u For This Useful Post:

    Tjstretch (March 15th, 2012)

  4. #3
    Administrator copeg's Avatar
    Join Date
    Oct 2009
    Location
    US
    Posts
    5,320
    Thanks
    181
    Thanked 833 Times in 772 Posts
    Blog Entries
    5

    Default Re: URLConnection inconsistency + SSCCE

    Also note you are bombarding the server with 15 requests in an extraordinarily short time span, which could be recognized as a denial of service hacker attack (in other words the responses you are getting could be a preventative measure against this sort of attack).

  5. The Following User Says Thank You to copeg For This Useful Post:

    Tjstretch (March 15th, 2012)

  6. #4
    Super Moderator Sean4u's Avatar
    Join Date
    Jul 2011
    Location
    Tavistock, UK
    Posts
    637
    Thanks
    5
    Thanked 103 Times in 93 Posts

    Default Re: URLConnection inconsistency + SSCCE

    I just had another look at your code. When you're doing stuff with URLConnection bear in mind that Java uses persistent connections:

    HTTP Persistent Connections

    And that entity you requested is longer than 2896 (first value printed) bytes:

    sean@bulldozer:~$ curl -s -v http://en.wiktionary.org/wiki/amo?action=raw
    * About to connect() to en.wiktionary.org port 80 (#0)
    *   Trying 91.198.174.226... connected
    * Connected to en.wiktionary.org (91.198.174.226) port 80 (#0)
    > GET /wiki/amo?action=raw HTTP/1.1
    > User-Agent: curl/7.21.6 (x86_64-pc-linux-gnu) libcurl/7.21.6 OpenSSL/1.0.0e zlib/1.2.3.4 libidn/1.22 librtmp/2.3
    > Host: en.wiktionary.org
    > Accept: */*
    > 
    * HTTP 1.0, assume close after body
    < HTTP/1.0 200 OK
    < Date: Wed, 14 Mar 2012 23:41:39 GMT
    < Server: Apache
    < X-Content-Type-Options: nosniff
    < Cache-Control: public, s-maxage=0, max-age=2678400
    < Last-Modified: Thu, 08 Mar 2012 16:36:45 GMT
    < Vary: Accept-Encoding
    < Content-Length: 4115

    So it looks as though what's happening is that wiktionary has transferred nearly 3,000 bytes to you, but there's still another 1,000-odd to be transferred before the response is completed. Your next request will not be serviced because it's 'behind' the first one. Read each request completely and you won't get this output.

    copeg makes a good point: if you're writing robots you should attempt to adhere to the Robots Exclusion Standard. Wiktionary.org's /robots.txt hasn't anything in it to bother you, but you should at least be naming your bot (set the User-Agent header - it'll be something like "Java 1.6" and many sites Disallow and occasionally block robots whose authors can't be arsed to name them). Rate-limiting isn't in the standard, but Crawl-delay is a common declaration - so some people do care about too-rapid requests!

    The Web Robots Pages

  7. The Following User Says Thank You to Sean4u For This Useful Post:

    Tjstretch (March 15th, 2012)

  8. #5
    Forum VIP
    Join Date
    Oct 2010
    Posts
    275
    My Mood
    Cool
    Thanks
    32
    Thanked 54 Times in 47 Posts
    Blog Entries
    2

    Default Re: URLConnection inconsistency + SSCCE

    Wow, thank you for all of the help! I managed to get the SSCEE to consistently return 4115 bytes and an appropriate response.

    Again, than you!

    Working program if anyone is curious
     
    import java.net.*;
    import java.io.*;
    /**
     * This class is a test of the URLConnection class
     * 
     * @author Timothy Moore
     */
    public class URLConnectionTest
    {
      public static void main(String[] args) throws Exception //No error handling to keep it short and sweet
      {
        long start = System.currentTimeMillis();
        HttpURLConnection connection = (HttpURLConnection)new URL("http://en.wiktionary.org/wiki/amo?action=raw").openConnection();
        connection.setRequestProperty("User-Agent", "Timothy Moore");
        InputStream in = connection.getInputStream();
        System.out.println("   Response Code = "+connection.getResponseCode());
        DataInputStream dis = new DataInputStream(in);
        int counter = 0;
        try
        {
          while(true)
          {
            dis.readByte();
            counter++;
          }
        }catch(EOFException exc)
        {
          System.out.println("End of file at "+counter+" bytes");
        }
        dis.close();
        connection.disconnect();
        long time = System.currentTimeMillis()-start;
        System.out.println("   Time: "+time+"ms");
      }
    }
    Last edited by Tjstretch; March 15th, 2012 at 05:37 PM.

  9. #6
    Super Moderator Sean4u's Avatar
    Join Date
    Jul 2011
    Location
    Tavistock, UK
    Posts
    637
    Thanks
    5
    Thanked 103 Times in 93 Posts

    Default Re: URLConnection inconsistency + SSCCE

    I'm glad you got your code working. All you need to do now is to edit your comment. URLConnection is working the way it's supposed to - even though it might seem a bit odd at first. Wiktionary.org is likely to give you an over-optimistic impression of how URLConnection works. If you do a lot of automated HTTP fetching, you'll discover a lot of variability in how quickly / reliably servers respond! If you're planning to write an application based on someone's API, remember to code in some caching, to set your User-Agent header, and to test how well your code works when you pull your network cable out!

  10. The Following User Says Thank You to Sean4u For This Useful Post:

    Tjstretch (March 15th, 2012)

  11. #7
    Forum VIP
    Join Date
    Oct 2010
    Posts
    275
    My Mood
    Cool
    Thanks
    32
    Thanked 54 Times in 47 Posts
    Blog Entries
    2

    Default Re: URLConnection inconsistency + SSCCE

    Yeah, I finished the little application that I had need the URLConnection for, it seems to take anywhere from 250 to 1000ms to respond and be parsed. It ended up parsing a raw page from the wiktionary [The wiktionary's version of creating a program-readable version of a page]. Now that the URL Connection is working, it works perfectly. Usually only calls every few minutes max, and it's only shared in-house, so I don't have to worry about getting yelled at for flooding the server.

    Again, thanks for all your help!