Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 15 of 15

Thread: ChunkedInputStream Error? Help Please

  1. #1
    Junior Member
    Join Date
    Jul 2012
    Posts
    7
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Exclamation ChunkedInputStream Error? Help Please

    Hey guys,
    I have made an application that reads of a website and parses it's HTML, however when i go through more than one page or so(maybe 2-3) its throws out an error. This only happens when i parse lots of pages at once...

    This is the Error:


    java.io.IOException: Premature EOF	
            at sun.net.[url]www.http.ChunkedInputStream.readAheadBlocking(Unknown[/url] Source)
    	at sun.net.[url]www.http.ChunkedInputStream.readAhead(Unknown[/url] Source)
    	at sun.net.[url]www.http.ChunkedInputStream.read(Unknown[/url] Source)
    	at java.io.FilterInputStream.read(Unknown Source)
    	at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(Unknown Source)
    	at sun.nio.cs.StreamDecoder.readBytes(Unknown Source)
    	at sun.nio.cs.StreamDecoder.implRead(Unknown Source)
    	at sun.nio.cs.StreamDecoder.read(Unknown Source)
    	at java.io.InputStreamReader.read(Unknown Source)
    	at java.io.BufferedReader.fill(Unknown Source)
    	at java.io.BufferedReader.readLine(Unknown Source)
    	at java.io.BufferedReader.readLine(Unknown Source)
    	at Youtube1.main(Youtube1.java:40)



    Line 40 is:


    If you know how to help and would like to i can PM you the full code.


    Thanks for the help guys!

    PS.
    I'm kinda new to java and this is my first project.
    Last edited by onlyhereonce; July 2nd, 2012 at 06:52 AM.


  2. #2
    Member
    Join Date
    Apr 2012
    Location
    Superior, CO, USA
    Posts
    80
    Thanks
    0
    Thanked 14 Times in 14 Posts

    Default Re: ChunkedInputStream Error? Help Please

    We need to see more code. A "Premature EOF" (end of file) means that the file ended (for example, the sender stopped sending it) before the reciever (your code) expected it.

    Can you show a SSCCE?
    Need Java help? Check out the HotJoe Java Help forums!

  3. #3
    Junior Member
    Join Date
    Jul 2012
    Posts
    7
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Default Re: ChunkedInputStream Error? Help Please

    I'll post the whole code, here:

    import java.util.*;
    import java.net.*;
    import java.io.*;
    import javax.swing.JOptionPane;
     
    public class Youtube{
       public static void main(String args[]){
     
    	  String video;
          String comment;
          int number = 0;
          InputStream is = null;
          String line;
          URL page;
          boolean check = false;
          int namepos1;
          int namepos2;
          int reply;
          List<String> names = new ArrayList<String>();     
     
          video = JOptionPane.showInputDialog("Please Enter The First Youtube Comment Page URL:", "http://www.youtube.com/all_comments?v=******");
     
          String Scomments = JOptionPane.showInputDialog("Please Enter The Total Number Of Comments:", "500");
     
          double comments = Double.parseDouble(Scomments);
     
          int pages = (int) Math.ceil( comments /  500d );
          number++;      
          comment = video + "&page=" + number;
     
             try{
               while(number <= pages){
               	    comment = video + "&page=" + number;
            	    number++;                       
                    page = new URL(comment);                
                    is = page.openStream(); 
                    BufferedReader br = new BufferedReader(new InputStreamReader(is));
     
                    while ((line = br.readLine()) != null){       
                        if(check){
                        namepos1 = line.indexOf("yt-user-name ");
                        namepos2 = line.lastIndexOf("<");
                        reply = line.indexOf("in reply to");
     
                        if(namepos1 > 0 && namepos2 > 0 && reply < 0){
                           int back=line.lastIndexOf("<");
                           int front=line.indexOf("yt-user-name ") + 25;
                           String aaa=line.substring(front , back);
                           if(!names.contains(aaa)) names.add(aaa);  
                           System.out.println("Found Username");
                        }else{}
     
                        check = false;                                       
                        }                    
                        if(line.indexOf("author ") != -1) check = true;
     
                      } 
                      is.close();
         } 
               System.out.println("Done. Loading Unique UserNames...");
     
              }catch (MalformedURLException mue) {
                    mue.printStackTrace();
              } catch (IOException ioe) {
                   ioe.printStackTrace();
              }catch(Exception e){
                   e.printStackTrace();
             }
     
                System.out.println("\n \n");
                System.out.println("List Of Unique Commenter Usernames:");
                for(String name: names){
                   System.out.println(name);
                }
     
     
           Random r = new Random();
           int winner = r.nextInt(names.size());
           JOptionPane.showMessageDialog(null, "The Random Commenter Username Is:   " + names.get(winner), "Random Commenter", JOptionPane.PLAIN_MESSAGE );
           System.out.println("\n\n\nThe Random Commenter Username Is:   " + names.get(winner));
     
     
     
     
        }
     
    }

    Basically You enter a Youtube comment page URL, and tell the program how many comments you want it to go through.

    Then it'll grab all the usernames and add them to a list except duplicates. Then generate a random winner.

    - for youtube give aways.

    500 comments per page so i can run it up to 1000 sometimes 1500 but after that its pauses for ages and gives me an error.
    Last edited by onlyhereonce; July 2nd, 2012 at 11:33 AM.

  4. #4
    Junior Member
    Join Date
    Jul 2012
    Posts
    7
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Default Re: ChunkedInputStream Error? Help Please

    I did some research on chunked Transfer encoding and some one said the error is related to it, but i cant fit it together. Chunked transfer encoding - Wikipedia, the free encyclopedia

  5. #5
    Member
    Join Date
    Apr 2012
    Location
    Superior, CO, USA
    Posts
    80
    Thanks
    0
    Thanked 14 Times in 14 Posts

    Default Re: ChunkedInputStream Error? Help Please

    Geez, if I was YouTube I'd kick your program out too. I'm surprised you got this far. The problem is that you're not talking HTTP - you're thinking in terms of a raw Socket stream. You can't just keep requesting data and expect the server to give it to you. You've got to follow the protocol. Additionally, you're making your life horrible by trying to parse HTML through code.

    Lastly, you're breaking the YouTube terms of service:

    You agree not to access Content through any technology or means other than the video playback pages of the Service itself, the Embeddable Player, or other explicitly authorized means YouTube may designate.
    So, where do you go from here? Your best bet is to use the Google Java client library that includes a small YouTube client to access the data. This hides many of the details of dealing with the HTTP protocol and doesn't violate any terms of service.

    In short - I'd rethink your design. Ultimately I'm guessing that YouTube has something to govern the kind of abuse you're throwing at their servers and/or there is something HTTP is telling you that you're ignoring. This may be a Cookie, a redirect, or something else.
    Need Java help? Check out the HotJoe Java Help forums!

  6. #6
    Junior Member
    Join Date
    Jul 2012
    Posts
    7
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Default Re: ChunkedInputStream Error? Help Please

    Firstly Youtube wouldn't carry out any action against me if i made this public. If they really cared we wouldn't see any of these mp3 scrapers.
    Plus if they did they would probably give me notice to take it down first, which i would do.

    Additionally this is just replicating what a human could do, but making it faster and easier.

    And also, I'm not really in looking into the Youtube API, but are you saying this is the only way to do this as its the correct 'protocol'?
    If not what is the correct protocol and how can i adapt my current code to make it follow it?

    Sorry I'm new, and thanks for your help.

  7. #7
    Junior Member
    Join Date
    Jul 2012
    Posts
    7
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Default Re: ChunkedInputStream Error? Help Please

    Hmmm, i looked into the API, Just one question, it only displays the comments off the first video page, using the video ID i can only get these comments. How would i display all the comments? or navigate to each page?



    Thanks.
    Last edited by onlyhereonce; July 2nd, 2012 at 01:52 PM.

  8. #8
    Member
    Join Date
    Jul 2012
    Posts
    119
    Thanks
    0
    Thanked 19 Times in 19 Posts

    Default Re: ChunkedInputStream Error? Help Please

    The message
    "java.io.IOException: Premature EOF
    at sun.net.http://www.http.ChunkedInputStream.r...ocking(Unknown Source)
    ...."
    is thrown if the input stream is NOT a real ChunkedInputStream

    A stream is called ChunkedInputStream always begins with a hex string telling about the size of the incoming data (chunk). When an ChunkedInputStream is initialized this Chunk-Size is read if an EOF (i.e. -1) is encountered before the string \r\n is read then Premature EOF is thrown.

    The Format is: nnnn\r\ndata.

    nnnn: hex value (Ascii, variable length
    data: the incoming data (a chunk of the given size nnnn)

  9. The Following User Says Thank You to Voodoo For This Useful Post:

    onlyhereonce (July 7th, 2012)

  10. #9
    Member
    Join Date
    Jul 2012
    Posts
    119
    Thanks
    0
    Thanked 19 Times in 19 Posts

    Default Re: ChunkedInputStream Error? Help Please

    Quote Originally Posted by onlyhereonce View Post
    I did some research on chunked Transfer encoding and some one said the error is related to it, but i cant fit it together
    It fits because huge data (independent on YouTube) are usually sent in chunks (groups of data) so that a browser can cache them and avoids to hog the traffic. If you want to avoid this problem you have to catch the reply HTTP header and look at the keyword "Transfer-Encoding: xxxx" . If it says chunked the incoming data stream is chunked:
    size\r\ndata <pause or so>size\r\ndata <pause>....EOF
    where:size n hex-digits
    Data: any format
    EOF: -1
    You can implement for yourself the ChunkedInputStream.java with 1 method read()
    For example:
    public class ChunkedInputStream extends InputStream {
        private boolean isahead  = false;
        private StringBuffer buf;
        private InputStream in;
        private int ahead = -1;
        private int clen, ch;
        private boolean eof;
        //
        public ChunkedInputStream(InputStream in) throws IOException {
            this.in  = in;
            getChunk();
        }
        //
        protected boolean getChunk() throws IOException {
        	buf = new StringBuffer();
            while (true) {
            	ch = in.read();
            	if (ch == -1) throw new IOException("Premature end of chunked stream.");
            	else if (ch > 0x20 && ch < 0x7F) buf.append((char)ch);
            	else if (ch == '\n') break;
            	else if (ch == '\r') {
            		if ((ch = in.read()) != '\n') {
            			ahead = ch;
            			isahead = true;
            		}
            		break;
            	}
            }
            // Parse the buffer content as an hex number:
            char c;
            clen = 0;
            int fac = 1;
            int len = buf.length();
            while (--len >= 0) {
            	c = buf.charAt(len);
            	ch = (c > '9') ? 9 + (c & 0x0F) : (c & 0x0F);
            	if (ch > 15) break;
                	clen += (fac * ch);
                	fac *= 16;
            }
            eof = (clen == 0);
            return eof;
        }
        //
        protected boolean nextChunk() throws IOException {
            if (eof) return true;
            ch = in.read();     // '\r'
            ch = in.read();     // '\n'
            return getChunk();
        }
        //
        public int read()  throws IOException {
            if (clen == 0) {
                if (nextChunk()) return -1;
                if ( isahead ) {
                    --clen;
                    isahead = false;
                    return ahead;
                }
            } 
            --clen;
            return in.read();
        }
    }

  11. The Following User Says Thank You to Voodoo For This Useful Post:

    onlyhereonce (July 7th, 2012)

  12. #10
    Junior Member
    Join Date
    Jul 2012
    Posts
    7
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Default Re: ChunkedInputStream Error? Help Please

    Thanks Voodoo! Check your pm please

  13. #11
    Member
    Join Date
    Jul 2012
    Posts
    119
    Thanks
    0
    Thanked 19 Times in 19 Posts

    Cool Re: ChunkedInputStream Error? Help Please

    onlyoncehere
    For the sake of the community I explain the ChunkedInputStream codes here.
    * ChunkedInputStream is an extend of the InputStream so that it inherits every NO-overwritten methods (e.g. skip(), read(bye[]), etc.) of InputStream.
    * Now to the codes: A chunk has a format: nnnnCRLFdataCRLF (CRLF CarriageReturnLineFeed)
    Method getChunk(): Here the incoming data (chunked data) are read byte-wise and stored only the ASCII characters in buf: if (ch > 0x20 && ch < 0x7F). And stop at the end of a chunk (terminated by \r\n or CRLF and ahead is set to true (for next possible chunk). If -1 is encountered before a chunk end PREMATURE EOF is thrown. The last part is the calculation (convert the size in Ascii-hex to an int) of the chunk-size. EOF is set if clen=0 (OR: no more data).
    protected boolean getChunk() throws IOException {
        	buf = new StringBuffer();
            while (true) {
            	ch = in.read();
            	if (ch == -1) throw new IOException("Premature end of chunked stream.");
            	else if (ch > 0x20 && ch < 0x7F) buf.append((char)ch);
            	else if (ch == '\n') break;
            	else if (ch == '\r') {
            		if ((ch = in.read()) != '\n') {
            			ahead = ch;
            			isahead = true;
            		}
            		break;
            	}
            }
            // Parse the buffer content as an hex number:
            char c;
            clen = 0;
            int fac = 1;
            int len = buf.length();
            while (--len >= 0) {
            	c = buf.charAt(len);
            	ch = (c > '9') ? 9 + (c & 0x0F) : (c & 0x0F);
            	if (ch > 15) break;
                	clen += (fac * ch);
                	fac *= 16;
            }
            eof = (clen == 0);
            return eof;
        }
        //
    The reply HTTP header Transfer-Encoding: chunked tells you when you have expect a chunked data stream so that you can "switch" your reading method accordingly. More you should consult the HtttpConnection class

  14. #12
    Member
    Join Date
    Jul 2012
    Posts
    119
    Thanks
    0
    Thanked 19 Times in 19 Posts

    Smile Re: ChunkedInputStream Error? Help Please

    ....sorry I was out for some errands
    back to the method Method getChunk(). This method just read the Ascii-hex nnnn...nn in order to determine how big the expected chunk will be. The buf includes every character that begins after the blank (0x20) and ends before the tilde (~) or hex 0x7F. Theoretically you can accept only 0....9 A...F as following:
    if (ch >= '0' && ch <= '9' || ch >= 'A' && ch <= 'F' ) buf.append((char)ch);
    * The method nextChunk() reads the next chunk size and is invoked internally at begin and within the basic method int read()
    protected boolean [B]nextChunk()[/B] throws IOException {
            if (eof) return true;
            ch = in.read();     // '\r'
            ch = in.read();     // '\n'
            return getChunk();
     }
    * The boolean ahead tells the read() method that a byte (or int) is already read. Otherwise it return the "original" in.read() and the chunk-size clen decrements.
    * The reply HTTP header can be extracted by this method public String getHeaderField(int n) of Class HttpURLConnection

    Hope you have enough info for your forthcoming.

  15. #13
    Junior Member
    Join Date
    Jul 2012
    Posts
    7
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Default Re: ChunkedInputStream Error? Help Please

    Thanks for the help voodoo i kinda understand it more.


    Quote Originally Posted by Voodoo View Post
    public class ChunkedInputStream extends InputStream {
        private boolean isahead  = false;
        private StringBuffer buf;
        private InputStream in;
        private int ahead = -1;
        private int clen, ch;
        private boolean eof;
        //
        public ChunkedInputStream(InputStream in) throws IOException {
            this.in  = in;
            getChunk();
        }
        //
        protected boolean getChunk() throws IOException {
        	buf = new StringBuffer();
            while (true) {
            	ch = in.read();
            	if (ch == -1) throw new IOException("Premature end of chunked stream.");
            	else if (ch > 0x20 && ch < 0x7F) buf.append((char)ch);
            	else if (ch == '\n') break;
            	else if (ch == '\r') {
            		if ((ch = in.read()) != '\n') {
            			ahead = ch;
            			isahead = true;
            		}
            		break;
            	}
            }
            // Parse the buffer content as an hex number:
            char c;
            clen = 0;
            int fac = 1;
            int len = buf.length();
            while (--len >= 0) {
            	c = buf.charAt(len);
            	ch = (c > '9') ? 9 + (c & 0x0F) : (c & 0x0F);
            	if (ch > 15) break;
                	clen += (fac * ch);
                	fac *= 16;
            }
            eof = (clen == 0);
            return eof;
        }
        //
        protected boolean nextChunk() throws IOException {
            if (eof) return true;
            ch = in.read();     // '\r'
            ch = in.read();     // '\n'
            return getChunk();
        }
        //
        public int read()  throws IOException {
            if (clen == 0) {
                if (nextChunk()) return -1;
                if ( isahead ) {
                    --clen;
                    isahead = false;
                    return ahead;
                }
            } 
            --clen;
            return in.read();
        }
    }
    That's the C.I.S.java, and this is my Youtube.java:

    import java.util.*;
    import java.net.*;
    import java.io.*;
    import javax.swing.JOptionPane;
     
    public class Youtube{
       public static void main(String args[]){
     
    	  String video;
          String comment;
          int number = 0;
          InputStream is = null;
          String line;
          URL page;
          boolean check = false;
          int namepos1;
          int namepos2;
          int reply;
          List<String> names = new ArrayList<String>();   
     
          //user inputs
          video = JOptionPane.showInputDialog("Please Enter The First Youtube Comment Page URL:", "http://www.youtube.com/all_comments?v=******");
     
          String Scomments = JOptionPane.showInputDialog("Please Enter The Total Number Of Comments:", "500");
     
          //calculates number of comment pages
          double comments = Double.parseDouble(Scomments);
     
          int pages = (int) Math.ceil( comments /  500d );
          number++;      
          comment = video + "&page=" + number;
     
          //goes through each page grabbing the user names and adding unique names to list
             try{
               while(number <= pages){
               	    comment = video + "&page=" + number;
            	    number++;                       
                    page = new URL(comment);                
                    is = page.openStream(); 
                    BufferedReader br = new BufferedReader(new InputStreamReader(is));
     
     
                    while ((line = br.readLine()) != null){       
                        if(check){
                        namepos1 = line.indexOf("yt-user-name ");
                        namepos2 = line.lastIndexOf("<");
                        reply = line.indexOf("in reply to");
     
                        if(namepos1 > 0 && namepos2 > 0 && reply < 0){
                           int back=line.lastIndexOf("<");
                           int front=line.indexOf("yt-user-name ") + 25;
                           String aaa=line.substring(front , back);
                           if(!names.contains(aaa)) names.add(aaa);  
                           System.out.println("Found User Name");
                        }else{}
     
                        check = false;                                       
                        }                    
                        if(line.indexOf("author ") != -1) check = true;
     
                      } 
                      is.close();
         } 
     
               //finished finding
               System.out.println("Done. Loading Unique User Names...");
     
              }catch (MalformedURLException mue) {
                    mue.printStackTrace();
              } catch (IOException ioe) {
                   ioe.printStackTrace();
              }catch(Exception e){
                   e.printStackTrace();
             }
                 //prints list
                System.out.println("\n \n");
                System.out.println("List Of Unique Commenter User Names:");
                for(String name: names){
                   System.out.println(name);
                }
     
              //generates winner
           Random r = new Random();
           int winner = r.nextInt(names.size());
           JOptionPane.showMessageDialog(null, "The Random Commenter User Name Is:   " + names.get(winner), "Random Commenter", JOptionPane.PLAIN_MESSAGE );
           System.out.println("\n\n\nThe Random Commenter User Name Is:   " + names.get(winner));
     
     
     }
     
    }

    how and where do i implement the method read() and do the input streams have to be the same?
    Last edited by onlyhereonce; July 8th, 2012 at 07:40 AM.

  16. #14
    Member
    Join Date
    Jul 2012
    Posts
    119
    Thanks
    0
    Thanked 19 Times in 19 Posts

    Default Re: ChunkedInputStream Error? Help Please

    Onlyoncehere
    Gosh you've question....
    Your code is incomplete this piece here works only with a displayable text content. It runs immediately into troubles if the content is a zipped or chunked data. Reason: The br.readLine() may bump into a large chunk of garbage (that can cause your computer behaves like a maniac ) if and only if it gets consecutively 2 bytes that happen to be \r, \n. Otherwise.... weird exception comes out nowhere...
    page = new URL(comment);                
    is = page.openStream(); 
    BufferedReader br = new BufferedReader(new InputStreamReader(is));
    must be "scrutinized" and expanded. You implement 2 classes:
    * public class ChunkedInputStream (see the code I gave you)
    * your above-mentioned piece must be modified...as this example
    page = new URL(comment); 
    URLConnection ucon = page.openConnection();
    String xtype = null;
    for(int i=0;;++i) {
          xtype = ucon.getHeaderField(i);
          if (xtype == null || xtype.toLowerCase().indexOf("transfer-encoding:") > 0) break;
    }
    if (xtype == null || xtype.toLowerCase().indexOf("chunked") == -1) is = page.openStream();
    else is = new ChunkedInputStream(ucon.getInputStream());
    BufferedReader br = new BufferedReader(new InputStreamReader((InputStream)is));
    Just an Idea. I haven't check for its validity.

  17. #15
    Member
    Join Date
    Jul 2012
    Posts
    119
    Thanks
    0
    Thanked 19 Times in 19 Posts

    Default Re: ChunkedInputStream Error? Help Please

    ...sorry for a
    instead of
    if (xtype == null || xtype.toLowerCase().indexOf("transfer-encoding:") > 0) break;
    it should be
    if (xtype == null || xtype.toLowerCase().indexOf("chunked") > 0) break;

Tags for this Thread