ChunkedInputStream Error? Help Please
Hey guys,
I have made an application that reads of a website and parses it's HTML, however when i go through more than one page or so(maybe 2-3) its throws out an error. This only happens when i parse lots of pages at once...
This is the Error:
Code Java:
java.io.IOException: Premature EOF
at sun.net.[url]www.http.ChunkedInputStream.readAheadBlocking(Unknown[/url] Source)
at sun.net.[url]www.http.ChunkedInputStream.readAhead(Unknown[/url] Source)
at sun.net.[url]www.http.ChunkedInputStream.read(Unknown[/url] Source)
at java.io.FilterInputStream.read(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(Unknown Source)
at sun.nio.cs.StreamDecoder.readBytes(Unknown Source)
at sun.nio.cs.StreamDecoder.implRead(Unknown Source)
at sun.nio.cs.StreamDecoder.read(Unknown Source)
at java.io.InputStreamReader.read(Unknown Source)
at java.io.BufferedReader.fill(Unknown Source)
at java.io.BufferedReader.readLine(Unknown Source)
at java.io.BufferedReader.readLine(Unknown Source)
at Youtube1.main(Youtube1.java:40)
Line 40 is:
If you know how to help and would like to i can PM you the full code.
Thanks for the help guys!
PS.
I'm kinda new to java and this is my first project.
Re: ChunkedInputStream Error? Help Please
We need to see more code. A "Premature EOF" (end of file) means that the file ended (for example, the sender stopped sending it) before the reciever (your code) expected it.
Can you show a SSCCE?
Re: ChunkedInputStream Error? Help Please
I'll post the whole code, here:
Code java:
import java.util.*;
import java.net.*;
import java.io.*;
import javax.swing.JOptionPane;
public class Youtube{
public static void main(String args[]){
String video;
String comment;
int number = 0;
InputStream is = null;
String line;
URL page;
boolean check = false;
int namepos1;
int namepos2;
int reply;
List<String> names = new ArrayList<String>();
video = JOptionPane.showInputDialog("Please Enter The First Youtube Comment Page URL:", "http://www.youtube.com/all_comments?v=******");
String Scomments = JOptionPane.showInputDialog("Please Enter The Total Number Of Comments:", "500");
double comments = Double.parseDouble(Scomments);
int pages = (int) Math.ceil( comments / 500d );
number++;
comment = video + "&page=" + number;
try{
while(number <= pages){
comment = video + "&page=" + number;
number++;
page = new URL(comment);
is = page.openStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
while ((line = br.readLine()) != null){
if(check){
namepos1 = line.indexOf("yt-user-name ");
namepos2 = line.lastIndexOf("<");
reply = line.indexOf("in reply to");
if(namepos1 > 0 && namepos2 > 0 && reply < 0){
int back=line.lastIndexOf("<");
int front=line.indexOf("yt-user-name ") + 25;
String aaa=line.substring(front , back);
if(!names.contains(aaa)) names.add(aaa);
System.out.println("Found Username");
}else{}
check = false;
}
if(line.indexOf("author ") != -1) check = true;
}
is.close();
}
System.out.println("Done. Loading Unique UserNames...");
}catch (MalformedURLException mue) {
mue.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
}catch(Exception e){
e.printStackTrace();
}
System.out.println("\n \n");
System.out.println("List Of Unique Commenter Usernames:");
for(String name: names){
System.out.println(name);
}
Random r = new Random();
int winner = r.nextInt(names.size());
JOptionPane.showMessageDialog(null, "The Random Commenter Username Is: " + names.get(winner), "Random Commenter", JOptionPane.PLAIN_MESSAGE );
System.out.println("\n\n\nThe Random Commenter Username Is: " + names.get(winner));
}
}
Basically You enter a Youtube comment page URL, and tell the program how many comments you want it to go through.
Then it'll grab all the usernames and add them to a list except duplicates. Then generate a random winner.
- for youtube give aways.
500 comments per page so i can run it up to 1000 sometimes 1500 but after that its pauses for ages and gives me an error.
Re: ChunkedInputStream Error? Help Please
I did some research on chunked Transfer encoding and some one said the error is related to it, but i cant fit it together. Chunked transfer encoding - Wikipedia, the free encyclopedia
Re: ChunkedInputStream Error? Help Please
Geez, if I was YouTube I'd kick your program out too. I'm surprised you got this far. The problem is that you're not talking HTTP - you're thinking in terms of a raw Socket stream. You can't just keep requesting data and expect the server to give it to you. You've got to follow the protocol. Additionally, you're making your life horrible by trying to parse HTML through code.
Lastly, you're breaking the YouTube terms of service:
Quote:
You agree not to access Content through any technology or means other than the video playback pages of the Service itself, the Embeddable Player, or other explicitly authorized means YouTube may designate.
So, where do you go from here? Your best bet is to use the Google Java client library that includes a small YouTube client to access the data. This hides many of the details of dealing with the HTTP protocol and doesn't violate any terms of service.
In short - I'd rethink your design. Ultimately I'm guessing that YouTube has something to govern the kind of abuse you're throwing at their servers and/or there is something HTTP is telling you that you're ignoring. This may be a Cookie, a redirect, or something else.
Re: ChunkedInputStream Error? Help Please
Firstly Youtube wouldn't carry out any action against me if i made this public. If they really cared we wouldn't see any of these mp3 scrapers.
Plus if they did they would probably give me notice to take it down first, which i would do.
Additionally this is just replicating what a human could do, but making it faster and easier.
And also, I'm not really in looking into the Youtube API, but are you saying this is the only way to do this as its the correct 'protocol'?
If not what is the correct protocol and how can i adapt my current code to make it follow it?
Sorry I'm new, and thanks for your help.
Re: ChunkedInputStream Error? Help Please
Hmmm, i looked into the API, Just one question, it only displays the comments off the first video page, using the video ID i can only get these comments. How would i display all the comments? or navigate to each page?
Thanks.
Re: ChunkedInputStream Error? Help Please
The message
"java.io.IOException: Premature EOF
at sun.net.http://www.http.ChunkedInputStream.r...ocking(Unknown Source)
...."
is thrown if the input stream is NOT a real ChunkedInputStream
A stream is called ChunkedInputStream always begins with a hex string telling about the size of the incoming data (chunk). When an ChunkedInputStream is initialized this Chunk-Size is read if an EOF (i.e. -1) is encountered before the string \r\n is read then Premature EOF is thrown.
The Format is: nnnn\r\ndata.
nnnn: hex value (Ascii, variable length
data: the incoming data (a chunk of the given size nnnn)
Re: ChunkedInputStream Error? Help Please
Quote:
Originally Posted by
onlyhereonce
I did some research on chunked Transfer encoding and some one said the error is related to it, but i cant fit it together
It fits because huge data (independent on YouTube) are usually sent in chunks (groups of data) so that a browser can cache them and avoids to hog the traffic. If you want to avoid this problem you have to catch the reply HTTP header and look at the keyword "Transfer-Encoding: xxxx" . If it says chunked the incoming data stream is chunked:
size\r\ndata <pause or so>size\r\ndata <pause>....EOF
where:size n hex-digits
Data: any format
EOF: -1
You can implement for yourself the ChunkedInputStream.java :cool: with 1 method read()
For example:
Code java:
public class ChunkedInputStream extends InputStream {
private boolean isahead = false;
private StringBuffer buf;
private InputStream in;
private int ahead = -1;
private int clen, ch;
private boolean eof;
//
public ChunkedInputStream(InputStream in) throws IOException {
this.in = in;
getChunk();
}
//
protected boolean getChunk() throws IOException {
buf = new StringBuffer();
while (true) {
ch = in.read();
if (ch == -1) throw new IOException("Premature end of chunked stream.");
else if (ch > 0x20 && ch < 0x7F) buf.append((char)ch);
else if (ch == '\n') break;
else if (ch == '\r') {
if ((ch = in.read()) != '\n') {
ahead = ch;
isahead = true;
}
break;
}
}
// Parse the buffer content as an hex number:
char c;
clen = 0;
int fac = 1;
int len = buf.length();
while (--len >= 0) {
c = buf.charAt(len);
ch = (c > '9') ? 9 + (c & 0x0F) : (c & 0x0F);
if (ch > 15) break;
clen += (fac * ch);
fac *= 16;
}
eof = (clen == 0);
return eof;
}
//
protected boolean nextChunk() throws IOException {
if (eof) return true;
ch = in.read(); // '\r'
ch = in.read(); // '\n'
return getChunk();
}
//
public int read() throws IOException {
if (clen == 0) {
if (nextChunk()) return -1;
if ( isahead ) {
--clen;
isahead = false;
return ahead;
}
}
--clen;
return in.read();
}
}
Re: ChunkedInputStream Error? Help Please
Thanks Voodoo! Check your pm please :)
Re: ChunkedInputStream Error? Help Please
onlyoncehere
For the sake of the community I explain the ChunkedInputStream codes here.
* ChunkedInputStream is an extend of the InputStream so that it inherits every NO-overwritten methods (e.g. skip(), read(bye[]), etc.) of InputStream.
* Now to the codes: A chunk has a format: nnnnCRLFdataCRLF (CRLF CarriageReturnLineFeed)
Method getChunk(): Here the incoming data (chunked data) are read byte-wise and stored only the ASCII characters in buf: if (ch > 0x20 && ch < 0x7F). And stop at the end of a chunk (terminated by \r\n or CRLF and ahead is set to true (for next possible chunk). If -1 is encountered before a chunk end PREMATURE EOF is thrown. The last part is the calculation (convert the size in Ascii-hex to an int) of the chunk-size. EOF is set if clen=0 (OR: no more data).
Code java:
protected boolean getChunk() throws IOException {
buf = new StringBuffer();
while (true) {
ch = in.read();
if (ch == -1) throw new IOException("Premature end of chunked stream.");
else if (ch > 0x20 && ch < 0x7F) buf.append((char)ch);
else if (ch == '\n') break;
else if (ch == '\r') {
if ((ch = in.read()) != '\n') {
ahead = ch;
isahead = true;
}
break;
}
}
// Parse the buffer content as an hex number:
char c;
clen = 0;
int fac = 1;
int len = buf.length();
while (--len >= 0) {
c = buf.charAt(len);
ch = (c > '9') ? 9 + (c & 0x0F) : (c & 0x0F);
if (ch > 15) break;
clen += (fac * ch);
fac *= 16;
}
eof = (clen == 0);
return eof;
}
//
The reply HTTP header Transfer-Encoding: chunked tells you when you have expect a chunked data stream so that you can "switch" your reading method accordingly. More you should consult the HtttpConnection class
:o
Re: ChunkedInputStream Error? Help Please
....sorry I was out for some errands :o
back to the method Method getChunk(). This method just read the Ascii-hex nnnn...nn in order to determine how big the expected chunk will be. The buf includes every character that begins after the blank (0x20) and ends before the tilde (~) or hex 0x7F. Theoretically you can accept only 0....9 A...F as following:
Code java:
if (ch >= '0' && ch <= '9' || ch >= 'A' && ch <= 'F' ) buf.append((char)ch);
* The method nextChunk() reads the next chunk size and is invoked internally at begin and within the basic method int read()
Code java:
protected boolean [B]nextChunk()[/B] throws IOException {
if (eof) return true;
ch = in.read(); // '\r'
ch = in.read(); // '\n'
return getChunk();
}
* The boolean ahead tells the read() method that a byte (or int) is already read. Otherwise it return the "original" in.read() and the chunk-size clen decrements.
* The reply HTTP header can be extracted by this method public String getHeaderField(int n) of Class HttpURLConnection
Hope you have enough info for your forthcoming.
Re: ChunkedInputStream Error? Help Please
Thanks for the help voodoo i kinda understand it more.
Quote:
Originally Posted by
Voodoo
Code java:
public class ChunkedInputStream extends InputStream {
private boolean isahead = false;
private StringBuffer buf;
private InputStream in;
private int ahead = -1;
private int clen, ch;
private boolean eof;
//
public ChunkedInputStream(InputStream in) throws IOException {
this.in = in;
getChunk();
}
//
protected boolean getChunk() throws IOException {
buf = new StringBuffer();
while (true) {
ch = in.read();
if (ch == -1) throw new IOException("Premature end of chunked stream.");
else if (ch > 0x20 && ch < 0x7F) buf.append((char)ch);
else if (ch == '\n') break;
else if (ch == '\r') {
if ((ch = in.read()) != '\n') {
ahead = ch;
isahead = true;
}
break;
}
}
// Parse the buffer content as an hex number:
char c;
clen = 0;
int fac = 1;
int len = buf.length();
while (--len >= 0) {
c = buf.charAt(len);
ch = (c > '9') ? 9 + (c & 0x0F) : (c & 0x0F);
if (ch > 15) break;
clen += (fac * ch);
fac *= 16;
}
eof = (clen == 0);
return eof;
}
//
protected boolean nextChunk() throws IOException {
if (eof) return true;
ch = in.read(); // '\r'
ch = in.read(); // '\n'
return getChunk();
}
//
public int read() throws IOException {
if (clen == 0) {
if (nextChunk()) return -1;
if ( isahead ) {
--clen;
isahead = false;
return ahead;
}
}
--clen;
return in.read();
}
}
That's the C.I.S.java, and this is my Youtube.java:
Code java:
import java.util.*;
import java.net.*;
import java.io.*;
import javax.swing.JOptionPane;
public class Youtube{
public static void main(String args[]){
String video;
String comment;
int number = 0;
InputStream is = null;
String line;
URL page;
boolean check = false;
int namepos1;
int namepos2;
int reply;
List<String> names = new ArrayList<String>();
//user inputs
video = JOptionPane.showInputDialog("Please Enter The First Youtube Comment Page URL:", "http://www.youtube.com/all_comments?v=******");
String Scomments = JOptionPane.showInputDialog("Please Enter The Total Number Of Comments:", "500");
//calculates number of comment pages
double comments = Double.parseDouble(Scomments);
int pages = (int) Math.ceil( comments / 500d );
number++;
comment = video + "&page=" + number;
//goes through each page grabbing the user names and adding unique names to list
try{
while(number <= pages){
comment = video + "&page=" + number;
number++;
page = new URL(comment);
is = page.openStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
while ((line = br.readLine()) != null){
if(check){
namepos1 = line.indexOf("yt-user-name ");
namepos2 = line.lastIndexOf("<");
reply = line.indexOf("in reply to");
if(namepos1 > 0 && namepos2 > 0 && reply < 0){
int back=line.lastIndexOf("<");
int front=line.indexOf("yt-user-name ") + 25;
String aaa=line.substring(front , back);
if(!names.contains(aaa)) names.add(aaa);
System.out.println("Found User Name");
}else{}
check = false;
}
if(line.indexOf("author ") != -1) check = true;
}
is.close();
}
//finished finding
System.out.println("Done. Loading Unique User Names...");
}catch (MalformedURLException mue) {
mue.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
}catch(Exception e){
e.printStackTrace();
}
//prints list
System.out.println("\n \n");
System.out.println("List Of Unique Commenter User Names:");
for(String name: names){
System.out.println(name);
}
//generates winner
Random r = new Random();
int winner = r.nextInt(names.size());
JOptionPane.showMessageDialog(null, "The Random Commenter User Name Is: " + names.get(winner), "Random Commenter", JOptionPane.PLAIN_MESSAGE );
System.out.println("\n\n\nThe Random Commenter User Name Is: " + names.get(winner));
}
}
how and where do i implement the method read() and do the input streams have to be the same?
Re: ChunkedInputStream Error? Help Please
Onlyoncehere
Gosh you've question....
Your code is incomplete :o this piece here works only with a displayable text content. It runs immediately into troubles if the content is a zipped or chunked data. Reason: The br.readLine() may bump into a large chunk of garbage (that can cause your computer behaves like a maniac ;))) if and only if it gets consecutively 2 bytes that happen to be \r, \n. Otherwise.... weird exception comes out nowhere...
must be "scrutinized" and expanded. You implement 2 classes:
* public class ChunkedInputStream (see the code I gave you)
* your above-mentioned piece must be modified...as this example
Code java:
page = new URL(comment);
URLConnection ucon = page.openConnection();
String xtype = null;
for(int i=0;;++i) {
xtype = ucon.getHeaderField(i);
if (xtype == null || xtype.toLowerCase().indexOf("transfer-encoding:") > 0) break;
}
if (xtype == null || xtype.toLowerCase().indexOf("chunked") == -1) is = page.openStream();
else is = new ChunkedInputStream(ucon.getInputStream());
BufferedReader br = new BufferedReader(new InputStreamReader((InputStream)is));
Just an Idea. I haven't check for its validity.
Re: ChunkedInputStream Error? Help Please
...sorry for a =:)
instead of
Code java:
if (xtype == null || xtype.toLowerCase().indexOf("transfer-encoding:") > 0) break;
it should be
Code java:
if (xtype == null || xtype.toLowerCase().indexOf("chunked") > 0) break;