Go Back   Java Programming Forums > Java Standard Edition Programming Help > Java SE APIs


Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 07-11-2009, 08:58 PM
Junior Member
 

Join Date: Nov 2009
Posts: 2
Thanks: 1
Thanked 0 Times in 0 Posts
zort is on a distinguished road
Default using Scanner for 75mb file

Hello,

I am trying to parse a 75mb .list /.txt file to the screen first then eventually to the DB if it ever works.

I am trying to use Scanner and it stops readinglines after 9-10 lines and the application commences properly no crash no nothing. The file i m trying to read is 100000 line-long IMDB rating.list file.

Should i do some memory management or smth?

Thanks in advance.

Part of my code;
Java Code
...
 try {
            scanner.findWithinHorizon("Title", 0);
            while (scanner.hasNextLine()) {
                String nextLine = scanner.nextLine();
                System.out.println(nextLine);
                if (!isValid(nextLine)) {
                    continue;
                }
                processLine(nextLine);
            }
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            scanner.close();
        }
...
//For each line i use another scanner
 private void processLine(String aLine) {
        lineScanner = new Scanner(aLine);

        lineScanner.useDelimiter("\\s{2,}");

       //process the line...

}




Last edited by helloworld922; 07-11-2009 at 09:01 PM.
Reply With Quote Share this thread on Facebook
Sponsored Links
Java Training from DevelopIntelligence
  #2 (permalink)  
Old 07-11-2009, 09:12 PM
helloworld922's Avatar
Super Moderator
 

Join Date: Jun 2009
Posts: 1,215
Thanks: 5
Thanked 258 Times in 234 Posts
helloworld922 will become famous soon enoughhelloworld922 will become famous soon enoughhelloworld922 will become famous soon enough
Default Re: using Scanner for 75mb file

For such a long file, I'd recommend NOT displaying everything you parse to the screen.

For example, try this:

Java Code
for (int i = 0; i < 1000000; i++)
{
     for (int j = 0; j < 1000000; j++)
     {
          System.out.println(i*j);
     }
}
vs. this:

Java Code
for (int i = 0; i < 1000000; i++)
{
     for (int j = 0; j < 1000000; j++)
     {
          i*j;
     }
}
The second code will finish many times faster because printing stuff out is extremely slow.

To test if your algorithm is working, I'd recommend taking the first 10 lines or so of your file and then test it with that (with the screen output in place). If that works, then remove the screen output code and process the larger file.

If you have a computer that was made at least in the 2000's or newer, you'll probably have ~512MB to 3GB of memory, plenty to deal with your file (I once tried to allocate an array of size 2000000 and it succeeded)
__________________
ASCII a question .. Get an ANSI

Please surround your code with [highlight=Java]code goes here[/highlight].
Reply With Quote
The Following User Says Thank You to helloworld922 For This Useful Post:
zort (07-11-2009)
  #3 (permalink)  
Old 07-11-2009, 09:39 PM
copeg's Avatar
Moderator
 
9 Highscores

Join Date: Oct 2009
Posts: 570
Thanks: 7
Thanked 131 Times in 125 Posts
copeg will become famous soon enoughcopeg will become famous soon enough

I'm feeling Sleepy
Default Re: using Scanner for 75mb file

Depending upon if and how you are reading the data into memory, you may also need to set the maximum JVM memory (although based upon your description this may not be the problem - you should see an OutOfMemoryException). Just add something like -Xmx512m or -Xmx1g on the command line to safeguard against memory exceptions

Last edited by copeg; 07-11-2009 at 09:41 PM.
Reply With Quote
  #4 (permalink)  
Old 08-11-2009, 09:04 AM
Junior Member
 

Join Date: Nov 2009
Posts: 2
Thanks: 1
Thanked 0 Times in 0 Posts
zort is on a distinguished road
Default Re: using Scanner for 75mb file

Thank you for your quick replies. I ll try to exlain further my problem.

When i run the code for a testfile.txt, it reads all of its 90 lines.

Movie title : The Shawshank Redemption
Movie title : The Godfather
Movie title : The Godfather: Part II
Movie title : Il buono, il brutto, il cattivo.
Movie title : Pulp Fiction
Movie title : Schindler's List
...
Movie title : Who Made the Potatoe Salad?
Movie title : Who Makes Movies?
BUILD SUCCESSFUL (total time: 0 seconds)


If I try it with the imdb file (100000 lines), it stops reading after 10 lines.

...
0000000124 335002 8.7 Fight Club (1999)
0000000124 63810 8.7 C'era una volta il West (1968)

BUILD SUCCESSFUL (total time: 0 seconds)
...



And actually in certain occasions it stops reading in the middle of a long line.
...
0000000124 335002 8.7 Fight Club (1999)
0000000124 349139 8.7 The Lord of the Rings: The Fellowshi
java.lang.ArrayIndexOutOfBoundsException: 1
// out of bounds occurs when it tries to process a incomplete line.
...





I understand the screen printing issue. But i would understand it better if it would crash trying to print those numoerous line.
I suspect it has smth to the with the file size being huge. how can i make it crash at least?
Reply With Quote
  #5 (permalink)  
Old 08-11-2009, 03:15 PM
copeg's Avatar
Moderator
 
9 Highscores

Join Date: Oct 2009
Posts: 570
Thanks: 7
Thanked 131 Times in 125 Posts
copeg will become famous soon enoughcopeg will become famous soon enough

I'm feeling Sleepy
Default Re: using Scanner for 75mb file

That last exception "java.lang.ArrayIndexOutOfBoundsException" says a lot, especially if those are not being caught. There is possibly something in your processLine function that is the culprit
Reply With Quote
  #6 (permalink)  
Old 09-11-2009, 08:20 AM
Json's Avatar
Super Moderator
 

Join Date: Jul 2009
Location: Manchester, United Kingdom
Posts: 1,157
Thanks: 54
Thanked 136 Times in 132 Posts
Json will become famous soon enoughJson will become famous soon enoughJson will become famous soon enough

I'm feeling Happy
Default Re: using Scanner for 75mb file

So are you reading each line and passing that line to the scanner?

// Json
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



Similar Threads
Thread Thread Starter Forum Replies Last Post
Scanner vs BufferedReader? Bill_H File I/O & Other I/O Streams 11 27-10-2009 01:44 PM
Reading many files using a scanner jayjames90 File I/O & Other I/O Streams 2 22-10-2009 09:35 PM
Help With Scanner jtphenom File I/O & Other I/O Streams 1 13-10-2009 01:49 AM
network scanner vivek494818 Java Networking 0 18-08-2009 04:07 AM
How to Read a file line by line using the Scanner class JavaPF Java Code Snippets and Tutorials 0 17-04-2009 12:34 PM


100 most searched terms
Search Cloud
2 dimensional arraylist java 2d arraylist java actionlistener actionlistener in java addactionlistener addactionlistener java convert double to integer java double format java double to integer in java double to integer java drag en drop programmeren java eclipse shortcut keys exception in thread "awt-eventqueue-0" java.lang.outofmemoryerror: java heap space exception in thread "main" java.lang.nullpointerexception exception in thread "main" java.lang.outofmemoryerror: java heap space format double in java format double java get mouse position java java 2d arraylist java actionlistener java double format java double formatting java double to int java double to integer java format double java forum java forums java get mouse position java list to map java mouse position java programming forum java programming forums java programming practice problems java send keystrokes to another application java two dimensional arraylist java.io.ioexception: premature eof java.lang.classformaterror: truncated class file java.lang.outofmemoryerror: java heap space java.util.arraylist jbutton action jbutton actionlistener jtextarea font jtextfield font size jxl.read.biff.biffexception: unable to recognize ole stream programming mutators and generics smack api two dimensional arraylist two dimensional arraylist java unable to sendviapost to url what is the smallest positive number that is evenly divisible by all of the numbers from 1 to 20?

All times are GMT. The time now is 01:51 AM.
Powered by vBulletin® Copyright ©2000-2009, Jelsoft Enterprises Ltd.