Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 6 of 6

Thread: using Scanner for 75mb file

  1. #1
    Junior Member
    Join Date
    Nov 2009
    Posts
    2
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default using Scanner for 75mb file

    Hello,

    I am trying to parse a 75mb .list /.txt file to the screen first then eventually to the DB if it ever works.

    I am trying to use Scanner and it stops readinglines after 9-10 lines and the application commences properly no crash no nothing. The file i m trying to read is 100000 line-long IMDB rating.list file.

    Should i do some memory management or smth?

    Thanks in advance.

    Part of my code;
    ...
     try {
                scanner.findWithinHorizon("Title", 0);
                while (scanner.hasNextLine()) {
                    String nextLine = scanner.nextLine();
                    System.out.println(nextLine);
                    if (!isValid(nextLine)) {
                        continue;
                    }
                    processLine(nextLine);
                }
            } catch (Exception e) {
                e.printStackTrace();
            } finally {
                scanner.close();
            }
    ...
    //For each line i use another scanner
     private void processLine(String aLine) {
            lineScanner = new Scanner(aLine);
     
            lineScanner.useDelimiter("\\s{2,}");
     
           //process the line...
     
    }
    Last edited by helloworld922; November 7th, 2009 at 04:01 PM.


  2. #2
    Super Moderator helloworld922's Avatar
    Join Date
    Jun 2009
    Posts
    2,896
    Thanks
    23
    Thanked 619 Times in 561 Posts
    Blog Entries
    18

    Default Re: using Scanner for 75mb file

    For such a long file, I'd recommend NOT displaying everything you parse to the screen.

    For example, try this:

    for (int i = 0; i < 1000000; i++)
    {
         for (int j = 0; j < 1000000; j++)
         {
              System.out.println(i*j);
         }
    }

    vs. this:

    for (int i = 0; i < 1000000; i++)
    {
         for (int j = 0; j < 1000000; j++)
         {
              i*j;
         }
    }

    The second code will finish many times faster because printing stuff out is extremely slow.

    To test if your algorithm is working, I'd recommend taking the first 10 lines or so of your file and then test it with that (with the screen output in place). If that works, then remove the screen output code and process the larger file.

    If you have a computer that was made at least in the 2000's or newer, you'll probably have ~512MB to 3GB of memory, plenty to deal with your file (I once tried to allocate an array of size 2000000 and it succeeded)

  3. The Following User Says Thank You to helloworld922 For This Useful Post:

    zort (November 7th, 2009)

  4. #3
    Super Moderator copeg's Avatar
    Join Date
    Oct 2009
    Location
    US
    Posts
    5,217
    Thanks
    175
    Thanked 815 Times in 758 Posts
    Blog Entries
    5

    Default Re: using Scanner for 75mb file

    Depending upon if and how you are reading the data into memory, you may also need to set the maximum JVM memory (although based upon your description this may not be the problem - you should see an OutOfMemoryException). Just add something like -Xmx512m or -Xmx1g on the command line to safeguard against memory exceptions
    Last edited by copeg; November 7th, 2009 at 04:41 PM.

  5. #4
    Junior Member
    Join Date
    Nov 2009
    Posts
    2
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default Re: using Scanner for 75mb file

    Thank you for your quick replies. I ll try to exlain further my problem.

    When i run the code for a testfile.txt, it reads all of its 90 lines.

    Movie title : The Shawshank Redemption
    Movie title : The Godfather
    Movie title : The Godfather: Part II
    Movie title : Il buono, il brutto, il cattivo.
    Movie title : Pulp Fiction
    Movie title : Schindler's List
    ...
    Movie title : Who Made the Potatoe Salad?
    Movie title : Who Makes Movies?
    BUILD SUCCESSFUL (total time: 0 seconds)


    If I try it with the imdb file (100000 lines), it stops reading after 10 lines.

    ...
    0000000124 335002 8.7 Fight Club (1999)
    0000000124 63810 8.7 C'era una volta il West (1968)

    BUILD SUCCESSFUL (total time: 0 seconds)
    ...



    And actually in certain occasions it stops reading in the middle of a long line.
    ...
    0000000124 335002 8.7 Fight Club (1999)
    0000000124 349139 8.7 The Lord of the Rings: The Fellowshi
    java.lang.ArrayIndexOutOfBoundsException: 1
    // out of bounds occurs when it tries to process a incomplete line.
    ...





    I understand the screen printing issue. But i would understand it better if it would crash trying to print those numoerous line.
    I suspect it has smth to the with the file size being huge. how can i make it crash at least?

  6. #5
    Super Moderator copeg's Avatar
    Join Date
    Oct 2009
    Location
    US
    Posts
    5,217
    Thanks
    175
    Thanked 815 Times in 758 Posts
    Blog Entries
    5

    Default Re: using Scanner for 75mb file

    That last exception "java.lang.ArrayIndexOutOfBoundsException" says a lot, especially if those are not being caught. There is possibly something in your processLine function that is the culprit

  7. #6
    Super Moderator Json's Avatar
    Join Date
    Jul 2009
    Location
    Warrington, United Kingdom
    Posts
    1,274
    My Mood
    Happy
    Thanks
    70
    Thanked 156 Times in 152 Posts

    Default Re: using Scanner for 75mb file

    So are you reading each line and passing that line to the scanner?

    // Json

Similar Threads

  1. Scanner vs BufferedReader?
    By Bill_H in forum File I/O & Other I/O Streams
    Replies: 11
    Last Post: October 27th, 2009, 09:44 AM
  2. Reading many files using a scanner
    By jayjames90 in forum File I/O & Other I/O Streams
    Replies: 2
    Last Post: October 22nd, 2009, 04:35 PM
  3. Help With Scanner
    By jtphenom in forum File I/O & Other I/O Streams
    Replies: 1
    Last Post: October 12th, 2009, 08:49 PM
  4. network scanner
    By vivek494818 in forum Java Networking
    Replies: 0
    Last Post: August 17th, 2009, 11:07 PM
  5. Reading a file line by line using the Scanner class
    By JavaPF in forum File Input/Output Tutorials
    Replies: 0
    Last Post: April 17th, 2009, 07:34 AM