Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 13 of 13

Thread: Does the Scanner class use an internal buffer?

  1. #1
    Member
    Join Date
    Oct 2012
    Posts
    32
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default Does the Scanner class use an internal buffer?

    This is actually a pretty simple question that I'm surprised I'm unable to find the answer to. Recently, I was advised to use a Scanner to read a large file in tokens, rather than using a BufferedReader and then splitting the Strings it puts out. While this worked wonders for my code, I've found the Scanner to work much more slowly and I suspect the problem is its lack of a buffer... Which it may or may not have. I've found conflicting evidence on the 'net.

    So here's my question: Does Scanner use a buffer, if so how large and how can I mess with it? If not, is there any way to get the performance of a BufferedReader with the per-token reading of a Scanner? Because at this point, it seems faster to use a BufferedReader and do String.split than it is to use the Scanner for basically the same task.

    For context, the code loading the file looks like this:

    private void LoadComparisonList(File comparisonFile)
    	{
    		String readLine;
    		String[] readArray = new String[2];
     
    		if(!comparisonFile.exists() || !comparisonFile.isFile() || !comparisonFile.canRead()
    				|| comparisonFile.length() == 0)
    		{
    			this.errorcode = 2;
    			return;
    		}
     
    		try
    		{
    			Scanner scanner = new Scanner(comparisonFile);
    			scanner.nextLine();
     
    			do
    			{
    				readArray[0] = scanner.next();
    				readArray[1] = scanner.nextLine();
    				this.comparisonList.add(readArray);
    			}
    			while(scanner.hasNext());
     
    			scanner.close();
    		}
    		catch (IOException exception)
    		{
    			this.errorcode = 2;
    		}
    	}
    }

    This reads and discards the file's header line, which is just column names, and then proceeds to read the first element of the column to save as a header and everything else (another five tokens) as a footer, saving everything into an ArrayList<String[]>. It works just fine, it just takes ~27 seconds for a 90MB, ~5 000 000 line file where virtually the same action done through a BufferedReader completes in around 4-5, if I remember correctly.


  2. #2
    Super Moderator curmudgeon's Avatar
    Join Date
    Aug 2012
    Posts
    1,130
    My Mood
    Cynical
    Thanks
    64
    Thanked 139 Times in 134 Posts

    Default Re: Does the Scanner class use an internal buffer?

    I don't know the answer, but if it isn't specified in the API or JLS, it may be undefined and JVM dependent.

    Edit:
    Google has given me more info:

    Does a Java Scanner implicitly create a buffer even if you do not pass it one? - Stack Overflow

  3. #3
    Crazy Cat Lady KevinWorkman's Avatar
    Join Date
    Oct 2010
    Location
    Washington, DC
    Posts
    5,180
    My Mood
    Hungover
    Thanks
    141
    Thanked 602 Times in 517 Posts

    Default Re: Does the Scanner class use an internal buffer?

    Your jdk folder should contain src.zip. Check that out, find the Scanner class, and you can see for yourself exactly what it does. I'd be curious to see what you find.
    Useful links: How to Ask Questions the Smart Way | Use Code Tags | Java Tutorials
    Static Void Games - Play indie games, learn from game tutorials and source code, upload your own games!

  4. #4
    Member
    Join Date
    Oct 2012
    Posts
    32
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default Re: Does the Scanner class use an internal buffer?

    I can't seem to find that on my system, I'm afraid. I checked the Java folder and the Eclipse folder but I can't find a source file. I'm sure it has to exist somewhere for my system to know what's in the individual classes, I just can't find it. Even Eclipse doesn't seem able to find it, since every time it uses one of the base classes when debugging, I just get a "Source code not available" screen. It's why I had to weed out those steps when debugging. I'm not sure I'd be able to understand what the classes do even if I did find the source, to be honest. I'm not that good at Java.

  5. #5
    Crazy Cat Lady KevinWorkman's Avatar
    Join Date
    Oct 2010
    Location
    Washington, DC
    Posts
    5,180
    My Mood
    Hungover
    Thanks
    141
    Thanked 602 Times in 517 Posts

    Default Re: Does the Scanner class use an internal buffer?

    It depends on how your system is setup, but for example my JDK folder is:

    C:\Program Files\Java\jdk1.7.0_07

    In that directory, I have a src.zip, and inside that I have a Scanner.java file (inside java/util within the zip).

    But I will say that the short answer to your question, from reading the source, is that Scanner uses a CharBuffer internally.
    Useful links: How to Ask Questions the Smart Way | Use Code Tags | Java Tutorials
    Static Void Games - Play indie games, learn from game tutorials and source code, upload your own games!

  6. #6
    Member
    Join Date
    Oct 2012
    Posts
    32
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default Re: Does the Scanner class use an internal buffer?

    Yeah, I checked my Java folder, but all it has is JRE folders for Java 6 and 7 (I'm still using 6, by the way, legacy stuff). No JDK. As far as Windows 7 can be trusted, I ran a search just in case I wasn't aware of where my Java folders were, but no src.zip turned up.

  7. #7
    Crazy Cat Lady KevinWorkman's Avatar
    Join Date
    Oct 2010
    Location
    Washington, DC
    Posts
    5,180
    My Mood
    Hungover
    Thanks
    141
    Thanked 602 Times in 517 Posts

    Default Re: Does the Scanner class use an internal buffer?

    Quote Originally Posted by Fazan View Post
    Yeah, I checked my Java folder, but all it has is JRE folders for Java 6 and 7 (I'm still using 6, by the way, legacy stuff). No JDK. As far as Windows 7 can be trusted, I ran a search just in case I wasn't aware of where my Java folders were, but no src.zip turned up.
    Strange. What jdk are you using?

    Either way, you can download the source either as part of the full JDK or standalone from this page: Java SE Downloads
    Useful links: How to Ask Questions the Smart Way | Use Code Tags | Java Tutorials
    Static Void Games - Play indie games, learn from game tutorials and source code, upload your own games!

  8. #8
    Member
    Join Date
    Oct 2012
    Posts
    32
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default Re: Does the Scanner class use an internal buffer?

    At the risk of revealing that I'm not entirely certain I know what a Java Development Kit is, I use Eclipse for code-writing purposes and that's about the extent of it. Beyond that, my only other Java-related software are the JRE packages, and even then I only have 6 and 7.

    Either way, thank you for the link. I'll ook into it.

  9. #9
    Crazy Cat Lady KevinWorkman's Avatar
    Join Date
    Oct 2010
    Location
    Washington, DC
    Posts
    5,180
    My Mood
    Hungover
    Thanks
    141
    Thanked 602 Times in 517 Posts

    Default Re: Does the Scanner class use an internal buffer?

    Quote Originally Posted by Fazan View Post
    At the risk of revealing that I'm not entirely certain I know what a Java Development Kit is, I use Eclipse for code-writing purposes and that's about the extent of it. Beyond that, my only other Java-related software are the JRE packages, and even then I only have 6 and 7.

    Either way, thank you for the link. I'll ook into it.
    Okay, gotcha. I assume that eclipse has its own JDK tucked away somewhere, but a JDK is simply the set of tools that compile your java code into bytecode (namely the javac tool). Compare that to the JRE, which is the set of tools that run compiled bytecode. Eclipse might hide it, but behind the scenes it has to be using a JDK, even if it's not installed where you'd expect. Most people install a JDK like I have above so that they can compile with the command prompt. The source comes with the "real" (non-eclipse) JDK, which is another reason it's a good thing to have.
    Useful links: How to Ask Questions the Smart Way | Use Code Tags | Java Tutorials
    Static Void Games - Play indie games, learn from game tutorials and source code, upload your own games!

  10. #10
    Super Moderator helloworld922's Avatar
    Join Date
    Jun 2009
    Posts
    2,896
    Thanks
    23
    Thanked 619 Times in 561 Posts
    Blog Entries
    18

    Default Re: Does the Scanner class use an internal buffer?

    As far as I know the Scanner class is a convenient Regex-wrapper for any incoming stream. What stream you pass to it will determine if the file is buffered in memory or not.

    Scanner buffed_reader = new Scanner(new BufferedReader(new FileReader("file.txt"))); // file gets buffered into memory, usually faster
    Scanner unbuffed_reader = new Scanner(new FileReader("file.txt")); // file isn't buffered into memory

    That being said, I don't know what the behavior is if you pass a file to the Scanner object directly (though it sounds like it isn't).

    It's also possible that it's the Regex side which is slowing your application down. If all you need is basic read-line stuff Scanners may be overkill. Scanners work best when you're parsing the data while your reading it in (such as reading in a file of numbers).
    Last edited by helloworld922; October 9th, 2012 at 12:28 PM.

  11. #11
    Crazy Cat Lady KevinWorkman's Avatar
    Join Date
    Oct 2010
    Location
    Washington, DC
    Posts
    5,180
    My Mood
    Hungover
    Thanks
    141
    Thanked 602 Times in 517 Posts

    Default Re: Does the Scanner class use an internal buffer?

    Quote Originally Posted by helloworld922 View Post
    As far as I know the Scanner class is a convenient Regex-wrapper for any incoming stream. What stream you pass to it will determine if the file is buffered in memory or not.

    Scanner buffed_reader = new Scanner(new BufferedReader(new FileReader("file.txt"))); // file gets buffered into memory, usually faster
    Scanner unbuffed_reader = new Scanner(new FileReader("file.txt")); // file isn't buffered into memory

    That being said, I don't know what the behavior is if you pass a file to the Scanner object directly (though it sounds like it isn't).

    It's also possible that it's the Regex side which is slowing your application down. If all you need is basic read-line stuff Scanners may be overkill. Scanners work best when you're parsing the data while your reading it in (such as reading in a file of numbers).
    From the source, the Scanner(File) constructor wraps the File in a FileInputStream:
        public Scanner(File source) throws FileNotFoundException {
            this((ReadableByteChannel)(new FileInputStream(source).getChannel()));
        }

    Which is converted into a Readable:
        public Scanner(ReadableByteChannel source) {
            this(makeReadable(Objects.requireNonNull(source, "source")),
                 WHITESPACE_PATTERN);
        }
    And finally passed into the main constructor:
        private Scanner(Readable source, Pattern pattern) {
            assert source != null : "source should not be null";
            assert pattern != null : "pattern should not be null";
            this.source = source;
            delimPattern = pattern;
            buf = CharBuffer.allocate(BUFFER_SIZE);
            buf.limit(0);
            matcher = delimPattern.matcher(buf);
            matcher.useTransparentBounds(true);
            matcher.useAnchoringBounds(false);
            useLocale(Locale.getDefault(Locale.Category.FORMAT));
        }
    ...which creates the CharBuffer. The CharBuffer is used in the base method for reading input:

        private void readInput() {
            if (buf.limit() == buf.capacity())
                makeSpace();
     
            // Prepare to receive data
            int p = buf.position();
            buf.position(buf.limit());
            buf.limit(buf.capacity());
     
            int n = 0;
            try {
                n = source.read(buf);
            } catch (IOException ioe) {
                lastException = ioe;
                n = -1;
            }
     
            if (n == -1) {
                sourceClosed = true;
                needInput = false;
            }
     
            if (n > 0)
                needInput = false;
     
            // Restore current position and limit for reading
            buf.limit(buf.position());
            buf.position(p);
        }
    Useful links: How to Ask Questions the Smart Way | Use Code Tags | Java Tutorials
    Static Void Games - Play indie games, learn from game tutorials and source code, upload your own games!

  12. The Following 2 Users Say Thank You to KevinWorkman For This Useful Post:

    curmudgeon (October 9th, 2012), helloworld922 (October 9th, 2012)

  13. #12
    Super Moderator helloworld922's Avatar
    Join Date
    Jun 2009
    Posts
    2,896
    Thanks
    23
    Thanked 619 Times in 561 Posts
    Blog Entries
    18

    Default Re: Does the Scanner class use an internal buffer?

    I suppose I should have performed benchmarks before making assumptions

    import java.io.BufferedOutputStream;
    import java.io.BufferedReader;
    import java.io.FileNotFoundException;
    import java.io.FileOutputStream;
    import java.io.FileReader;
    import java.io.IOException;
    import java.io.PrintStream;
    import java.util.Scanner;
     
    public class StreamTest
    {
    	public static void gen_file(String file_name, int lines) throws FileNotFoundException
    	{
    		PrintStream out = new PrintStream(new BufferedOutputStream(new FileOutputStream(file_name)));
    		for (int i = 0; i < lines; ++i)
    		{
    			out.print("header,");
    			for (int j = 0; j < 5; ++j)
    			{
    				out.print(i * 0.33f + j + ",");
    			}
    			out.println();
    		}
    		out.close();
    	}
     
    	public static void main(String[] args) throws IOException
    	{
    		String file_name = "data.txt";
    		gen_file(file_name, 100000);
    		Scanner scan;
    		BufferedReader read;
     
    		int times = 10;
    		long scan_times[] = new long[times];
    		long buff_times[] = new long[times];
    		System.out.println("scanner test");
    		for (int i = 0; i < times; ++i)
    		{
    			scan = new Scanner(new BufferedReader(new FileReader(file_name)));
    			scan.useDelimiter(",");
    			long start_time = System.currentTimeMillis();
    			while (scan.hasNextLine())
    			{
    				// skip header
    				scan.next();
    				// read 5 doubles
    				scan.nextDouble();
    				scan.nextDouble();
    				scan.nextDouble();
    				scan.nextDouble();
    				scan.nextDouble();
    				// finish line
    				scan.nextLine();
    			}
    			long end_time = System.currentTimeMillis();
    			scan_times[i] = end_time - start_time;
    			System.out.println(scan_times[i]);
    			scan.close();
    		}
     
    		System.out.println("buffered reader test");
    		for (int i = 0; i < times; ++i)
    		{
    			read = new BufferedReader(new FileReader(file_name));
    			long start_time = System.currentTimeMillis();
    			String line;
    			while ((line = read.readLine()) != null)
    			{
    				String[] split = line.split(",");
    				// convert items to doubles
    				Double.parseDouble(split[1]);
    				Double.parseDouble(split[2]);
    				Double.parseDouble(split[3]);
    				Double.parseDouble(split[4]);
    				Double.parseDouble(split[5]);
    			}
    			long end_time = System.currentTimeMillis();
    			buff_times[i] = end_time - start_time;
    			System.out.println(buff_times[i]);
    			read.close();
    		}
    	}
    }

    This code does creates a reader stream, reads in a line which consists of a header and 5 doubles, all separated by commas. It also generates a file to run the test with.

    Results:

     
    scanner test
    3162
    2855
    2787
    2800
    2799
    2797
    2760
    2755
    2779
    2756
    buffered reader test
    209
    128
    112
    108
    110
    108
    108
    108
    107
    107




    So it is looks like it is indeed true that Scanner's are significantly slower than BufferedReaders, even after buffering.

  14. #13
    Member
    Join Date
    Oct 2012
    Posts
    32
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default Re: Does the Scanner class use an internal buffer?

    Quote Originally Posted by KevinWorkman View Post
    Okay, gotcha. I assume that eclipse has its own JDK tucked away somewhere, but a JDK is simply the set of tools that compile your java code into bytecode (namely the javac tool). Compare that to the JRE, which is the set of tools that run compiled bytecode. Eclipse might hide it, but behind the scenes it has to be using a JDK, even if it's not installed where you'd expect. Most people install a JDK like I have above so that they can compile with the command prompt. The source comes with the "real" (non-eclipse) JDK, which is another reason it's a good thing to have.
    Thank you. I suspected, but it's been a while since I've used precise terminology (I'm not actually a native English speaker) so I'm not sure if I was ever clear on this. I'll see about getting the actual JRE, though I still prefer Eclipse. The ease of use of the editor's functions is too good to pass up. At this point I'm hooked on code-complete as a means of avoiding spelling errors (sloppy typist here).

    Quote Originally Posted by helloworld922 View Post
    As far as I know the Scanner class is a convenient Regex-wrapper for any incoming stream. What stream you pass to it will determine if the file is buffered in memory or not.

    Scanner buffed_reader = new Scanner(new BufferedReader(new FileReader("file.txt"))); // file gets buffered into memory, usually faster
    Scanner unbuffed_reader = new Scanner(new FileReader("file.txt")); // file isn't buffered into memory

    That being said, I don't know what the behavior is if you pass a file to the Scanner object directly (though it sounds like it isn't).
    Wait, I can pass a BufferedReader to a Scanner? But I thought its constructor expected a Stream child, not a Reader child? Yes, if I can pass a BufferedReader, then I'd expect the Scanner's behaviour to build on that of the underlying reader object, so I'd expect it to use said reader's buffer. I didn't know I could do that, though.

    Quote Originally Posted by helloworld922 View Post
    It's also possible that it's the Regex side which is slowing your application down. If all you need is basic read-line stuff Scanners may be overkill. Scanners work best when you're parsing the data while your reading it in (such as reading in a file of numbers).
    I'm not just reading the data, I need to split it into tokens. Each row comes in six parts - one header and five number columns. Originally, I was using a BufferedReader and then post-splitting and converting the coming Strings, which really isn't a good idea. After having my post moderated in another thread, it gave me the idea of using a StringTokenizer, instead. I haven't read up enough on it to know for certain, but this seems like an easier way to get tokens out of a file reader than a Scanner, which does indeed seem to want to parse my data. I suspect that might be faster, though I don't know if the StringTokenizer itself won't reintroduce some slowdown.

    In any event, the previous memory problem and the Scanner's slowdown forced me to admit defeat and move the software to our server so it can use more operating memory. It's either that or take ages to accomplish anything.

Similar Threads

  1. use Delimiter, Scanner Class
    By izzahmed in forum What's Wrong With My Code?
    Replies: 12
    Last Post: October 25th, 2011, 05:27 PM
  2. I am Facing problem in Scanner Class
    By snithishkumar in forum Java SE APIs
    Replies: 2
    Last Post: October 11th, 2011, 09:39 AM
  3. Scanner class error "java.lang.Error"
    By Lheviathan in forum What's Wrong With My Code?
    Replies: 2
    Last Post: September 21st, 2009, 02:23 AM
  4. [SOLVED] Problem in Coin-counter with scanner class
    By coccoster in forum File I/O & Other I/O Streams
    Replies: 6
    Last Post: March 25th, 2009, 08:46 AM
  5. Replies: 1
    Last Post: May 13th, 2008, 08:08 AM