Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 9 of 9

Thread: Downloading files efficiently

  1. #1
    Member angstrem's Avatar
    Join Date
    Mar 2013
    Location
    Ukraine
    Posts
    200
    My Mood
    Happy
    Thanks
    9
    Thanked 31 Times in 29 Posts

    Default Downloading files efficiently

    I'm working on an application, who's task is to download and display pictures. I'm planning to track the progress of download and display it. Hence, I've written the following code:
            // java.net.URL imageUrl initialization here
            URLConnection connection = imageUrl.openConnection();
            try (BufferedInputStream is = new BufferedInputStream(connection.getInputStream())) {
                int size = connection.getContentLength(); // Total size of file, in bytes
                int chunk = size / 100;                   // We divide file in 100 parts
     
                ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); // Stream to write bytes to
                BufferedOutputStream os = new BufferedOutputStream(outputStream); // Wrapper to increase efficiency
     
                // Standard process of reading from input stream and writing to output stream
                byte[] buffer = new byte[chunk];
                int read;
                while ((read = is.read(buffer)) != -1) {
                    os.write(buffer, 0, read);
     
                    // Parts for tracking progress, it does not influence the download process
                    if(model != null) model.setOperationProgress(model.getOperationProgress() + 1);
                }
     
                // Converting the downloaded bytes to BufferedImage and then returning it
                ByteArrayInputStream stream = new ByteArrayInputStream(outputStream.toByteArray());
                BufferedImage image = null;
                while (image == null) image = ImageIO.read(stream);

    As you can see, I divide all the download job into 100 chunks, then do them sequentially, notifying the model (another part of application, it's not related to the question and the line with model can be ignored) in process.

    It works, but I wonder, whether it's the most efficient way to do things? Does it lead to significant overhead during downloading? This approach seems to be pretty "brute-forcy", so I have some doubts about it. Are there better solutions?


  2. #2
    Super Moderator helloworld922's Avatar
    Join Date
    Jun 2009
    Posts
    2,896
    Thanks
    23
    Thanked 619 Times in 561 Posts
    Blog Entries
    18

    Default Re: Downloading files efficiently

    Are you sure your code works? I can't get it to produce an actual output image.

  3. #3
    Member angstrem's Avatar
    Join Date
    Mar 2013
    Location
    Ukraine
    Posts
    200
    My Mood
    Happy
    Thanks
    9
    Thanked 31 Times in 29 Posts

    Default Re: Downloading files efficiently

    public static void main(String[] args) throws Exception {
            URL imageUrl = new URL("https://si0.twimg.com/profile_images/3453931082/64673e967c9a34e24ded46fd7124d37d.jpeg");
     
            URLConnection connection = imageUrl.openConnection();
            try (BufferedInputStream is = new BufferedInputStream(connection.getInputStream())) {
                int size = connection.getContentLength(); // Total size of file, in bytes
                int chunk = size / 100;                   // We divide file in 100 parts
     
                System.out.println(size);
     
                ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); // Stream to write bytes to
                BufferedOutputStream os = new BufferedOutputStream(outputStream); // Wrapper to increase efficiency
     
                // Standard process of reading from input stream and writing to output stream
                byte[] buffer = new byte[chunk + 1];
                int read;
                while ((read = is.read(buffer)) != -1) {
                    os.write(buffer, 0, read);
                }
     
                // Converting the downloaded bytes to BufferedImage and then returning it
                ByteArrayInputStream stream = new ByteArrayInputStream(outputStream.toByteArray());
                BufferedImage image = null;
                while (image == null) image = ImageIO.read(stream);
     
                ImageIcon icon = new ImageIcon(image);
                JOptionPane.showMessageDialog(null, icon);
            }
     
        }
    This worked for me... But I have a gray stripe at the bottom of the image. Seems, that some bytes are systematically missing.

  4. #4
    Super Moderator helloworld922's Avatar
    Join Date
    Jun 2009
    Posts
    2,896
    Thanks
    23
    Thanked 619 Times in 561 Posts
    Blog Entries
    18

    Default Re: Downloading files efficiently

    Why not do something like this:

    // java.net.URL imageUrl initialization here
    URLConnection connection = imageURL.openConnection();
    InputStream is = connection.getInputStream();
    int size = connection.getContentLength(); // Total size of file, in bytes
    int chunk = Math.min(size, 8192);
     
    byte[] buffer = new byte[size];
    int read;
    int offset = 0;
    do
    {
    	read = is.read(buffer, offset, Math.min(size - offset, chunk));
    	if (read != -1)
    	{
    		offset += read;
    	}
    	// update GUI display if desired
    } while (offset < size);
    ByteArrayInputStream bais = new ByteArrayInputStream(buffer, 0, size);
    BufferedImage image = ImageIO.read(bais);

    A few notes on design decisions:

    1. There's no reason to go URLStream->ByteArrayOutputStream->ByteArrayInputStream->BufferedImage if you already have a byte array. Buffered streams won't speed up access to data already in RAM.
    2. There's no need to wrap the URLStream with a BufferedStream if you're reading in large enough chunks already. It just creates an extra buffer on top of the buffer already being used.
    3. You don't need the last while(img==null) loop you have. It's unnecessary because image should always be initialized on the first attempt.
    4. I chose fixed 8kbyte read chunks. Reason is there's no reason to update the GUI 100 times in a few milliseconds for reading really small images, and 8kbytes effectively loads a typical disk drive (this is the default buffered streams use). You can try experimenting with larger read chunks, too.
    5. We know the size of the incomming image already, no need to use a dynamically sized ByteArrayOutputBuffer to capture this. It'll be slower because it has to create larger storage buffers as you add to it (amortized O(1), but still some overhead), and at the end you still need to copy the full buffer into a byte[] for ByteArrayInputBuffer.

    On my localhost test server I was able to achieve ~17MB/sec throughput using a ~12MB test image (~28MB/sec if I don't create the BufferedImage object). This is pretty close to my disk drive's theoretically bandwidth, I suspect the overhead may come from the server software.

    If you don't need discrete divisions to update a GUI, you can just use this:

    BufferedImage image = ImageIO.read(new BufferedInputStream(imageURL.openStream()));

    This performs similarly to the above code, and is a simple one-liner.

  5. The Following User Says Thank You to helloworld922 For This Useful Post:

    angstrem (June 8th, 2013)

  6. #5
    Member angstrem's Avatar
    Join Date
    Mar 2013
    Location
    Ukraine
    Posts
    200
    My Mood
    Happy
    Thanks
    9
    Thanked 31 Times in 29 Posts

    Default Re: Downloading files efficiently

    So BufferedInputStreams are only needed when we don't read into an array? If we read to an array, then there's no use of buffered streams?
    I've looped the ImageIO part, because the documentation says, that it can return null in certain circumstances.
    Yes, it turns out, that this method is efficient enough with 8kb/s: I've loaded an image from the internet on my maximum speed. But why 8kb/s and not some other value?

  7. #6
    Super Moderator helloworld922's Avatar
    Join Date
    Jun 2009
    Posts
    2,896
    Thanks
    23
    Thanked 619 Times in 561 Posts
    Blog Entries
    18

    Default Re: Downloading files efficiently

    The reason for Buffered streams (and buffers in general) is to mediate slow access operations, for example reading from a hard drive or across the network. The time it takes to read a byte from these slow mediums is significantly larger than the time it takes to read a byte from memory.

    The way Buffered streams works is they have some sort of storage buffer (say, an internal array), and every read/write operation operates on this buffer. The key feature is that this buffer exists in RAM, so you can operate very quickly on this data. The buffer object will determine when it needs to update itself/the underlying medium.

    These buffer objects also rely on the underlying medium being able to do "bulk read/write" operations. Hard drives need to be read in large chunks (i believe this is 4kbytes, I don't know the actual value), and network read/writes are effectively done in bulk so the client doesn't need to make x requests to get x bytes, only 1 request needs to be made.

    However, once the data is already in memory there's no need to create an additional buffer, because now we can compare apples to apples. Reading a byte from the buffer is just as fast as reading a byte from anywhere else in memory (there are a few caveats with cache memory, but I'm going to ignore this for now because it doesn't really matter here).

    There's not really any good reason why 8kbytes performs best (at least not generally applicable), people have tested different sizes and 8kbytes just happens to work best.

    ImageIO.read will only return null if it can't decode the data. In this case, no matter how many times you call imageIO.read with the same data it will always return null. The function may as well throw an exception if it can't decode the data, but it doesn't.

  8. The Following User Says Thank You to helloworld922 For This Useful Post:

    angstrem (June 8th, 2013)

  9. #7
    Member angstrem's Avatar
    Join Date
    Mar 2013
    Location
    Ukraine
    Posts
    200
    My Mood
    Happy
    Thanks
    9
    Thanked 31 Times in 29 Posts

    Default Re: Downloading files efficiently

    So when we invoke a method read() to get just a single byte from a BufferedInputStream, one of 2 things happens:
    1) if the internal buffer is empty, read 8kb (or the default buffer size) from the underlying stream in a bulk
    2) otherwise - return a byte from non-empty internal array.

    But why the bulk operations are faster then non-bulk ones? Do operations need something like initialization phase and finalization phase, and if we do bulk operation many bytes at a time can share that phases?

  10. #8
    Super Moderator helloworld922's Avatar
    Join Date
    Jun 2009
    Posts
    2,896
    Thanks
    23
    Thanked 619 Times in 561 Posts
    Blog Entries
    18

    Default Re: Downloading files efficiently

    Yep, more or less that's how a buffer works.

    Yes, basically there's some fixed cost associated with performing an operation, plus some dynamic cost depending on the amount of data being requested in a single operation.

    Say we have a fixed cost for invoking a read() function of s, and a dynamic cost d associated with how much data is read.

    The cost of reading n bytes using n read operations is:

    n * (s + d)

    On the other hand, a bulk operation would take:

    s + n * d

    This is a very simplified view because s and d aren't necessarily constant and there might be other factors, but it does explain basically why bulk operations could be much faster.

  11. The Following User Says Thank You to helloworld922 For This Useful Post:

    angstrem (June 9th, 2013)

  12. #9
    Member angstrem's Avatar
    Join Date
    Mar 2013
    Location
    Ukraine
    Posts
    200
    My Mood
    Happy
    Thanks
    9
    Thanked 31 Times in 29 Posts

    Default Re: Downloading files efficiently

    Thank you! Now I understand.

Similar Threads

  1. how to data parse efficiently
    By coder13 in forum JDBC & Databases
    Replies: 0
    Last Post: April 4th, 2013, 06:12 PM
  2. How to use MappedByteBuffer efficiently with extremely large files
    By Vyse in forum File I/O & Other I/O Streams
    Replies: 3
    Last Post: September 13th, 2012, 08:44 PM
  3. Efficiently processing data ....
    By rdegrijs in forum Java Servlet
    Replies: 0
    Last Post: July 10th, 2012, 06:25 PM
  4. Using arrays more efficiently in Java?
    By mjballa in forum Java Theory & Questions
    Replies: 1
    Last Post: February 4th, 2012, 08:53 PM
  5. Downloading files and verifying MD5 checksums
    By Kryptix in forum What's Wrong With My Code?
    Replies: 3
    Last Post: October 3rd, 2010, 11:04 PM