Downloading files efficiently

**angstrem** · June 8th, 2013, 01:20 PM

I'm working on an application, who's task is to download and display pictures. I'm planning to track the progress of download and display it. Hence, I've written the following code:

        // java.net.URL imageUrl initialization here
        URLConnection connection = imageUrl.openConnection();
        try (BufferedInputStream is = new BufferedInputStream(connection.getInputStream())) {
            int size = connection.getContentLength(); // Total size of file, in bytes
            int chunk = size / 100;                   // We divide file in 100 parts
 
            ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); // Stream to write bytes to
            BufferedOutputStream os = new BufferedOutputStream(outputStream); // Wrapper to increase efficiency
 
            // Standard process of reading from input stream and writing to output stream
            byte[] buffer = new byte[chunk];
            int read;
            while ((read = is.read(buffer)) != -1) {
                os.write(buffer, 0, read);
 
                // Parts for tracking progress, it does not influence the download process
                if(model != null) model.setOperationProgress(model.getOperationProgress() + 1);
            }
 
            // Converting the downloaded bytes to BufferedImage and then returning it
            ByteArrayInputStream stream = new ByteArrayInputStream(outputStream.toByteArray());
            BufferedImage image = null;
            while (image == null) image = ImageIO.read(stream);

As you can see, I divide all the download job into 100 chunks, then do them sequentially, notifying the model (another part of application, it's not related to the question and the line with model can be ignored) in process.

It works, but I wonder, whether it's the most efficient way to do things? Does it lead to significant overhead during downloading? This approach seems to be pretty "brute-forcy", so I have some doubts about it. Are there better solutions?

**helloworld922** · June 8th, 2013, 04:21 PM

Are you sure your code works? I can't get it to produce an actual output image.

**angstrem** · June 8th, 2013, 05:49 PM

public static void main(String[] args) throws Exception {
        URL imageUrl = new URL("https://si0.twimg.com/profile_images/3453931082/64673e967c9a34e24ded46fd7124d37d.jpeg");
 
        URLConnection connection = imageUrl.openConnection();
        try (BufferedInputStream is = new BufferedInputStream(connection.getInputStream())) {
            int size = connection.getContentLength(); // Total size of file, in bytes
            int chunk = size / 100;                   // We divide file in 100 parts
 
            System.out.println(size);
 
            ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); // Stream to write bytes to
            BufferedOutputStream os = new BufferedOutputStream(outputStream); // Wrapper to increase efficiency
 
            // Standard process of reading from input stream and writing to output stream
            byte[] buffer = new byte[chunk + 1];
            int read;
            while ((read = is.read(buffer)) != -1) {
                os.write(buffer, 0, read);
            }
 
            // Converting the downloaded bytes to BufferedImage and then returning it
            ByteArrayInputStream stream = new ByteArrayInputStream(outputStream.toByteArray());
            BufferedImage image = null;
            while (image == null) image = ImageIO.read(stream);
 
            ImageIcon icon = new ImageIcon(image);
            JOptionPane.showMessageDialog(null, icon);
        }
 
    }

This worked for me... But I have a gray stripe at the bottom of the image. Seems, that some bytes are systematically missing.

**helloworld922** · June 8th, 2013, 06:34 PM

Why not do something like this:

// java.net.URL imageUrl initialization here
URLConnection connection = imageURL.openConnection();
InputStream is = connection.getInputStream();
int size = connection.getContentLength(); // Total size of file, in bytes
int chunk = Math.min(size, 8192);
 
byte[] buffer = new byte[size];
int read;
int offset = 0;
do
{
	read = is.read(buffer, offset, Math.min(size - offset, chunk));
	if (read != -1)
	{
		offset += read;
	}
	// update GUI display if desired
} while (offset < size);
ByteArrayInputStream bais = new ByteArrayInputStream(buffer, 0, size);
BufferedImage image = ImageIO.read(bais);

A few notes on design decisions:

1. There's no reason to go URLStream->ByteArrayOutputStream->ByteArrayInputStream->BufferedImage if you already have a byte array. Buffered streams won't speed up access to data already in RAM.
2. There's no need to wrap the URLStream with a BufferedStream if you're reading in large enough chunks already. It just creates an extra buffer on top of the buffer already being used.
3. You don't need the last while(img==null) loop you have. It's unnecessary because image should always be initialized on the first attempt.
4. I chose fixed 8kbyte read chunks. Reason is there's no reason to update the GUI 100 times in a few milliseconds for reading really small images, and 8kbytes effectively loads a typical disk drive (this is the default buffered streams use). You can try experimenting with larger read chunks, too.
5. We know the size of the incomming image already, no need to use a dynamically sized ByteArrayOutputBuffer to capture this. It'll be slower because it has to create larger storage buffers as you add to it (amortized O(1), but still some overhead), and at the end you still need to copy the full buffer into a byte[] for ByteArrayInputBuffer.

On my localhost test server I was able to achieve ~17MB/sec throughput using a ~12MB test image (~28MB/sec if I don't create the BufferedImage object). This is pretty close to my disk drive's theoretically bandwidth, I suspect the overhead may come from the server software.

If you don't need discrete divisions to update a GUI, you can just use this:

BufferedImage image = ImageIO.read(new BufferedInputStream(imageURL.openStream()));

This performs similarly to the above code, and is a simple one-liner.

**angstrem** · June 8th, 2013, 07:29 PM

So BufferedInputStreams are only needed when we don't read into an array? If we read to an array, then there's no use of buffered streams?
I've looped the ImageIO part, because the documentation says, that it can return null in certain circumstances.
Yes, it turns out, that this method is efficient enough with 8kb/s: I've loaded an image from the internet on my maximum speed. But why 8kb/s and not some other value?

**helloworld922** · June 8th, 2013, 07:56 PM

The reason for Buffered streams (and buffers in general) is to mediate slow access operations, for example reading from a hard drive or across the network. The time it takes to read a byte from these slow mediums is significantly larger than the time it takes to read a byte from memory.

The way Buffered streams works is they have some sort of storage buffer (say, an internal array), and every read/write operation operates on this buffer. The key feature is that this buffer exists in RAM, so you can operate very quickly on this data. The buffer object will determine when it needs to update itself/the underlying medium.

These buffer objects also rely on the underlying medium being able to do "bulk read/write" operations. Hard drives need to be read in large chunks (i believe this is 4kbytes, I don't know the actual value), and network read/writes are effectively done in bulk so the client doesn't need to make x requests to get x bytes, only 1 request needs to be made.

However, once the data is already in memory there's no need to create an additional buffer, because now we can compare apples to apples. Reading a byte from the buffer is just as fast as reading a byte from anywhere else in memory (there are a few caveats with cache memory, but I'm going to ignore this for now because it doesn't really matter here).

There's not really any good reason why 8kbytes performs best (at least not generally applicable), people have tested different sizes and 8kbytes just happens to work best.

ImageIO.read will only return null if it can't decode the data. In this case, no matter how many times you call imageIO.read with the same data it will always return null. The function may as well throw an exception if it can't decode the data, but it doesn't.

**angstrem** · June 8th, 2013, 08:18 PM

So when we invoke a method read() to get just a single byte from a BufferedInputStream, one of 2 things happens:
1) if the internal buffer is empty, read 8kb (or the default buffer size) from the underlying stream in a bulk
2) otherwise - return a byte from non-empty internal array.

But why the bulk operations are faster then non-bulk ones? Do operations need something like initialization phase and finalization phase, and if we do bulk operation many bytes at a time can share that phases?

**helloworld922** · June 8th, 2013, 08:41 PM

Yep, more or less that's how a buffer works.

Yes, basically there's some fixed cost associated with performing an operation, plus some dynamic cost depending on the amount of data being requested in a single operation.

Say we have a fixed cost for invoking a read() function of s, and a dynamic cost d associated with how much data is read.

The cost of reading n bytes using n read operations is:

n * (s + d)

On the other hand, a bulk operation would take:

s + n * d

This is a very simplified view because s and d aren't necessarily constant and there might be other factors, but it does explain basically why bulk operations could be much faster.

**angstrem** · June 9th, 2013, 05:07 AM

Thank you! Now I understand.

Thread: Downloading files efficiently

LinkBack

Thread Tools

Display

Downloading files efficiently

Related threads:

Re: Downloading files efficiently

Re: Downloading files efficiently

Re: Downloading files efficiently

The Following User Says Thank You to helloworld922 For This Useful Post:

Re: Downloading files efficiently

Re: Downloading files efficiently

The Following User Says Thank You to helloworld922 For This Useful Post:

Re: Downloading files efficiently

Re: Downloading files efficiently

The Following User Says Thank You to helloworld922 For This Useful Post:

Re: Downloading files efficiently

Similar Threads

how to data parse efficiently

How to use MappedByteBuffer efficiently with extremely large files

Efficiently processing data ....

Using arrays more efficiently in Java?

Downloading files and verifying MD5 checksums