Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 21 of 21

Thread: ArrayList memory bloat that I cannot comprehend

  1. #1
    Member
    Join Date
    Oct 2012
    Posts
    32
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default ArrayList memory bloat that I cannot comprehend

    I apologise for jumping right into a question, but I ran into the sort of behaviour that just makes no sense to me. It's like looking into the face of Great Cthulu - it's something that should not be, or at least I don't get why it is.

    Here's the situation: I've been given what should be an embarrassingly basic task - load two files into memory. My plan was to load them in as ArrayLists, but I started running out of memory for the second file. I was attempting to load tab-delimited rows and split them into arrays, constructing an ArrayList of actual arrays. Probably inefficient, yes, but I didn't expect to run out of a GB of memory for a 100M file. Switching to simply loading the file's rows as basic Strings cut the memory usage down to a mere fraction of what it was before. OK, so lesson learned, I thought. A String split into arrays takes up more memory than a basic array.

    Except then I did the same to the first file and that started using MORE memory. For the first file, I was loading two rows from the hard drive at a time and combining them into a single array of two elements, then adding them to the ArrayList. So I changed this to instead concatenate the the two rows into a single String, and that bloated both my memory used and the time taken by almost double. Processing time I get - concatenation messes with Strings, so it presumably takes longer. Fair enough. But why does switching from an array to a String now cost me MORE physical memory?

    My actual working code is kind of spaghetti, so I compiled a small example that's actually very telling:

    public class Test
    {
    	public static void main(String[] args)
    	{		
    		String string1 = "cat";
    		String string2 = "dog";
    		String string3 = "rabbit";
    		String string4 = "mouse";
    		String string5 = "eagle";
    		String line = string1 + " " + string2 + " " + string3 + " " + string4 + " " + string5;
     
    		ArrayList<String> list = new ArrayList<String>();
    		for(int i = 0; i <= 1000000; i++)
    		{
    			list.add(line);
    		}
    		System.out.println("Total used memory so far:" + (Runtime.getRuntime().totalMemory()/1048576) + "MB");
     
    		ArrayList<String> list2 = new ArrayList<String>();
    		for(int i = 0; i <= 1000000; i++)
    		{
    			list2.add(string1 + " " + string2 + " " + string3 + " " + string4 + " " + string5);
    		}
    		System.out.println("Total used memory so far:" + (Runtime.getRuntime().totalMemory()/1048576) + "MB");
    	}
     
    }

    Run together, this produced the output of:
    Total used memory so far:15MB
    Total used memory so far:155MB

    Just the first cycle creates a footprint of 15MB, which I believe is within the footprint of he JVM, while just the second cycle produces a footprint of 137MB.

    But why is this happening? Both cycles do the exact same thing, with the only difference being how they do it. The first cycle uses the concatenated variable "line" that's built from the five previous variables while the second cycle uses the five variables directly and concatenates them in the process of creating the list. The result is two identical lists with the same number of elements and the same element contents, yet the second one is ten times as big as the first one, and I do not know why. I don't know enough about Java "under the hood," but I can speculate some kind of garbage builds up as this is happening. I scoured the Web and it seems like I can't call the Garbage Collector. I can suggest it run, and running it through System does take down a good 20MB off the total pile, but it doesn't take down the 100+MB that make up the bulk of the difference.

    Why do two ArrayLists with IDENTICAL contents differ in size so massively, and what can I do to avoid it? At this point, I've pieced it together that directly loading strings takes less memory than loading arrays, which takes less memory than loading concatenated strings and this just makes no sense to me. I'm sorry if this doesn't seem to make sense, but I failed my saving throw upon seeing the above example, lost 10 Sanity and now have a phobia of formatted text. I cannot understand why this is happening, and if I can't understand why it's happening, I don't know what I can do to prevent my programme blowing through a GB of memory for something which should not take this much.

    *edit*
    I do have a few more tests I can post, but I'd like to wait for a response so I know I'm not just going crazy.
    Last edited by Fazan; October 5th, 2012 at 07:15 AM.


  2. #2
    Super Moderator helloworld922's Avatar
    Join Date
    Jun 2009
    Posts
    2,896
    Thanks
    23
    Thanked 619 Times in 561 Posts
    Blog Entries
    18

    Default Re: ArrayList memory bloat that I cannot comprehend

    They may have identical contents but when you concatenate strings in Java it needs to create a new string to store the data. Strings in Java can also be interned, meaning that strings with identical contents are cached as a single copy. I don't think concatenated strings are automatically interned, but this isn't something I've paid too close attention to. Java also utilizes a Garbage Collector which means that the runtime can decide when it wants to reclaim memory. Almost always this is not immediately when an object is no longer referred to.

    Lastly (and probably most importantly), the Javadoc for totalMemory says that this method returns the amount of memory current available to the JVM. This is memory which is currently being used for objects, memory being used for objects waiting to be garbage collected, and memory currently not in use (free memory). The JVM will allocate memory lazily (i.e. it will increase the amount of available memory if it thinks it needs to). To get the amount of memory in use subtract the amount of free memory (see: What is the exact meaning of Runtime.getRuntime().totalMemory() and freeMemory()). Note that this method doesn't account for memory currently waiting to be garbage collected. You can try to mitigate this by calling System.gc() to suggest a garbage collect, but no guarantees (see: Why is it a bad practice to call System.gc())
    Last edited by helloworld922; October 5th, 2012 at 11:21 AM.

  3. #3
    Crazy Cat Lady KevinWorkman's Avatar
    Join Date
    Oct 2010
    Location
    Washington, DC
    Posts
    5,166
    My Mood
    Hungover
    Thanks
    141
    Thanked 597 Times in 512 Posts

    Default Re: ArrayList memory bloat that I cannot comprehend

    There are a couple misunderstandings here.

    The value returned by Runtime.totalMemory() isn't what your program is currently using- it's how much total memory it has had to use. Java allocates memory lazily, and the total memory is simply how much of the max it has allocated. You can also compare it to Runtime.freeMemory() to get an idea of how much is currently in use.

    But even that isn't really telling the whole story- even if you use System.gc(), you don't have control over when garbage collection occurs. So even if you see that you're using a bunch of memory at a given time, that could just mean that the garbage collector hasn't run yet.

    Also, you're using String concatenation in a loop, which is a pretty big memory no-no. Under the hood, Java is instantiating a StringBuilder for each concatenation. The better way to do this is instantiate your own StringBuilder and use that single instance instead of creating a million instances.

    Hope that helps.
    Last edited by KevinWorkman; October 5th, 2012 at 01:49 PM.
    Useful links: How to Ask Questions the Smart Way | Use Code Tags | Java Tutorials
    Static Void Games - Play indie games, learn from game tutorials and source code, upload your own games!

  4. #4
    Member
    Join Date
    Oct 2012
    Posts
    32
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default Re: ArrayList memory bloat that I cannot comprehend

    Quote Originally Posted by helloworld922 View Post
    Lastly (and probably most importantly), the Javadoc for totalMemory says that this method returns the amount of memory current available to the JVM. This is memory which is currently being used for objects, memory being used for objects waiting to be garbage collected, and memory currently not in use (free memory). The JVM will allocate memory lazily (i.e. it will increase the amount of available memory if it thinks it needs to). To get the amount of memory in use subtract the amount of free memory (see: What is the exact meaning of Runtime.getRuntime().totalMemory() and freeMemory()). Note that this method doesn't account for memory currently waiting to be garbage collected. You can try to mitigate this by calling System.gc() to suggest a garbage collect, but no guarantees (see: Why is it a bad practice to call System.gc())
    I actually considered that mayby totalMemory is reporting false results and would have chalked it up to that, but my strongest evidence for the discrepancy is that the second loop will run out of memory A LOT faster than the first. Run out of memory and exit the programme via exception, I mean. The first loop can run for 10 million and over iterations before the list exceeds my maximum memory, while the second will not run many more than a million, all for the creation of the same amount of data. Additionally, I don't start seeing memory bloat until somewhere over 500 thousand repetitions. Until them, totalMemory reports the same for both loops. Once 500 thousand iterations is exceeded, memory use cascades at an alarming rate.

    Additionally, I'm seeing severe memory bloat from using string splitting in a loop to create an ArrayList of primitive arrays. I can avoid this by simply not splitting and putting off the splits for the actual data processing, but the difference is shocking. The file I've been asked to process is 5+ million lines long, around 90MB, and each line splits into 6 columns, so I can see that might create severe overhead, but the difference still seems too much to me. The entire file loaded as an ArrayList of Strings pushes my used memory up to ~280MB, which isn't that much. I'm reserving a GB for the programme and I can afford to. The same exact file loaded as an ArrayList of arrays bottoms out my memory pool and crashes the programme with an OutOfMemoryException at around line 2.5 million. It just seems like there has to be something more to it than arrays just being that much more memory-intensive than Strings as to turn a 250MB list into one that bottoms out a GB of memory half-way through the process. I know I'm doing something wrong, I just don't know what that is.

    I didn't snag an example of this and I'm away from my work machine so I'll try to fabricate one when time permits, but my problem was trying to load a file into an ArrayList line by line, splitting every line by the \t character and saving this into an array. My machine was physically incapable of doing this. Am I not supposed to do that?

    Quote Originally Posted by KevinWorkman View Post
    Also, you're using String concatenation in a loop, which is a pretty big memory no-no. Under the hood, Java is instantiating a StringBuilder for each concatenation. The better way to do this is instantiate your own StringBuilder and use that single instance instead of creating a million instances.
    I was about to say I tried this and it didn't work, but I just remembered making a VERY basic mistake. I did try using a StringBuilder, but rebuilt a new object over the old reference on every step, which is... A beginner's mistake, I'll freely admit. I'd have to check the API, but I'm pretty sure the StringBuilder should have a method to clear its contents without having to create a new object out of it on every step. If I can do that, it would save me much heartache, but again - I don't have access to my work machine now.

    P.S. I apologise for putting this in the wrong forum. I looked through the available ones and didn't see where else it could fit. I guess I thought my problem was more exotic than it really was.

  5. #5
    Member
    Join Date
    Oct 2012
    Posts
    32
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default Re: ArrayList memory bloat that I cannot comprehend

    Correction - that OutOfMemoryException is actually a java.lang.OutOfMemoryError: Java heap space. What caused it is the following, and this will probably make me look really unprofessional and hackneyed for doing it (I assume, based on my goof with using concatenation in a loop), but here it goes:

    public class Test
    {
    	public static void main(String[] args)
    	{		
    		String string1 = "cat";
    		String string2 = "dog";
    		String string3 = "rabbit";
    		String string4 = "mouse";
    		String string5 = "eagle";
    		String line = string1 + " " + string2 + " " + string3 + " " + string4 + " " + string5;
     
    		ArrayList<String> list = new ArrayList<String>();
    		for(int i = 0; i <= 1000000; i++)
    		{
    			list.add(line);
    		}
    		System.out.println("Total used memory so far:" + (Runtime.getRuntime().totalMemory()/1048576) + "MB");
     
    		ArrayList<String[]> list2 = new ArrayList<String[]>();
    		for(int i = 0; i <= 1000000; i++)
    		{
    			list2.add(line.split(" "));
    		}
    		System.out.println("Total used memory so far:" + (Runtime.getRuntime().totalMemory()/1048576) + "MB");
    	} 
    }

    This produces:

    Total used memory so far:15MB
    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    	at java.lang.String.split(Unknown Source)
    	at java.lang.String.split(Unknown Source)
    	at test.Test.main(Test.java:26)

    I don't know that arrays take that much more memory than Strings, but I can accept that pretty much as a guess. I don't know how they can cost THAT much more memory, however. The first finishes the whole thing in 15MB, most of which is, I think the JVM. The second goes up to iteration 935719 and runs itself out of memory.

    What I'm saying is I get this is not a memory-efficient thing to do, but I don't know WHY it isn't. Is it the array causing this, or is it the split method? I read up on Split in the API, but I'm not sure I follow exactly how it works. I infer the split(regex) method works by calling the split(regex,limit) method as many times as is necessary, which in this case is five, so that would make for a LOOOT of split calls, but that should bog my system down to a crawl before it eats through my memory. I surmise something is creating junk before the Garbage Collector can clean it up and running out of memory before the thing can even run, but I don't know what that is. Is Split creating an array for every step in the splitting process, leaving me with 5x1000000 orphaned arrays eating up my physical memory? Or is there something about the arrays themselves which is problematic?

    A co-worker of mine insists I use HashMaps instead of ArrayLists, which I'd need to look into before I know whether I want to or not, but if I do, I still need to run a series of splits to pluck the row title out of the title+data clump, and if splits kill my memory, I can't do that. If it's the zillions of arrays doing it, or if limiting my splits to a limit of 2 will help, I can do that, as well, but I need to know what I'm doing wrong before I know what I need to do right.

  6. #6
    Crazy Cat Lady KevinWorkman's Avatar
    Join Date
    Oct 2010
    Location
    Washington, DC
    Posts
    5,166
    My Mood
    Hungover
    Thanks
    141
    Thanked 597 Times in 512 Posts

    Default Re: ArrayList memory bloat that I cannot comprehend

    Well, you're doing two very different things here: in the first loop, you're adding the same instance of String to a List over and over again. That's not going to take up much memory. In the second loop, you're creating a new instance of a String array for each iteration. That's a million different instances versus only one instance in the first loop.
    Useful links: How to Ask Questions the Smart Way | Use Code Tags | Java Tutorials
    Static Void Games - Play indie games, learn from game tutorials and source code, upload your own games!

  7. #7
    Crazy Cat Lady KevinWorkman's Avatar
    Join Date
    Oct 2010
    Location
    Washington, DC
    Posts
    5,166
    My Mood
    Hungover
    Thanks
    141
    Thanked 597 Times in 512 Posts

    Default Re: ArrayList memory bloat that I cannot comprehend

    To show what I mean, try this as your first loop:

    ArrayList<String> list = new ArrayList<String>();
    		for(int i = 0; i <= 1000000; i++)
    		{
    			list.add(new String(line));
    		}

    And this as your second loop:

    ArrayList<String[]> list2 = new ArrayList<String[]>();
    		String[] arr = line.split(" ");
    		for(int i = 0; i <= 1000000; i++)
    		{
    			list2.add(arr);
    		}
    Useful links: How to Ask Questions the Smart Way | Use Code Tags | Java Tutorials
    Static Void Games - Play indie games, learn from game tutorials and source code, upload your own games!

  8. #8
    Member
    Join Date
    Jun 2012
    Location
    Left Coast, USA
    Posts
    451
    My Mood
    Mellow
    Thanks
    1
    Thanked 97 Times in 88 Posts

    Default Re: ArrayList memory bloat that I cannot comprehend

    I would like to offer some musings and opinions and some data.

    First of all, the OP was not asking about how to write efficient programs, but how to understand the results of using Runtime methods in determining memory requirements for a particular type of object, an ArrayList of Strings.

    The setup for the experiment that I am reporting here is:
    32-bit Java version 1.6.0_22 running on a 32-bit Centos 5.8 Linux workstation.

    Here's my assumption. I welcome any enlightenment by anyone who has anything to offer.

    I think that the difference between Runtime.totalMemory() and Runtime.freeMemory() can be calculated before and after creating an object and the difference would be an estimate of the amount of memory taken up by that object. It's only approximate, since the JVM is doing lots of things over which we have precious little direct control. (Garbage collection with occasional heap compaction, etc.)

    Anyhow...

    I put the two different types of loops into separate functions so that I could change them easily without accidentally screwing anything up in the main program.

    I created the ArrayList in each case by using a constructor that designates the initial size of the array list rather than risking fragmentation of the ArrayList itself as I added the Strings.

    For the first type of loop, I defined the common String in the main() method and passed it as a parameter to the function.
    For the second type of loop, I created the String on the fly inside the function.

    So: Here's my test program.
    // Populate ArrayList<String> two different ways:
    //
    // loop1 uses a single common String for each element that is added
    //
    // loop2 adds a String created on the fly. The String itself
    // is identical to the common String used in loop 1
    //
    //   Zaphod_b
     
    import java.util.*;
    import java.lang.*;
     
    public class Z
    {
        public static void main(String[] args) throws Exception
        {       
            long total, free, size0, size1, size2;
     
            int number = 1000000;
     
            String string1 = "cat";
            String string2 = "dog";
            String string3 = "rabbit";
            String string4 = "mouse";
            String string5 = "eagle";
            String line = string1 + " " + string2 + " " + string3 + " " + string4 + " " + string5;
     
            // Show memory parameters before the fun begins
            total = Runtime.getRuntime().totalMemory();
            free  = Runtime.getRuntime().freeMemory();
            size0 = total-free;
            System.out.printf("Initially   : totalMemory = %10d Bytes\n", total);
            System.out.printf("              freeMemory  = %10d Bytes\n", free);
            System.out.printf("              difference  = %10d Bytes\n", size0);
            System.out.println();
     
            System.out.printf("Creating ArrayList<String> objects with %d elements\n", number);
            System.out.printf("Length of string = %d\n\n", line.length());
     
            // First method: will put the "line" String in all of the elements
            ArrayList<String> list = loop1(number, line);
     
            total = Runtime.getRuntime().totalMemory();
            free  = Runtime.getRuntime().freeMemory();
            size1 = total-free;
     
            System.out.printf("After loop1 : totalMemory = %10d Bytes\n", total);
            System.out.printf("              freeMemory  = %10d Bytes\n", free);
            System.out.printf("               difference = %10d Bytes\n", size1);
     
            // How many bytes used in creating the ArrayList with 'number' elements
            System.out.printf("                    delta = %10d Bytes\n", size1-size0);
     
            // Is this meaningful?  How many bytes memory for each element.  I think so.
            System.out.printf("                delta / n = %10.6f\n", (double)(size1-size0)/number);
            System.out.println();
     
            // loop2 puts in the same string, but creates it on the fly
            // for each one.
            ArrayList<String> list2 = loop2(number);
     
            total = Runtime.getRuntime().totalMemory(); // Bytes
            free  = Runtime.getRuntime().freeMemory(); // Bytes
            size2 = total-free;
     
            // Summary for this one
            System.out.printf("After loop2 : totalMemory = %10d Bytes\n", total);
            System.out.printf("              freeMemory  = %10d Bytes\n", free);
            System.out.printf("              difference  = %10d Bytes\n", size2);
            System.out.printf("                    delta = %10d Bytes\n", size2-size1);
            System.out.printf("                delta / n = %10.6f\n", (double)(size2-size1)/number);
            System.out.println();
        }
     
        // Add a fixed String each time
        static ArrayList<String> loop1(int n, String line) {
            long total = Runtime.getRuntime().totalMemory();
            long free  = Runtime.getRuntime().freeMemory();
            long size0 = total-free;
            System.out.printf("Entry into Loop1 : totalMemory = %10d Bytes\n", total);
            System.out.printf("                   freeMemory  = %10d Bytes\n", free);
            System.out.printf("                   difference  = %10d Bytes\n", size0);
            System.out.println();
     
            // Allocate storage for length n
            ArrayList<String> list = new ArrayList<String>(n);
     
            total = Runtime.getRuntime().totalMemory();
            free  = Runtime.getRuntime().freeMemory();
            long size1 = total-free;
            System.out.printf("After allocation : totalMemory = %10d Bytes\n", total);
            System.out.printf("                   freeMemory  = %10d Bytes\n", free);
            System.out.printf("                   difference  = %10d Bytes\n", size1);
            System.out.printf("                         delta = %10d Bytes\n", size1-size0);
            System.out.println();
     
     
            for(int i = 0; i < n; i++)
            {
                list.add(line);
            }
            return list;
        }
     
        // Add a temporary String each time
        static ArrayList<String> loop2(int n) throws Exception {
            String string1 = "cat";
            String string2 = "dog";
            String string3 = "rabbit";
            String string4 = "mouse";
            String string5 = "eagle";
     
            long total = Runtime.getRuntime().totalMemory();
            long free  = Runtime.getRuntime().freeMemory();
            long size0 = total-free;
            System.out.printf("Entry into Loop2 : totalMemory = %10d Bytes\n", total);
            System.out.printf("                   freeMemory  = %10d Bytes\n", free);
            System.out.printf("                   difference  = %10d Bytes\n", size0);
            System.out.println();
     
            // Allocate storage for length n
            ArrayList<String> list = new ArrayList<String>(n);
     
            total = Runtime.getRuntime().totalMemory();
            free  = Runtime.getRuntime().freeMemory();
            long size1 = total-free;
            System.out.printf("After allocation : totalMemory = %10d Bytes\n", total);
            System.out.printf("                   freeMemory  = %10d Bytes\n", free);
            System.out.printf("                   difference  = %10d Bytes\n", size1);
            System.out.printf("                         delta = %10d Bytes\n", size1-size0);
            System.out.println();
     
            for(int i = 0; i < n; i++)
            {
                list.add(string1 + " " + string2 + " " + string3 + " " + string4 + " " + string5);
            }
            return list;
        }
    }

    Output on my system:

    Initially   : totalMemory =    47841280 Bytes
                  freeMemory  =    47590920 Bytes
                  difference  =      250360 Bytes
     
    Creating ArrayList<String> objects with 1000000 elements
    Length of string = 26
     
    Entry into Loop1 : totalMemory =    47841280 Bytes
                       freeMemory  =    47590920 Bytes
                       difference  =      250360 Bytes
     
    After allocation : totalMemory =    47841280 Bytes
                       freeMemory  =    43590904 Bytes
                       difference  =     4250376 Bytes
                             delta =     4000016 Bytes
     
    After loop1 : totalMemory =    47841280 Bytes
                  freeMemory  =    43590904 Bytes
                   difference =     4250376 Bytes
                        delta =     4000016 Bytes
                    delta / n =   4.000016
     
    Entry into Loop2 : totalMemory =    47841280 Bytes
                       freeMemory  =    43340480 Bytes
                       difference  =     4500800 Bytes
     
    After allocation : totalMemory =    47841280 Bytes
                       freeMemory  =    39340464 Bytes
                       difference  =     8500816 Bytes
                             delta =     4000016 Bytes
     
    After loop2 : totalMemory =   232128512 Bytes
                  freeMemory  =   134134640 Bytes
                  difference  =    97993872 Bytes
                        delta =    93743496 Bytes
                    delta / n =  93.743496


    Conclusion: For one million elements we see that the first loop allocates about four million bytes for the ArrayList. Then, after the ArrayList is populated, the memory usage reported back in the main() method hasn't changed. Still four million bytes. I conclude that the elements are actually pointers to the common String that they all refer to. In other words, I am seeing that it doesn't actually copy the Strings into the ArrayList object. (Maybe this is documented somewhere, but I wanted to test it on my system.)

    If I just look at the bottom line for the second loop, the amount of memory after populating the second ArrayList is a lot more. I think this is because, inside the loop, a "temporary String" is allocated each time through the loop and each ArrayList member is set to point to a (different) "temporary String." The temporary Strings" must remain in memory after the loop2() function returns to main(), since references to them exist in the ArrayList. (A really smart optimization might recognize that the Strings are actually all the same so that it wouldn't need a million separate "temporary Strings," only one. You know, the kind of optimization you get with GNU g++ when you give it an -O3 command-line switch.) I'm kind of glad that it didn't optimize them down into one String, since that wouldn't have been so dramatic, right? I place no importance on the number of bytes per string in the results of loop2, but I printed them just the same. (Change number from a million to, say, one hundred thousand, and the results change.)

    One further experiment:
    After repeating the above experiment, set all of the elements of list2 to null. Then all of the "temporary Strings" have nothing referring to them, so the Garbage Collector can reclaim their memory.

    So, add the following at the end of main():
            // Now, set list2 elements to "null" so that the Strings that they refer to
            // can be garbage-collected.
            for (int i = 0; i < list2.size(); i++) {
                list2.set(i, line);
            }
            // Force the Garbage Collector to do the deed
            System.gc();
            Thread.sleep(1000); // Give Garbage Collection a little time to work
            total = Runtime.getRuntime().totalMemory(); // Bytes
            free  = Runtime.getRuntime().freeMemory(); // Bytes
            long size3 = total-free;
            // Summary for this one
            System.out.printf("Grand finale: totalMemory = %11d Bytes\n", total);
            System.out.printf("              freeMemory  = %11d Bytes\n", free);
            System.out.printf("              difference  = %11d Bytes\n", size3);
            System.out.printf("                    delta = %11d Bytes\n", size3-size2);
            System.out.println();

    The following appears at the end of the output on my system:
    Grand finale: totalMemory =   289472512 Bytes
                  freeMemory  =   287075336 Bytes
                  difference  =     2397176 Bytes
                        delta =   -95596696 Bytes

    So, indeed, it did reclaim the storage no longer needed as "temporary Strings." I didn't actually calculate the difference per String or anything else...

    Warnings: The numbers may vary from run to run. Depending on what else is going on in my system, the JVM may be performing its tasks interleaved with other system operations. In particular, Garbage Collection/Heap Compaction may be going on at different rates, so, for example the allocation in loop2 might end up taking more memory. (Believe it or not.) Putting gc() and a second's delay before each allocation made my numbers more consistent when I ran this on a really old, really, really slow machine.

    The real bottom line for me is that the test program is interesting because it made me examine some things that I, as a newcomer to Java, hadn't thought about (much) before. I mean, the results from the "loop1" test has no practical application as far as I can see, and, in fact, if we are trying to estimate system resource requirements before tackling a big project, simple little "tests" like loop1 can be very misleading. (I mean, in a "real" program what would be the point of creating an ArrayList with identical elements that would never change?)

    Furthermore, as far as I can determine...

    Figuring out the actual storage requirements for a particular object (especially something involving, say, Lists) may not be as obvious as "simple logical reasoning" and naive testing might suggest.

    Cheers!

    Z
    Last edited by Zaphod_b; October 5th, 2012 at 01:53 PM.

  9. #9
    Crazy Cat Lady KevinWorkman's Avatar
    Join Date
    Oct 2010
    Location
    Washington, DC
    Posts
    5,166
    My Mood
    Hungover
    Thanks
    141
    Thanked 597 Times in 512 Posts

    Default Re: ArrayList memory bloat that I cannot comprehend

    The whole point is that the second loop is allocating a lot more instances, either of String or of a String array depending on which version of the code we're looking at. The first loop is only pointing a bunch of references to the *same* instance.

    Let's say the ArrayList is something like a card catalog in a library. The books it references are on the shelves. The first loop is simply making a million copies of a card catalog reference for the same book. So you have references to a million books, right? Nope. You have a million references to one book. Now the second loop does something completely different, it creates a million books and a card catalog entry for each. So even though the card catalog in each library contains the same amount of entries, the shelves in the first library (which only contains a single book) are going to weigh a lot less than the second library (which contains a million books). I hope that metaphor didn't get away from me, haha.
    Useful links: How to Ask Questions the Smart Way | Use Code Tags | Java Tutorials
    Static Void Games - Play indie games, learn from game tutorials and source code, upload your own games!

  10. #10
    Member
    Join Date
    Oct 2012
    Posts
    32
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default Re: ArrayList memory bloat that I cannot comprehend

    OK, show of hands - that I was simply creating more references to the SAME ArrayList completely skipped my mind, and that would indeed explain a lot of the differences. I'll need to come up with a more comprehensive test that creates a brand new string for each repetition so that we can better imitate the behaviour of a real programme. Specifically, so we can better imitate the behaviour of a programme reading from memory, where a new String object would be fed in by a BufferedReader on every step, this ensuring I have unique objects.

    My question stands, though, as what spawned it was more proper than the "test" I was using to try it out. My bad. I'll see if I can't come up with something better. Will update when I can, and see if I have more insight.

    *edit*
    OK, I found a bit more of a realistic example. Again, I apologise for the messy code. It's midnight and I'm trying to run fast experiments and not fuss over neatness until I know I'm doing the right thing so I can spend more effort on making it read well. Here's what I have:

    package test;
     
    import java.util.ArrayList;
     
    public class Test
    {
    	public static void main(String[] args)
    	{		
    		String string1 = "cat";
    		String string2 = "dog";
    		String string3 = "rabbit";
    		String string4 = "mouse";
    		String string5 = "eagle";
    		String line = string1 + " " + string2 + " " + string3 + " " + string4 + " " + string5;
     
    		ArrayList<String> list = new ArrayList<String>();
    		for(int i = 0; i <= 1000000; i++)
    		{
    			String testLine = new String(line);
    			list.add(testLine);
    		}
    		long memory = Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
    		System.out.println("Total used memory so far:" + memory/1048576 + "MB");
     
    		ArrayList<String> list2 = new ArrayList<String>();
    		for(int i = 0; i <= 1000000; i++)
    		{
    			list2.add(string1 + " " + string2 + " " + string3 + " " + string4 + " " + string5);
    		}
    		long memory2 = Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
    		System.out.println("Total used memory so far:" + memory2/1048576 + "MB");
     
    		ArrayList <String> list3 = new ArrayList <String>();
    		StringBuilder builder = new StringBuilder();
    		for(int i = 0; i < 1000000; i++)
    		{
    			builder.append(string1);
    			builder.append(string2);
    			builder.append(string3);
    			builder.append(string4);
    			builder.append(string5);
    			list3.add(builder.toString());
    			builder.setLength(0);
    		}
    		long memory3 = Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
    		System.out.println("Total used memory so far:" + memory3/1048576 + "MB");
    	}
    }

    This runs three loops. I made sure the first loop created a separate String object for each iteration by explicitly invoking a new constructor for each string. They construct with the same contents, but the invoking the String constructor should ensure they're still separate instances. Running JUST that gave me 27MB of memory, which is much more significant than what it was using when just copying references. No surprise there. I'm also dividing to get MB because this absorbs some of the "signal noise" from catching the various bits here and there for JVM components. The 27MB number does not change.

    The second loop is what I had before - concatenations inside a loop. As before, this bloats significantly, ending up with a used memory amount of 103MB, several times more. I can see why putting concatenation in a loop is a bad idea. I just wish I'd known about this before

    The third loop is the new one, using a single StringBuilder that I "reset" on each iteration. I'm not entirely sure how to clear a StringBuilder, but I looked online and setting its size to 0 is what was suggested, so I did that. This uses up 86MB when it's done. That's less than direct concatenation but still considerably more than straight up adding strings, so something is clearly still leaving a footprint, and I still don't know what that is. You were right - using a single StringBuilder drops memory usage considerably and I'll definitely be using that, but I have to wonder why I'm still seeing bloat. Is the StringBuilder itself leaving garbage behind? Will that balloon badly when I go up to, say, 5 000 000 lines of text?

    *edit*
    And KevinWorkman, no, your metaphor did run away from you I actually rather like it, and I think I'm going to use it in the future. Thank you.

    Plus, I'm glad I'm not the only one who puts opening curly brackets on a new line these days.
    Last edited by Fazan; October 5th, 2012 at 04:12 PM.

  11. #11
    Crazy Cat Lady KevinWorkman's Avatar
    Join Date
    Oct 2010
    Location
    Washington, DC
    Posts
    5,166
    My Mood
    Hungover
    Thanks
    141
    Thanked 597 Times in 512 Posts

    Default Re: ArrayList memory bloat that I cannot comprehend

    Your new loop is still allocating a new instance of String each iteration. It has less overhead because it's only using a single instance of StringBuilder, but you're still creating a new String each iteration. In your first loop, you're still only creating a single instance and simply pointing a bunch of references to it, which doesn't take up much memory. The program behavior you described is pretty much what I would expect to see.

    You could probably get around this by interning the Strings, which only works because you happen to be dealing with the same String over and over again. I'm not sure how well that translates to your actual problem.
    Useful links: How to Ask Questions the Smart Way | Use Code Tags | Java Tutorials
    Static Void Games - Play indie games, learn from game tutorials and source code, upload your own games!

  12. #12
    Member
    Join Date
    Oct 2012
    Posts
    32
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default Re: ArrayList memory bloat that I cannot comprehend

    *sigh*
    Some days I pine for C++ and its EXPLICIT pointer invocations... But then I remember how much of a tangled mess I made of those and resign myself to just try and anticipate what Java does. So invoking the "new String(String string)" builder just re-references a previous String. Should have known. Would it work if, instead of building from a previous String, I built from text entered directly into the constructor, or simply doing a line = "text" call? I am a bit confused at this point. My actual programme's on my work computer, as are the data files I'm working with. That means the earliest I can get you an actual code sample is Monday when I come into work. At this point, I'm thinking I should abandon ArrayLists entirely and go with HashMaps, but that's really not relevant to the problem of performance.

  13. #13
    Crazy Cat Lady KevinWorkman's Avatar
    Join Date
    Oct 2010
    Location
    Washington, DC
    Posts
    5,166
    My Mood
    Hungover
    Thanks
    141
    Thanked 597 Times in 512 Posts

    Default Re: ArrayList memory bloat that I cannot comprehend

    Sorry, I actually read it wrong- I didn't realize you were calling new String(string). That actually does create a new instance of String. You can test this by using the == operator on the original String and the newly created one.

    But I think another major issue here is that your test doesn't really allow for garbage collection. For example, try adding a System.gc() before each memory calculation.
    Useful links: How to Ask Questions the Smart Way | Use Code Tags | Java Tutorials
    Static Void Games - Play indie games, learn from game tutorials and source code, upload your own games!

  14. #14
    Member
    Join Date
    Oct 2012
    Posts
    32
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default Re: ArrayList memory bloat that I cannot comprehend

    Quote Originally Posted by KevinWorkman View Post
    I think another major issue here is that your test doesn't really allow for garbage collection. For example, try adding a System.gc() before each memory calculation.
    I did try that, though I didn't mention it for fear of making an even bigger fool of myself In nearly all cases, Garbage Collection does shave around 20MB off the used memory, so it's clearly doing something, it's just not doing as much as I think it should, meaning there's something there that the GC doesn't dump which exists with concatenation (even with a single StringBuilder) but not with pre-built strings.

    As well, I realise the programme ends fairly quickly, within around 10 seconds on the machine I'm running it, but that's more or less what my active programme does, as well. I can load 10 million lines into a single ArrayList<String[]>, two rows to a single list element (I put them both in the same array) within 3-4 seconds on my work machine, and this produces fairly little bloat. Trying to concatenate them two-by-two into the same String and using an ArrayList<String> bloats processing time to 7-8 seconds and memory use to several times. I'll have to get to my workstation to check just how much using a single StringBuilder will help.

  15. #15
    Crazy Cat Lady KevinWorkman's Avatar
    Join Date
    Oct 2010
    Location
    Washington, DC
    Posts
    5,166
    My Mood
    Hungover
    Thanks
    141
    Thanked 597 Times in 512 Posts

    Default Re: ArrayList memory bloat that I cannot comprehend

    I suspect that you're experiencing a slightly different problem- even with the worst coding practices, using String concatenation in a loop shouldn't give you an OutOfMemoryError, it'll just use up more memory over time and be slower. I'd be curious to see an example that repeats the OutOfMemory problem, because that almost definitely shouldn't happen.

    Oh, and using a HashMap won't fix any of these problems for you- in fact, they use more memory than a List. The only reason to use one is if you need the functionality that a Map gives you (looking things up by key instead of by index). I'm not sure why your coworker was suggesting to use them over a List, but they won't fix the memory problems.
    Useful links: How to Ask Questions the Smart Way | Use Code Tags | Java Tutorials
    Static Void Games - Play indie games, learn from game tutorials and source code, upload your own games!

  16. #16
    Member
    Join Date
    Oct 2012
    Posts
    32
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default Re: ArrayList memory bloat that I cannot comprehend

    Quote Originally Posted by KevinWorkman View Post
    I suspect that you're experiencing a slightly different problem- even with the worst coding practices, using String concatenation in a loop shouldn't give you an OutOfMemoryError, it'll just use up more memory over time and be slower. I'd be curious to see an example that repeats the OutOfMemory problem, because that almost definitely shouldn't happen.
    *edit*
    4 AM, and my brain is slowing down

    What's causing the OutOfMemoryEorror is the code I listed in this post, specifically the second loop. It's a mirror of what I was doing at work, and that depletes whatever the default memory allocation is before the loop ends. JUUUST before it ends, mind you, at step ~900 000 or some such, but before the end nevertheless.

    I assume that if you fill the stack with enough crap, the whole JVM will eventually tap out stop prematurely. It has an upper memory limit, after all.

    Quote Originally Posted by KevinWorkman View Post
    Oh, and using a HashMap won't fix any of these problems for you- in fact, they use more memory than a List. The only reason to use one is if you need the functionality that a Map gives you (looking things up by key instead of by index). I'm not sure why your coworker was suggesting to use them over a List, but they won't fix the memory problems.
    His suggestion wasn't really for efficiency so much as for ease of coding. He's a Perl programmer and REALLY loves using hashes, to the point where we've had arguments over what he should use in his programmes, so that's his prerogative. And to be fair, the programme I'm loading this data in memory for will benefit from using hashes, since I need to compare two unsorted files and pair rows with matching headers, among other things. I'm told a Hash Map searches faster than an ArrayList, but either should do, I think.

    So a HashMap is more resource-intensive than an ArrayList? That's news to me, thank you for letting me know. I'm being asked to work with some sizeable files so I need to cut corners wherever I can. I'd rather it were clunkier to write if it worked better.

    *edit*
    Speaking of OutOfMemoryError, I just managed to construct another example. This one:

    public class Test
    {
    	public static void main(String[] args)
    	{		
    		String string1 = "cat";
    		String string2 = "dog";
    		String string3 = "rabbit";
    		String string4 = "mouse";
    		String string5 = "eagle";
    		String line = string1 + " " + string2 + " " + string3 + " " + string4 + " " + string5;
     
    		ArrayList<String[]> list4 = new ArrayList <String[]>();
    		for(int i = 0; i < 1000000; i++)
    		{
    			list4.add(line.split(" "));
    		}
    		long memory3 = Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
    		System.out.println("Total used memory so far:" + memory3/1048576 + " MB");
    	}
    }

    This one doesn't finish. Instead, it ends in this:

    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    	at java.util.ArrayList$SubList.listIterator(Unknown Source)
    	at java.util.AbstractList.listIterator(Unknown Source)
    	at java.util.ArrayList$SubList.iterator(Unknown Source)
    	at java.util.AbstractCollection.toArray(Unknown Source)
    	at java.lang.String.split(Unknown Source)
    	at java.lang.String.split(Unknown Source)
    	at test.Test.main(Test.java:51)

    This is the other big one I wanted to ask about. I presume it's a big no-no to use split inside a loop for a similar reason to concatenation, but this one is actually by far the worst offender. The OutOfMemoryError test from before I could finish at work when I permitted Java more memory, but this one I couldn't finish anywhere. I'm assuming that string splitting produces some kind of cascading clutter, possibly a zillion arrays that it bifurcates into smaller and smaller pairs, but something about the whole process just kills my programme. Literally, in this case - it caused the runtime to crash. That's not even an exception, I've never seen OutOfMemoryError before.
    Last edited by Fazan; October 5th, 2012 at 08:18 PM.

  17. #17
    Super Moderator helloworld922's Avatar
    Join Date
    Jun 2009
    Posts
    2,896
    Thanks
    23
    Thanked 619 Times in 561 Posts
    Blog Entries
    18

    Default Re: ArrayList memory bloat that I cannot comprehend

    HashMaps take up more memory but they are significantly faster at searching. However, in many cases I doubt the size of the base Hashmap structure will be significant compared to the size of the data you're trying to put into it.

    If you have a sorted array it takes O(log(n)) to find something using a binary search. If it's unsorted it would take O(n) by iterating through the whole list.

    HashMaps have an amortize O(1) search, depending on how many collisions there are and how you're handling collisions. The most common way to mitigate collisions is to have a large underlying memory footprint (much larger than the number of elements being stored by the hash).

    Note that ArrayLists aren't completely free from wasting memory space, either. The way ArrayLists work is they allocate an array with an initial capacity and have a separate counter for the number of items in that Array. As you add items to the ArrayList they simply get put at the location of the "virtual end". When the virtual end reaches the actual capacity a new array with a larger capacity is allocated and all the items are copied over. A common implementation is to double the capacity each time the array fills up.

    This may sound silly, but perhaps you simply just need more memory. Have you tried increasing the maximum amount of memory you allow the JVM to allocate? How to increase the JVM maximum heap memory:

    java -Xmx1g ...

    This increases the maximum heap size to 1GB. Note that to get larger than 2GB of memory you need a 64-bit JVM.

    This is the other big one I wanted to ask about. I presume it's a big no-no to use split inside a loop for a similar reason to concatenation, but this one is actually by far the worst offender. The OutOfMemoryError test from before I could finish at work when I permitted Java more memory, but this one I couldn't finish anywhere. I'm assuming that string splitting produces some kind of cascading clutter, possibly a zillion arrays that it bifurcates into smaller and smaller pairs, but something about the whole process just kills my programme. Literally, in this case - it caused the runtime to crash. That's not even an exception, I've never seen OutOfMemoryError before.
    If you're storing all the same data then this is definitely not the way to go because as you said you're just creating tons of new objects. However, for different data there really aren't a lot of ways around it.

    I suspect the fundamental problem might be associated with how you're trying to solve the bigger problem. What problem are you trying to solve? You mentioned something about pairing rows with matching headers.
    Last edited by helloworld922; October 5th, 2012 at 09:18 PM.

  18. #18
    Member
    Join Date
    Oct 2012
    Posts
    32
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default Re: ArrayList memory bloat that I cannot comprehend

    Quote Originally Posted by helloworld922 View Post
    I suspect the fundamental problem might be associated with how you're trying to solve the bigger problem. What problem are you trying to solve? You mentioned something about pairing rows with matching headers.
    It would normally be a pretty simple problem, but I'm trying to devise a software package that will work with a wider range of data than JUST the two files I'm given in case my project manager wants to publish the broader research project or we need to sort other types of data. What I'm trying to do is the following:

    I have two files. One has each data entry on two consecutive rows, with the first row being the data header and the second row being data I don't care about for the purposes of the programme. The other has each data entry on the same row, with a header and five columns of numerical data that I DO care about tab-delimited. I need to match entries from the first file to the second file, find out if the numerical data is non-zero, then jam the numbers into the header with an underscore delimiter, and delete the entry if it IS zero, which many of those are. Then, I need to print the results in the format of the first file - header and data on separate rows, with the header now also carrying the data from the second file.

    I chose to do this by loading the whole thing into memory because I'll need to run searches. My project manager was adamant that I treat both files as unsorted and isn't letting me mess with the files themselves, plus we're not sure if all data entries from the second file have counterparts in the first. I wouldn't want to do looping file reads for files this large, hence why I want to load them into memory. I suspect it's best to either use HashMaps to simplify searches or, as was my original intent: trace through the source ArrayList, then go element-by-element through the comparison ArrayList, save the "jammed-together" line into a third output list and remove both corresponding entries from both lists. This ought to simplify searching as more are found since the lists will shrink progressively, but I suspect it has severe problems of its own, such as corner cases when my source list is much smaller than the comparison list.

  19. #19
    Super Moderator helloworld922's Avatar
    Join Date
    Jun 2009
    Posts
    2,896
    Thanks
    23
    Thanked 619 Times in 561 Posts
    Blog Entries
    18

    Default Re: ArrayList memory bloat that I cannot comprehend

    You say you don't need to write out the zero data to the new file, but do you need to keep it in memory? If you don't, as you read in the data file you can do a check first to see if you should keep that data or get rid of it.

    Also, if you read in the data first then you don't have to keep every header item from the first file, only those you find in the second file which are valid. To make searches much faster this would be a good place to use HashMaps if the data is indeed unsorted.

    Lastly I would suggest you change the way you read in data. Use a Scanner object so you can read in tokens directly rather than trying to read in a line and split it into the appropriate data. Read in numerical data as a float or double, unless you're not allowed to. These data types usually take up less memory than a String representation of the number.

  20. #20
    Member
    Join Date
    Oct 2012
    Posts
    32
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default Re: ArrayList memory bloat that I cannot comprehend

    Quote Originally Posted by helloworld922 View Post
    You say you don't need to write out the zero data to the new file, but do you need to keep it in memory? If you don't, as you read in the data file you can do a check first to see if you should keep that data or get rid of it.
    I don't need to zero data at all, and I intend to remove rows from both lists as I match them up. As a point of fact, I may not even need a third list. It should be simple enough to just update the source list, because all that's really needed is I alter the headers. That should save some memory.

    Quote Originally Posted by helloworld922 View Post
    Lastly I would suggest you change the way you read in data. Use a Scanner object so you can read in tokens directly rather than trying to read in a line and split it into the appropriate data. Read in numerical data as a float or double, unless you're not allowed to. These data types usually take up less memory than a String representation of the number.
    That's not a bad idea. I've never really used Scanners for reading files before, but that shouldn't be a bad idea. If I can read the data in pieces to begin with, that'll do the trick. Save memory and processing, at the very least. Thank you kindly for your help

  21. #21
    Member
    Join Date
    Oct 2012
    Posts
    32
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default Re: ArrayList memory bloat that I cannot comprehend

    I think I have all I need at this point, and I'm willing to call this problem solved. I know why the memory bloat was occurring and what I can do to prevent it. Thank you kindly, everybody

Similar Threads

  1. Properly releasing memory to avoid memory pileup/crash
    By fickletrick in forum What's Wrong With My Code?
    Replies: 6
    Last Post: July 22nd, 2012, 10:09 AM
  2. [SOLVED] Memory usage increasing in while loop - is it a memory leak
    By mds1256 in forum What's Wrong With My Code?
    Replies: 2
    Last Post: July 18th, 2012, 10:06 AM
  3. How to use an ArrayList and what is its advantage over array?
    By JavaPF in forum Java SE API Tutorials
    Replies: 4
    Last Post: December 21st, 2011, 03:44 AM
  4. Ordering ArrayList by 3 conditions as you add to ArrayList
    By aussiemcgr in forum Collections and Generics
    Replies: 4
    Last Post: July 13th, 2010, 02:08 PM
  5. Memory Handling -de allocate memory in Java
    By 19world in forum Java Theory & Questions
    Replies: 4
    Last Post: June 15th, 2010, 04:05 AM