intern() is an interesting function in java.lang.String object. intern() function eliminates duplicate string objects from the application and has potential to reduce overall memory consumption of your application. In this post, let’s learn more about this intern() function.

1. How does the String intern() function work?
In Java heap memory, a pool of string objects is maintained. When you invoke an intern() function on a String object, JVM will check whether this string object already exists in the pool. If it exists, then that same object is returned back to the invoker. If the string object doesn’t exist, then this string object is added to the pool and the newly added string object is returned to the invoker.

It’s always easy to learn through examples and pictures. Let’s do it. Let’s look at the below code snippet:

1: String s1 = new String("yCrash").intern();
2: String s2 = new String("yCrash").intern();



Fig: JVM heap memory when launched initially

All the objects that your application creates are stored in the JVM’s heap memory. This JVM heap memory internally has a string intern pool. When you launch the program initially, JVM’s heap memory will have no string objects.



Fig: JVM heap memory when ‘String s1 = new String(“yCrash”).intern();’ is executed

When the first statement ‘String s1 = new String(“yCrash”).intern();‘ is executed, JVM will check whether the “yCrash” string object is present in the intern string pool. Since it doesn’t exist, this “yCrash” string will be added to the intern string pool and this newly created String object’s reference will be returned back to s1.



Fig: JVM heap memory when ‘String s2 = new String(“yCrash”).intern();’ is executed

When the second statement ‘String s2 = new String(“yCrash”).intern();‘ is executed, JVM will once again check whether the “yCrash” string object is present in the intern string pool. This time, “yCrash” string object is present in the intern string pool because it got added when the statement #1 is executed. Now this old string object’s reference will be returned to s2. Now both s1 and s2 will be pointing to the same “yCrash” string object. Thus, duplicate string object “yCrash” created in statement #2 will be discarded.

2. How String works without intern() function?

1: String s3 = new String("yCrash");
2: String s4 = new String("yCrash");



Fig: JVM heap memory when ‘String s3 = new String(“yCrash”);’ is executed

When the first statement ‘String s3 = new String(“yCrash”);’ is executed, JVM will add the “yCrash” string object to the heap memory, but not within the intern string pool.



Fig: JVM heap memory when ‘String s4 = new String(“yCrash”);’ is executed

When the second statement ‘String s4 = new String(“yCrash”);’ is executed, JVM will create a new “yCrash” string object in the heap memory. Thus duplicate “yCrash” will be created in the memory. In case if your application is creating n “yCrash” objects without invoking intern(), n “yCrash” string objects will be created in the memory. It will lead to a considerable amount of memory wastage.

3. How intern() and == work?
Since s1 and s2 are pointing to the same “yCrash” string object, when you invoke ‘==’ operation between s1 and s2 as shown below you will get ‘true’ as result.

// true will be printed
System.out.println(s1 == s2);
Since s3 and s4 are pointing to two different “yCrash” string objects, when you invoke ‘==’ operation between s3 and s4 as shown below you will get ‘false’ as result.

// false will be printed
System.out.println(s3 == s4);

4. In which JVM memory region Intern String pool are stored?
JVM memory has following regions:

a. Heap region (i.e. Young Generation + Old Generation)

b. Metaspace

c. Others region

To learn about these JVM memory regions, you may refer to this video clip. In the earlier versions of Java starting from 1 to 6, string intern pool was stored in the Perm Generation. Starting from java 7, String intern pool is stored in the JVM’s heap memory. To confirm it, we conducted this simple experiment

5. Is it better to use intern() or -XX:+UseStringDeduplication?
When you pass ‘-XX:+UseStringDeduplication’ JVM argument during application startup, JVM will try to eliminate duplicate strings as part of the garbage collection process. During the garbage collection process, JVM inspects all the objects in memory. As part of this process, it tries to identify duplicate strings among them and tries to eliminate them. However, there are certain limitations in using the ‘-XX:+UseStringDeduplication‘ JVM argument, such as it will only work with G1 GC algorithm and eliminate duplicates only on the long living string objects, … To learn more about this argument you can refer to this post. Here is an interesting case study of a major application which tried to use the ‘-XX:+UseStringDeduplication’ JVM argument.

On the other hand, intern() function can be used with any GC algorithms and on both short-lived/long-lived objects. However, intern() function might impact application response time more than ‘-XX:+UseStringDeduplication’, for more details refer to this blog post

6. What is the performance impact of intern() function?
Based on this post, you might have understood that invoking the intern() function on the string objects has a potential to eliminate duplicate strings from memory, thus reducing overall memory utilization. However, it can have a toll on the response time and CPU utilization. To understand the performance impact of using intern() function, you may refer to this post

Video

https://youtu.be/HiL2634pZaA