Sunday, 27 July 2014

Stuff about String class

In this Post we will discuss about some important concepts of String Class in Java.

Memory leak issue


 Have you tried creating substrings from a string object. Do you know the internals of substring in java. How they create memory leaks?



Sub strings in java are created using method substring(int beginIndex) and some other overloaded forms of this method. All these methods create a new String object and update the offset and count variable.

The original value[] is unchaged. Thus if you create a string with 10000 chars and create 100 substrings with 5-10 chars in each, all 101 objects will have same char array of size 10000 chars. It is memory wastage without any doubt.

Let see this using a program:


import java.lang.reflect.Field;
import java.util.Arrays;

public class SubStringTest {
    public static void main(String[] args) throws Exception
    {
        //Our main String
        String mainString = "i_love_java";
        //Substring holds value 'java'
        String subString = mainString.substring(7);

        System.out.println(mainString);
        System.out.println(subString);

        //Lets see what's inside mainString
        Field innerCharArray = String.class.getDeclaredField("value");
        innerCharArray.setAccessible(true);
        char[] chars = (char[]) innerCharArray.get(mainString);
        System.out.println(Arrays.toString(chars));

        //Now peek inside subString
        chars = (char[]) innerCharArray.get(subString);
        System.out.println(Arrays.toString(chars));
    }
}

Output:

i_love_java
java
[i, _, l, o, v, e, _, j, a, v, a]
[i, _, l, o, v, e, _, j, a, v, a]


Clearly, both objects have same char array stored while subString need only four characters.

Lets solve this issue using our own code:


import java.lang.reflect.Field;
import java.util.Arrays;

public class SubStringTest
{
    public static void main(String[] args) throws Exception
    {
        //Our main String
        String mainString = "i_love_java";
        //Substring holds value 'java'
        String subString = fancySubstring(7, mainString);

        System.out.println(mainString);
        System.out.println(subString);

        //Lets see what's inside mainString
        Field innerCharArray = String.class.getDeclaredField("value");
        innerCharArray.setAccessible(true);
        char[] chars = (char[]) innerCharArray.get(mainString);
        System.out.println(Arrays.toString(chars));

        //Now peek inside subString
        chars = (char[]) innerCharArray.get(subString);
        System.out.println(Arrays.toString(chars));
    }

    //Our new method prevents memory leakage
    public static String fancySubstring(int beginIndex, String original)
    {
        return new String(original.substring(beginIndex));
    }
}

Output:

i_love_java
java
[i, _, l, o, v, e, _, j, a, v, a]
[j, a, v, a]

Now substring has only characters which it need, and intermediate string used to create our correct substring can be garbage collected and thus leaving no memory footprint.



Why strings are immutable?


We all know that strings in java are immutable. If you want to know, what immutability is and how it is achieved? follow this post: How to make a java class immutable?

Here the question is WHY? Why immutable? Lets analyze.

1) The very first reason i can think of is performance increase. Java language was developed to speed up the application development as it was not that much fast in previous languages. JVM designers must have been smart enough to identify that real world applications will consist of mostly Strings in form of labels, messages, configuration, output and such numerous ways.

Seeing such over use, they imagined how dangerous can be string’s improper use. So they came up with concept of String pool (next section). String pool is nothing but a collection of some strings mostly unique. The very basic idea behind String pool is to reuse string once created. This way if a particular string is created 20 times in code, application end up having only one instance.

2) Second reason i see as security considerations. Strings are most used parameter type in each aspect of java programming. Be it loading a driver or open a URL connection, you need to pass the information as parameter in form of string. If strings have not been final then they have opened up a Pandora box of security issues.


Keyword ‘intern’ usage


This is best described by java docs:

When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned.

String str = new String("abc");

str.intern();

It follows that for any two strings s and t, s.intern() == t.intern() is true if and only if s.equals(t) is true. Means if s and t both are different string objects and have same character sequence, then calling intern() on both will result in single string pool literal referred by both variables.



Strings comparison


There are generally two ways to compare objects

    Using == operator
    Using equals() method

== operator compare for object references i.e. memory address equality. So if two string objects are referring to same literal in string pool or same string object in heap then s ==t will return true, else false.

equals() method is overridden in String class and it verify the char sequences hold by string objects. If they store the same char sequence, the s.equals(t) will return true, else false.


Matching Regular expressions


Not so secret but useful feature if you still have not explored it. You must have seen usage of Pattern and Matcher for regular expression matching. String class provides its own shortcut. Use it directly. This method also uses Pattern.matches() inside function definition.


String str = new String("abc");

str.matches("<regex>");


No comments:

Post a Comment