Things you Should do with Strings While Your Coworkers are on Holiday and No One is Checking the Production Code Branch

Hi everyone! This is part of the really cool new CS Advent Calendar run by Matthew Groves! Go check out all the really great articles by everyone!

In a non infrequent basis, interviewers ask the question, “What is a string?” and they are looking for a quick answer similar to, “It is an immutable reference type.” This normally, sparks follow up questions such as, “explain what immutable means in this scenario,” or “so are there any examples where you can change a string?” The most common answers is, “No,” and with good reason. Adding two Strings together creates a new third String. Calling methods like ToUpper() doesn’t modify the one being operated on. It creates a new string, and although strings can be treated like an array of characters, the compiler prevents the modification of those characters in their specific positions.

Technically, the more correct answer is, “It depends.” Under most circumstances, it is not possible by design, and rightfully so. There are several factors dealing with efficiency and predictability that rely on this fundamental idea, but this doesn’t encompass the “allow unsafe code” compiler option. This is in a sense cheating, as it goes against established ideas of how most .NET applications work, but with this, it is possible to mutate a string using the fixed statement, and exploring it exposes some interesting behaviors of the .NET runtime.

To elucidate this, I created an assembly project and a unit test project to show various scenarios using the fixed statement and what happens. In these examples, the unit tests don’t actually test for validity. They merely bootstrap the test methods and print the results.

So what is happening with this code? The first necessity is to understand what the fixed statement does. According to the C# Language Reference:

The fixed statement sets a pointer to a managed variable and “pins” that variable during the execution of the statement. Without fixed, pointers to movable managed variables would be of little use since garbage collection could relocate the variables unpredictably. The C# compiler only lets you assign a pointer to a managed variable in a fixed statement.

With the fixed statement, it is possible to change a string in place which breaks its concept of immutability. The unit test:

  • prints the public readonly string “Bah Humbug!!!!!”
  • runs the method which alters that string
  • prints the same string which is now “Happy Holidays!”
  • show output of unit test.

    Now what happens when a local string is modified that is the exact same as the class level string?

    At first glance, the local string (localSeasonsGreetings) should be modified, and the class level string (SeasonsGreetings) should be unchanged.

    In this example, the unit test runs the method which prints out the values of the local string and the class level string, and then the unit test prints out the value of the class level string.

    copy of string results

    The local string is modified, and the class level string is also changed. Why did this happen? The answer lies in String Interning. When a literal string becomes accessible by the program, it is checked against the intern pool (a table which houses a unique instance of each literal string or ones that have been programmatically added). If the literal already exists within that table, a reference to the string in the table is returned instead of creating a new instance. Since the two string entries in the example are the same (Bah Humbug!!!!!), the runtime actually creates one reference for both of them, and hence, when one is modified, the other is affected.

    So what happens if we piece together the string at runtime from two constants?

    Notice in the example code above, the localSeasonsGreetings literal is changed to:

    mutate pieced together copy of string.  Shows different results.

    Since the local variable instance of Bah Humbug!!!!! was created when the method was run (and is not a literal), the CLR created a new instance of this string. When this local instance was modified, the class level variable instance was not differing from the previous example.

    What happens when the same string value is in different assemblies?

    Mutate local copy of bah humbug value in test assembly.  Is modified.

    Based on the previous examples, it works how you would expect it to. Since String Interning is controlled by the CLR and not during compile time, which assembly the string is located in doesn’t matter. All literals loaded into memory are added to the same pool, so modifying the value in one assembly affects all other instances in the entire application.

    Up until this point, we’ve only seen the effects of String Interning on instances of a string. What happens if we return a literal from a static method? To test this, I added a method to return “Bah Humbug!!!!!” to the ImmutableStringsExample.

    static method single call - no change

    The static method was called after the modification method ran, and it did not change. We could assume that since the method creates a new string instance, and the static method after we modified the interned “Bah Humbug!!!!!” string reference that it couldn’t find it and created a new instance. Now the question is, “Is this method deterministic?” Will this method always return a new instance of “Bah Humbug!!!!!!”?

    call static method twice change

    Clearly the answer is no. The time when the application calls the static method, determines its behavior. Now what happens with a non-static method? Are the same methods in different objects the same?

    Second instantiated object - different

    Non static methods work the same as static ones in this regard. Once ran, the CLR will make updates and return a reference to the same object.

    With the above examples, we see that Strings in .NET are really a lot more complicated than they initially let on. The runtime handles a lot of complicated optimizations, and there is a lot of work that goes on behind the scenes to ensure that efficiency. With those efficiencies come certain restrictions, such as immutability, but in the whole scope, those small restrictions can be managed and used to benefit the application.

    The code for this post can be found on GitHub

    10 thoughts on “Things you Should do with Strings While Your Coworkers are on Holiday and No One is Checking the Production Code Branch”

    1. One interesting thing that wasn’t mentioned here: in the case when you build the string at runtime, you still have the opportunity to intern it manually, which could be useful in situations where you dynamically build strings that could be duplicates and need to strongly conserve memory (at the cost of performance – not recommended as a general solution to your memory issues):

      string localSeasonsGreetings = “Bah Humbug” + exclamations;
      localSeasonsGreetings = String.Intern(localSeasonsGreetings); // Plug this line in and localSeasonsGreetings will now be the same reference as the class instance.

      1. Thanks for catching that! I suppose this means, never come up with a title when your sleep deprived ๐Ÿ™‚

          1. Just to be clear, I know. I figure I had to make a joke about my stupid mistake, and that was clearly the one I had to make :).

    Leave a Reply