Article: String interning experiment

Home Page


Consultancy

  • Service Vouchers
  • Escrow Service

Shop



Programming
  • Articles
  • Tools
  • Links

Search

 

Contact

 

PHPinfo


$_SERVER







String interning can save storage memory and increase string comparison performance.

category 'experiment', language C#, created 03-Sep-2009, version V1.0, by Luc Pattyn


License: The author hereby grants you a worldwide, non-exclusive license to use and redistribute the files and the source code in the article in any way you see fit, provided you keep the copyright notice in place; when code modifications are applied, the notice must reflect that. The author retains copyright to the article, you may not republish or otherwise make available the article, in whole or in part, without the prior written consent of the author.

Disclaimer: This work is provided as is, without any express or implied warranties or conditions or guarantees. You, the user, assume all risk in its use. In no event will the author be liable to you on any legal theory for any special, incidental, consequential, punitive or exemplary damages arising out of this license or the use of the work or otherwise.


In .NET strings get interned sometimes in order to save memory space and/or CPU cycles. I decided to investigate this a bit.

Some theory

The concept of string interning is wide spread; it is documented everywhere, including here. The basic idea is to avoid storing duplicate strings, saving memory for storing the strings, and maybe also cycles when string comparison starts with comparing the references.

I did not really know what is triggering the interning; is it applied all the time (which would have its cost) or under some circumstances only, or under user control? The answers as far as I gathered so far seem to be:

  • string literals are interned by the compiler;
  • run-time generated strings are not interned by default;
  • strings can be interned explicitly by calling the static method string.Intern(string).

Some code peeking with Reflector also showed that a string comparison such as if (string1==string2)... gets compiled exactly the same as if (string.Equals(string1, string2)) ... so they both rely on the methods provided by the string class. And both String.Compare() and String.Equals() start by comparing the string reference: equal references are bound to result in equal values.

A little experiment

I decided to create a little program that generated, listed and compared a lot of strings, while measuring both the memory footprint of those strings and the execution time. The memory footprint is obtained by comparing Process.PeakWorkingSet64 both before and after the operation. Warning: this number can never decrease (hence the "peak" in its name), so each run of the program can only generate one reliable result. The elapsed time is measured using a Stopwatch instance.

The program

The environment used is the Microsoft .NET Framework (version 2.0 or above) and the C# programming language. The code is pretty simple:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Text;

namespace StringInterning {
	class Program {
		static void Main(string[] args) {
			int differences=1000;
			int repeats=1000;
			int action=0;
			if (args.Length==1) int.TryParse(args[0].ToLower(), out action);
			switch (action) {
				case 1:
					testStrings(differences*repeats, 1, false);
					break;
				case 2:
					testStrings(differences, repeats, false);
					break;
				case 3:
					testStrings(differences*repeats, 1, true);
					break;
				case 4:
					testStrings(differences, repeats, true);
					break;
				case 5:
					// a secret way to test all four at once (with inaccurate WS result though)
					testStrings(differences, repeats, true);
					testStrings(differences*repeats, 1, false);
					testStrings(differences, repeats, false);
					testStrings(differences*repeats, 1, true);
					break;
				default:
					log("StringInterning needs one numeric argument in the range [1,4] to select the test case");
					break;
			}
			Console.Write("Hit ENTER to terminate...");
			Console.ReadLine();
		}

		private static void log(string s) {
			Console.WriteLine(s);
		}

		private static void testStrings(int differences, int repeats, bool interning) {
			GC.Collect();
			Process process=Process.GetCurrentProcess();
			process.Refresh();
			long WS=process.PeakWorkingSet64;
			string title="Generating "+differences.ToString("#,###")+" different strings";
			if (repeats<=1) title+=" once"; else title+=" "+repeats.ToString("#,###")+" times";
			title+=interning?" with interning":" without interning";
			log(title);
			List<string> list=new List<string>(differences*repeats);
			Stopwatch stopwatch=new Stopwatch();
			stopwatch.Start();
			for (int repeat=0; repeat<repeats; repeat++) {
				for (int count=0; count<differences; count++) {
					string s=string.Format("text line {0}", count);
					if (interning) s=string.Intern(s);
					list.Add(s);
				}
			}
			stopwatch.Stop();
			GC.Collect();
			process.Refresh();
			WS=(process.PeakWorkingSet64-WS)/1024/1024;
			log("took "+stopwatch.Elapsed.TotalMilliseconds.ToString("#,###")+" msec and required "+WS+" MB");
			stopwatch.Reset();
			stopwatch.Start();
			long matches=0;
			int listCount=list.Count;
			for (int i1=0; i1<listCount; i1+=100) {
				string s1=list[i1];
				for (int i2=0; i2<listCount; i2+=100) {
					string s2=list[i2];
					if (s1==s2) matches++;
				}
			}
			stopwatch.Stop();
			log("Comparing 1% of them took "+stopwatch.Elapsed.TotalMilliseconds.ToString("#,###")+
				" msec, and found "+matches.ToString("#,###")+" matches");
		}
	}
}

And the results were (always generating 1 million strings and comparing 1% of them):

action  differences repeats generationTime  storage comparisonTime    matches
  1          1,000    1000      1.3 sec      68 MB      12 sec          10,000
  2      1,000,000       1      1.3 sec      68 MB      12 sec      10,000,000
  3          1,000    1000      2.4 sec      96 MB      12 sec          10,000
  4      1,000,000       1      1.0 sec       5 MB       2 sec      10,000,000

Conclusion

String interning works: it makes sense to call string.Intern() explicitly provided there will be enough duplicates to amortize the effort; the storage can go down and so can the comparison time, provided enough matches are present in the set.



Perceler

Copyright © 2012, Luc Pattyn

Last Modified 21-May-2025