Article: Growing collections

Home Page


Consultancy

  • Service Vouchers
  • Escrow Service

Shop



Programming
  • Articles
  • Tools
  • Links

Search

 

Contact

 

PHPinfo


$_SERVER







.NET collection classes (including StringBuilder) grow by doubling the size of their internal array.

category 'experiment', language C#, created 15-Jul-2009, version V1.1 (20-Jul-2009), by Luc Pattyn


License: The author hereby grants you a worldwide, non-exclusive license to use and redistribute the files and the source code in the article in any way you see fit, provided you keep the copyright notice in place; when code modifications are applied, the notice must reflect that. The author retains copyright to the article, you may not republish or otherwise make available the article, in whole or in part, without the prior written consent of the author.

Disclaimer: This work is provided as is, without any express or implied warranties or conditions or guarantees. You, the user, assume all risk in its use. In no event will the author be liable to you on any legal theory for any special, incidental, consequential, punitive or exemplary damages arising out of this license or the use of the work or otherwise.


The .NET Framework offers lots of classes that collect something, offering the comfort that no initial or maximal size must be estimated beforehand. Examples of such classes are StringBuilder and List<Type>.

The implementation of these classes relies on an internal array, which results in good performance for a lot of operations, such as indexing, i.e. reading or writing an element at a specified position. Being based on arrays creates a capacity problem: what should be the size of such array? The .NET Framework has chosen a scheme where the initial size of the array can be selected by the user, and gets a small default value when unspecified (16 chars for a StringBuilder, 4 elements for a List). The capacity is available through the Capacity property, so it can be read and set at all times.

What interested me most is how those arrays grow when elements get added if and when doing so exceeds the current capacity. The experiment proves the array grows exponentially by doubling its size each time insufficient space is available.

Growing a collection

The experiment consists of a simple Console application, and does not need any clarification.

using System;
using System.Collections.Generic;
using System.Text;

namespace CapacityDoubles {
	class Program {
		static int MAXCAP=100*1024*1024;

		static void Main(string[] args) {
			testStringBuilder(0);
			testStringBuilder(7);
			testList(0);
			testList(7);
			log("Hit any key to terminate.");
			Console.ReadKey();
		}

		static void log(string s) {
			Console.WriteLine(s);
			File.AppendAllText("CapacityDoubles.log", s+Environment.NewLine);
		}

		static void testStringBuilder(int initialCapacity) {
			try {
				log("StringBuilder("+initialCapacity+")");
				string s64="0123456789012345678901234567890123456789012345678901234567890123";
				StringBuilder sb;
				if (initialCapacity==0) sb=new StringBuilder();
				else sb=new StringBuilder(initialCapacity);
				sb.Append("a");
				int oldCapacity=1;
				for (; ; ) {
					int capacity=sb.Capacity;
					if (capacity!=oldCapacity) {
						double gf=capacity/oldCapacity;
						log("capacity="+capacity+"  growthFactor="+gf);
						oldCapacity=capacity;
						if (capacity>MAXCAP) break;
					}
					int appendCount=capacity/16;
					if (appendCount<1) appendCount=1;
					string appendString="a";
					if (appendCount>=64) {
						appendString=s64;
						appendCount/=64;
					}
					for (int i=0; i<appendCount; i++) sb.Append(appendString);
				}
			} catch (Exception exc) {
				log(exc.ToString());
			}
			log("===========================================");
		}

		static void testList(int initialCapacity) {
			try {
				log("List<string>("+initialCapacity+")");
				string s1="0123456789";
				List<string> list;
				if (initialCapacity==0) list=new List<string>();
				else list=new List<string>(initialCapacity);
				list.Add(s1);
				int oldCapacity=1;
				for (; ; ) {
					int capacity=list.Capacity;
					if (capacity!=oldCapacity) {
						double gf=capacity/oldCapacity;
						log("capacity="+capacity+"  growthFactor="+gf);
						oldCapacity=capacity;
						if (capacity>MAXCAP) break;
					}
					int addCount=capacity/16;
					if (addCount<1) addCount=1;
					for (int i=0; i<addCount; i++) list.Add(s1);
				}
			} catch (Exception exc) {
				log(exc.ToString());
			}
			log("===========================================");
		}
	}
}

This is part of the output it generates:

...
StringBuilder(7)
capacity=7  growthFactor=7
capacity=14  growthFactor=2
capacity=28  growthFactor=2
capacity=56  growthFactor=2
capacity=112  growthFactor=2
capacity=224  growthFactor=2
capacity=448  growthFactor=2
capacity=896  growthFactor=2
capacity=1792  growthFactor=2
capacity=3584  growthFactor=2
capacity=7168  growthFactor=2
capacity=14336  growthFactor=2
...
List<string>(7)
capacity=7  growthFactor=7
capacity=14  growthFactor=2
capacity=28  growthFactor=2
capacity=56  growthFactor=2
capacity=112  growthFactor=2
capacity=224  growthFactor=2
capacity=448  growthFactor=2
capacity=896  growthFactor=2
capacity=1792  growthFactor=2
capacity=3584  growthFactor=2
capacity=7168  growthFactor=2
capacity=14336  growthFactor=2
...

Shrinking a collection

You can reduce the capacity of a collection by assigning a new and smaller value to its Capacity property; if the new value is less than the current Count the data would not fit and an ArgumentOutOfRangeException would be thrown.

For any collection the MSDN documentation on the Clear() method states "Removes all elements" and you can take that literally. Code inspection with Reflector shows the internal data structures are emptied but not discarded nor resized, so the arrays continue to occupy the memory they had before; on the positive side that means the collections can grow back to their earlier Capacity without reallocating and copying arrays.

The following method tests the above statements:

		static void testReduceClear() {
			List<string> list=new List<string>();
			for (int i=0; i<200; i++) list.Add("aha");
			log("before trimming: count="+list.Count+"  capacity="+list.Capacity);
			list.Capacity=list.Count;
			log("after trimming:  count="+list.Count+"  capacity="+list.Capacity);
			list.Clear();
			log("after clearing:  count=  "+list.Count+"  capacity="+list.Capacity);
		}

It generates the following output:

before trimming: count=200  capacity=256
after trimming:  count=200  capacity=200
after clearing:  count=  0  capacity=200

Conclusions

When there is a need to grow, the capacity gets doubled each time.

As a result, a lot of memory could be wasted: when a 1.1 million character StringBuilder is required, its actual footprint would suffice to hold 2 million characters. Furthermore, each growth causes the allocation of a new array (as .NET can't really grow an object) and a full copy of the existing data.

As a result there is a double recommendation when big collections are involved:

  1. Try to set a reasonable initial capacity; this will prevent a lot of doubling and copying;
  2. Consider trimming a collection to its actual size if the collection is large, stops being added to, and is intended to stay around for a long time.


Perceler

Copyright © 2012, Luc Pattyn

Last Modified 21-May-2025