.NET collection classes (including StringBuilder) grow by doubling the size of their internal array. category 'experiment', language C#, created 15-Jul-2009, version V1.1 (20-Jul-2009), by Luc Pattyn |
License: The author hereby grants you a worldwide, non-exclusive license to use and redistribute the files and the source code in the article in any way you see fit, provided you keep the copyright notice in place; when code modifications are applied, the notice must reflect that. The author retains copyright to the article, you may not republish or otherwise make available the article, in whole or in part, without the prior written consent of the author. Disclaimer: This work is provided |
The .NET Framework offers lots of classes that collect something, offering the comfort that no initial or maximal
size must be estimated beforehand.
Examples of such classes are StringBuilder
and List<Type>
.
The implementation of these classes relies on an internal array, which results in good performance for
a lot of operations, such as indexing, i.e. reading or writing an element at a specified position. Being based on
arrays creates a capacity problem: what should be the size of such array?
The .NET Framework has chosen a scheme where the initial size of the array can be selected by the user,
and gets a small default value when unspecified (16 chars for a StringBuilder, 4 elements for a List).
The capacity is available through the Capacity
property, so it can be read and set at all times.
What interested me most is how those arrays grow when elements get added if and when doing so exceeds the current capacity. The experiment proves the array grows exponentially by doubling its size each time insufficient space is available.
The experiment consists of a simple Console application, and does not need any clarification.
using System;
using System.Collections.Generic;
using System.Text;
namespace CapacityDoubles {
class Program {
static int MAXCAP=100*1024*1024;
static void Main(string[] args) {
testStringBuilder(0);
testStringBuilder(7);
testList(0);
testList(7);
log("Hit any key to terminate.");
Console.ReadKey();
}
static void log(string s) {
Console.WriteLine(s);
File.AppendAllText("CapacityDoubles.log", s+Environment.NewLine);
}
static void testStringBuilder(int initialCapacity) {
try {
log("StringBuilder("+initialCapacity+")");
string s64="0123456789012345678901234567890123456789012345678901234567890123";
StringBuilder sb;
if (initialCapacity==0) sb=new StringBuilder();
else sb=new StringBuilder(initialCapacity);
sb.Append("a");
int oldCapacity=1;
for (; ; ) {
int capacity=sb.Capacity;
if (capacity!=oldCapacity) {
double gf=capacity/oldCapacity;
log("capacity="+capacity+" growthFactor="+gf);
oldCapacity=capacity;
if (capacity>MAXCAP) break;
}
int appendCount=capacity/16;
if (appendCount<1) appendCount=1;
string appendString="a";
if (appendCount>=64) {
appendString=s64;
appendCount/=64;
}
for (int i=0; i<appendCount; i++) sb.Append(appendString);
}
} catch (Exception exc) {
log(exc.ToString());
}
log("===========================================");
}
static void testList(int initialCapacity) {
try {
log("List<string>("+initialCapacity+")");
string s1="0123456789";
List<string> list;
if (initialCapacity==0) list=new List<string>();
else list=new List<string>(initialCapacity);
list.Add(s1);
int oldCapacity=1;
for (; ; ) {
int capacity=list.Capacity;
if (capacity!=oldCapacity) {
double gf=capacity/oldCapacity;
log("capacity="+capacity+" growthFactor="+gf);
oldCapacity=capacity;
if (capacity>MAXCAP) break;
}
int addCount=capacity/16;
if (addCount<1) addCount=1;
for (int i=0; i<addCount; i++) list.Add(s1);
}
} catch (Exception exc) {
log(exc.ToString());
}
log("===========================================");
}
}
}
This is part of the output it generates:
...
StringBuilder(7)
capacity=7 growthFactor=7
capacity=14 growthFactor=2
capacity=28 growthFactor=2
capacity=56 growthFactor=2
capacity=112 growthFactor=2
capacity=224 growthFactor=2
capacity=448 growthFactor=2
capacity=896 growthFactor=2
capacity=1792 growthFactor=2
capacity=3584 growthFactor=2
capacity=7168 growthFactor=2
capacity=14336 growthFactor=2
...
List<string>(7)
capacity=7 growthFactor=7
capacity=14 growthFactor=2
capacity=28 growthFactor=2
capacity=56 growthFactor=2
capacity=112 growthFactor=2
capacity=224 growthFactor=2
capacity=448 growthFactor=2
capacity=896 growthFactor=2
capacity=1792 growthFactor=2
capacity=3584 growthFactor=2
capacity=7168 growthFactor=2
capacity=14336 growthFactor=2
...
You can reduce the capacity of a collection by assigning a new and smaller value
to its Capacity
property; if the new value is less than the current Count
the data would not fit and an ArgumentOutOfRangeException
would be thrown.
For any collection the MSDN documentation on the Clear()
method states "Removes all elements"
and you can take that literally. Code inspection with Reflector shows the internal data structures are emptied
but not discarded nor resized, so the arrays continue to occupy the memory they had before; on the positive side
that means the collections can grow back to their earlier Capacity
without reallocating and copying arrays.
The following method tests the above statements:
static void testReduceClear() {
List<string> list=new List<string>();
for (int i=0; i<200; i++) list.Add("aha");
log("before trimming: count="+list.Count+" capacity="+list.Capacity);
list.Capacity=list.Count;
log("after trimming: count="+list.Count+" capacity="+list.Capacity);
list.Clear();
log("after clearing: count= "+list.Count+" capacity="+list.Capacity);
}
It generates the following output:
before trimming: count=200 capacity=256
after trimming: count=200 capacity=200
after clearing: count= 0 capacity=200
When there is a need to grow, the capacity gets doubled each time.
As a result, a lot of memory could be wasted: when a 1.1 million character StringBuilder is required, its actual footprint would suffice to hold 2 million characters. Furthermore, each growth causes the allocation of a new array (as .NET can't really grow an object) and a full copy of the existing data.
As a result there is a double recommendation when big collections are involved:
Perceler |
Copyright © 2012, Luc Pattyn |
Last Modified 20-Dec-2024 |