Virtual collections hold elements without storing them in memory; the yield retun keywords allow them to be produced one by one and just-in-time. category 'KB', language C#, created 12-Dec-2010, version V1.0, by Luc Pattyn |
License: The author hereby grants you a worldwide, non-exclusive license to use and redistribute the files and the source code in the article in any way you see fit, provided you keep the copyright notice in place; when code modifications are applied, the notice must reflect that. The author retains copyright to the article, you may not republish or otherwise make available the article, in whole or in part, without the prior written consent of the author. Disclaimer: This work is provided “as is”, without any express or implied warranties or conditions or guarantees. You, the user, assume all risk in its use. In no event will the author be liable to you on any legal theory for any special, incidental, consequential, punitive or exemplary damages arising out of this license or the use of the work or otherwise. |
Virtual collections can be enumerated using the yield return
keywords.
When a number of data items need to be produced and consumed, things may be simple, as in:
for(int i=0; i<10; i++) {
int square=i*i;
log("square="+square);
}
where log(string s)
is a simple logging method, it could be Console.WriteLine(s)
.
In this example, production (the loop and the multiplication) and consumption (the log statement) are intertwined,
and the data items aren't stored in memory at all, they are generated just-in-time.
One often wants to separate production and consumption of such data; the standard approach would be using
a buffer, a generic List
might come in handy, so it could look like this:
// producer
private static IEnumerable<int> GetSquareEnumerator0(int start, int length) {
List<int> list=new List<int>();
for (int i=start; i<start+length; i++) list.Add(i*i);
return list;
}
// consumer
foreach (int square0 in GetSquareEnumerator0(0, 10)) {
log("square0="+square0);
}
The separation of responsibilities is clear; the cost is the buffer that is holding the data. For large values of length, this may become prohibitive.
The combined yield return
keywords support the construction of virtual enumerators, i.e. enumerators
that enumerate the elements of a collection that does not exist as data in memory, but merely consists of the code
required to generate the data, one item at a time. The remainder of this article holds three examples.
The equivalent code of the above is this:
// producer
private static IEnumerable<int> GetSquareEnumerator1(int start, int length) {
for (int i=start; i<start+length; i++) yield return i*i;
}
// consumer
foreach (int square1 in GetSquareEnumerator1(0, 10)) {
log("square1="+square1);
}
We see the consumer hasn't really changed. The producer however is fundamentally different:
there is no list, instead the loop returns one value on each iteration, and it magically continues
where it has left of before. It behaves as if a separate processor (or another thread) is executing the
producer code and passing data items to the consumer one by one, as if yield return
would be
a Queue.Enqueue()
operation.
Note: when the producer is to produce elements of type T (int in the example), it's yield return
statements do return type T, however its declaration states a return type of IEnumerable<T>
.
Things get a bit tricky when the producer needs recursion and the data items are produced at different nesting levels in the recursion tree. The solution consists of passing the items explicitly from one nesting level to the next.
Example 2 searches the ways a number can be split into its divisors. For example, one of the ways to split 12
would be 3*2*2
. Here is the output we hope to get:
number=12
=2*2*3
=2*3*2
=2*6
=3*2*2
=3*4
=4*3
=6*2
=12
number=14
=2*7
=7*2
=14
number=16
=2*2*2*2
=2*2*4
=2*4*2
=2*8
=4*2*2
=4*4
=8*2
=16
The consumer is simple as always:
for (int number=12; number<=16; number+=2) {
log("number="+number);
foreach (string divisors in GetDivisors(number, "")) {
log(" ="+divisors.Substring(1));
}
}
The producer is recursive as the number of divisors is variable, it depends on earlier divisors. Here is the code:
public static IEnumerable<string> GetDivisors(int number, string more) {
if (number==1) {
yield return more;
} else {
for (int div=2; div<=number; div++) {
if (number%div==0) {
int quotient=number/div;
foreach (string div2 in GetDivisors(quotient, more+"*"+div)) {
yield return div2;
}
}
}
}
}
Note 1: Each nesting level of the recursive method passes its results to a foreach in the calling level, which "yield returns" the elements one by one. So there would be as many nested virtual enumerators as the recursion is deep.
Note 2: The algorithm is kept as simple as possible for illustrative purposes only, it isn't optimized at all.
Quite often one wants to operate on all TextBoxes, all Labels, or any other kind of Controls on a Form, no matter how exactly they are organized hierarchically in Containers such as TabPages, Panels, or GroupBoxes. Here is a simple way of enumerating all TextBoxes:
// producer
public static IEnumerable<T> GetNestedControls<T>(Control control) where T : Control {
foreach (Control c in control.Controls) {
T t=c as T;
if (t!=null) yield return c;
if (c.Controls.Count > 0) {
foreach (T t2 in GetNestedControls<T>(c)) {
yield return t2;
}
}
}
}
// consumer
Form myForm=new Form();
myForm.Controls.Add(...);
...
foreach (TextBox tb in GetNestedControls<TextBox>(myForm)) {
log(tb.Name);
}
It works fine; recursive methods create a virtual enumeration and need a foreach-ed yield return
at each nesting level.
Perceler |
Copyright © 2012, Luc Pattyn |
Last Modified 02-Sep-2013 |