Introduction
This article is about coding collection and list classes. It will walk you through the particulars of implementing IEnumerable<T>
, IEnumerator<T>
, ICollection<T>
, and IList<T>
over custom data structures.
The reason for this article is the topic isn't well covered and there are several pitfalls when working with these interfaces that aren't clearly documented by Microsoft. For that reason, we will be breaking down the development of such a class into manageable steps.
Background
List classes in .NET provide foreach
enumeration, indexed addressing, and general storage for an arbitrary, variable count of items. They work much like an array
in .NET except they are intrinsically resizable. In .NET, arrays
themselves implement IList<T>
.
The primary .NET class for using lists is List<T>
. It provides a full featured list over an internal array, which is resized as needed.
Often, this is efficient and appropriate, but in some cases, it may not be. Microsoft provides an alternative in its LinkedList<T>
class which stores values as a linked list instead of an array.
We'll be implementing our own linked list. While using Microsoft's highly optimized class is preferable, this class will serve us well as an illustration. In doing so, we'll cover all of the major components and pitfalls of implementing a list so that your real world lists can work well and efficiently in practice.
Coding this Mess
I've broken up the major interfaces into several partial classes in order to keep things clear.
Let's start with our data structure itself in LinkedList.cs.
public partial class LinkedList<T>
{
private class _Node
{
public T Value = default(T);
public _Node Next = null;
}
_Node _root=null;
}
By itself, this isn't a list at all! It implements none of the relevant interfaces, but it does contain a nested class which holds our node structure. This node is a single node in the linked list. It holds a value of type T
and the next node as fields. The outer class also contains a field that holds the first node in our list, or null
if the list is empty.
I like to start coding the list interfaces by implementing IEnumerable<T>
/IEnumerator<T>
which enables foreach
because it's such a fundamental operation, and it doesn't have any code dependencies (usually) except for the data structure itself. Let's visit LinkedList.Enumerable.cs:
partial class LinkedList<T> : IEnumerable<T>
{
int _version = 0;
public IEnumerator<T> GetEnumerator()
{
return new Enumerator(this);
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
=> GetEnumerator();
struct Enumerator : IEnumerator<T>
{
LinkedList<T> _outer;
int _version;
int _state;
_Node _current;
public Enumerator(LinkedList<T> outer)
{
_outer = outer;
_version = outer._version;
_current = null;
_state = -1;
}
public void Reset()
{
_CheckDisposed();
_current = null;
_state = -1;
}
void IDisposable.Dispose()
{
_state = -3;
}
public bool MoveNext()
{
_CheckDisposed();
switch (_state)
{
case -1:
_CheckVersion();
_current = _outer._root;
if (null == _current)
{
_state = -2;
return false;
}
_state = 0;
break;
case -2:
return false;
case 0:
_CheckVersion();
_current = _current.Next;
if (null == _current)
{
_state = -2;
return false;
}
break;
}
return true;
}
public T Current {
get {
switch (_state)
{
case -1:
throw new InvalidOperationException
("The enumeration is before the start position.");
case -2:
throw new InvalidOperationException
("The enumeration is past the end position.");
case -3:
_CheckDisposed();
break;
}
_CheckVersion();
return _current.Value;
}
}
object System.Collections.IEnumerator.Current => Current;
void _CheckDisposed()
{
if (-3 == _state)
throw new ObjectDisposedException(GetType().FullName);
}
void _CheckVersion()
{
if (_version != _outer._version)
throw new InvalidOperationException("The enumeration has changed.");
}
}
}
This is a lot to chew on, and you might be wondering why we don't simply use C# iterators via the yield
syntax, but there's a very good reason, and that reason is versioning and resetting.
To provide a proper implementation of IEnumerator<T>
over a collection, one must keep track of the collection to make sure it hasn't changed while we're enumerating items. This is done using the _version
fields. Each time the collection is changed, the version is incremented. This field is compared with the enumerator's copy to see if they are the same. If they are not, the enumerator will throw. C# iterators do not provide this functionality, but it is necessary for a complete, proper, and robust implementation of a list.
In addition, iterators do not support Reset()
. While this is not used by foreach
, it can be used by other consumers, and generally, it is expected to work.
If you really don't want to implement all of this, you can use C# iterator support but keep in mind, that's not a canonical nor complete implementation of a list's enumerator.
Note the use of the _state
field. Its primary purpose is for error handling, so we can distinguish between being at the beginning, the end, or disposed. Note also that we're always checking the version and checking for disposal on virtually every operation. This is important to make the enumerator robust.
As with all enumerator implementations, MoveNext()
is the heart of the enumerator and its job is to advance the cursor by one. It returns true
if there are more items to read, or false
to indicate that there are no more items. In this routine, we're simply iterating through the linked list, and updating the state as appropriate. Meanwhile, Current
, Reset()
, and Dispose()
are all straightforward.
Next, we have the ICollection<T>
which essentially provides a basic interface for storing and retrieving items in our class.
partial class LinkedList<T> : ICollection<T>
{
int _count=0;
public int Count {
get {
return _count;
}
}
public bool Contains(T value)
{
var current = _root;
while(null!=current)
{
if (Equals(value, current.Value))
return true;
current = current.Next;
}
return false;
}
public void CopyTo(T[] array,int index)
{
var i = _count;
if (null == array)
throw new ArgumentNullException(nameof(array));
if (1 != array.Rank || 0 != array.GetLowerBound(0))
throw new ArgumentException("The array is not an SZArray", nameof(array));
if (0 > index)
throw new ArgumentOutOfRangeException(nameof(index),
"The index cannot be less than zero.");
if (array.Length<=index)
throw new ArgumentOutOfRangeException(nameof(index),
"The index cannot be greater than the length of the array.");
if (i > array.Length + index)
throw new ArgumentException
("The array is not big enough to hold the collection entries.", nameof(array));
i = 0;
var current = _root;
while (null != current)
{
array[i + index] = current.Value;
++i;
current = current.Next;
}
}
bool ICollection<T>.IsReadOnly => false;
public void Add(T value)
{
_Node prev = null;
var current = _root;
while (null != current)
{
prev = current;
current = current.Next;
}
if (null == prev)
{
_root = new _Node();
_root.Value = value;
}
else
{
var n = new _Node();
n.Value = value;
prev.Next = n;
}
++_count;
++_version;
}
public void Clear()
{
_root = null;
_count = 0;
++_version;
}
public bool Remove(T value)
{
_Node prev = null;
var current = _root;
while (null != current)
{
if(Equals(value,current.Value))
{
if(null!=prev)
prev.Next = current.Next;
else
_root = current.Next;
--_count;
++_version;
return true;
}
prev = current;
current = current.Next;
}
return false;
}
}
Note the addition of the _count
field. Collection consumers will expect the Count
property to be very fast. However, in order to retrieve the count of items in a linked list, you'd normally have to traverse each item in the list in order to find it. That's not desirable for this scenario, so each time we add and remove items from the list, we update the _count
field accordingly. That way, we avoid unnecessary performance degradation. This is a very common pattern when implementing custom collections.
The rest of the members are self explanatory, either at face value, or through the comments above. Note the judicious use of error handling, and the careful bookkeeping of the _count
and _version
fields whenever the collection is updated. This is critical. The one thing to be aware of here is in the CopyTo()
method, the index refers to the index
in array
to start copying - not the index of the collection. That is, the index
is the destination index.
Now on to the implementation of IList<T>
in LinkedList.List.cs:
public T this[int index] {
get {
if (0 > index || index >= _count)
throw new IndexOutOfRangeException();
var current = _root;
for (var i = 0;i<index;++i)
current = current.Next;
return current.Value;
}
set {
if (0 > index || index >= _count)
throw new IndexOutOfRangeException();
var current = _root;
for (var i = 0; i < index; ++i)
current = current.Next;
current.Value=value;
++_version;
}
}
public int IndexOf(T value)
{
var i = 0;
var current = _root;
while (null!=current)
{
if (Equals(current.Value, value))
return i;
++i;
current = current.Next;
}
return -1;
}
public void Insert(int index,T value)
{
if (0 > index || index > _count)
throw new ArgumentOutOfRangeException(nameof(index));
if (0 == index)
{
_Node n = new _Node();
n.Next = _root;
n.Value = value;
_root = n;
}
else if (_count == index)
{
Add(value);
return;
}
else
{
var current = _root;
_Node prev = null;
for (var i = 0; i < index; ++i)
{
prev = current;
current = current.Next;
}
var n = new _Node();
n.Value = value;
n.Next = current;
prev.Next = n;
}
++_version;
++_count;
}
public void RemoveAt(int index)
{
if (0 > index || index >= _count)
throw new ArgumentOutOfRangeException(nameof(index));
var current = _root;
_Node prev = null;
if (1 == _count)
_root = null;
else
{
for (var i = 0; i < index; ++i)
{
prev = current;
current = current.Next;
}
if (null == prev)
_root = _root.Next;
else
prev.Next = current.Next;
}
--_count;
++_version;
}
}
Much of the code here is self explanatory. Just be careful to update _version
on the this[int index].set
property! Insert()
is only as complicated as it is because of how linked lists work. The only caveats with Insert()
are that index
refers to the index
after the new item and can go as high as _count
rather than _count-1
. So to insert should be conceptually thought of as "insert before index."
One thing to note about IList<T>
: It isn't always appropriate to implement. In fact, because of the way a linked list works, it doesn't really support direct indexing, unlike an array. Sure we've emulated it, but it's not really efficient because we have to iterate. Because of this, it's not necessarily good practice to implement this interface on top of a linked list. We've done so here simply for demonstration purposes. In the real world, you will choose the collection interfaces that best suit your data structure. Here, ICollection<T>
might have been appropriate without implementing IList<T>
.
Finally, onto our constructors which can be found in LinkedList.Constructors.cs:
partial class LinkedList<T>
{
public LinkedList(IEnumerable<T> collection)
{
if (null == collection)
throw new ArgumentNullException(nameof(collection));
int c = 0;
_Node current = null;
foreach(var item in collection)
{
if(null==current)
{
_root = new _Node();
_root.Value = item;
current = _root;
} else
{
var n = new _Node();
n.Value = item;
current.Next = n;
current = n;
}
++c;
}
_count = c;
}
public LinkedList() {
}
}
Here, we've provided a default constructor and a constructor that copies from an existing collection
or array
. These are really the minimum constructors you want to provide. Others might be appropriate depending on your internal data structure. For example, if it was array based, you might have a capacity
, while a B-tree might have an order
. In this case, the constructor that takes a collection is optimized, but you'll generally want to provide it even if all it does is call Add()
on itself in a loop.
And that's it! Now you have a robust list implementation that handles errors properly, is efficient, and handles the various pitfalls of implementing lists properly. That being said, do not use this linked list in production. Microsoft's implementation is probably more optimized, and certainly more tested. This however, should provide you a path forward for implementing your own data structures.
History
- 17th November, 2019 - Initial submission
Just a shiny lil monster. Casts spells in C++. Mostly harmless.