Going Random on an Enumerable
I have not published any programming articles recently and I know that a throng of readers eagerly await them so I decided to share an interesting task that was brought up by a guy on IRC. You have an enumerable sequence and you are required to write a method that selects a random element from the sequence with just one iteration over the sequence. This means that you are not allowed to call methods like Count because it will iterate through the sequence. You are not allowed to save the elements in another collection. Of course an even distribution of the results is required (i.e. every element should have equal chance to be selected).
Here is my solution. It is in C# but it is essentially the same in every language.
public static T SelectRandom<T>(IEnumerable<T> sequence)
{
//NOTE! If you are using this code in a real world application
//you need to pull the Random initialization outside this method
//and use the same Random object in every method call
//otherwise you risk to get the same seed
//if the method is called several times in a short time
Random r = new Random();
int count = 1;
T selected = default(T);
foreach (T element in sequence)
{
if (r.Next(count++) == 0)
{
//Select the current element with 1/count probability
selected = element;
}
}
return selected;
}
Now lets try to prove it. Essentially we are looking for a way to select each element with probability 1/M where M is the total number of elements in the sequence. Using my method the probability for the element N to be selected is 1/N. However this probability is reduced by the probability of selecting the subsequent elements. For element 1 the probability is 1/1 if the sequence has one element but if it has two elements the probability is 1/1*1/2 = 1/2. For a sequence of three elements the probability is (1*1/2)*2/3 = 1/3... you get the pattern. Every element reduces the chance for the previous elements to be selected by introducing 1/N chance to replace them. The Nth element has probability of being selected equal to
(1/N) * (N/(N+1)) * ((N+1)/(N+2)) * ... * ((M-1)/M)
The initial 1/N in this formula is the probability that the element is selected at all represented by the code if (r.Next(count++) == 0) then there is the probability that the next element is selected which is 1/(N+1) so we are left N/(N+1) probability that our element stays and so on until we reach the end of the sequence. It turns out that this probability is 1/M for every N and that is exactly what we're looking for.