Going Random on an Enumerable

   I have not published any programming articles recently and I know that a throng of readers eagerly await them so I decided to share an interesting task that was brought up by a guy on IRC. You have an enumerable sequence and you are required to write a method that selects a random element from the sequence with just one iteration over the sequence. This means that you are not allowed to call methods like Count because it will iterate through the sequence. You are not allowed to save the elements in another collection. Of course an even distribution of the results is required (i.e. every element should have equal chance to be selected).

   Here is my solution. It is in C# but it is essentially the same in every language.

       public static T SelectRandom<T>(IEnumerable<T> sequence)
       {
           //NOTE! If you are using this code in a real world application
           //you need to pull the Random initialization outside this method
           //and use the same Random object in every method call
           //otherwise you risk to get the same seed
           //if the method is called several times in a short time
           Random r = new Random();
           int count = 1;
           T selected = default(T);

           foreach (T element in sequence)
           {
               if (r.Next(count++) == 0)
               {
                   //Select the current element with 1/count probability
                   selected = element;
               }
           }

           return selected;
       }

   Now lets try to prove it. Essentially we are looking for a way to select each element with probability 1/M where M is the total number of elements in the sequence. Using my method the probability for the element N to be selected is 1/N. However this probability is reduced by the probability of selecting the subsequent elements. For element 1 the probability is 1/1 if the sequence has one element but if it has two elements the probability is 1/1*1/2 = 1/2. For a sequence of three elements the probability is (1*1/2)*2/3 = 1/3... you get the pattern. Every element reduces the chance for the previous elements to be selected by introducing 1/N chance to replace them. The Nth element has probability of being selected equal to

   (1/N) * (N/(N+1)) * ((N+1)/(N+2)) * ... * ((M-1)/M)

The initial 1/N in this formula is the probability that the element is selected at all represented by the code if (r.Next(count++) == 0) then there is the probability that the next element is selected which is 1/(N+1) so we are left N/(N+1) probability that our element stays and so on until we reach the end of the sequence. It turns out that this probability is 1/M for every N and that is exactly what we're looking for.
Tags:   english programming 
Posted by:   Stilgar
00:37 26.01.2009

Comments:

First Previous 1 Next Last 

Posted by   ___XXX_X_XXX___   on   20:25 27.01.2009

Ех значи бе, все си ги измисляш едни...

На ти и едно от мен:

public static T SelectRandom<T>(this IEnumerable<T> instance, Random rnd)
{
   // Look mom, there is a magic trick...
   Array whatTheFuck = (Array)instance;
   return (T)whatTheFuck.GetValue(rnd.Next(whatTheFuck.Length));
}


Тук даже няма да има нужда да се обхождат всички елементи (ако изобщо GetValue обхожда елементите - мързи ме да гледам в Reflector-а). Лошото е, че не знам дали идеята ще може да се приложи в ситуация с deferred loading при LINQ to SQL или подобно... :)

Posted by   Stilgar   on   20:33 27.01.2009

E tva protivorechi na uslovieto za tva che ne moje da go puhash v druga kolekcia

Predstavi si che toia enumerable e 50tina MB source code ili niakakuv potencialno bezkraen i baven network stream koito kato svurshi triabva da se kaje chak kuv e rezultata.

Obache tva che ima definiran explicit cast ot IEnumerable kum Array ne go znaeh. Iako!

Posted by   ___XXX_X_XXX___   on   20:42 27.01.2009

Array whatTheFuck = (Array)instance;

-НЕ- създава нова колекция. :P

Просто whatTheFuck сочи към instance обаче имаш методите, които взимат дължината на array-а от паметта, защото е Array:

IEnumerable<Class1> c1 = new[] { new Class1 { ID = 1 }, new Class1 { ID = 2 }, new Class1 { ID = 3 }, new Class1 { ID = 4 }, new Class1 { ID = 555 } };
Array c2 = (Array)c1;

И на c1 и на c2 пойнтерите в момента сочат към 0x0135fad8, а ето и съдържанието там:
0x0135FAD8  bc 40 30 79 05 00 00 00 f0 9d 94 00 fc fa 35 01 08 fb 35 01 14 fb 35 01 20 fb 35 01 2c fb 35

Дължината на sequence е във втория int = "05 00 00 00". Array.Length само чете от паметта (не минава през items както Count()).

Posted by   Stilgar   on   20:50 27.01.2009

e da ama ako imash DataReader ili FileStream naish kva kolekcia shte vidish...

Posted by   Stilgar   on   00:58 29.01.2009

Abe ti toia tvoia kod probval li si go izobshto?
takuv explicit cast ne e definiran dori ot List<int>

           List<int> list = new List<int>();
           IEnumerable<int> enumerable = (IEnumerable<int>)list;
           Array arr = (Array)enumerable;

si pliushti InvalidCastException sus zverska sila

Posted by   ___XXX_X_XXX___   on   01:43 29.01.2009

Хахахах :)
Ми пробвал съм го с:

IEnumerable<int> aaa = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
Random rnd = new Random();
int suckMyDick = aaa.SelectRandom(rnd);


Пфффф :)

Posted by   Stilgar   on   01:48 29.01.2009

mnogo smisul ima da go probvash s masiv mdam:)

Posted by   Aleks (Unregistered)   on   23:43 03.08.2017

Това е частен случей на един по стандартен въпрос който е да намериш random subset в IEnumerable/Stream/LinkedList

First Previous 1 Next Last 


Post as:



Post a comment: