It is always good to be careful when using random number generators. Things might not be as random as they seem. I came across on inconsistency when using parfor loops in Matlab. I wanted to generate random permutations of indices so that I could pick out a random training and test set from the data. I also wanted to use Matlab’s parallelization tools, that is, a parfor loop. Here is what the code looked like:
% Loads variable X
[N_samples,N_dims] = size(X);
N_trials = 1000;
N_training = 100;
N_test = 50;
parfor i_trial = 1:N_trials
% Seed the random number generator based on the current time so that it outputs different sequences each time.
% Returns a row vector containing a random permutation of the integers from 1 to N.
randIdx = randperm(N);
% Assign training and test indices
trIdx = randIdx(1:N_training);
teIdx = randIdx(N_training+1:NSamples);
% Do caclulations
With this code, I was getting curious results. Somehow, the output seemed to be same even though it was given random parts of the data for training and testing. When removing the parfor loop and just running the usual for-loop the results were more reasonable.
I have a 4-core laptop and it turns out that, when looking at the randomly generated indices they where same in groups of four. I am not sure if my assessment is right, but somehow the rng works globally meaning that if you run a parfor loop it will only reshuffle once all the pools have done the iteration they were assigned to. If you have, 4 pools there will be four runs then a reshuffle and so on. My solution was to generate the permutations beforehand in a for loop, storing them in a matrix, and then picking them out when running the for loop.
Lesson learned: when working with random number generators, make sure they are actually random, and when combining it with parallel computing make sure that the parallelism does not affect the results of any computations.