Permutations quickly

Written by ettolrach, 2025-08-17.

Recently, I was writing a Haskell program which used the permutations function. I wanted to convert my code to Rust for fun, but I realised that Rust's standard library didn't come with such a funciton. That makes sense, since it's not commonly used for Rust programs, especially since a non-lazy version only works for small lists.

Still, I wanted to write it myself. So I decided to implement it using the classic algorithm which you may be familiar with from interview prep. We loop through each element in the list and permute the rest. Then, for each permutation, we prepend the element we kept a hold of. This is what you'll probably do if you're asked to write out the permutations of a list on paper.

fn permutations_recurse<X: Clone>(xs: impl IntoIterator<Item = X>) -> Option<Vec<Vec<X>>> {
    fn go<X: Clone>(v: Vec<X>) -> Vec<Vec<X>> {
        if v.len() == 0 {
            return Vec::new();
        }
        if v.len() == 1 {
            return vec![v];
        }
        let mut to_return = Vec::new();
        for i in 0..v.len() {
            let mut remaining = v.clone();
            _ = remaining.remove(i);
            let perms = go(remaining);
            // Append the element we held on to to the end.
            // It doesn't matter whether we pre- or append the element, the resulting
            // vector will just have a different order.
            to_return.extend(perms.into_iter().map(|mut perm| {
                perm.push(v[i].clone());
                perm
            }));
        }
        to_return
    }

    let v: Vec<X> = xs.into_iter().collect();
    if v.len() > 12 {
        None
    } else {
        Some(go(v))
    }
}

But this isn't the only method. We can use the factorial number system. It's a number base (or radix) where each place value n represents n!, similar to how in binary, each place value n represents 2ⁿ. We call this representation the factoradic of a number, and we can calculate it using the algorithm below. The resulting vector is in reverse order when read left-to-right, the 0th element is the 0th place value.

fn factoradic(n: u64) -> Vec<u64> {
    let mut to_return = Vec::new();
    let mut place_value = 1;
    let mut dividend = n;
    while dividend != 0 {
        to_return.push(dividend % place_value);
        dividend /= place_value;
        place_value += 1;
    }
    to_return
}

// For efficiency, have a version where we pre-allocate the vector.
fn padded_factoradic(n: u64, v: &mut [u64]) {
    let mut place_value: usize = 1;
    let mut dividend = n;
    while dividend != 0 {
        // Assume we aren't running on a 128-bit machine.
        v[place_value - 1] = dividend % (place_value as u64);
        dividend /= place_value as u64;
        place_value += 1;
    }
}

We can get the n^th permutation by calculating the factoradic of n, and for each place value k of n (starting from the smallest), we remove the k^th element from the original list and append it to the output list. This method has the benefit of being able to be written as an iterator.

fn factorial(n: usize) -> Option<usize> {
    // Hardcode elements up to and including 12, return `None` if n >= 13.
}

struct Permutations<X: Clone> {
    v: Vec<X>,
    current: usize,
    largest_fac: usize,
}

impl<X: Clone> Permutations<X> {
    fn new(v: Vec<X>) -> Option<Self> {
        let largest_fac = factorial(v.len())?;
        Some(Self {
            v,
            current: 0,
            largest_fac,
        })
    }
}

impl<X: Clone> Iterator for Permutations<X> {
    type Item = Vec<X>;

    fn next(&mut self) -> Option<Self::Item> {
        if self.current >= self.largest_fac {
            return None;
        }
        let mut n_factoriadic = vec![0; self.v.len()];
        let mut elems = self.v.clone();
        let mut to_return = Vec::with_capacity(self.v.len());
        padded_factoradic(self.current as u64, &mut n_factoriadic);
        for index in n_factoriadic.into_iter().rev() {
            to_return.push(elems.remove(index as usize));
        }
        self.current += 1;
        Some(to_return)
    }
}

fn permutations<X: Clone>(
    xs: impl IntoIterator<Item = X>,
) -> Option<impl Iterator<Item = Vec<X>>> {
    Permutations::new(xs.into_iter().collect())
}

At first glance, I would think that the recursive solution is faster, since we don't have to make any divisions or modulo operations. But whenever you talk about performance, you should really test it. After all, naively, one would say that they both are in the set O(l!) where l is the length of the list. So, using Criterion.rs, I measured a small list of length 8, and another of length 10 (remember, 8! = 40320, but 10! = 3628800). Each time, the list was the same, a Vec<char> of the letters ['A', 'B', 'C' ...].

Performance of aforementioned algorithms after 100 iterations
Algorithm	Mean (ms)	Median (ms)	σ (ms)
permute_recurse 8	5.6038	5.6039	0.0092405
permute 8	3.1287	3.1276	0.011452
permute_recurse 10	678.73	678.33	3.0358
permute 10	016.992	016.988	0.000000035559

I was wrong! The recursive solution isn't faster after all.

Another thought would be to try multithreading. Although starting a new thread for every factoradic would probably not help due to the overhead in creating threads, what if we make two threads and assign one thread half of the factoradics and the other the remaining half?

use std::{sync::mpsc, thread};

fn permutations_multithreaded<X: Clone + Send + 'static>(xs: impl IntoIterator<Item = X>) -> Option<Vec<Vec<X>>> {
    let v: Vec<X> = xs.into_iter().collect();
    let len = v.len();
    let fac_len = factorial(len)?;
    let (tx1, rx) = mpsc::channel();
    let tx2 = tx1.clone();

    {
        let elems = v.clone();
        thread::spawn(move || {
            let mut to_return: Vec<Vec<X>> = Vec::with_capacity(fac_len / 2);
            for n in 0..(fac_len / 2) {
                let mut this_elems = elems.clone();
                let mut n_factoradic = vec![0; len];
                padded_factoradic(n as u64, &mut n_factoradic);
                to_return.push(n_factoradic.into_iter().rev().map(|index| this_elems.remove(index as usize)).collect());
            }
            tx1.send(to_return).unwrap();
        });

    }
    
    {
        let elems = v.clone();
        thread::spawn(move || {
            let mut to_return: Vec<Vec<X>> = Vec::with_capacity(fac_len / 2);
            for n in (fac_len / 2)..fac_len {
                let mut this_elems = elems.clone();
                let mut n_factoradic = vec![0; len];
                padded_factoradic(n as u64, &mut n_factoradic);
                to_return.push(n_factoradic.into_iter().rev().map(|index| this_elems.remove(index as usize)).collect());
            }
            tx2.send(to_return).unwrap();
        });
    }

    Some(rx.iter().flatten().collect())
}

Performance of aforementioned algorithms after 100 iterations
Algorithm	Mean (ms)	Median (ms)	σ (ms)
permute_recurse 8	5.6038	5.6039	0.0092405
permute 8	3.1287	3.1276	0.011452
permute_multithreaded 8	1.9952	1.9930	0.014901
permute_recurse 10	678.73	678.33	3.0358
permute 10	016.992	016.988	0.000000035559
permute_multithreaded 10	217.57	214.12	9.2134

Once again, I was surprised to find that it was slower! I presume that the thread overhead is still too large, even for a very big list.

But what I was most surprised at was how fast this factoradic method is. I'll definitely remember it should I ever need to calculate permutations in the future.

As an aside, how does Haskell do it? Well, the code itself is quite cryptic at first glance.

permutations :: [a] -> [[a]]
permutations xs0 = xs0 : perms xs0 []
  where
    -- | @perms ts is@ is equivalent to
    --
    -- > concat
    -- >   [ interleave {(ts!!n)} {(drop (n+1) ts)} xs []
    -- >   | n <- [0..length ts - 1]
    -- >   , xs <- permutations (reverse (take n ts) ++ is)
    -- >   ]
    --
    -- @{(ts!!n)}@ and @{(drop (n+1) ts)}@ denote the values of variables @t@ and @ts@
    -- when they appear free in the definition of @interleave@ and @interleave'@.
    perms :: forall a. [a] -> [a] -> [[a]]
    perms []     _  = []
    perms (t:ts) is = foldr interleave (perms ts (t:is)) (permutations is)
      where
        -- @interleave {t} {ts} xs r@ is equivalent to
        --
        -- > [ insertAt n t xs ++ ts | n <- [0..length xs - 1] ] ++ r
        --
        -- where
        --
        -- > insertAt n y xs = take n xs ++ y : drop n xs
        --
        interleave :: [a] -> [[a]] -> [[a]]
        interleave xs r = let (_,zs) = interleave' id xs r in zs

        -- @interleave' {t} {ts} f ys r@ is equivalent to
        --
        -- > ( ys ++ ts
        -- > , [ f (insertAt n t ys ++ ts) | n <- [0..length ys - 1] ] ++ r
        -- > )
        --
        interleave' :: ([a] -> b) -> [a] -> [b] -> ([a], [b])
        interleave' _ []     r = (ts, r)
        interleave' f (y:ys) r = let (us,zs) = interleave' (f . (y:)) ys r
                                 in  (y:us, f (t:y:us) : zs)

But as the full source code points out, there is an excellent walkthrough of the function in a Stack Overflow answer given by Twan van Laarhoven. The goal is to maximise laziness as in a language like Haskell, this function is used a lot more than it is in Rust. I highly recommend reading the Stack Overflow answer, it's a really clever algorithm that deserves your time.

Comments

Please be respectful. Any comment which I find objectionable will be removed. By submitting a comment, you agree to the privacy policy.

No comments yet.