Bits and pieces of Linux RNG

August 21st, 2019

Before doing any sort of surgery on Linux RNG, it makes sense to get acquainted with its source, and this is what we will be doing right now. Most of the source is available in two files: include/linux/random.h and drivers/char/random.c. First, let's have a look at the header file to get an overview of the RNG interface exported to the rest of the kernel:

/*
 * include/linux/random.h
 *
 * Include file for the random number generator.
 */
#ifndef _LINUX_RANDOM_H
#define _LINUX_RANDOM_H

#include < linux/list.h >
#include < linux/once.h >

#include < uapi/linux/random.h >

struct random_ready_callback {
	struct list_head list;
	void (*func)(struct random_ready_callback *rdy);
	struct module *owner;
};

Other drivers and kernel subsystems may want to use random numbers, however it may be the case that the RNG1 is not yet initialized. Thus, these subsystems can register a callback to consume entropy from time to time. The only user is DRBG (Deterministic Random Bits Generator) in crypto/drbg.c, which is further used by some files in crypto/ directory.

extern void add_device_randomness(const void *, unsigned int);

#if defined(CONFIG_GCC_PLUGIN_LATENT_ENTROPY) && !defined(__CHECKER__)
static inline void add_latent_entropy(void)
{
	add_device_randomness( (const void *)&latent_entropy,
			      sizeof(latent_entropy));
}
#else
static inline void add_latent_entropy(void) {}
#endif

From time to time, Linux mixed "entropy" from devices into the randomness pool, but this is not done using add_device_randomness function. Instead, add_device_randomness mixes the information about the devices into the randomness pool during boot, so that computers with e.g. different MAC addresses would have different initial randomness pool state.

Latent entropy is a poor man's entropy source (and at the initial boot time, kernel is poor of entropy). Because some services may run before sufficient amount of entropy is accumulated, and even generate a cryptographic key or two, the LATENT_ENTROPY GCC plugin was developed, that (1) initializes variables with random data during compile-time (2) functions are instrumented with hash-like computations, their result is later added to the entropy pool. The functions and variables to instrument are marked in kernel with __latent_latency attribute. add_latent_randomness must be called to actually transfer the computed hash to the entropy pool. If the plugin is disabled, this function can be empty and optimized away.

extern void add_input_randomness(unsigned int type, unsigned int code,
				 unsigned int value) __latent_entropy;
extern void add_interrupt_randomness(int irq, int irq_flags) __latent_entropy;

extern void get_random_bytes(void *buf, int nbytes);
extern int add_random_ready_callback(struct random_ready_callback *rdy);
extern void del_random_ready_callback(struct random_ready_callback *rdy);
extern void get_random_bytes_arch(void *buf, int nbytes);

Here we can see functions for adding entropy into the entropy pool from input devices, and from interrupts.

get_random_bytes fills the buffer with nbytes random bytes, get_random_bytes_arch does the same, but using architecture specific RNG (i.e. RDRAND on Intel).

{add,del}_random_ready_callback are clearly service functions for adding aforementioned callbacks.

#ifndef MODULE
extern const struct file_operations random_fops, urandom_fops;
#endif

Apparently, in case RNG is not built as module, the structures with /dev/random and /dev/urandom device callbacks are exposed to the rest of the kernel. I don't know yet if this is strictly required.

unsigned int get_random_int(void);
unsigned long get_random_long(void);
unsigned long randomize_page(unsigned long start, unsigned long range);

These seems to be convenience wrappers around get_random_bytes2.

u32 prandom_u32(void);
void prandom_bytes(void *buf, size_t nbytes);
void prandom_seed(u32 seed);
void prandom_reseed_late(void);

struct rnd_state {
	__u32 s1, s2, s3, s4;
};

u32 prandom_u32_state(struct rnd_state *state);
void prandom_bytes_state(struct rnd_state *state, void *buf, size_t nbytes);
void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state);

#define prandom_init_once(pcpu_state)			\
	DO_ONCE(prandom_seed_full_state, (pcpu_state))
/**
 * prandom_u32_max - returns a pseudo-random number in interval [0, ep_ro)
 * @ep_ro: right open interval endpoint
 *
 * Returns a pseudo-random number that is in interval [0, ep_ro). Note
 * that the result depends on PRNG being well distributed in [0, ~0U]
 * u32 space. Here we use maximally equidistributed combined Tausworthe
 * generator, that is, prandom_u32(). This is useful when requesting a
 * random index of an array containing ep_ro elements, for example.
 *
 * Returns: pseudo-random number in interval [0, ep_ro)
 */
static inline u32 prandom_u32_max(u32 ep_ro)
{
	return (u32)( ( (u64) prandom_u32() * ep_ro) >> 32);
}

/*
 * Handle minimum values for seeds
 */
static inline u32 __seed(u32 x, u32 m)
{
	return (x < m) ? x + m : x;
}

/**
 * prandom_seed_state - set seed for prandom_u32_state().
 * @state: pointer to state structure to receive the seed.
 * @seed: arbitrary 64-bit value to use as a seed.
 */
static inline void prandom_seed_state(struct rnd_state *state, u64 seed)
{
	u32 i = (seed >> 32) ^ (seed << 10) ^ seed;

	state->s1 = __seed(i,   2U);
	state->s2 = __seed(i,   8U);
	state->s3 = __seed(i,  16U);
	state->s4 = __seed(i, 128U);
}

Apparently, there is a no-pretense simple PRNG in kernel. It is used by:

  • Test infrastructure;
  • Some network drivers use it to pick random timeouts/RSS hash keys of NICs;
  • Pseudorandom packet distribution over bonded interfaces;
  • TCP uses it in several places (congestion control, etc.).

i.e. all the use cases where fast and cheap low-quality randomness is required.

#ifdef CONFIG_ARCH_RANDOM
# include 
#else
static inline bool arch_get_random_long(unsigned long *v)
{
	return 0;
}
static inline bool arch_get_random_int(unsigned int *v)
{
	return 0;
}
static inline bool arch_has_random(void)
{
	return 0;
}
static inline bool arch_get_random_seed_long(unsigned long *v)
{
	return 0;
}
static inline bool arch_get_random_seed_int(unsigned int *v)
{
	return 0;
}
static inline bool arch_has_random_seed(void)
{
	return 0;
}
#endif

In case platform does not expose any built-in RNG, arch_*_randon_* functions are stubbed out.

/* Pseudo random number generator from numerical recipes. */
static inline u32 next_pseudo_random32(u32 seed)
{
	return seed * 1664525 + 1013904223;
}

#endif /* _LINUX_RANDOM_H */

Continuation of fast PRNG code.


Now, random.c. It begins with the following comment:

This routine gathers environmental noise from device drivers, etc.,
and returns good random numbers, suitable for cryptographic use.
Besides the obvious cryptographic uses, these numbers are also good
for seeding TCP sequence numbers, and other places where it is
desirable to have numbers which are not only random, but hard to
predict by an attacker.

Theory of operation
===================

Computers are very predictable devices. Hence it is extremely hard
to produce truly random numbers on a computer --- as opposed to
pseudo-random numbers, which can easily generated by using a
algorithm. Unfortunately, it is very easy for attackers to guess
the sequence of pseudo-random number generators, and for some
applications this is not acceptable. So instead, we must try to
gather "environmental noise" from the computer's environment, which
must be hard for outside attackers to observe, and use that to
generate random numbers. In a Unix environment, this is best done
from inside the kernel.

Sources of randomness from the environment include inter-keyboard
timings, inter-interrupt timings from some interrupts, and other
events which are both (a) non-deterministic and (b) hard for an
outside observer to measure. Randomness from these sources are
added to an "entropy pool", which is mixed using a CRC-like function.
This is not cryptographically strong, but it is adequate assuming
the randomness is not chosen maliciously, and it is fast enough that
the overhead of doing it on every interrupt is very reasonable.
As random bytes are mixed into the entropy pool, the routines keep
an *estimate* of how many bits of randomness have been stored into
the random number generator's internal state.

When random bytes are desired, they are obtained by taking the SHA
hash of the contents of the "entropy pool". The SHA hash avoids
exposing the internal state of the entropy pool. It is believed to
be computationally infeasible to derive any useful information
about the input of SHA from its output. Even if it is possible to
analyze SHA in some clever way, as long as the amount of data
returned from the generator is less than the inherent entropy in
the pool, the output data is totally unpredictable. For this
reason, the routine decreases its internal estimate of how many
bits of "true randomness" are contained in the entropy pool as it
outputs random numbers.

If this estimate goes to zero, the routine can still generate
random numbers; however, an attacker may (at least in theory) be
able to infer the future output of the generator from prior
outputs. This requires successful cryptanalysis of SHA, which is
not believed to be feasible, but there is a remote possibility.
Nonetheless, these numbers should be useful for the vast majority
of purposes.

Exported interfaces ---- output
===============================

There are three exported interfaces; the first is one designed to
be used from within the kernel:

	void get_random_bytes(void *buf, int nbytes);

This interface will return the requested number of random bytes,
and place it in the requested buffer.

The two other interfaces are two character devices /dev/random and
/dev/urandom. /dev/random is suitable for use when very high
quality randomness is desired (for example, for key generation or
one-time pads), as it will only return a maximum of the number of
bits of randomness (as estimated by the random number generator)
contained in the entropy pool.

The /dev/urandom device does not have this limit, and will return
as many bytes as are requested. As more and more random bytes are
requested without giving time for the entropy pool to recharge,
this will result in random numbers that are merely cryptographically
strong. For many applications, however, this is acceptable.

Exported interfaces ---- input
==============================

The current exported interfaces for gathering environmental noise
from the devices are:

	void add_device_randomness(const void *buf, unsigned int size);
	void add_input_randomness(unsigned int type, unsigned int code,
                               unsigned int value);
	void add_interrupt_randomness(int irq, int irq_flags);
	void add_disk_randomness(struct gendisk *disk);

add_device_randomness() is for adding data to the random pool that
is likely to differ between two devices (or possibly even per boot).
This would be things like MAC addresses or serial numbers, or the
read-out of the RTC. This does *not* add any actual entropy to the
pool, but it initializes the pool to different values for devices
that might otherwise be identical and have very little entropy
available to them (particularly common in the embedded world).

add_input_randomness() uses the input layer interrupt timing, as well as
the event type information from the hardware.

add_interrupt_randomness() uses the interrupt timing as random
inputs to the entropy pool. Using the cycle counters and the irq source
as inputs, it feeds the randomness roughly once a second.

add_disk_randomness() uses what amounts to the seek time of block
layer request events, on a per-disk_devt basis, as input to the
entropy pool. Note that high-speed solid state drives with very low
seek times do not make for good sources of entropy, as their seek
times are usually fairly consistent.

All of these routines try to estimate how many bits of randomness a
particular randomness source. They do this by keeping track of the
first and second order deltas of the event timings.

Ensuring unpredictability at system startup
============================================

When any operating system starts up, it will go through a sequence
of actions that are fairly predictable by an adversary, especially
if the start-up does not involve interaction with a human operator.
This reduces the actual number of bits of unpredictability in the
entropy pool below the value in entropy_count. In order to
counteract this effect, it helps to carry information in the
entropy pool across shut-downs and start-ups. To do this, put the
following lines an appropriate script which is run during the boot
sequence:

	echo "Initializing random number generator..."
	random_seed=/var/run/random-seed
	# Carry a random seed from start-up to start-up
	# Load and then save the whole entropy pool
	if [ -f $random_seed ]; then
		cat $random_seed >/dev/urandom
	else
		touch $random_seed
	fi
	chmod 600 $random_seed
	dd if=/dev/urandom of=$random_seed count=1 bs=512

and the following lines in an appropriate script which is run as
the system is shutdown:

	# Carry a random seed from shut-down to start-up
	# Save the whole entropy pool
	echo "Saving random seed..."
	random_seed=/var/run/random-seed
	touch $random_seed
	chmod 600 $random_seed
	dd if=/dev/urandom of=$random_seed count=1 bs=512

For example, on most modern systems using the System V init
scripts, such code fragments would be found in
/etc/rc.d/init.d/random. On older Linux systems, the correct script
location might be in /etc/rcb.d/rc.local or /etc/rc.d/rc.0.

Effectively, these commands cause the contents of the entropy pool
to be saved at shut-down time and reloaded into the entropy pool at
start-up. (The 'dd' in the addition to the bootup script is to
make sure that /etc/random-seed is different for every start-up,
even if the system crashes without executing rc.0.) Even with
complete knowledge of the start-up activities, predicting the state
of the entropy pool requires knowledge of the previous history of
the system.

Configuring the /dev/random driver under Linux
==============================================

The /dev/random driver under Linux uses minor numbers 8 and 9 of
the /dev/mem major number (#1). So if your system does not have
/dev/random and /dev/urandom created already, they can be created
by using the commands:

	mknod /dev/random c 1 8
	mknod /dev/urandom c 1 9

Acknowledgements:
=================

Ideas for constructing this random number generator were derived
from Pretty Good Privacy's random number generator, and from private
discussions with Phil Karn. Colin Plumb provided a faster random
number generator, which speed up the mixing function of the entropy
pool, taken from PGPfone. Dale Worley has also contributed many
useful ideas and suggestions to improve this driver.

Any flaws in the design are solely my responsibility, and should
not be attributed to the Phil, Colin, or any of authors of PGP.

Further background information on this topic may be obtained from
RFC 1750, "Randomness Recommendations for Security", by Donald
Eastlake, Steve Crocker, and Jeff Schiller.

Next, lets skip some headers, and get to:

#define INPUT_POOL_SHIFT	12
#define INPUT_POOL_WORDS	(1 << (INPUT_POOL_SHIFT-5))
#define OUTPUT_POOL_SHIFT	10
#define OUTPUT_POOL_WORDS	(1 << (OUTPUT_POOL_SHIFT-5))
#define SEC_XFER_SIZE		512
#define EXTRACT_SIZE		10

Here, entropy pool sizes are defined for two pool: input pool, where all entropy is added, and output (or blocking) pool, which is used by /dev/random. Blocking pool as well as other RNG components are filled with data from input pool, but not from each other. Data directly derived from the input pool is never passed to user applications.

/dev/urandom as well as in-kernel randomness sources use 'crng' infrastructure, which we will see later.

The sizes of pools in ints (words): 128 and 32 for input and output pools correspondingly.

#define LONGS(x) ( ( (x) + sizeof(unsigned long) - 1)/sizeof(unsigned long))

Size of array of 'long'-typed elements necessary to hold X bits of data.

/*
 * To allow fractional bits to be tracked, the entropy_count field is
 * denominated in units of 1/8th bits.
 *
 * 2*(ENTROPY_SHIFT + log2(poolbits)) must <= 31, or the multiply in
 * credit_entropy_bits() needs to be 64 bits wide.
 */
#define ENTROPY_SHIFT 3
#define ENTROPY_BITS(r) ( (r)->entropy_count >> ENTROPY_SHIFT)

Why track fractional bits?

/*
 * The minimum number of bits of entropy before we wake up a read on
 * /dev/random.  Should be enough to do a significant reseed.
 */
static int random_read_wakeup_bits = 64;

/*
 * If the entropy count falls under this number of bits, then we
 * should wake up processes which are selecting or polling on write
 * access to /dev/random.
 */
static int random_write_wakeup_bits = 28 * OUTPUT_POOL_WORDS;

/*
 * The minimum number of seconds between urandom pool reseeding.  We
 * do this to limit the amount of entropy that can be drained from the
 * input pool even if there are heavy demands on /dev/urandom.
 */
static int random_min_urandom_seed = 60;

Obligatory magical constants.

/*
 * Originally, we used a primitive polynomial of degree .poolwords
 * over GF(2).  The taps for various sizes are defined below.  They
 * were chosen to be evenly spaced except for the last tap, which is 1
 * to get the twisting happening as fast as possible.
 *
 * For the purposes of better mixing, we use the CRC-32 polynomial as
 * well to make a (modified) twisted Generalized Feedback Shift
 * Register.  (See M. Matsumoto & Y. Kurita, 1992.  Twisted GFSR
 * generators.  ACM Transactions on Modeling and Computer Simulation
 * 2(3):179-194.  Also see M. Matsumoto & Y. Kurita, 1994.  Twisted
 * GFSR generators II.  ACM Transactions on Modeling and Computer
 * Simulation 4:254-266)
 *
 * Thanks to Colin Plumb for suggesting this.
 *
 * The mixing operation is much less sensitive than the output hash,
 * where we use SHA-1.  All that we want of mixing operation is that
 * it be a good non-cryptographic hash; i.e. it not produce collisions
 * when fed "random" data of the sort we expect to see.  As long as
 * the pool state differs for different inputs, we have preserved the
 * input entropy and done a good job.  The fact that an intelligent
 * attacker can construct inputs that will produce controlled
 * alterations to the pool's state is not important because we don't
 * consider such inputs to contribute any randomness.  The only
 * property we need with respect to them is that the attacker can't
 * increase his/her knowledge of the pool's state.  Since all
 * additions are reversible (knowing the final state and the input,
 * you can reconstruct the initial state), if an attacker has any
 * uncertainty about the initial state, he/she can only shuffle that
 * uncertainty about, but never cause any collisions (which would
 * decrease the uncertainty).
 *
 * Our mixing functions were analyzed by Lacharme, Roeck, Strubel, and
 * Videau in their paper, "The Linux Pseudorandom Number Generator
 * Revisited" (see: http://eprint.iacr.org/2012/251.pdf).  In their
 * paper, they point out that we are not using a true Twisted GFSR,
 * since Matsumoto & Kurita used a trinomial feedback polynomial (that
 * is, with only three taps, instead of the six that we are using).
 * As a result, the resulting polynomial is neither primitive nor
 * irreducible, and hence does not have a maximal period over
 * GF(2**32).  They suggest a slight change to the generator
 * polynomial which improves the resulting TGFSR polynomial to be
 * irreducible, which we have made here.
 */
static struct poolinfo {
  int poolbitshift, poolwords, poolbytes, poolbits, poolfracbits;
#define S(x) ilog2(x)+5, (x), (x)*4, (x)*32, (x) << (ENTROPY_SHIFT+5)
  int tap1, tap2, tap3, tap4, tap5;
} poolinfo_table[] = {
  /* was: x^128 + x^103 + x^76 + x^51 +x^25 + x + 1 */
  /* x^128 + x^104 + x^76 + x^51 +x^25 + x + 1 */
  { S(128),	104,	76,	51,	25,	1 },
  /* was: x^32 + x^26 + x^20 + x^14 + x^7 + x + 1 */
  /* x^32 + x^26 + x^19 + x^14 + x^7 + x + 1 */
  { S(32),	26,	19,	14,	7,	1 },
#if 0
  /* x^2048 + x^1638 + x^1231 + x^819 + x^411 + x + 1  -- 115 */
  { S(2048),	1638,	1231,	819,	411,	1 },

  /* x^1024 + x^817 + x^615 + x^412 + x^204 + x + 1 -- 290 */
  { S(1024),	817,	615,	412,	204,	1 },

  /* x^1024 + x^819 + x^616 + x^410 + x^207 + x^2 + 1 -- 115 */
  { S(1024),	819,	616,	410,	207,	2 },

  /* x^512 + x^411 + x^308 + x^208 + x^104 + x + 1 -- 225 */
  { S(512),	411,	308,	208,	104,	1 },

  /* x^512 + x^409 + x^307 + x^206 + x^102 + x^2 + 1 -- 95 */
  { S(512),	409,	307,	206,	102,	2 },
  /* x^512 + x^409 + x^309 + x^205 + x^103 + x^2 + 1 -- 95 */
  { S(512),	409,	309,	205,	103,	2 },

  /* x^256 + x^205 + x^155 + x^101 + x^52 + x + 1 -- 125 */
  { S(256),	205,	155,	101,	52,	1 },

  /* x^128 + x^103 + x^78 + x^51 + x^27 + x^2 + 1 -- 70 */
  { S(128),	103,	78,	51,	27,	2 },

  /* x^64 + x^52 + x^39 + x^26 + x^14 + x + 1 -- 15 */
  { S(64),	52,	39,	26,	14,	1 },
#endif
};

These structures describe useful constants for working with the pool (poolbitshift, poolwords, poolbytes, poolbits, poolfracbits), and constants for the mixing functions.

/*
 * Static global variables
 */
static DECLARE_WAIT_QUEUE_HEAD(random_read_wait);
static DECLARE_WAIT_QUEUE_HEAD(random_write_wait);
static DECLARE_WAIT_QUEUE_HEAD(urandom_init_wait);
static struct fasync_struct *fasync;

static DEFINE_SPINLOCK(random_ready_list_lock);
static LIST_HEAD(random_ready_list);

These structures allow waiting for /dev/random become available for reading/writing, for /dev/urandom to initialize. fasync_struct is necessary for asynchronous notification about data availability, random_ready_list we have covered earlier.

struct crng_state {
  __u32		state[16];
  unsigned long	init_time;
  spinlock_t	lock;
};

struct crng_state primary_crng = {
  .lock = __SPIN_LOCK_UNLOCKED(primary_crng.lock),
};

struct crng_state is another pool that backs /dev/urandom and kernel-level entropy generation interfaces. primary_crng is the instance that is used by default on non-NUMA machines (NUMA machines have per-socket instances).

/*
 * crng_init =  0 --> Uninitialized
 *		1 --> Initialized
 *		2 --> Initialized from input_pool
 *
 * crng_init is protected by primary_crng->lock, and only increases
 * its value (from 0->1->2).
 */
static int crng_init = 0;
#define crng_ready() (likely(crng_init > 1))
static int crng_init_cnt = 0;
static unsigned long crng_global_init_time = 0;
#define CRNG_INIT_CNT_THRESH (2*CHACHA20_KEY_SIZE)
static void _extract_crng(struct crng_state *crng,
			  __u8 out[CHACHA20_BLOCK_SIZE]);
static void _crng_backtrack_protect(struct crng_state *crng,
				    __u8 tmp[CHACHA20_BLOCK_SIZE], int used);
static void process_random_ready_list(void);

crng is initialized: first directly by interrupts/HWRNGs as a performance optimization, then - from the data accumulated in the input pool.

static struct ratelimit_state unseeded_warning =
  RATELIMIT_STATE_INIT("warn_unseeded_randomness", HZ, 3);
static struct ratelimit_state urandom_warning =
  RATELIMIT_STATE_INIT("warn_urandom_randomness", HZ, 3);

static int ratelimit_disable __read_mostly;

module_param_named(ratelimit_disable, ratelimit_disable, int, 0644);
MODULE_PARM_DESC(ratelimit_disable, "Disable random ratelimit suppression");

Rate limiting for warning messages.

/**********************************************************************
 *
 * OS independent entropy store.   Here are the functions which handle
 * storing entropy in an entropy pool.
 *
 **********************************************************************/

struct entropy_store;
struct entropy_store {
  /* read-only data: */
  const struct poolinfo *poolinfo;
  __u32 *pool;
  const char *name;
  struct entropy_store *pull;
  struct work_struct push_work;

  /* read-write data: */
  unsigned long last_pulled;
  spinlock_t lock;
  unsigned short add_ptr;
  unsigned short input_rotate;
  int entropy_count;
  int entropy_total;
  unsigned int initialized:1;
  unsigned int limit:1;
  unsigned int last_data_init:1;
  __u8 last_data[EXTRACT_SIZE];
};

This structure describes the input and the output pools. poolinfo was defined before, it contains some constants and mixing function parameters. pool is the actual data used by the pool. pull is pointer to the 'parent' of our current pool. push_work is used to schedule refill of blocking pool from the input pool. entropy_count is in fractional bits, entropy_total - in bits. last_data is used in 'FIPS' mode to check if the previous data generated equals our current draw from the PRNG.

static ssize_t extract_entropy(struct entropy_store *r, void *buf,
			       size_t nbytes, int min, int rsvd);
static ssize_t _extract_entropy(struct entropy_store *r, void *buf,
				size_t nbytes, int fips);

These functions (former of which wraps the latter) get the data out of the entropy pool. Details near the actual implementation.

static void crng_reseed(struct crng_state *crng, struct entropy_store *r);
static void push_to_pool(struct work_struct *work);
static __u32 input_pool_data[INPUT_POOL_WORDS] __latent_entropy;
static __u32 blocking_pool_data[OUTPUT_POOL_WORDS] __latent_entropy;
  • crng_reseed - is clear;
  • push_to_pool - is used to schedule entropy transfer from parent pool to the child.
  • input_pool_data, blocking_pool_data - storage areas for the entropy pools. They can be initialized with random data at the compilation time.
static struct entropy_store input_pool = {
  .poolinfo = &poolinfo_table[0],
  .name = "input",
  .limit = 1,
  .lock = __SPIN_LOCK_UNLOCKED(input_pool.lock),
  .pool = input_pool_data
};

static struct entropy_store blocking_pool = {
  .poolinfo = &poolinfo_table[1],
  .name = "blocking",
  .limit = 1,
  .pull = &input_pool,
  .lock = __SPIN_LOCK_UNLOCKED(blocking_pool.lock),
  .pool = blocking_pool_data,
  .push_work = __WORK_INITIALIZER(blocking_pool.push_work,
                                  push_to_pool),
};

The actual entropy pools: input and blocking one. Blocking pool uses input pool as parent, thus it also has the work queue.

static __u32 const twist_table[8] = {
  0x00000000, 0x3b6e20c8, 0x76dc4190, 0x4db26158,
  0xedb88320, 0xd6d6a3e8, 0x9b64c2b0, 0xa00ae278 };

/*
 * This function adds bytes into the entropy "pool".  It does not
 * update the entropy estimate.  The caller should call
 * credit_entropy_bits if this is appropriate.
 *
 * The pool is stirred with a primitive polynomial of the appropriate
 * degree, and then twisted.  We twist by three bits at a time because
 * it's cheap to do so and helps slightly in the expected case where
 * the entropy is concentrated in the low-order bits.
 */
static void _mix_pool_bytes(struct entropy_store *r, const void *in,
			    int nbytes)
{
  unsigned long i, tap1, tap2, tap3, tap4, tap5;
  int input_rotate;
  int wordmask = r->poolinfo->poolwords - 1;
  const char *bytes = in;
  __u32 w;

  tap1 = r->poolinfo->tap1;
  tap2 = r->poolinfo->tap2;
  tap3 = r->poolinfo->tap3;
  tap4 = r->poolinfo->tap4;
  tap5 = r->poolinfo->tap5;

  input_rotate = r->input_rotate;
  i = r->add_ptr;

  /* mix one byte at a time to simplify size handling and churn faster */
  while (nbytes--) {
    w = rol32(*bytes++, input_rotate);
    i = (i - 1) & wordmask;

    /* XOR in the various taps */
    w ^= r->pool[i];
    w ^= r->pool[(i + tap1) & wordmask];
    w ^= r->pool[(i + tap2) & wordmask];
    w ^= r->pool[(i + tap3) & wordmask];
    w ^= r->pool[(i + tap4) & wordmask];
    w ^= r->pool[(i + tap5) & wordmask];

    /* Mix the result back in with a twist */
    r->pool[i] = (w >> 3) ^ twist_table[w & 7];

    /*
     * Normally, we add 7 bits of rotation to the pool.
     * At the beginning of the pool, add an extra 7 bits
     * rotation, so that successive passes spread the
     * input bits across the pool evenly.
     */
     input_rotate = (input_rotate + (i ? 7 : 14)) & 31;
  }

  r->input_rotate = input_rotate;
  r->add_ptr = i;
}

The comments say it all. The function adds random bytes to the pool using the constants defined in poolinfo.

static void __mix_pool_bytes(struct entropy_store *r, const void *in,
			     int nbytes)
{
	trace_mix_pool_bytes_nolock(r->name, nbytes, _RET_IP_);
	_mix_pool_bytes(r, in, nbytes);
}

A wrapper for tracing infrastructure.


static void mix_pool_bytes(struct entropy_store *r, const void *in,
			   int nbytes)
{
	unsigned long flags;

	trace_mix_pool_bytes(r->name, nbytes, _RET_IP_);
	spin_lock_irqsave(&r->lock, flags);
	_mix_pool_bytes(r, in, nbytes);
	spin_unlock_irqrestore(&r->lock, flags);
}

A wrapper for locking. This C style would win from Forthy hyperstatic scopes.

struct fast_pool {
	__u32		pool[4];
	unsigned long	last;
	unsigned short	reg_idx;
	unsigned char	count;
};

Fast pool uses a simpler mixing function and is used for adding entropy from interrupts, which should be served as fast as possible.

/*
 * This is a fast mixing routine used by the interrupt randomness
 * collector.  It's hardcoded for an 128 bit pool and assumes that any
 * locks that might be needed are taken by the caller.
 */
static void fast_mix(struct fast_pool *f)
{
	__u32 a = f->pool[0],	b = f->pool[1];
	__u32 c = f->pool[2],	d = f->pool[3];

	a += b;			c += d;
	b = rol32(b, 6);	d = rol32(d, 27);
	d ^= a;			b ^= c;

	a += b;			c += d;
	b = rol32(b, 16);	d = rol32(d, 14);
	d ^= a;			b ^= c;

	a += b;			c += d;
	b = rol32(b, 6);	d = rol32(d, 27);
	d ^= a;			b ^= c;

	a += b;			c += d;
	b = rol32(b, 16);	d = rol32(d, 14);
	d ^= a;			b ^= c;

	f->pool[0] = a;  f->pool[1] = b;
	f->pool[2] = c;  f->pool[3] = d;
	f->count++;
}

This is the function for mixing up the contents of the fast pool.

static void process_random_ready_list(void)
{
	unsigned long flags;
	struct random_ready_callback *rdy, *tmp;

	spin_lock_irqsave(&random_ready_list_lock, flags);
	list_for_each_entry_safe(rdy, tmp, &random_ready_list, list) {
		struct module *owner = rdy->owner;

		list_del_init(&rdy->list);
		rdy->func(rdy);
		module_put(owner);
	}
	spin_unlock_irqrestore(&random_ready_list_lock, flags);
}

process_random_ready_list is called when CRNG is initialized or reseeded: it calls 'wait until RNG is ready' callbacks in the modules that are interested in random numbers early in the boot process.

/*
 * Credit (or debit) the entropy store with n bits of entropy.
 * Use credit_entropy_bits_safe() if the value comes from userspace
 * or otherwise should be checked for extreme values.
 */
static void credit_entropy_bits(struct entropy_store *r, int nbits)
{
	int entropy_count, orig;
	const int pool_size = r->poolinfo->poolfracbits;
	int nfrac = nbits << ENTROPY_SHIFT;

	if (!nbits)
		return;

retry:
	entropy_count = orig = ACCESS_ONCE(r->entropy_count);
	if (nfrac < 0) {
		/* Debit */
		entropy_count += nfrac;
	} else {
		/*
		 * Credit: we have to account for the possibility of
		 * overwriting already present entropy.	 Even in the
		 * ideal case of pure Shannon entropy, new contributions
		 * approach the full value asymptotically:
		 *
		 * entropy <- entropy + (pool_size - entropy) *
		 *	(1 - exp(-add_entropy/pool_size))
		 *
		 * For add_entropy <= pool_size/2 then
		 * (1 - exp(-add_entropy/pool_size)) >=
		 *    (add_entropy/pool_size)*0.7869...
		 * so we can approximate the exponential with
		 * 3/4*add_entropy/pool_size and still be on the
		 * safe side by adding at most pool_size/2 at a time.
		 *
		 * The use of pool_size-2 in the while statement is to
		 * prevent rounding artifacts from making the loop
		 * arbitrarily long; this limits the loop to log2(pool_size)*2
		 * turns no matter how large nbits is.
		 */
		int pnfrac = nfrac;
		const int s = r->poolinfo->poolbitshift + ENTROPY_SHIFT + 2;
		/* The +2 corresponds to the /4 in the denominator */

		do {
			unsigned int anfrac = min(pnfrac, pool_size/2);
			unsigned int add =
				( (pool_size - entropy_count)*anfrac*3) >> s;

			entropy_count += add;
			pnfrac -= anfrac;
		} while (unlikely(entropy_count < pool_size-2 && pnfrac));
	}

	if (unlikely(entropy_count < 0)) {
		pr_warn("random: negative entropy/overflow: pool %s count %d\n",
			r->name, entropy_count);
		WARN_ON(1);
		entropy_count = 0;
	} else if (entropy_count > pool_size)
		entropy_count = pool_size;
	if (cmpxchg(&r->entropy_count, orig, entropy_count) != orig)
		goto retry;

	r->entropy_total += nbits;
	if (!r->initialized && r->entropy_total > 128) {
		r->initialized = 1;
		r->entropy_total = 0;
	}

	trace_credit_entropy_bits(r->name, nbits,
				  entropy_count >> ENTROPY_SHIFT,
				  r->entropy_total, _RET_IP_);

	if (r == &input_pool) {
		int entropy_bits = entropy_count >> ENTROPY_SHIFT;

		if (crng_init < 2 && entropy_bits >= 128) {
			crng_reseed(&primary_crng, r);
			entropy_bits = r->entropy_count >> ENTROPY_SHIFT;
		}

		/* should we wake readers? */
		if (entropy_bits >= random_read_wakeup_bits) {
			wake_up_interruptible(&random_read_wait);
			kill_fasync(&fasync, SIGIO, POLL_IN);
		}
		/* If the input pool is getting full, send some
		 * entropy to the blocking pool until it is 75% full.
		 */
		if (entropy_bits > random_write_wakeup_bits &&
		    r->initialized &&
		    r->entropy_total >= 2*random_read_wakeup_bits) {
			struct entropy_store *other = &blocking_pool;

			if (other->entropy_count <=
			    3 * other->poolinfo->poolfracbits / 4) {
				schedule_work(&other->push_work);
				r->entropy_total = 0;
			}
		}
	}
}

This function does the voodoo calculations of the amount of entropy in the pool. Interestingly, the 'debit' branch seems to be totally unused, and the actual debit happens in the 'account' function. Otherwise, function features some approximation for information-theoretical computations, and a CAS loop. There is also a special-case for input pool: if it accumulates more than 128 entropy bits, this function seeds crng, activating urandom; if after this there remains enough entropy, the readers are waked up3.

static int credit_entropy_bits_safe(struct entropy_store *r, int nbits)
{
	const int nbits_max = r->poolinfo->poolwords * 32;

	if (nbits < 0)
		return -EINVAL;

	/* Cap the value to avoid overflows */
	nbits = min(nbits,  nbits_max);

	credit_entropy_bits(r, nbits);
	return 0;
}

This is a safety wrapper around credit_entropy_bits. Exists for sanity-checking the user input to the kernel.


/*********************************************************************
 *
 * CRNG using CHACHA20
 *
 *********************************************************************/

#define CRNG_RESEED_INTERVAL (300*HZ)

static DECLARE_WAIT_QUEUE_HEAD(crng_init_wait);

#ifdef CONFIG_NUMA
/*
 * Hack to deal with crazy userspace progams when they are all trying
 * to access /dev/urandom in parallel.  The programs are almost
 * certainly doing something terribly wrong, but we'll work around
 * their brain damage.
 */
static struct crng_state **crng_node_pool __read_mostly;
#endif

The crng is reseeded from input pool around every (but not earlier) 5 minutes; the crng_node_pool is used to create per-CPU-socket crng state structures - shared memory communication is too slow on such machines.

static void crng_initialize(struct crng_state *crng)
{
	int		i;
	unsigned long	rv;

	memcpy(&crng->state[0], "expand 32-byte k", 16);
	if (crng == &primary_crng)
		_extract_entropy(&input_pool, &crng->state[4],
				 sizeof(__u32) * 12, 0);
	else
		get_random_bytes(&crng->state[4], sizeof(__u32) * 12);
	for (i = 4; i < 16; i++) {
		if (!arch_get_random_seed_long(&rv) &&
		    !arch_get_random_long(&rv))
			rv = random_get_entropy();
		crng->state[i] ^= rv;
	}
	crng->init_time = jiffies - CRNG_RESEED_INTERVAL - 1;
}

The primary crng instance is initialized from input pool, all the rest - from the primary crng (via get_random_bytes). Additionally, RNDRANDisms are mixed into the pool if present. And of course, there is an obligatory voodoo value "expand 32-byte k" involved.

static int crng_fast_load(const char *cp, size_t len)
{
	unsigned long flags;
	char *p;

	if (!spin_trylock_irqsave(&primary_crng.lock, flags))
		return 0;
	if (crng_init != 0) {
		spin_unlock_irqrestore(&primary_crng.lock, flags);
		return 0;
	}
	p = (unsigned char *) &primary_crng.state[4];
	while (len > 0 && crng_init_cnt < CRNG_INIT_CNT_THRESH) {
		p[crng_init_cnt % CHACHA20_KEY_SIZE] ^= *cp;
		cp++; crng_init_cnt++; len--;
	}
	if (crng_init_cnt >= CRNG_INIT_CNT_THRESH) {
		crng_init = 1;
		wake_up_interruptible(&crng_init_wait);
		pr_notice("random: fast init done\n");
	}
	spin_unlock_irqrestore(&primary_crng.lock, flags);
	return 1;
}

This function is used to seed crng from hardware random generators and interrupt context. The is added to the middle 32 bytes (CHACHA20_KEY_SIZE) out of 64 (state consists of 16 4-byte ints) of the crng state, and does not touch other bytes. This is enough to consider crng usably initialized.

#ifdef CONFIG_NUMA
static void do_numa_crng_init(struct work_struct *work)
{
	int i;
	struct crng_state *crng;
	struct crng_state **pool;

	pool = kcalloc(nr_node_ids, sizeof(*pool), GFP_KERNEL|__GFP_NOFAIL);
	for_each_online_node(i) {
		crng = kmalloc_node(sizeof(struct crng_state),
				    GFP_KERNEL | __GFP_NOFAIL, i);
		spin_lock_init(&crng->lock);
		crng_initialize(crng);
		pool[i] = crng;
	}
	mb();
	if (cmpxchg(&crng_node_pool, NULL, pool)) {
		for_each_node(i)
			kfree(pool[i]);
		kfree(pool);
	}
}

static DECLARE_WORK(numa_crng_init_work, do_numa_crng_init);

static void numa_crng_init(void)
{
	schedule_work(&numa_crng_init_work);
}
#else
static void numa_crng_init(void) {}
#endif

On NUMA systems, other than primary_crng, there is an additional crng instance per NUMA node. These need to be separately initialized; the initialization is invoked when crng is first time reseeded (initialized from the input pool).

On normal machines, this is a noop.

static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
{
	unsigned long	flags;
	int		i, num;
	union {
		__u8	block[CHACHA20_BLOCK_SIZE];
		__u32	key[8];
	} buf;

	if (r) {
		num = extract_entropy(r, &buf, 32, 16, 0);
		if (num == 0)
			return;
	} else {
		_extract_crng(&primary_crng, buf.block);
		_crng_backtrack_protect(&primary_crng, buf.block,
					CHACHA20_KEY_SIZE);
	}
	spin_lock_irqsave(&crng->lock, flags);
	for (i = 0; i < 8; i++) {
		unsigned long	rv;
		if (!arch_get_random_seed_long(&rv) &&
		    !arch_get_random_long(&rv))
			rv = random_get_entropy();
		crng->state[i+4] ^= buf.key[i] ^ rv;
	}
	memzero_explicit(&buf, sizeof(buf));
	crng->init_time = jiffies;
	if (crng == &primary_crng && crng_init < 2) {
		numa_crng_init();
		crng_init = 2;
		process_random_ready_list();
		wake_up_interruptible(&crng_init_wait);
		pr_notice("random: crng init done\n");
		if (unseeded_warning.missed) {
			pr_notice("random: %d get_random_xx warning(s) missed "
				  "due to ratelimiting\n",
				  unseeded_warning.missed);
			unseeded_warning.missed = 0;
		}
		if (urandom_warning.missed) {
			pr_notice("random: %d urandom warning(s) missed "
				  "due to ratelimiting\n",
				  urandom_warning.missed);
			urandom_warning.missed = 0;
		}
	}
	spin_unlock_irqrestore(&crng->lock, flags);
}

CRNG is reseeded on attempts to read from it, or on explicit reseed ioctls; if the last reseed happened more than CRNG_RESEED_INTERVAL seconds ago, a new reseed is done. primary_crng is reseeded from input_pool, per-node crngs - from the primary crng4. Additional RNDRANDisms are applied to the crng state as well.

If this is the primary_crng that just got reseeded for first time, additional bookkeeping is triggered - initing per-node crngs, waking processes waiting for urandom to become usable.

static inline void maybe_reseed_primary_crng(void)
{
	if (crng_init > 2 &&
	    time_after(jiffies, primary_crng.init_time + CRNG_RESEED_INTERVAL))
		crng_reseed(&primary_crng, &input_pool);
}

This function is actually unused.

static inline void crng_wait_ready(void)
{
	wait_event_interruptible(crng_init_wait, crng_ready());
}

Veeeery thin convenience wrapper.

static void _extract_crng(struct crng_state *crng,
			  __u8 out[CHACHA20_BLOCK_SIZE])
{
	unsigned long v, flags;

	if (crng_ready() &&
	    (time_after(crng_global_init_time, crng->init_time) ||
	     time_after(jiffies, crng->init_time + CRNG_RESEED_INTERVAL)))
		crng_reseed(crng, crng == &primary_crng ? &input_pool : NULL);
	spin_lock_irqsave(&crng->lock, flags);
	if (arch_get_random_long(&v))
		crng->state[14] ^= v;
	chacha20_block(&crng->state[0], out);
	if (crng->state[12] == 0)
		crng->state[13]++;
	spin_unlock_irqrestore(&crng->lock, flags);
}

First, there is a check for reseeding crng. Then, the state is xored with RNDRAND values, and a ChaCha20 block is derived from the state. After that, there is some weird fixup which I don't really understand. This function is called only from it's wrapper, shown below:

static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE])
{
	struct crng_state *crng = NULL;

#ifdef CONFIG_NUMA
	if (crng_node_pool)
		crng = crng_node_pool[numa_node_id()];
	if (crng == NULL)
#endif
		crng = &primary_crng;
	_extract_crng(crng, out);
}

Now this function is actually used, it extracts the entropy from the right crng pool - node-local on NUMA machines, primary on the normal ones.

/*
 * Use the leftover bytes from the CRNG block output (if there is
 * enough) to mutate the CRNG key to provide backtracking protection.
 */
static void _crng_backtrack_protect(struct crng_state *crng,
				    __u8 tmp[CHACHA20_BLOCK_SIZE], int used)
{
	unsigned long	flags;
	__u32		*s, *d;
	int		i;

	used = round_up(used, sizeof(__u32));
	if (used + CHACHA20_KEY_SIZE > CHACHA20_BLOCK_SIZE) {
		extract_crng(tmp);
		used = 0;
	}
	spin_lock_irqsave(&crng->lock, flags);
	s = (__u32 *) &tmp[used];
	d = &crng->state[4];
	for (i=0; i < 8; i++)
		*d++ ^= *s++;
	spin_unlock_irqrestore(&crng->lock, flags);
}

The unused data out of the extracted data block is xored into the crng state. If there is not enough crng data, it is first extracted. Only called directly from crng_reseed when reseeding per-socket crng_states.

static void crng_backtrack_protect(__u8 tmp[CHACHA20_BLOCK_SIZE], int used)
{
	struct crng_state *crng = NULL;

#ifdef CONFIG_NUMA
	if (crng_node_pool)
		crng = crng_node_pool[numa_node_id()];
	if (crng == NULL)
#endif
		crng = &primary_crng;
	_crng_backtrack_protect(crng, tmp, used);
}

Wrapper a la extract_crng - automatically selects the right crng to mix up.

static ssize_t extract_crng_user(void __user *buf, size_t nbytes)
{
	ssize_t ret = 0, i = CHACHA20_BLOCK_SIZE;
	__u8 tmp[CHACHA20_BLOCK_SIZE];
	int large_request = (nbytes > 256);

	while (nbytes) {
		if (large_request && need_resched()) {
			if (signal_pending(current)) {
				if (ret == 0)
					ret = -ERESTARTSYS;
				break;
			}
			schedule();
		}

		extract_crng(tmp);
		i = min_t(int, nbytes, CHACHA20_BLOCK_SIZE);
		if (copy_to_user(buf, tmp, i)) {
			ret = -EFAULT;
			break;
		}

		nbytes -= i;
		buf += i;
		ret += i;
	}
	crng_backtrack_protect(tmp, i);

	/* Wipe data just written to memory */
	memzero_explicit(tmp, sizeof(tmp));

	return ret;
}

In case user has requested more entropy than can fit into ChaCha20 block, the input is derived in a loop. After this, crng state is mixed up.


/*********************************************************************
 *
 * Entropy input management
 *
 *********************************************************************/

/* There is one of these per entropy source */
struct timer_rand_state {
	cycles_t last_time;
	long last_delta, last_delta2;
	unsigned dont_count_entropy:1;
};

#define INIT_TIMER_RAND_STATE { INITIAL_JIFFIES, };

This structure is used to calculate the amount of entropy that each source contributes.

/*
 * Add device- or boot-specific data to the input pool to help
 * initialize it.
 *
 * None of this adds any entropy; it is meant to avoid the problem of
 * the entropy pool having similar initial state across largely
 * identical devices.
 */
void add_device_randomness(const void *buf, unsigned int size)
{
	unsigned long time = random_get_entropy() ^ jiffies;
	unsigned long flags;

	trace_add_device_randomness(size, _RET_IP_);
	spin_lock_irqsave(&input_pool.lock, flags);
	_mix_pool_bytes(&input_pool, buf, size);
	_mix_pool_bytes(&input_pool, &time, sizeof(time));
	spin_unlock_irqrestore(&input_pool.lock, flags);
}
EXPORT_SYMBOL(add_device_randomness);

As mentioned before, this function mixes some device specific information into the input entropy pool at boot time.

static struct timer_rand_state input_timer_state = INIT_TIMER_RAND_STATE;
/*
 * This function adds entropy to the entropy "pool" by using timing
 * delays.  It uses the timer_rand_state structure to make an estimate
 * of how many bits of entropy this call has added to the pool.
 *
 * The number "num" is also added to the pool - it should somehow describe
 * the type of event which just happened.  This is currently 0-255 for
 * keyboard scan codes, and 256 upwards for interrupts.
 *
 */
static void add_timer_randomness(struct timer_rand_state *state, unsigned num)
{
	struct entropy_store	*r;
	struct {
		long jiffies;
		unsigned cycles;
		unsigned num;
	} sample;
	long delta, delta2, delta3;

	preempt_disable();

	sample.jiffies = jiffies;
	sample.cycles = random_get_entropy();
	sample.num = num;
	r = &input_pool;
	mix_pool_bytes(r, &sample, sizeof(sample));

	/*
	 * Calculate number of bits of randomness we probably added.
	 * We take into account the first, second and third-order deltas
	 * in order to make our estimate.
	 */

	if (!state->dont_count_entropy) {
		delta = sample.jiffies - state->last_time;
		state->last_time = sample.jiffies;

		delta2 = delta - state->last_delta;
		state->last_delta = delta;

		delta3 = delta2 - state->last_delta2;
		state->last_delta2 = delta2;

		if (delta < 0)
			delta = -delta;
		if (delta2 < 0)
			delta2 = -delta2;
		if (delta3 < 0)
			delta3 = -delta3;
		if (delta > delta2)
			delta = delta2;
		if (delta > delta3)
			delta = delta3;

		/*
		 * delta is now minimum absolute delta.
		 * Round down by 1 bit on general principles,
		 * and limit entropy entimate to 12 bits.
		 */
		credit_entropy_bits(r, min_t(int, fls(delta>>1), 11));
	}
	preempt_enable();
}

The information that is mixed into the pool is: jiffies, in-kernel timekeeping value incremented on timer interrupts; cycles, aka random_get_entropy(), aka rdtsc on x865, and num, event code. To estimate how many bits of entropy were added as a result, deltas of up to third order are calculated, and the minimum absolute value is credited.

void add_input_randomness(unsigned int type, unsigned int code,
				 unsigned int value)
{
	static unsigned char last_value;

	/* ignore autorepeat and the like */
	if (value == last_value)
		return;

	last_value = value;
	add_timer_randomness(&input_timer_state,
			     (type << 4) ^ code ^ (code >> 4) ^ value);
	trace_add_input_randomness(ENTROPY_BITS(&input_pool));
}
EXPORT_SYMBOL_GPL(add_input_randomness);

Calls add_timer_randomness, but only after applying some sort of hash to the event code.

static DEFINE_PER_CPU(struct fast_pool, irq_randomness);

#ifdef ADD_INTERRUPT_BENCH
static unsigned long avg_cycles, avg_deviation;

#define AVG_SHIFT 8     /* Exponential average factor k=1/256 */
#define FIXED_1_2 (1 << (AVG_SHIFT-1))

static void add_interrupt_bench(cycles_t start)
{
        long delta = random_get_entropy() - start;

        /* Use a weighted moving average */
        delta = delta - ​((avg_cycles + FIXED_1_2) >> AVG_SHIFT);
        avg_cycles += delta;
        /* And average deviation */
        delta = abs(delta) - ​((avg_deviation + FIXED_1_2) >> AVG_SHIFT);
        avg_deviation += delta;
}
#else
#define add_interrupt_bench(x)
#endif

This function is used to calculate how much time does adding the entropy from interrupt context takes. irq_randomness fast pool is where each CPU accumulates entropy before mixing it into the pool.

static __u32 get_reg(struct fast_pool *f, struct pt_regs *regs)
{
	__u32 *ptr = (__u32 *) regs;
	unsigned int idx;

	if (regs == NULL)
		return 0;
	idx = READ_ONCE(f->reg_idx);
	if (idx >= sizeof(struct pt_regs) / sizeof(__u32))
		idx = 0;
	ptr += idx++;
	WRITE_ONCE(f->reg_idx, idx);
	return *ptr;
}

get_reg returns the value of a register in a round-robin (across different calls) fashion.

void add_interrupt_randomness(int irq, int irq_flags)
{
	struct entropy_store	*r;
	struct fast_pool	*fast_pool = this_cpu_ptr(&irq_randomness);
	struct pt_regs		*regs = get_irq_regs();
	unsigned long		now = jiffies;
	cycles_t		cycles = random_get_entropy();
	__u32			c_high, j_high;
	__u64			ip;
	unsigned long		seed;
	int			credit = 0;

	if (cycles == 0)
		cycles = get_reg(fast_pool, regs);
	c_high = (sizeof(cycles) > 4) ? cycles >> 32 : 0;
	j_high = (sizeof(now) > 4) ? now >> 32 : 0;
	fast_pool->pool[0] ^= cycles ^ j_high ^ irq;
	fast_pool->pool[1] ^= now ^ c_high;
	ip = regs ? instruction_pointer(regs) : _RET_IP_;
	fast_pool->pool[2] ^= ip;
	fast_pool->pool[3] ^= (sizeof(ip) > 4) ? ip >> 32 :
		get_reg(fast_pool, regs);

	fast_mix(fast_pool);
	add_interrupt_bench(cycles);

	if (unlikely(crng_init == 0)) {
		if (​(fast_pool->count >= 64) &&
		    crng_fast_load(​(char *) fast_pool->pool,
				   sizeof(fast_pool->pool))) {
			fast_pool->count = 0;
			fast_pool->last = now;
		}
		return;
	}

	if ( (fast_pool->count < 64) &&
	    !time_after(now, fast_pool->last + HZ))
		return;

	r = &input_pool;
	if (!spin_trylock(&r->lock))
		return;

	fast_pool->last = now;
	__mix_pool_bytes(r, &fast_pool->pool, sizeof(fast_pool->pool));

	/*
	 * If we have architectural seed generator, produce a seed and
	 * add it to the pool.  For the sake of paranoia don't let the
	 * architectural seed generator dominate the input from the
	 * interrupt noise.
	 */
	if (arch_get_random_seed_long(&seed)) {
		__mix_pool_bytes(r, &seed, sizeof(seed));
		credit = 1;
	}
	spin_unlock(&r->lock);

	fast_pool->count = 0;

	/* award one bit for the contents of the fast pool */
	credit_entropy_bits(r, credit + 1);
}
EXPORT_SYMBOL_GPL(add_interrupt_randomness);

Ok, so this function mixes one of the registers, higher 32 bits of jiffies and rdtsc cycles into the core-local fast_pool (this core part is benchmarked). If the crng is not initialized and there is enough randomness in the fast pool, crng is initialized with the contents of the pool. Next group of lines exits early if the current interrupt is delivered without timing information update (jiffies is updated on timer interrupts) since previous add_interrupt_randomness invocation; then it mixes the contents of the fast pool into input pool. Finally, it mixes CPU-provided random number into the fast pool.

#ifdef CONFIG_BLOCK
void add_disk_randomness(struct gendisk *disk)
{
	if (!disk || !disk->random)
		return;
	/* first major is 1, so we get >= 0x200 here */
	add_timer_randomness(disk->random, 0x100 + disk_devt(disk));
	trace_add_disk_randomness(disk_devt(disk), ENTROPY_BITS(&input_pool));
}
EXPORT_SYMBOL_GPL(add_disk_randomness);
#endif

add_disk_randomness is just a wrapper around add_timer_randomness.


/*********************************************************************
 *
 * Entropy extraction routines
 *
 *********************************************************************/

/*
 * This utility inline function is responsible for transferring entropy
 * from the primary pool to the secondary extraction pool. We make
 * sure we pull enough for a 'catastrophic reseed'.
 */
static void _xfer_secondary_pool(struct entropy_store *r, size_t nbytes);
static void xfer_secondary_pool(struct entropy_store *r, size_t nbytes)
{
	if (!r->pull ||
	    r->entropy_count >= (nbytes << (ENTROPY_SHIFT + 3)) ||
	    r->entropy_count > r->poolinfo->poolfracbits)
		return;

	if (r->limit == 0 && random_min_urandom_seed) {
		unsigned long now = jiffies;

		if (time_before(now,
				r->last_pulled + random_min_urandom_seed * HZ))
			return;
		r->last_pulled = now;
	}

	_xfer_secondary_pool(r, nbytes);
}

This is a wrapper around _xfer_secondary_pool, and r here is a destination pool. The function just exits when the destination pool has enough entropy (more than is specified by the caller of the function), the second condition looks like dead code to me, as limit is always set to 1.

static void _xfer_secondary_pool(struct entropy_store *r, size_t nbytes)
{
	__u32	tmp[OUTPUT_POOL_WORDS];

	/* For /dev/random's pool, always leave two wakeups' worth */
	int rsvd_bytes = r->limit ? 0 : random_read_wakeup_bits / 4;
	int bytes = nbytes;

	/* pull at least as much as a wakeup */
	bytes = max_t(int, bytes, random_read_wakeup_bits / 8);
	/* but never more than the buffer size */
	bytes = min_t(int, bytes, sizeof(tmp));

	trace_xfer_secondary_pool(r->name, bytes * 8, nbytes * 8,
				  ENTROPY_BITS(r), ENTROPY_BITS(r->pull));
	bytes = extract_entropy(r->pull, tmp, bytes,
				random_read_wakeup_bits / 8, rsvd_bytes);
	mix_pool_bytes(r, tmp, bytes);
	credit_entropy_bits(r, bytes*8);
}

This function actually pulls data from the input to output pool. Per my reading, rsvd_bytes is always set to 0 here, and the comment is totally outdated. This undocumented r->limit field is a WTF to me even now. Anyhow, the amount of data to transfer is reasonably limited, and then this amount is extracted from the parent pool, and mixed into the child pool.

/*
 * Used as a workqueue function so that when the input pool is getting
 * full, we can "spill over" some entropy to the output pools.  That
 * way the output pools can store some of the excess entropy instead
 * of letting it go to waste.
 */
static void push_to_pool(struct work_struct *work)
{
	struct entropy_store *r = container_of(work, struct entropy_store,
					      push_work);
	BUG_ON(!r);
	_xfer_secondary_pool(r, random_read_wakeup_bits/8);
	trace_push_to_pool(r->name, r->entropy_count >> ENTROPY_SHIFT,
			   r->pull->entropy_count >> ENTROPY_SHIFT);
}

This is a wrapper for entropy_pool->push_work work queue. It is periodically scheduled for execution by credit_entropy_bits.

/*
 * This function decides how many bytes to actually take from the
 * given pool, and also debits the entropy count accordingly.
 */
static size_t account(struct entropy_store *r, size_t nbytes, int min,
		      int reserved)
{
	int entropy_count, orig;
	size_t ibytes, nfrac;

	BUG_ON(r->entropy_count > r->poolinfo->poolfracbits);

	/* Can we pull enough? */
retry:
	entropy_count = orig = ACCESS_ONCE(r->entropy_count);
	ibytes = nbytes;
	/* If limited, never pull more than available */
	if (r->limit) {
		int have_bytes = entropy_count >> (ENTROPY_SHIFT + 3);

		if ( (have_bytes -= reserved) < 0)
			have_bytes = 0;
		ibytes = min_t(size_t, ibytes, have_bytes);
	}
	if (ibytes < min)
		ibytes = 0;

	if (unlikely(entropy_count < 0)) {
		pr_warn("random: negative entropy count: pool %s count %d\n",
			r->name, entropy_count);
		WARN_ON(1);
		entropy_count = 0;
	}
	nfrac = ibytes << (ENTROPY_SHIFT + 3);
	if ( (size_t) entropy_count > nfrac )
		entropy_count -= nfrac;
	else
		entropy_count = 0;

	if (cmpxchg(&r->entropy_count, orig, entropy_count) != orig)
		goto retry;

	trace_debit_entropy(r->name, 8 * ibytes);
	if (ibytes &&
	    (r->entropy_count >> ENTROPY_SHIFT) < random_write_wakeup_bits) {
		wake_up_interruptible(&random_write_wait);
		kill_fasync(&fasync, SIGIO, POLL_OUT);
	}

	return ibytes;
}

This function calculates how much data can be pulled, taking into account the reserved and minimum amounts of data to pull. If pull is possible, we just subtract the number of fractional bits from the entropy count. This code is wrapped in a CAS loop; and on exit from the loop, the function wakes the applications waiting to write to /dev/random.

/*
 * This function does the actual extraction for extract_entropy and
 * extract_entropy_user.
 *
 * Note: we assume that .poolwords is a multiple of 16 words.
 */
static void extract_buf(struct entropy_store *r, __u8 *out)
{
	int i;
	union {
		__u32 w[5];
		unsigned long l[LONGS(20)];
	} hash;
	__u32 workspace[SHA_WORKSPACE_WORDS];
	unsigned long flags;

	/*
	 * If we have an architectural hardware random number
	 * generator, use it for SHA's initial vector
	 */
	sha_init(hash.w);
	for (i = 0; i < LONGS(20); i++) {
		unsigned long v;
		if (!arch_get_random_long(&v))
			break;
		hash.l[i] = v;
	}

	/* Generate a hash across the pool, 16 words (512 bits) at a time */
	spin_lock_irqsave(&r->lock, flags);
	for (i = 0; i < r->poolinfo->poolwords; i += 16)
		sha_transform(hash.w, (__u8 *)(r->pool + i), workspace);

	/*
	 * We mix the hash back into the pool to prevent backtracking
	 * attacks (where the attacker knows the state of the pool
	 * plus the current outputs, and attempts to find previous
	 * ouputs), unless the hash function can be inverted. By
	 * mixing at least a SHA1 worth of hash data back, we make
	 * brute-forcing the feedback as hard as brute-forcing the
	 * hash.
	 */
	__mix_pool_bytes(r, hash.w, sizeof(hash.w));
	spin_unlock_irqrestore(&r->lock, flags);

	memzero_explicit(workspace, sizeof(workspace));

	/*
	 * In case the hash function has some recognizable output
	 * pattern, we fold it in half. Thus, we always feed back
	 * twice as much data as we output.
	 */
	hash.w[0] ^= hash.w[3];
	hash.w[1] ^= hash.w[4];
	hash.w[2] ^= rol32(hash.w[2], 16);

	memcpy(out, &hash, EXTRACT_SIZE);
	memzero_explicit(&hash, sizeof(hash));
}

The function is well-commented, though the argument about recognizable hash pattern is strange - how does folding the hash help if the output pattern is recognizable?

static ssize_t _extract_entropy(struct entropy_store *r, void *buf,
				size_t nbytes, int fips)
{
	ssize_t ret = 0, i;
	__u8 tmp[EXTRACT_SIZE];
	unsigned long flags;

	while (nbytes) {
		extract_buf(r, tmp);

		if (fips) {
			spin_lock_irqsave(&r->lock, flags);
			if (!memcmp(tmp, r->last_data, EXTRACT_SIZE))
				panic("Hardware RNG duplicated output!\n");
			memcpy(r->last_data, tmp, EXTRACT_SIZE);
			spin_unlock_irqrestore(&r->lock, flags);
		}
		i = min_t(int, nbytes, EXTRACT_SIZE);
		memcpy(buf, tmp, i);
		nbytes -= i;
		buf += i;
		ret += i;
	}

	/* Wipe data just returned from memory */
	memzero_explicit(tmp, sizeof(tmp));

	return ret;
}

_extract_entropy just calls extract_buf in a loop; also there is a 'sanity checker'6 from something called 'fips': the generator should not output same blocks consecutively.

/*
 * This function extracts randomness from the "entropy pool", and
 * returns it in a buffer.
 *
 * The min parameter specifies the minimum amount we can pull before
 * failing to avoid races that defeat catastrophic reseeding while the
 * reserved parameter indicates how much entropy we must leave in the
 * pool after each pull to avoid starving other readers.
 */
static ssize_t extract_entropy(struct entropy_store *r, void *buf,
				 size_t nbytes, int min, int reserved)
{
	__u8 tmp[EXTRACT_SIZE];
	unsigned long flags;

	/* if last_data isn't primed, we need EXTRACT_SIZE extra bytes */
	if (fips_enabled) {
		spin_lock_irqsave(&r->lock, flags);
		if (!r->last_data_init) {
			r->last_data_init = 1;
			spin_unlock_irqrestore(&r->lock, flags);
			trace_extract_entropy(r->name, EXTRACT_SIZE,
					      ENTROPY_BITS(r), _RET_IP_);
			xfer_secondary_pool(r, EXTRACT_SIZE);
			extract_buf(r, tmp);
			spin_lock_irqsave(&r->lock, flags);
			memcpy(r->last_data, tmp, EXTRACT_SIZE);
		}
		spin_unlock_irqrestore(&r->lock, flags);
	}

	trace_extract_entropy(r->name, nbytes, ENTROPY_BITS(r), _RET_IP_);
	xfer_secondary_pool(r, nbytes);
	nbytes = account(r, nbytes, min, reserved);

	return _extract_entropy(r, buf, nbytes, fips_enabled);
}

Wrapper around _extract_entropy. Initializes last_data field in entropy_pool, refills it, debits the bits to be consumed, and extracts that amount.

/*
 * This function extracts randomness from the "entropy pool", and
 * returns it in a userspace buffer.
 */
static ssize_t extract_entropy_user(struct entropy_store *r, void __user *buf,
				    size_t nbytes)
{
	ssize_t ret = 0, i;
	__u8 tmp[EXTRACT_SIZE];
	int large_request = (nbytes > 256);

	trace_extract_entropy_user(r->name, nbytes, ENTROPY_BITS(r), _RET_IP_);
	xfer_secondary_pool(r, nbytes);
	nbytes = account(r, nbytes, 0, 0);

	while (nbytes) {
		if (large_request && need_resched()) {
			if (signal_pending(current)) {
				if (ret == 0)
					ret = -ERESTARTSYS;
				break;
			}
			schedule();
		}

		extract_buf(r, tmp);
		i = min_t(int, nbytes, EXTRACT_SIZE);
		if (copy_to_user(buf, tmp, i)) {
			ret = -EFAULT;
			break;
		}

		nbytes -= i;
		buf += i;
		ret += i;
	}

	/* Wipe data just returned from memory */
	memzero_explicit(tmp, sizeof(tmp));

	return ret;
}

This function handles extraction of entropy to userspace. it calls extract_buf in a loop, until the user-provided buffer becomes full, while checking inf there is a need to give up the CPU or a pending signal. Used in the implementation of read(2) from /dev/random.

/*
 * This function is the exported kernel interface.  It returns some
 * number of good random numbers, suitable for key generation, seeding
 * TCP sequence numbers, etc.  It does not rely on the hardware random
 * number generator.  For random bytes direct from the hardware RNG
 * (when available), use get_random_bytes_arch().
 */
void get_random_bytes(void *buf, int nbytes)
{
	__u8 tmp[CHACHA20_BLOCK_SIZE];

#if DEBUG_RANDOM_BOOT > 0
	if (!crng_ready())
		printk(KERN_NOTICE "random: %pF get_random_bytes called "
		       "with crng_init = %d\n", (void *) _RET_IP_, crng_init);
#endif
	trace_get_random_bytes(nbytes, _RET_IP_);

	while (nbytes >= CHACHA20_BLOCK_SIZE) {
		extract_crng(buf);
		buf += CHACHA20_BLOCK_SIZE;
		nbytes -= CHACHA20_BLOCK_SIZE;
	}

	if (nbytes > 0) {
		extract_crng(tmp);
		memcpy(buf, tmp, nbytes);
		crng_backtrack_protect(tmp, nbytes);
	} else
		crng_backtrack_protect(tmp, CHACHA20_BLOCK_SIZE);
	memzero_explicit(tmp, sizeof(tmp));
}
EXPORT_SYMBOL(get_random_bytes);

This function is available to the rest of kernel. It calls extract_crng in a loop until the provided buffer is filled, while whitening the crng state using leftover bytes.

/*
 * Add a callback function that will be invoked when the nonblocking
 * pool is initialised.
 *
 * returns: 0 if callback is successfully added
 *	    -EALREADY if pool is already initialised (callback not called)
 *	    -ENOENT if module for callback is not alive
 */
int add_random_ready_callback(struct random_ready_callback *rdy)
{
	struct module *owner;
	unsigned long flags;
	int err = -EALREADY;

	if (crng_ready())
		return err;

	owner = rdy->owner;
	if (!try_module_get(owner))
		return -ENOENT;

	spin_lock_irqsave(&random_ready_list_lock, flags);
	if (crng_ready())
		goto out;

	owner = NULL;

	list_add(&rdy->list, &random_ready_list);
	err = 0;

out:
	spin_unlock_irqrestore(&random_ready_list_lock, flags);

	module_put(owner);

	return err;
}
EXPORT_SYMBOL(add_random_ready_callback);

/*
 * Delete a previously registered readiness callback function.
 */
void del_random_ready_callback(struct random_ready_callback *rdy)
{
	unsigned long flags;
	struct module *owner = NULL;

	spin_lock_irqsave(&random_ready_list_lock, flags);
	if (!list_empty(&rdy->list)) {
		list_del_init(&rdy->list);
		owner = rdy->owner;
	}
	spin_unlock_irqrestore(&random_ready_list_lock, flags);

	module_put(owner);
}
EXPORT_SYMBOL(del_random_ready_callback);

These two functions are nothing too interesting. Adding or removing an element from the random_ready_list.

/*
 * This function will use the architecture-specific hardware random
 * number generator if it is available.  The arch-specific hw RNG will
 * almost certainly be faster than what we can do in software, but it
 * is impossible to verify that it is implemented securely (as
 * opposed, to, say, the AES encryption of a sequence number using a
 * key known by the NSA).  So it's useful if we need the speed, but
 * only if we're willing to trust the hardware manufacturer not to
 * have put in a back door.
 */
void get_random_bytes_arch(void *buf, int nbytes)
{
	char *p = buf;

	trace_get_random_bytes_arch(nbytes, _RET_IP_);
	while (nbytes) {
		unsigned long v;
		int chunk = min(nbytes, (int)sizeof(unsigned long));

		if (!arch_get_random_long(&v))
			break;

		memcpy(p, &v, chunk);
		p += chunk;
		nbytes -= chunk;
	}

	if (nbytes)
		get_random_bytes(p, nbytes);
}
EXPORT_SYMBOL(get_random_bytes_arch);

get_random_bytes_arch fills buffer with CPU-provided random numbers - until the array gets full, or until CPU-provided source is exhausted. In this case, it fills the rest of the array with crng-extracted data.

/*
 * init_std_data - initialize pool with system data
 *
 * @r: pool to initialize
 *
 * This function clears the pool's entropy count and mixes some system
 * data into the pool to prepare it for use. The pool is not cleared
 * as that can only decrease the entropy in the pool.
 */
static void init_std_data(struct entropy_store *r)
{
	int i;
	ktime_t now = ktime_get_real();
	unsigned long rv;

	r->last_pulled = jiffies;
	mix_pool_bytes(r, &now, sizeof(now));
	for (i = r->poolinfo->poolbytes; i > 0; i -= sizeof(rv)) {
		if (!arch_get_random_seed_long(&rv) &&
		    !arch_get_random_long(&rv))
			rv = random_get_entropy();
		mix_pool_bytes(r, &rv, sizeof(rv));
	}
	mix_pool_bytes(r, utsname(), sizeof(*(utsname())));
}

Adds system time, a rdrand random number, and a hostname to the entropy pool.

/*
 * Note that setup_arch() may call add_device_randomness()
 * long before we get here. This allows seeding of the pools
 * with some platform dependent data very early in the boot
 * process. But it limits our options here. We must use
 * statically allocated structures that already have all
 * initializations complete at compile time. We should also
 * take care not to overwrite the precious per platform data
 * we were given.
 */
static int rand_initialize(void)
{
	init_std_data(&input_pool);
	init_std_data(&blocking_pool);
	crng_initialize(&primary_crng);
	crng_global_init_time = jiffies;
	if (ratelimit_disable) {
		urandom_warning.interval = 0;
		unseeded_warning.interval = 0;
	}
	return 0;
}
early_initcall(rand_initialize);

Initializes both pools with init_std_data, initializes crng, and configures rate limiting. early_initcall marks function for execution right after initialization of the OS core.

#ifdef CONFIG_BLOCK
void rand_initialize_disk(struct gendisk *disk)
{
	struct timer_rand_state *state;

	/*
	 * If kzalloc returns null, we just won't use that entropy
	 * source.
	 */
	state = kzalloc(sizeof(struct timer_rand_state), GFP_KERNEL);
	if (state) {
		state->last_time = INITIAL_JIFFIES;
		disk->random = state;
	}
}
#endif

Sets up per-disk entropy collection state. Called from the block device subsystem.

static ssize_t
_random_read(int nonblock, char __user *buf, size_t nbytes)
{
	ssize_t n;

	if (nbytes == 0)
		return 0;

	nbytes = min_t(size_t, nbytes, SEC_XFER_SIZE);
	while (1) {
		n = extract_entropy_user(&blocking_pool, buf, nbytes);
		if (n < 0)
			return n;
		trace_random_read(n*8, (nbytes-n)*8,
				  ENTROPY_BITS(&blocking_pool),
				  ENTROPY_BITS(&input_pool));
		if (n > 0)
			return n;

		/* Pool is (near) empty.  Maybe wait and retry. */
		if (nonblock)
			return -EAGAIN;

		wait_event_interruptible(random_read_wait,
			ENTROPY_BITS(&input_pool) >=
			random_read_wakeup_bits);
		if (signal_pending(current))
			return -ERESTARTSYS;
	}
}

This is the actual implementation of the read(2) syscall for /dev/random. Extracts entropy from the blocking pool, limiting the maximum number of bytes extracted to SEC_XFER_SIZE. In case extraction cannot proceed, blocks the read until enough entropy is provided.

static ssize_t
random_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
{
	return _random_read(file->f_flags & O_NONBLOCK, buf, nbytes);
}

A wrapper that adapts to the expected signature of read in file_operations structure (i.e., as expected in the VFS).

static ssize_t
urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
{
	unsigned long flags;
	static int maxwarn = 10;
	int ret;

	if (!crng_ready() && maxwarn > 0) {
		maxwarn--;
		if (__ratelimit(&urandom_warning))
			printk(KERN_NOTICE "random: %s: uninitialized "
			       "urandom read (%zd bytes read)\n",
			       current->comm, nbytes);
		spin_lock_irqsave(&primary_crng.lock, flags);
		crng_init_cnt = 0;
		spin_unlock_irqrestore(&primary_crng.lock, flags);
	}
	nbytes = min_t(size_t, nbytes, INT_MAX >> (ENTROPY_SHIFT + 3));
	ret = extract_crng_user(buf, nbytes);
	trace_urandom_read(8 * nbytes, 0, ENTROPY_BITS(&input_pool));
	return ret;
}

OTOH, urandom read extracts the entropy from crng. Reads from urandom succeed even if crng is still not initialized.

static unsigned int
random_poll(struct file *file, poll_table * wait)
{
	unsigned int mask;

	poll_wait(file, &random_read_wait, wait);
	poll_wait(file, &random_write_wait, wait);
	mask = 0;
	if (ENTROPY_BITS(&input_pool) >= random_read_wakeup_bits)
		mask |= POLLIN | POLLRDNORM;
	if (ENTROPY_BITS(&input_pool) < random_write_wakeup_bits)
		mask |= POLLOUT | POLLWRNORM;
	return mask;
}

Implementation of poll(2) on /dev/random. Supports only the POLLIN and POLLOUT flags.

static int
write_pool(struct entropy_store *r, const char __user *buffer, size_t count)
{
	size_t bytes;
	__u32 t, buf[16];
	const char __user *p = buffer;

	while (count > 0) {
		int b, i = 0;

		bytes = min(count, sizeof(buf));
		if (copy_from_user(&buf, p, bytes))
			return -EFAULT;

		for (b = bytes ; b > 0 ; b -= sizeof(__u32), i++) {
			if (!arch_get_random_int(&t))
				break;
			buf[i] ^= t;
		}

		count -= bytes;
		p += bytes;

		mix_pool_bytes(r, buf, bytes);
		cond_resched();
	}

	return 0;
}

Mixes bytes from user buffer into the entropy pool, but first xors it with the random numbers from the CPU.

static ssize_t random_write(struct file *file, const char __user *buffer,
			    size_t count, loff_t *ppos)
{
	size_t ret;

	ret = write_pool(&input_pool, buffer, count);
	if (ret)
		return ret;

	return (ssize_t)count;
}

A thin wrapper around write_pool that implements write(2) syscall on /dev/random and /dev/urandom. Feeds the user buffer into the input pool.

static long random_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
{
	int size, ent_count;
	int __user *p = (int __user *)arg;
	int retval;

	switch (cmd) {
	case RNDGETENTCNT:
		/* inherently racy, no point locking */
		ent_count = ENTROPY_BITS(&input_pool);
		if (put_user(ent_count, p))
			return -EFAULT;
		return 0;
	case RNDADDTOENTCNT:
		if (!capable(CAP_SYS_ADMIN))
			return -EPERM;
		if (get_user(ent_count, p))
			return -EFAULT;
		return credit_entropy_bits_safe(&input_pool, ent_count);
	case RNDADDENTROPY:
		if (!capable(CAP_SYS_ADMIN))
			return -EPERM;
		if (get_user(ent_count, p++))
			return -EFAULT;
		if (ent_count < 0)
			return -EINVAL;
		if (get_user(size, p++))
			return -EFAULT;
		retval = write_pool(&input_pool, (const char __user *)p,
				    size);
		if (retval < 0)
			return retval;
		return credit_entropy_bits_safe(&input_pool, ent_count);
	case RNDZAPENTCNT:
	case RNDCLEARPOOL:
		/*
		 * Clear the entropy pool counters. We no longer clear
		 * the entropy pool, as that's silly.
		 */
		if (!capable(CAP_SYS_ADMIN))
			return -EPERM;
		input_pool.entropy_count = 0;
		blocking_pool.entropy_count = 0;
		return 0;
	case RNDRESEEDCRNG:
		if (!capable(CAP_SYS_ADMIN))
			return -EPERM;
		if (crng_init < 2)
			return -ENODATA;
		crng_reseed(&primary_crng, NULL);
		crng_global_init_time = jiffies - 1;
		return 0;
	default:
		return -EINVAL;
	}
}

Defines ioctl operations on /dev/random and /dev/urandom. The following operations are supported:

  • RNDGETENTCNT - returns the amount of entropy in bits in the input pool;
  • RNDADDTOENTCNT - modifies the amount of entropy in the input pool without adding any bytes;
  • RNDADDENTROPY - mixes the user-provided bytes into the input pool and credits the amount accordingly;
  • RNDZAPENTCNT/RNDCLEARPOOL - clears the entropy count of both pools;
  • RNDRESEEDCRNG - reseed the global crng.
static int random_fasync(int fd, struct file *filp, int on)
{
	return fasync_helper(fd, filp, on, &fasync);
}

Initializes data structures for signal-driven IO. This is ~generic code used in all drivers.

const struct file_operations random_fops = {
	.read  = random_read,
	.write = random_write,
	.poll  = random_poll,
	.unlocked_ioctl = random_ioctl,
	.fasync = random_fasync,
	.llseek = noop_llseek,
};

const struct file_operations urandom_fops = {
	.read  = urandom_read,
	.write = random_write,
	.unlocked_ioctl = random_ioctl,
	.fasync = random_fasync,
	.llseek = noop_llseek,
};

These data structures plug the functions we've seen before into the VFS. We can see that /dev/random and /dev/urandom differ only in the read operations and poll support; writes/ioctls to both operate on the same data structures.

SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count,
		unsigned int, flags)
{
	if (flags & ~(GRND_NONBLOCK|GRND_RANDOM))
		return -EINVAL;

	if (count > INT_MAX)
		count = INT_MAX;

	if (flags & GRND_RANDOM)
		return _random_read(flags & GRND_NONBLOCK, buf, count);

	if (!crng_ready()) {
		if (flags & GRND_NONBLOCK)
			return -EAGAIN;
		crng_wait_ready();
		if (signal_pending(current))
			return -ERESTARTSYS;
	}
	return urandom_read(NULL, buf, count, NULL);
}

This is a syscall for getting random numbers that works without opening files in /dev/, otherwise the operations are the same.


/********************************************************************
 *
 * Sysctl interface
 *
 ********************************************************************/

#ifdef CONFIG_SYSCTL

#include


static int min_read_thresh = 8, min_write_thresh;
static int max_read_thresh = OUTPUT_POOL_WORDS * 32;
static int max_write_thresh = INPUT_POOL_WORDS * 32;
static char sysctl_bootid[16];

/*
 * This function is used to return both the bootid UUID, and random
 * UUID.  The difference is in whether table->data is NULL; if it is,
 * then a new UUID is generated and returned to the user.
 *
 * If the user accesses this via the proc interface, the UUID will be
 * returned as an ASCII string in the standard UUID format; if via the
 * sysctl system call, as 16 bytes of binary data.
 */
static int proc_do_uuid(struct ctl_table *table, int write,
			void __user *buffer, size_t *lenp, loff_t *ppos)
{
	struct ctl_table fake_table;
	unsigned char buf[64], tmp_uuid[16], *uuid;

	uuid = table->data;
	if (!uuid) {
		uuid = tmp_uuid;
		generate_random_uuid(uuid);
	} else {
		static DEFINE_SPINLOCK(bootid_spinlock);

		spin_lock(&bootid_spinlock);
		if (!uuid[8])
			generate_random_uuid(uuid);
		spin_unlock(&bootid_spinlock);
	}

	sprintf(buf, "%pU", uuid);

	fake_table.data = buf;
	fake_table.maxlen = sizeof(buf);

	return proc_dostring(&fake_table, write, buffer, lenp, ppos);
}

Function generates a random uuid and exposes it in the procfs under /proc/sys/kernel/random/ directory.

/*
 * Return entropy available scaled to integral bits
 */
static int proc_do_entropy(struct ctl_table *table, int write,
			   void __user *buffer, size_t *lenp, loff_t *ppos)
{
	struct ctl_table fake_table;
	int entropy_count;

	entropy_count = *(int *)table->data >> ENTROPY_SHIFT;

	fake_table.data = &entropy_count;
	fake_table.maxlen = sizeof(entropy_count);

	return proc_dointvec(&fake_table, write, buffer, lenp, ppos);
}

The comment explains it all; the available entropy estimation is exposed in the /proc/sys tree.

static int sysctl_poolsize = INPUT_POOL_WORDS * 32;
extern struct ctl_table random_table[];
struct ctl_table random_table[] = {
	{
		.procname	= "poolsize",
		.data		= &sysctl_poolsize,
		.maxlen		= sizeof(int),
		.mode		= 0444,
		.proc_handler	= proc_dointvec,
	},
	{
		.procname	= "entropy_avail",
		.maxlen		= sizeof(int),
		.mode		= 0444,
		.proc_handler	= proc_do_entropy,
		.data		= &input_pool.entropy_count,
	},
	{
		.procname	= "read_wakeup_threshold",
		.data		= &random_read_wakeup_bits,
		.maxlen		= sizeof(int),
		.mode		= 0644,
		.proc_handler	= proc_dointvec_minmax,
		.extra1		= &min_read_thresh,
		.extra2		= &max_read_thresh,
	},
	{
		.procname	= "write_wakeup_threshold",
		.data		= &random_write_wakeup_bits,
		.maxlen		= sizeof(int),
		.mode		= 0644,
		.proc_handler	= proc_dointvec_minmax,
		.extra1		= &min_write_thresh,
		.extra2		= &max_write_thresh,
	},
	{
		.procname	= "urandom_min_reseed_secs",
		.data		= &random_min_urandom_seed,
		.maxlen		= sizeof(int),
		.mode		= 0644,
		.proc_handler	= proc_dointvec,
	},
	{
		.procname	= "boot_id",
		.data		= &sysctl_bootid,
		.maxlen		= 16,
		.mode		= 0444,
		.proc_handler	= proc_do_uuid,
	},
	{
		.procname	= "uuid",
		.maxlen		= 16,
		.mode		= 0444,
		.proc_handler	= proc_do_uuid,
	},
#ifdef ADD_INTERRUPT_BENCH
	{
		.procname	= "add_interrupt_avg_cycles",
		.data		= &avg_cycles,
		.maxlen		= sizeof(avg_cycles),
		.mode		= 0444,
		.proc_handler	= proc_doulongvec_minmax,
	},
	{
		.procname	= "add_interrupt_avg_deviation",
		.data		= &avg_deviation,
		.maxlen		= sizeof(avg_deviation),
		.mode		= 0444,
		.proc_handler	= proc_doulongvec_minmax,
	},
#endif
	{ }
};
#endif 	/* CONFIG_SYSCTL */

Other variables are available in /proc/sys tree as well: input pool size, thresholds for waking readers and writers to /dev/random, reseed interval for /dev/urandom, system boot uuid, a generator for random uuids, and two knobs to tweak the interrupt entropy collector benchmark.

struct batched_entropy {
	union {
		unsigned long entropy_long[CHACHA20_BLOCK_SIZE / sizeof(unsigned long)];
		unsigned int entropy_int[CHACHA20_BLOCK_SIZE / sizeof(unsigned int)];
	};
	unsigned int position;
};

This data structure reduces the cost of generating random long and int values (to avoid the cost of going through the full ChaCha20 extraction for getting 4-8 bytes).

/*
 * Get a random word for internal kernel use only. The quality of the random
 * number is either as good as RDRAND or as good as /dev/urandom, with the
 * goal of being quite fast and not depleting entropy.
 */
static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_long);
unsigned long get_random_long(void)
{
	unsigned long ret;
	struct batched_entropy *batch;

	if (arch_get_random_long(&ret))
		return ret;

	batch = &get_cpu_var(batched_entropy_long);
	if (batch->position % ARRAY_SIZE(batch->entropy_long) == 0) {
		extract_crng ( (u8 *)batch->entropy_long );
		batch->position = 0;
	}
	ret = batch->entropy_long[batch->position++];
	put_cpu_var(batched_entropy_long);
	return ret;
}
EXPORT_SYMBOL(get_random_long);

The batched_entropy structure has an instance per CPU; however this data structure is not used if the CPU-provided random number is available. Otherwise, it gets one long value out of the buffer, and increments the index in it. If the counter overflows, reseeds the data structure.

#if BITS_PER_LONG == 32
unsigned int get_random_int(void)
{
	return get_random_long();
}
#else
static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_int);
unsigned int get_random_int(void)
{
	unsigned int ret;
	struct batched_entropy *batch;

	if (arch_get_random_int(&ret))
		return ret;

	batch = &get_cpu_var(batched_entropy_int);
	if (batch->position % ARRAY_SIZE(batch->entropy_int) == 0) {
		extract_crng( (u8 *)batch->entropy_int );
		batch->position = 0;
	}
	ret = batch->entropy_int[batch->position++];
	put_cpu_var(batched_entropy_int);
	return ret;
}
#endif
EXPORT_SYMBOL(get_random_int);

Works for ints the same as for longs.

/**
 * randomize_page - Generate a random, page aligned address
 * @start:	The smallest acceptable address the caller will take.
 * @range:	The size of the area, starting at @start, within which the
 *		random address must fall.
 *
 * If @start + @range would overflow, @range is capped.
 *
 * NOTE: Historical use of randomize_range, which this replaces, presumed that
 * @start was already page aligned.  We now align it regardless.
 *
 * Return: A page aligned address within [start, start + range).  On error,
 * @start is returned.
 */
unsigned long
randomize_page(unsigned long start, unsigned long range)
{
	if (!PAGE_ALIGNED(start)) {
		range -= PAGE_ALIGN(start) - start;
		start = PAGE_ALIGN(start);
	}

	if (start > ULONG_MAX - range)
		range = ULONG_MAX - start;

	range >>= PAGE_SHIFT;

	if (range == 0)
		return start;

	return start + (get_random_long() % range << PAGE_SHIFT);
}

Generates a random address in a given range. Clips the range if there is a possibility of overflow.

/* Interface for in-kernel drivers of true hardware RNGs.
 * Those devices may produce endless random bits and will be throttled
 * when our pool is full.
 */
void add_hwgenerator_randomness(const char *buffer, size_t count,
				size_t entropy)
{
	struct entropy_store *poolp = &input_pool;

	if (unlikely(crng_init == 0)) {
		crng_fast_load(buffer, count);
		return;
	}

	/* Suspend writing if we're above the trickle threshold.
	 * We'll be woken up again once below random_write_wakeup_thresh,
	 * or when the calling thread is about to terminate.
	 */
	wait_event_interruptible(random_write_wait, kthread_should_stop() ||
			ENTROPY_BITS(&input_pool) <= random_write_wakeup_bits);
	mix_pool_bytes(poolp, buffer, count);
	credit_entropy_bits(poolp, entropy);
}
EXPORT_SYMBOL_GPL(add_hwgenerator_randomness);

In-kernel HWRNGs just initialize the crng and mix the hardware-provided bytes into the input pool.

This is it for the Linux RNG source. I expect that the code will be simplified a lot when gets used FG as the underlying entropy provider: first, the input and output pools can be replaced by a single ring buffer. Crng may still stay, unless the system will work acceptably without it. The entropy collection routines will go away. All of this should sum up to a nice readability improvement and size reduction7.

  1. I will not try to give a definition of what is called 'entropy' in Linux, given the amount of crap that gets mixed to derive this entropy. []
  2. In fact, they use the same randomness source, but they are not implemented as wrappers. []
  3. The readers here are userspace applications that attempt to read /dev/random -- even though they actually read data from the blocking pool. []
  4. Whitening primary crng state in the process. []
  5. Lol. []
  6. Not ent, though. []
  7. Of course, at the cost of working with FG inside kernel, which currently can not be done easily, and depending on the kernel version to genesis, may require backport from a rather recent kernel. []

6 Responses to “Bits and pieces of Linux RNG”

  1. spyked says:
    1

    IMHO it would be worth trying out (similarly to how Mircea Popescu points out) the following experiments:

    * Removing the /dev/random node and replacing it with a hard-link to /dev/fg (or /dev/ttyUSB0, assuming your FG is connected via USB); and
    * Trying to disable "randomness" altogether in the Linux kernel and seeing what effect this has on a. kernel compilation; and b. user space

    The second experiment might take a lot of work, since from what I see there's no Kconfig "CONFIG_RANDOM" knob, as you'd expect -- after all, /dev/random is not even a POSIXism, so the impossibility to build a kernel without it is mind-boggling.

  2. bvt says:
    2

    I think we now have a plan to follow. Re. experiments:
    1. This would affect only userspace applications, the kernel entropy users will still be using the old infrastructure. Other than that, and exotica a la haveged/rngd, this should work.
    2. This is certainly interesting - what other interfaces provided to userspace by kernel malfunction without entropy? Though this change may indeed affect too many places in the kernel, if done properly. The easiest hacky way to achieve this would seem to be disabling entropy collection functions; but crng that backs in-kernel users is designed to be available even when zero entropy is available in entropy pools, so no, this strawman does not work. (And funnily enough, urandom_init_wait is totally unused.)

  3. @bvt / 2 : currently I suspect that nuking the PRNG will expose -- at the very least -- interesting bugs in the interrupt handlers. (Why do these demand randomization to begin with?)

  4. bvt says:
    4

    @Stanislav,
    The interrupt handlers should not require randomization (i.e. they are not entropy consumers), it's their timing (in jiffies and CPU cycles) that is used as a randomness source.

  5. Jacob Welsh says:
    5

    Removing the /dev/random node and replacing it with a hard-link to /dev/fg (or /dev/ttyUSB0, assuming your FG is connected via USB);

    I've tried this experiment (as mentioned elsewhere) for feeding GPG, which reads from both random and urandom. It works, but there's a pitfall for using with arbitrary programs, per random(4):

    When calling read(2) for the device /dev/urandom, reads of up to 256 bytes will return as many bytes as are requested and will not be interrupted by a signal handler.

    E.g., a technically compliant program might not check the returned byte count, and pass off uninitialized bufferolade as entropy if fed from a tty. And in further sad,

    Reads from this device do not block (i.e., the CPU is not yielded)

    which would seem impossible to satisfy with FG, though I'm not sure of the implications.

  6. bvt says:
    6

    When calling read(2) for the device /dev/urandom, reads of up to 256 bytes will return as many bytes as are requested and will not be interrupted by a signal handler.

    E.g., a technically compliant program might not check the returned byte count, and pass off uninitialized bufferolade as entropy if fed from a tty. And in further sad,

    Yes, this is a sad assumption, and I will take this into account in the new RNG.

    Reads from this device do not block (i.e., the CPU is not yielded).

    It indeed does not block, but does yield the CPU -- the call to schedule() in extract_crng_user allows other processes to run on the core. In this context, the link from /dev/ttyUSBx to /dev/urandom will still work unless the process does something braindamaged - I even can't come up a case when the application would care about the fact whether it blocked or not when reading from the device (for which there is explicit documentation that reads do not block). The latency is not an issue because FG continuously supplies fresh random bits.

    And anyhow, above applies only to urandom, for which the new design also uses CPRNG.

Leave a Reply