BIP39: Mnemonics for Recording Long Keys

by Steve Marx on August 26, 2020

If you’ve worked with a blockchain like Bitcoin or Ethereum, you’ve probably been given a “seed phrase” or “mnemonic”, a phrase of a dozen or more words like this:

Yes, that’s my handwriting. No, that’s not my mnemonic phrase.

This phrase can be used to recover your blockchain account, so you might imagine it’s some sort of encoding of your private key. This is pretty much right, but there’s a little more to it.

This type of seed phrase is described by Bitcoin Improvement Proposal 39 (BIP39), and it has a number of desirable properties for its intended use case: securing a private key, particularly using offline storage.

Algorithm

The BIP39 algorithm to generate a new seed is as follows:

Generate the desired number of bits of entropy.
Append a checksum.
Encode using a standard word list.
Use PBKDF2 with an optional password to derive a seed.

The seed can then be used directly as a private key, but in typical blockchain usage, it’s actually the seed for BIP32: Hierarchical Deterministic Wallets. BIP32 is cool in its own right, but I’m not going to describe it here.

Why all the steps?

Before I get into the details of implementing BIP39, I want to take a moment to explain what these steps are doing in terms of user benefits:

Checksum

BIP39 was conceived as a way for users to keep their cryptocurrency holdings safe. The checksum helps in two ways:

It protects against mistakes. Typing an incorrect word is unlikely to yield a valid seed phrase because the checksum wouldn’t match. (The carefully chosen word list also helps prevent typos.)
It serves as error correction. If you write down your seed phrase on a napkin but immediately wipe mustard off of your mustache, obscuring one of the words, don’t despair! The checksum will help you recover the lost word.

A good word list

The encoding step involves translating a big long number into words. This is because it’s easier to write down or remember a phrase in your native language.

The word list is carefully selected to help avoid mistakes. Similar words (like “bean” and “bear”) are avoided so sloppy handwriting isn’t an issue, and the first four letters of each word are unique.

Key derivation function

The final step gives the user the option to add a password. This protects the secret even if the seed phrase is compromised.

This step is also computationally expensive (as all password hashing should be) to keep hackers from being able to attack weak passwords.

Implementation details

I outlined the algorithm in broad strokes. In this section, I’ll give the details of the algorithm along with working Go code that implements it.

Checksum

The checksum is the first few bits of the SHA256 hash of the entropy. The number of bits used ensure that the total bits in entropy + checksum is a multiple of 11. We’ll see why in the next section.

In Go:

func computeChecksum(entropy []byte) *big.Int {
	var checksumBits uint = uint(len(entropy) / 4)

	hash := sha256.Sum256(entropy)

	checksum := new(big.Int).SetBytes(hash[:])
	// Right-shift until only checksumBits bits remains.
	checksum.Rsh(checksum, uint(len(hash)*8)-checksumBits)

	return checksum
}

Encoding

With the checksum appended, the whole sequence is encoded in base-2048. This just means every 11 bits (2¹¹ = 2048) is replaced with a word from the word list.

In Go:

wordList := loadWords("english-wordlist.txt")

mnemonic := make([]string, mnemonicLength)
// We're taking bits from the right each time, so our loop has to fill
// the array backwards.
for i := int(mnemonicLength - 1); i >= 0; i-- {
    m := new(big.Int)
    bytes.DivMod(bytes, big.NewInt(2048), m)

    mnemonic[i] = wordList[m.Uint64()]
}

return strings.Join(mnemonic, " ")

Key derivation with PBKDF2

PBKDF2 is used to derive the final seed. The mnemonic phrase is used as the password, and the seed is the string “mnemonic” with an optional password appended. 2048 rounds of HMAC-SHA512 are applied, and the derived key is 64 bytes long.

In Go:

func deriveSeed(mnemonic, password string) []byte {
	return pbkdf2.Key(
		[]byte(mnemonic),            // password
		[]byte("mnemonic"+password), // salt
		2048,                        // iterations
		64,                          // key length
		sha512.New)                  // hash function
}

Full source code

You can find the complete source code here: https://github.com/smarx/bip39.

Let’s use BIP39 more

BIP39 does a lot of great things:

It makes a long seed easy to record, remember, and communicate, especially on paper.
It helps prevent errors and correct errors that occur.
It safeguards the secret against brute-force cracking.

I haven’t seen it used outside of blockchain applications yet, but I’d like to see it used in other cryptography contexts, like securing PGP private keys and verifying fingerprints (e.g. Signal’s “safety numbers”).

BIP39: Mnemonics for Recording Long Keys

Algorithm

Why all the steps?

Checksum

A good word list

Key derivation function

Implementation details

Checksum

Encoding

Key derivation with PBKDF2

Full source code

Let’s use BIP39 more

Me. In your inbox?

Admit it. You're intrigued.

Related posts

Convergent Encryption and Why No One Uses It

Privacy and the Google/Apple Exposure Notification System

Cracking BIP39 Seed Phrases

TOTP: How Most 2FA Apps Work