TOTP: How Most 2FA Apps Work
by Steve Marx on
If you’ve ever used a two-factor auth (2FA) app like Google Authenticator, you’ve probably used the “time-based one-time password” (TOTP) algorithm. It’s a conceptually simple (and awesome!) algorithm, but it’s harder to implement than you’d expect.
Read on, and I’ll show you how it works.
Just to make sure I got all of this right, give the demo below a whirl. It should work with whatever 2FA app you have on your phone.
What problem is TOTP solving?
TOTP is most commonly used as a second factor in authentication. Your password is the first factor, “something you know”, and your device is the second factor, “something you have”. You prove that you have the device by generating a one-time password on it. As the name suggests, a one-time password won’t work later. This means somebody who wants to break into your account really needs to steal your device—or at least the secret stored on it.
A further design requirement for TOTP1 was that it produce short, numeric codes. These are easy for a user to type.
The TOTP algorithm
The basic idea behind TOTP is to use a shared secret to produce digital signatures2 of timestamps. At a high level, the algorithm is as follows:
- The secret is produced on the server and communicated to the 2FA client. This is usually done via a QR code, but you can also just type it in.
- To authenticate, both the server and client use that secret to produce a signature of the current time. An exact time would be hard to match, so the timestamp is (typically) at the granularity of 30 second intervals.
- The user enters the signature produced on the client. If it matches the value the server produced, then the user has been successfully authenticated.
The signature method used is a hash-based message authentication code (HMAC), specifically HMAC-SHA1. A secure message authentication codes (MAC) is unforgeable, so producing a valid MAC is proof that you are in possession of the correct secret key.
In pseudocode, producing a one-time password looks like this:
truncate( HMAC-SHA1( secret, uint64(time() / INTERVAL) ) ) mod 10**DIGITS
time() yields the number of seconds since the Unix epoch,
INTERVAL is the number of seconds each one-time password should be valid, and
DIGITS is the length of the code.
The only part I haven’t explained is the
truncate() function, and that’s because it’s awful. (But I’ll explain it in the next section.)
truncate(). It uses the jsSHA library:
// compute a random secret const secret = new Uint8Array(20); crypto.getRandomValues(secret); // HMAC-SHA1 with our generated secret const hmac = new jsSHA("SHA-1", "HEX"); hmac.setHMACKey(secret, "UINT8ARRAY"); // current time, in seconds since the Unix epoch const time = Math.floor((new Date()).getTime() / 1000); // how many 30-second intervals const interval = Math.floor(time / 30); // pad to 8 bytes = 16 hex characters hmac.update(interval.toString(16).padStart(16, "0")); const digest = hmac.getHMAC("UINT8ARRAY");
The silliness of TOTP’s truncate()
At this point, we have
digest, the 20-byte signature from HMAC-SHA1. We need to get to a short numeric code that a user can type in. This is typically a 6-digit number.
You might be thinking to yourself, “That seems easy. Let’s just treat the digest as a number and take the last 6 digits.” If you were thinking that, congratulations! You just invented a better algorithm than TOTP.
- Grab the last 4 bits of the digest. Call that
- Starting at byte
offset, extract 32 bits of the digest.
- Clear the first bit of that result.
For some reason, this process is called “truncation” in TOTP, even though it has nothing to do with truncation.
Here’s some code that does it:
// "truncate" to 31 bits const offset = digest & 0xF; // last 4 bits // take 4 bytes starting at the specified byte offset, // but chop off the first bit so we have only 31 bits const v = ((digest[offset] & 0x7F) << 24) + // 0x7F = 01111111 (digest[offset + 1] << 16) + (digest[offset + 2] << 8) + digest[offset + 3];
This is probably the most difficult part of the implementation to get right. If you’re reading this because you’re struggling with your own TOTP implementation, I hope the preceding code is helpful.
With this “truncated” value in hand, we can finally extract our 6-digit code:
// take the last 6 digits, with leading 0s as necessary const codeString = (v % 10**6).toString(10).padStart(6, "0");
What about the QR code?
To me, one of the most magical parts of using TOTP is the QR code. Just point your phone at the screen, and you’re set! The QR code saves the user the trouble of having to type in the secret that was generated on the server.
The content of the QR code is simply this:
LABEL is just a friendly name that shows up in your 2FA app. In my demo above, the label is always “smarx-totp-demo”.
SECRET is the shared secret that the server generated. It’s encoded in base32, which I’ll describe in the next section.
Other parameters are possible, such as the number of digits the code should be, but they seem to be rarely used in the wild.
Here’s the part of my demo code that displays the QR code, using the QRCode.js library:
new QRCode( document.getElementById("qrcode"), "otpauth://totp/smarx-totp-demo?secret=" + base32.encode(secret));
Secret encoding: base32
A cryptographic secret is just a bunch of random bytes. If such a secret is to be displayed to the user, or transmitted through a URL, it needs to be encoded somehow. The most typical encodings you’ll see are hexadecimal, which is straightforward, and base64, which is more compact.
TOTP uses base32 as something of a compromise. It isn’t as efficient as base64—secrets will take more characters to encode—but it avoids some visual ambiguities of base64. For example, a base64-encoded value might include the sequence “Il1O0”. Depending on the font, you might not be sure what each character is.
There’s no official standard for what alphabet is used for base32 encoding, but apps that use TOTP seem to have standardized around the following from RFC 4648:
It’s just the English alphabet with the digits 2-7 added to the end.
Tip: Cryptography and encoding
Getting encoding and decoding right is one of the most frustrating parts of implementing cryptography.
A good rule of thumb is to, as much as possible, work directly with bytes. Treat hexadecimal, base64, etc., as serialization formats. You should use them to transport or store values, e.g. in a URL or a JSON configuration file. But internally, you want to use bytes.
This keeps things simpler and helps to avoid mistakes like hashing a hexadecimal string when you meant to hash the actual bytes it represents.
Universal 2nd Factor (U2F) is better
Aside from the silly truncation process, TOTP suffers from a genuine security drawback: it’s based on symmetric cryptography.
Symmetric cryptography just means there’s a single secret that both the server and the client know. If a hacker can read this secret from the server’s database, they can generate 2FA codes themselves.
Asymmetric (or “public key”) cryptography would be better. In that model, the user generates a private key and shares the corresponding public key with the server at the time of sign-up. From then on, the server can always authenticate the user by issuing a challenge (e.g. “sign this random string”) that can only be solved by knowing the private key. Because the private key stays on the local device, there’s no opportunity for it to be stolen by a hacker.
This is precisely how Universal 2nd Factor (U2F) works. With the addition of hardware dongles that do the typing for you, this solution can be both more convenient and more secure.
Let me know if you’d be interested in a follow-up post digging into U2F!
- RFC 4226: HOTP spec
- RFC 6238: TOTP spec
- RFC 4648: the base16, base32, and base64 data encodings
- Why You Should Never Use Google Authenticator Again
Technically, TOTP was designed to use the HMAC-based one-time password (HOTP) algorithm, which in turn had this design goal. ↩︎
I’m being a little sloppy with terminology here. At least under some definitions, a “digital signature” refers only to the use of public key cryptography. This would more accurately be called a message authentication code (MAC), but I think “signature” is easier to understand. ↩︎
Again, technically this is inherited from HOTP. ↩︎