PMAC - A Parallelizable MAC - Background

PMAC: Background

What is PMAC?

It's a Parallelizable MAC.

So what's a MAC?

The acronym stands for message authentication code. A (deterministic) message authentication code is a function of two strings. The first we'll call the key and the second we'll call the message. The value Tag= MAC(K,M) is a string of some fixed length (e.g., 64 bits). A MAC is usually used like this. A sender and a receiver share a key K. When the sender wants to send a message M to the receiver, he sends along with M the string Tag=MAC(K,M). When the receiver receives a message M* and its tag Tag*, the receiver computes the "expected" tag Tag**=MAC(K,M*) for M*. If the expected tag is identical to the received tag then the receiver assumes that the received message M* really did originate with the sender. If the tags are different then the receiver regards the message M* as a forgery. In that case the receiver may want to ignore the message, or close down the connection, or take some other action.

What's the traditional way to make a MAC?

The most well known way to MAC uses a block cipher, say AES, and is called the CBC MAC. (The CBC stands for cipher block chaining.) It's defined, for example, in ISO 9797, ANSI X9.9, and ANSI X9.19. In what I'll call the basic CBC MAC, the message M must have a number of bits which is a positive multiple of the block length (128 bits for AES). The message is regarded as a sequence of 128-bit words, M=M[1]M[2]...M[m]. The MAC, let us write CBC(K,M), is defined as C[m], where C[i]=AES(K,C[i-1] xor M[i]) for i>0 and and C[0]=ZERO (the block of 128 zero-bits).

The basic CBC MAC is usually "embellished" a bit. First, the domain is extended to arbitrary bits strings, usually by padding with a 1-bit and then the minimal number of 0-bits to reach a multiple of 128 bits. Second, a particular security problem is fixed. The basic CBC MAC works correctly when all the strings you want to MAC have one fixed length (this is a well-known result of [Bellare, Kilian, and Rogaway])) but it does not work correctly when message lengths may vary. (For example, given the MAC (under some unknown key) of ZERO, do you see how to forge the string ZERO||ZERO?) To fix this problem extra processing is often done to C[m] (as it was defined above), such as enciphering it under a separate key. Third, many standards allow you to take a prefix of the full n-bit MAC (where n is the block length). The three changes above represent one way to fix up the basic CBC MAC. A more recent suggestion due to [Black, Rogaway] more efficiently deals with padding and length-variability.

Are there ways to MAC besides the CBC MAC?

Yes, numerous ways. The most well-known construction is HMAC, which is due to [Bellare, Canetti, Krawczyk]. It has been adopted in IPSec, and was recently made a draft FIPS standard. Another MAC is UMAC, which is due to [Black, Halevi, Krawczyk, Krovetz, Rogaway]. It's the reigning speed champion for modern CPUs.

Why was PMAC invented?

PMAC addresses a particular problem of the CBC MAC: that it is inherently sequential. HMAC is the same way. Going back to the earlier description of the basic CBC MAC, note that you can't compute C[i] until you've finished computing C[i-1]. This "bottleneck" can be a problem if you want to MAC at very high speeds. Dedicated hardware will be limited in speed by the latency of the underlying cryptographic primitive. And modern processors which offer up lots of registers and instruction-dispatch units will be limited by the amount of parallelism there is to exploit in the underlying block cipher. PMAC was designed to address this issue, being fully parallelizable. In saying that PMAC is fully parallelizable we mean that you are effectively unlimited in how much parallelism you can extract.

We didn't want to gain parallel efficiency at the cost of serial efficiency, and PMAC is about as efficient as the CBC MAC in serial environments. In making a modern MAC, we were also careful to fix the limitations of the basic CBC MAC, giving a construction that works correctly across arbitrary bit strings.

PMAC was designed largely in response to NIST's modes-of-operation activities. In particular, we thought it would be nice to provide NIST with a parallelizable MAC that looked reasonably conventional. In particular, one route towards making a parallelizable MAC is the "Carter-Wegman approach" (which you can read about in the paper on UMAC, for example). But good MACs from the Carter-Wegman approach can be complicated to spec or implement well, and they don't look like conventional modes at all.

From where does PMAC spring?

PMAC follows several related papers. [Bellare, Guerin, Rogaway] was the first to draw attention to the issue of parallelizability in MAC constructions and to offer up a fully parallelizable MAC. PMAC is a descendent of their XOR MACs. Two further constructions build on the XOR MAC, and PMAC, in turn, builds on them. [Bernstein] ("How to stretch random functions") gives a deterministic version of the XOR MAC. [Gligor, Donescu], in their XECB construction, demonstrate an alternative to concatenation-style encoding of message-blocks and message-indices. Finally, papers by [Petrank, Rackoff] and [Black, Rogaway] focussed attention on dealing nicely with strings of varying and arbitrary bit lengths.

Who invented PMAC?

I did (Phil Rogaway). But I worked with John Black, and the academic paper on PMAC will authored by the two of us.

What are some of PMAC's properties?

PMAC has been designed to simultaneously have lots of desirable properties. These include the following.

PMAC is actually more than a MAC: its a pseudorandom function. Roughly said, this means that if K is random and unknown to the adversary, f(x)=PMAC(K,x) will look like a random function.
PMAC is fully parallelizable, as we've explained.
PMAC works on any bit string. In particular, you don't need to think about padding or issues of length-variability; it's been taken care of for you.
PMAC works with any block cipher.
PMAC provides a tag with as many bits as you want, up to the block length n of the underlying block cipher.
PMAC makes an optimal number of block-cipher calls for a conventional mode of operation, namely, roundup(|M|/n). (Actually, the formula is wrong for the empty string, which takes one block cipher call.)
PMAC is deterministic: it uses no counter/IV/nonce and no random bits. This has been true in all traditional MACs.
PMAC uses only a single block-cipher key, and every invocation of the block cipher is keyed by that one key.
PMAC only use the forward direction of the underlying block cipher.
PMAC key setup is very cheap: one block-cipher call and, if desired, a few XORs and conditional (128-bit) shifts.
PMAC generates a sequence of offsets, which it does in a very cheap way. Each offset is computed from the previous one either by xoring the previous offset by a value looked up in a small table or by doing a few shifts and xors.
PMAC is nearly endian-neutral: the scheme can be implemented just as fast on a big-endian machine or on a little-endian machine.
PMAC doesn't need much memory to run (and a memory-minimal implementation doesn't give up much speed).
PMAC avoids 128-bit addition (which is endian-biased and can be expensive in software or dedicated hardware). It uses xors instead.
PMAC is incremental, in the sense of [Bellare, Goldreich, and Goldwasser], with respect to several common operations.
PMAC is simple to understand and implement. It uses GF(2^128) arithmetic and a Gray code, but it all comes down to some xors and shifts. One doesn't have to understand the mathematics to implement the scheme.
PMAC is provably secure. It provably meets its goals, as long as the underlying block cipher meet a standard cryptographic assumption.

How do you compare OCB and PMAC?

These are completely different beasts. OCB provides privacy+authenticity, whereas PMAC gets you authenticity alone. You don't need PMAC if you're using OCB.

Please describe how PMAC works

Assume you're using AES (with whatever key length you want). First a little notation. Where L is a 128-bit string I'll write L<<1 for the left shift of L by 1 bit (where the first bit vanishes and a 0 comes into the last bit). Write L>>1 for the the right shift of L by 1 bit (where the last bit vanishes and a 0 comes into the first bit). For a nonzero number i, let ntz(i) be the number of trailing 0-bits in the binary representation of i (so, for example, ntz(52)=2, because 52 is 110100 in binary, which ends in two zeros). Let ZERO be a block of 128 zero-bits. Where x is a string of at most 128 bits, let pad(x) be x if x has exactly 128 bits, and let it be x followed by a 1-bit followed by 127-|x| 0-bits otherwise.

Let M be the message you want to MAC, and let K be the PMAC key, which is just an AES key. Let L be AES(K, ZERO). Let L(0) be L and, for i>0, let L(i) be L<<1 if the first bit of L(i-1) is 0, and let L(i) be (L(i-1)<<1) xor 0x00000000000000000000000000000087 otherwise. Let L(-1) be L>>1 if the last bit of L is 0, and let L(-1) be L>>1 xor 0x80000000000000000000000000000043 otherwise. Break the message M into m≥1 blocks M[1]M[2]...M[m] where each block except the last one has 128 bits. The last one may have fewer than 128 bits. Now we MAC as follows.

    
    function pmac (K, M)
    begin
        Offset = ZERO
        Σ = ZERO
        for i = 1 to m-1 do begin
            Offset = Offset xor L(ntz(i))
            Σ = Σ xor AES(K, Offset xor M[i])
        end
        Σ = Σ xor pad(M[m])
        if |M[m]|=n then Σ = Σ xor L(-1)
        FullTag = AES(K, Σ)
        return a prefix of FullTag (of the desired length)
    end

Might the algorithm be changed?

No; the PMAC algorithm is in its final form. The writeup may continue to evolve here and there, but only to improve the exposition. The algorithm itself will not be changed.

What about patents?

I abandoned all patent filings pertaining to PMAC many years ago. As far as I know, there is no IP relevant to using PMAC — nothing owned by me or by anyone else.

Where can I learn more?

Read the paper, of course!

Back to the PMAC homepage.