MGFs, power series, and pushing the expectation through

Contents

I got stuck on what I thought was a simple point while working through a stats textbook! I figured it out, I think, but was this the most straightforward path to understanding? You tell me! In this note, we explore some oft-used facts about moment-generating functions, and explain a connection with tail and concentration inequalities.

I am currently reading through Martin Wainwright’s High-dimensional statistics: A non-asymptotic viewpoint, to fill some gaps in my stat theory knowledge. While I’m only partway through, I can definitely already recommend this book for those wanting to get familiar with some of the modern theoretical machinery needed to validate statistical methods that we use on difficult problems!

The book starts off relatively lightly in Chapter 2 with an overview of fundamental tail and concentration bounds. A simple but powerful idea is that of Chernoff bounding, which combines the moment generating function of a random variable (when it exists), and Markov’s inequality to tightly bound tail probabilities. In reading through this chapter, one quickly comes across the sub-exponential class of distributions: a random variable XX is sub-exponential if E(X)<\mathbb{E}(|X|) < \infty and there exist constants ν>0\nu > 0 and α[0,)\alpha \in [0,\infty) such that E(et(Xμ))eν2t22,for all t(1α,1α),\mathbb{E}(e^{t (X-\mu)}) \leq e^{\frac{\nu^2 t^2}{2}}, \text{for all } t \in \left(-\frac{1}{\alpha}, \frac{1}{\alpha}\right), where μE(X)\mu \coloneqq \mathbb{E}(X) and we use the convention that 10=\frac{1}{0} = \infty.

Following this introduction, we are then exposed to a sufficient condition for XX to be sub-exponential, the so-called Bernstein condition. In particular, XX is said to be bb-Bernstein, for some b>0b > 0, if E(X2)<\mathbb{E}(|X|^2) < \infty, and letting μE(X)\mu \coloneqq \mathbb{E}(X), σ2Var(X)\sigma^2 \coloneqq \mathrm{Var}(X), it holds that E[(Xμ)k]12k!σ2bk2, for all kN{1}.\left|\mathbb{E}[(X - \mu)^k]\right| \leq \frac{1}{2}k! \sigma^2 b^{k-2}, \text{ for all } k \in \mathbb{N} \setminus \{1\}. In proving that this condition implies that XX is sub-exponential, the author employs the “power series decomposition” E(et(Xμ))=n=0tnn!E[(Xμ)n]\mathbb{E}(e^{t(X- \mu)}) = \sum_{n = 0}^\infty \frac{t^n}{n!} \mathbb{E}[(X-\mu)^n], for tt non-zero but sufficiently small. That said, I couldn’t find anywhere in the book where this equality was justified.

At first glance, this equality might seem obvious, as ez=n=0znn!e^{z} = \sum_{n=0}^\infty \frac{z^n}{n!} for all zRz \in \mathbb{R}, so one should just be able to push the expectation through this power series representation. But usually this sort of statement is justified by the dominated convergence theorem (DCT): if YnYY_n \to Y, and the sequence YnY_n is bounded by ZL1Z \in L_1, i.e., YnZ|Y_n| \leq Z for all nNn \in \mathbb{N} and E(Z)<\mathbb{E}(Z) < \infty, then E(Yn),E(Y)<\mathbb{E}(|Y_n|), \mathbb{E}(|Y|) < \infty and E(Y)=limnE(Yn)\mathbb{E}(Y) = \lim_{n \to \infty} \mathbb{E}(Y_n). In our problem, can we just directly apply the DCT? Well, we would like to say that, for small but non-zero tt, E(limnk=0ntkk!(Xμ)k)=limnE(k=0ntkk!(Xμ)k),\mathbb{E}\left(\lim_{n \to \infty}\sum_{k=0}^n \frac{t^k}{k!}(X-\mu)^k\right) = \lim_{n \to \infty}\mathbb{E}\left(\sum_{k=0}^n \frac{t^k}{k!}(X-\mu)^k\right), i.e., Yn=k=0ntkk!(Xμ)kY_n = \sum_{k=0}^n \frac{t^k}{k!}(X-\mu)^k and Y=et(Xμ)Y = e^{t(X-\mu)}. But what should be the dominating variable ZZ? We can easily see that Ynk=0ntnn!XμnetXμY_n \leq \sum_{k=0}^n \frac{|t|^n}{n!} |X-\mu|^n \leq e^{|t| |X-\mu|}, but do we know that the latter is integrable? It wasn’t obvious to me, so let’s dive into the details.

Let XX be a random variable, and define ΨX(t)=E(etX)\Psi_X(t) = \mathbb{E}(e^{tX}), for tDt \in D, where D={tR:E(etX)<}D = \{t \in \mathbb{R}: \mathbb{E}(e^{tX}) < \infty\}. Note that etX0e^{tX} \geq 0, so that its expectation is always well-defined in [0,][0, \infty]. Moreover, 0D0 \in D no matter what, as E(e0X)=1<\mathbb{E}(e^{0X}) = 1 < \infty — this could possibly be the only element of DD. The first interesting fact about ΨX\Psi_X is that its domain DD is always an interval including zero.

Lemma 1: If tD(0,)t \in D \cap (0, \infty), then [0,t]D[0, t] \subseteq D. Similarly, if tD(,0)t \in D \cap (-\infty, 0), then [t,0]D[t, 0] \subseteq D.

Proof: Suppose t>0t > 0, E(etX)<\mathbb{E}(e^{tX}) < \infty, and let s[0,t]s \in [0, t]. Observe that E(esX)=E(esX1[0,)(X))+E(esX1(,0)(X))E(etX1[0,))+P[X<0]ΨX(t)+1<,\mathbb{E}(e^{sX}) = \mathbb{E}(e^{sX} \mathbf{1}_{[0, \infty)}(X)) + \mathbb{E}(e^{sX} \mathbf{1}_{(-\infty, 0)}(X)) \leq \mathbb{E}(e^{tX} \mathbf{1}_{[0,\infty)}) + P[X < 0] \leq \Psi_{X}(t) + 1 < \infty, so sDs \in D. Similarly, if t<0t < 0, E(etX)<\mathbb{E}(e^{tX}) < \infty, then let s[t,0]s \in [t, 0] and note that E(esX)=E(esX1(0,)(X))+E(esX1(,0](X))P[X>0]+E(etX1(,0](X))1+ΨX(t),\mathbb{E}(e^{sX}) = \mathbb{E}(e^{sX} \mathbf{1}_{(0, \infty)}(X)) + \mathbb{E}(e^{sX} \mathbf{1}_{(-\infty, 0]}(X)) \leq P[X > 0] + \mathbb{E}(e^{tX} \mathbf{1}_{(-\infty, 0]}(X)) \leq 1 + \Psi_X(t), so sDs \in D. \quad \blacksquare


MGFs are useful when they exist on an open interval around zero — it is in this case that they actually generate moments! This is formalized in the next result.

Proposition 1: Suppose t>0t^* > 0 is such that (t,t)D(-t^*, t^*) \subseteq D. Then E(Xn)<\mathbb{E}(|X|^n) < \infty, for all nNn \in \mathbb{N}, ΨX(t)=n=0tnn!E(Xn)\Psi_X(t) = \sum_{n = 0}^\infty \frac{t^n}{n!} \mathbb{E}(X^n) holds and this series is absolutely convergent, for all t(t,t)t \in (-t^*, t^*). Consequently, E(Xn)=dndtnΨX(t)t=0\mathbb{E}(X^n) = \left. \frac{d^n}{dt^n} \Psi_X(t) \right|_{t = 0}.

Proof: Let t(0,t)t \in (0, t^*). The key observation is that etxetx+etxe^{t|x|} \leq e^{-tx} + e^{tx}, for all xRx \in \mathbb{R}. Thus, E(etX)ΨX(t)+ΨX(t)<,\mathbb{E}(e^{t|X|}) \leq \Psi_X(-t) + \Psi_X(t) < \infty, given that t,tD-t, t \in D. In other words, by the power series expansion of the exponential, E(etX)=E(n=0tnn!Xn)=n=0tnn!E(Xn)<,\mathbb{E}(e^{t|X|}) = \mathbb{E}\left(\sum_{n=0}^{\infty} \frac{t^n}{n!} |X|^n\right) = \sum_{n=0}^{\infty} \frac{t^n}{n!} \mathbb{E}(|X|^n) < \infty, where the interchange of expectation and summation is permitted by the monotone convergence theorem (MCT). Immediately, we can deduce that E(Xn)<\mathbb{E}(|X|^n) < \infty for all nNn \in \mathbb{N}. We further can see by the DCT that ΨX(t)=n=0tnn!E(Xn)\Psi_X(t) = \sum_{n=0}^\infty \frac{t^n}{n!} \mathbb{E}(X^n) for any t(t,t)t \in (-t^*, t^*), as the partial sums k=0Ntnn!Xn\sum_{k=0}^N \frac{t^n}{n!} X^n, for NNN \in \mathbb{N}, are uniformly bounded by etXL1e^{|t| \cdot |X|} \in L_1.

Notice that we have shown that the power series n=0tnn!E(Xn)\sum_{n=0}^{\infty} \frac{t^n}{n!} \mathbb{E}(X^n) converges absolutely on (t,t)(-t^*, t^*), i.e., its radius of convergence is at least tt^*. By basic facts about power series, ΨX(t)\Psi_X(t) is infinitely differentiable on (t,t)(-t^*, t^*), and E(Xn)=dndtnΨX(t)t=0\mathbb{E}(X^n) = \left. \frac{d^n}{dt^n} \Psi_X(t) \right|_{t = 0}. \quad \blacksquare

Inspecting the proof of Proposition 1, we can see that the real power (no pun intended) comes from etXe^{|tX|} being integrable. The crux of the argument was that {t,t}D\{-t, t\} \subseteq D is equivalent to E(etX)<\mathbb{E}(e^{|tX|}) < \infty, and the latter readily establishes that ΨX\Psi_X permits the power series representation ΨX(s)=n=0snn!E(Xn)\Psi_X(s) = \sum_{n=0}^\infty \frac{s^n}{n!}\mathbb{E}(X^n), for s{t,t}s \in \{-t, t\}. With Lemma 1, we pretty much immediately obtain the following nice chain of equivalences.

Theorem: For any tRt \in \mathbb{R}:

{t,t}D (or say [t,t]D if you like)    E(esX)<,for all s[t,t]    n=0tnn!E(Xn) absolutely converges and equals ΨX(t) for t[t,t]\begin{align*} &\{-t, t\} \subseteq D \text{ (or say } [-|t|, |t|] \subseteq D \text{ if you like)}\\ \iff &\mathbb{E}(e^{|sX|}) < \infty, \textit{for all } s \in [-|t|, |t|] \\ \iff &\sum_{n=0}^\infty \frac{t^n}{n!} \mathbb{E}(X^n) \textit{ absolutely converges and equals } \Psi_X(t) \textit{ for } t \in [-|t|, |t|] \end{align*}

[Notice that this is useful only when t0t \neq 0.]

At this point, we are equipped to understand a more general version of the problem we started out with (side note: are the necessary facts about MGFs just supposed to be common knowledge!?). Namely, we will verify that XX is sub-exponential if and only if the domain DD of ΨX\Psi_X contains an open interval around zero [for those reading the Wainwright text, this is part of Theorem 2.13]. Clearly, if ν>0\nu > 0 and α[0,)\alpha \in [0,\infty) satisfy E(et(Xμ))eν2t22\mathbb{E}(e^{t(X-\mu)}) \leq e^{\frac{\nu^2 t^2}{2}} (notice in particular that this is finite) for t(α1,α1)t \in (-\alpha^{-1}, \alpha^{-1}), then DD contains any pair {t,t}\{-t, t\} for t(0,α1)t \in (0, \alpha^{-1}), and we see that (α1,α1)D(-\alpha^{-1}, \alpha^{-1}) \subseteq D. The following says that the converse result also holds.

Proposition 2: The MGF of XX is defined on D(c,c)D \supseteq (-c, c) for some c>0c > 0 if and only if XX is sub-exponential.

Proof: We have just argued in the “if” direction. For the converse, suppose c>0c > 0 and (c,c)D(-c,c) \subseteq D. By our theorem, it is kosher to use the power series expansion of ΨXμ(t)\Psi_{X - \mu}(t) for any t(c,c)t \in (-c,c) — note by the way that the domain DD of ΨX\Psi_X is equal to that of the MGF of the centered variable ΨXμ\Psi_{X - \mu}, as they differ by a deterministic finite multiplicative factor etμe^{-t\mu}. Further, as the radius of convergence of this power series is at least cc, we can differentiate term by term on (c,c)(-c,c) to obtain the derivative of ΨXμ\Psi_{X - \mu}. Consequently, by Taylor’s theorem with Peano remainder, ΨXμ(t)=ΨXμ(0)+ΨXμ(1)(0)t+ΨXμ(2)(0)2t2+o(t2)=1+σ2t22+o(t2), as t0,\Psi_{X - \mu}(t) = \Psi_{X-\mu}(0) + \Psi^{(1)}_{X - \mu}(0)t + \frac{\Psi^{(2)}_{X - \mu}(0)}{2}t^2 + o(t^2) = 1 + \frac{\sigma^2 t^2}{2} + o(t^2), \text{ as } t \to 0, where σ2=E[(Xμ)2]\sigma^2 = \mathbb{E}[(X - \mu)^2]; recall that all moments are finite by Proposition 1. Similarly, for any ν>0\nu > 0,

exp{ν2t22}=[1+{ν2sexp{ν2s22}}t+12{ν2exp{ν2s22}(1+ν2s2)}t2]s=0+o(t2)=1+ν2t22+o(t2), as t0,\begin{align*} \exp{\left\{\frac{\nu^2 t^2}{2}\right\}} &= \left[1 + \left\{\nu^2 s\exp{\left\{\frac{\nu^2 s^2}{2}\right\}}\right\} t + \frac{1}{2} \left\{\nu^2 \exp{\left\{\frac{\nu^2 s^2}{2}\right\}}\left(1 + \nu^2 s^2\right)\right\} t^2\right]_{s=0} + o(t^2) \\ &= 1 + \frac{\nu^2 t^2}{2} + o(t^2), \text{ as } t \to 0, \end{align*}

as exe^x is smooth. Combining these facts, we obtain for any ν>0\nu > 0, exp{ν2t22}E(et(Xμ))=t22(ν2σ2)+o(t2), as t0.\exp{\left\{\frac{\nu^2 t^2}{2}\right\}} - \mathbb{E}(e^{t(X - \mu)}) = \frac{t^2}{2}(\nu^2 - \sigma^2)+ o(t^2), \text{ as } t \to 0. To make sure this exceeds zero, at least for small tt, we will choose an arbitrary ν>σ\nu > \sigma. Meanwhile, choose δ>0\delta > 0 such that whenever t<δ|t| < \delta, 1t2(exp{ν2t22}E(et(Xμ)))ν2σ22<ϵ,\left|\frac{1}{t^2}\left( \exp{\left\{\frac{\nu^2 t^2}{2}\right\}} - \mathbb{E}(e^{t(X - \mu)})\right) - \frac{\nu^2 - \sigma^2}{2}\right| < \epsilon, where we choose ϵ=ν2σ24>0\epsilon = \frac{\nu^2 - \sigma^2}{4} > 0. We then can see that ϵt2+E(et(Xμ))exp{ν2t22} for all t(δ,δ),\epsilon t^2 + \mathbb{E}(e^{t(X - \mu)}) \leq \exp{\left\{\frac{\nu^2 t^2}{2}\right\}} \text{ for all } t \in (-\delta, \delta), so XX is (ν,1δ)(ν, \frac{1}{\delta})-sub-exponential. \quad \blacksquare

Viewing the formula for the MGF as a power series, we can obtain one more useful characterization of its existence in an open interval around zero.

Proposition 3: The MGF of XX is defined on D(c,c)D \supseteq (-c, c) for some c>0c > 0, if and only if γlim supn(E(Xn)n!)1/n<.\gamma \coloneqq \limsup_{n \to \infty} \left(\frac{\left|\mathbb{E}(X^n)\right|}{n!}\right)^{1/n} < \infty.

Proof: By our theorem, (c,c)D(-c, c) \subseteq D for some c>0c > 0 if and only if XnL1X^n \in L_1 for all nNn \in \mathbb{N} and the radius of convergence RR of the power series n=0tnn!E(Xn)\sum_{n=0}^\infty \frac{t^n}{n!} \mathbb{E}(X^n) is positive. But R=1γR = \frac{1}{\gamma} by standard power series facts, so we conclude that R>0R > 0 if and only if γ<\gamma < \infty. \quad \blacksquare

With this result in hand, we can more easily see that the Bernstein condition implies the sub-exponential property. Indeed, if XX is bb-Bernstein for some b>0b > 0, then lim supn(E[(Xμ)n]n!)1/nblim supn(σ22b2)1/n=b<.\limsup_{n \to \infty} \left(\frac{\left|\mathbb{E}[(X - \mu)^n]\right|}{n!}\right)^{1/n} \leq b \limsup_{n \to \infty} \left(\frac{\sigma^2}{2b^2}\right)^{1/n} = b < \infty. Thus, (1b,1b)D\left(-\frac{1}{b}, \frac{1}{b}\right) \subseteq D, again using that the domains of ΨX\Psi_X and ΨXμ\Psi_{X - \mu} are both DD. By Proposition 2, we are done!

In talking through this with some peers at CMU, I learned (thanks to Tiger Zeng) that we could also directly verify that the Bernstein condition implies E(etX)<\mathbb{E}(e^{|tX|}) < \infty for t(1b,1b)t \in \left(-\frac{1}{b}, \frac{1}{b}\right). At that point we could just use the DCT to justify the interchange of expectation and sum. Since the idea was pretty clever, I’ve decided to write it out here.

Suppose XX is bb-Bernstein for b>0b > 0. We wish to show that E(etXμ)<\mathbb{E}(e^{|t| \cdot |X-\mu|}) < \infty for any t(1b,1b)t \in \left(-\frac{1}{b}, \frac{1}{b}\right), and recall that by the Taylor series for exe^x and the MCT, this expectation always equals n=0tnn!E(Xμn)\sum_{n=0}^{\infty} \frac{|t|^n}{n!}\mathbb{E}(|X-\mu|^n). The difficulty comes from the fact that the Bernstein condition does not speak directly to E(Xμn)\mathbb{E}(|X-\mu|^n), but rather to E[(Xμ)n]\left|\mathbb{E}[(X-\mu)^n]\right|. For nn even, these coincide, but for nn odd, some care is needed.

Explicitly separating the even and odd terms, we have E(etXμ)=k=0t2k(2k)!E(Xμ2k)+k=0t2k+1(2k+1)!E(Xμ2k+1).\mathbb{E}(e^{|t| \cdot |X-\mu|}) = \sum_{k=0}^\infty \frac{|t|^{2k}}{(2k)!}\mathbb{E}(|X-\mu|^{2k}) + \sum_{k=0}^\infty \frac{|t|^{2k + 1}}{(2k + 1)!}\mathbb{E}(|X-\mu|^{2k + 1}). We show these sums are both finite in turn. Starting with the even terms, since XX is bb-Bernstein, k=0t2k(2k)!E[(Xμ)2k]=1+k=1t2k(2k)!E[(Xμ)2k]1+σ22k=1t2kb2k2=1+σ2t2/21t2b2,\sum_{k=0}^\infty \frac{t^{2k}}{(2k)!}\mathbb{E}[(X-\mu)^{2k}] = 1 + \sum_{k=1}^\infty \frac{t^{2k}}{(2k)!}\mathbb{E}[(X-\mu)^{2k}] \leq 1 + \frac{\sigma^2}{2} \sum_{k=1}^{\infty}t^{2k}b^{2k - 2} = 1 + \frac{\sigma^2 t^2 / 2}{1 - t^2b^2}, by summing the geometric series, noting that t2b2=(tb)2<1t^2 b^2 = (|t|b)^2 < 1 by assumption. For the odd terms, the trick is to forcefully introduce even moments using Cauchy-Schwarz: for any kNk \in \mathbb{N}, E(Xμ2k+1)=E(Xμk+1Xμk){E[(Xμ)2k+2]E[(Xμ)2k]}1/2,\mathbb{E}(|X-\mu|^{2k + 1}) = \mathbb{E}(|X - \mu|^{k + 1} |X - \mu|^k) \leq \left\{\mathbb{E}[(X - \mu)^{2k + 2}]\mathbb{E}[(X - \mu)^{2k}]\right\}^{1/2}, by Cauchy-Schwarz. Thus, as XX is bb-Bernstein, E(Xμ2k+1)σ2b2k12(2k)!(2k+2)!(2k+1)!σ2b2k1,\mathbb{E}(|X-\mu|^{2k + 1}) \leq \frac{\sigma^2 b^{2k-1}}{2} \sqrt{(2k)! (2k+2)!} \leq (2k + 1)! \sigma^2 b^{2k - 1}, since (2k)!(2k+2)!=(2k+1)!2k+22k+1<2(2k+1)!\sqrt{(2k)! (2k+2)!} = (2k + 1)! \sqrt{\frac{2k+2}{2k + 1}} < 2 (2k+1)! for all kNk \in \mathbb{N}. Therefore, k=0t2k+1(2k+1)!E(Xμ2k+1)E(Xμ)+σ2k=1t2k+1b2k1=E(Xμ)+bσ2t31tb.\sum_{k=0}^\infty \frac{|t|^{2k+1}}{(2k+1)!}\mathbb{E}(|X-\mu|^{2k+1}) \leq \mathbb{E}(|X - \mu|) + \sigma^2\sum_{k=1}^{\infty}|t|^{2k+1}b^{2k-1} = \mathbb{E}(|X - \mu|) + \frac{b \sigma^2 |t|^3}{1 - |t|b}. Putting everything together, we have E(etXμ)<\mathbb{E}(e^{|t| \cdot |X-\mu|}) < \infty. Phew!