Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

7.1 Integrals of Sums and Products

Sums

Suppose that f(x)=ag(x)+bh(x)f(x) = a g(x) + b h(x) for some constants aa and bb, and some factors g(x)g(x) and h(x)h(x). What is all xf(x)\sum_{\text{all } x} f(x)? What about all xf(x)dx\int_{\text{all } x} f(x) dx?

Integrals and sums are linear operations. That means that, like the expectation, the integral of a sum is the sum of the integrals, the integral of a linear function is the linear function of the integral, and integrals of linear combinations are linear combinations of integrals. The same rules apply for summations. In all cases, the area under the sum of two curves is the sum of the area under each curve.

Notice the similarity to linearity of expectation. Expectations are linear since they are weighted averages, and averages are sums/integrals.

Products

Anytime we go to evaluate an expectation we need to compute a sum, or integral, of a product of two functions (typically, the function of XX whose expectation we are interested in, and the distribution function of XX).

So, suppose f(x)=g(x)×h(x)f(x) = g(x) \times h(x). What is all xf(x)\sum_{\text{all } x} f(x)? What about all xf(x)dx\int_{\text{all }x} f(x) dx?

We’ll state the general rule for integrals, provide some examples, then will illustrate an analogous rule for sums.

Integration by Parts

Integration by parts swaps which factor is differentiated. It replaces hh' with hh inside the integral, and replaces gg with gg'. In essence, we are moving the derivative from hh to gg. To move the derivative we will need to know hh, which is the anti-derivative of hh'. In probability hh' is often a PDF, so hh is often a CDF.

To apply integration by parts, always apply the same steps. These are:

  1. Write ff as a product of two functions: f(x)=g(x)h(x)f(x) = g(x) h'(x). Look for a pair of factors where:

    • g(x)g'(x) is a simpler function than g(x)g(x)

    • h(x)h'(x) is easy to integrate to recover h(x)h(x). If we can’t find the anti-derivative, then we won’t be able to integrate by parts.

  2. Find g(x)=ddxg(x)g'(x) = \frac{d}{dx} g(x).

  3. Find h(x)=xh(s)dsh(x) = \int^x h'(s) ds.

  4. Try to evaluate abg(x)h(x)dx\int_{a}^b g'(x) h(x) dx.

  5. Combine terms.

Tail Integrals

Any time we ask for the expectation of a continuous, nonnegative random variable we need to evaluate the integral x=0bxPDF(x)dx\int_{x = 0}^{b} x \text{PDF}(x) dx where bb is the upper endpoint of the support. In this case integration by parts produces an alternate formula for the expectation using the area under the survival function.

Setting g(x)=xg(x) = x gives g(x)=1g'(x) = 1. Setting h(x)=PDF(x)h'(x) = \text{PDF}(x) sets h(x)=CDF(x)h(x) = \text{CDF}(x). Then, integrating by parts:

x=0bxPDF(x)dx=xCDF(x)0bx=0bCDF(x)dx=[b×CDF(b)0×CDF(0)]x=0bCDF(x)dx=[b×10×0]x=0bCDF(x)dx=bx=0bCDF(x)dx=x=0b1CDF(x)dx=x=0b1Pr(Xx)dx=x=0bPr(X>x)dx.\begin{aligned} \int_{x = 0}^b x \text{PDF}(x) dx & = x \text{CDF}(x) \Big|_{0}^b - \int_{x = 0}^b \text{CDF}(x) dx \\ & = [b \times \text{CDF}(b) - 0 \times \text{CDF}(0)] - \int_{x = 0}^b \text{CDF}(x) dx \\ & = [b \times 1 - 0 \times 0] - \int_{x = 0}^b \text{CDF}(x) dx \\ & = b - \int_{x = 0}^b \text{CDF}(x) dx \\ & = \int_{x = 0}^b 1 - \text{CDF}(x) dx \\ & = \int_{x = 0}^b 1 - \text{Pr}(X \leq x) dx \\ & = \int_{x = 0}^b \text{Pr}(X > x) dx. \end{aligned}

This formula holds for any bb, so holds for any nonnegative random variable. We can extend the upper bound of the integral to infinity since Pr(X>x)=0\text{Pr}(X > x) = 0 if xx is above the upper bound of the support.

For example, if XExponential(λ)X \sim \text{Exponential}(\lambda) then Pr(X>x)=eλx\text{Pr}(X > x) = e^{-\lambda x} so E[X]=x=0eλx=1λ.\mathbb{E}[X] = \int_{x = 0}^{\infty} e^{-\lambda x} = \frac{1}{\lambda}.

Summation by Parts

Most integral rules match a rule for sums. The integration by parts strategy can also be used for sums of products.

Consider a sum of the form

x=1ng(x)h(x)\sum_{x = 1}^n g(x) h(x)

for nn possibly infinite.

The discrete analog to a derivative is a difference. Let:

g(x)=g(x)g(x1),g(0)=0.g'(x) = g(x) - g(x - 1), \quad g(0) = 0.

Then, just as derivatives are inverted by integrals, differences are inverted by running sums:

g(x)=(g(x)g(x1))+(g(x1)g(x2))+...(g(2)g(1))+g(1)=y=1xg(y).g(x) = (g(x) - g(x - 1)) + (g(x - 1) - g(x - 2)) + ... (g(2) - g(1)) + g(1) = \sum_{y=1}^x g'(y).

So, we can write our original sum:

x=1ng(x)h(x)=x=1n[y=1xg(y)]h(x)=x=1ny=1xg(y)h(x).\sum_{x = 1}^n g(x) h(x) = \sum_{x = 1}^n \left[\sum_{y = 1}^x g'(y) \right] h(x) = \sum_{x = 1}^n \sum_{y=1}^x g'(y) h(x).

To recover the discrete analog to integration by parts, reverse the order of the sum. First, notice that the sum runs over all pairs x,yx, y such that 1yxn1 \leq y \leq x \leq n. We can write this sum:

x=1ny=1xg(y)h(x)=1yxng(y)h(x)=y=1nx=yng(y)h(x).\sum_{x = 1}^n \sum_{y=1}^x g'(y) h(x) = \sum_{1 \leq y \leq x \leq n} g'(y) h(x) = \sum_{y = 1}^n \sum_{x = y}^n g'(y) h(x).

Since g(y)g'(y) does not depend on xx:

y=1nx=yng(y)h(x)=y=1ng(y)[x=ynh(x)].\sum_{y = 1}^n \sum_{x = y}^n g'(y) h(x) = \sum_{y = 1}^n g'(y) \left[\sum_{x = y}^n h(x) \right].

The sum, x=ynh(x)\sum_{x = y}^n h(x) is analogous to an anti-derivative. To make the analogy explicit, let H(y)=x=1y1h(x)H(y) = \sum_{x = 1}^{y-1} h(x). Then x=ynh(x)=H(n+1)H(y)\sum_{x = y}^n h(x) = H(n+1) - H(y) so:

y=1ng(y)[x=ynh(x)]=y=1ng(x)[H(n+1)H(y)]=H(n+1)(y=1ng(x))y=1ng(y)H(y).\begin{aligned} \sum_{y = 1}^n g'(y) \left[\sum_{x = y}^n h(x) \right] & = \sum_{y = 1}^n g'(x) [H(n+1) - H(y)] \\ & = H(n+1) \left( \sum_{y = 1}^n g'(x) \right) - \sum_{y=1}^n g'(y) H(y). \end{aligned}

Finally, since y=1ng(x)=g(n)g(0)\sum_{y=1}^n g'(x) = g(n) - g(0):

x=1ng(x)h(x)=g(n)H(n+1)y=1ng(y)H(y).\sum_{x = 1}^n g(x) h(x) = g(n) H(n+1) - \sum_{y=1}^n g'(y) H(y).

The summation by parts formula is perfectly analogous to the integration by parts formula. You’ll prove integration by parts on your homework using the same steps, but replacing differencing with derivatives, and summation with integration.

Tail Sums

Summation by parts is most often used to find the expectation of count valued random variables.

Suppose that X{0,1,2,3,...,n}X \in \{0,1,2,3,...,n\}. Then:

E[X]=x=0nxPMF(x)\mathbb{E}[X] = \sum_{x = 0}^{n} x \text{PMF}(x)

where nn is possibly infinite.

Then, let:

g(x)=x,h(x)=PMF(x).g(x) = x, \quad h(x) = \text{PMF}(x).

Then:

g(x)=g(x)g(x1)=x(x1)=1g'(x) = g(x) - g(x - 1) = x - (x - 1) = 1

and:

H(y)=x<yh(x)=x<yPMF(x)=CDF(y1).H(y) = \sum_{x < y} h(x) = \sum_{x < y} \text{PMF}(x) = \text{CDF}(y-1).

It follows that:

E[X]=n×CDF(n)y=1n1×CDF(y1)=n×1y=1n1×CDF(y1)=ny=1nCDF(y1)=y=1n1CDF(y1)=y=1n(1Pr(X<y))=y=1nPr(Xy).\begin{aligned} \mathbb{E}[X] & = n \times \text{CDF}(n)- \sum_{y=1}^n 1 \times \text{CDF}(y - 1) \\ & = n \times 1 - \sum_{y=1}^n 1 \times \text{CDF}(y-1) = n - \sum_{y=1}^n \text{CDF}(y-1) \\ & = \sum_{y = 1}^n 1 - \text{CDF}(y - 1) = \sum_{y = 1}^{n} (1 - \text{Pr}(X < y)) \\ & = \sum_{y = 1}^{n} \text{Pr}(X \geq y). \end{aligned}

The tail sum formula is often proved using a different algebraic argument. Consider the sum:

E[X]=x=0nxPMF(x)=x=1nxPMF(x).\mathbb{E}[X] = \sum_{x = 0}^{n} x \text{PMF}(x) = \sum_{x = 1}^{n} x \text{PMF}(x).

For concision, let p(x)=PMF(x)p(x) = \text{PMF}(x). Then the sum has the form:

E[X]=1×p(1)+2×p(2)+3×p(3)+...n×p(n).\mathbb{E}[X] = 1 \times p(1) + 2 \times p(2) + 3 \times p(3) + ... n \times p(n).

We can organize this sum:

E[X]=p(1)+...p(2)+p(2)+...p(3)+p(3)+p(3)+...p(n)+p(n)+p(n)+p(n)+...p(n)\begin{aligned} \mathbb{E}[X] = & p(1) + ... \\ & p(2) + p(2) + ... \\ & p(3) + p(3) + p(3) + ... \\ & \vdots \\ & p(n) + p(n) + p(n) + p(n) +... p(n)\end{aligned}

Summing down the columns instead of across the rows returns the tail sum formula:

E[X]=(x=1np(x))+(x=2np(x))+(x=3np(x))+...+(p(n))=Pr(X1)+Pr(X2)+...+Pr(Xn)=y=1nPr(Xy).\begin{aligned} \mathbb{E}[X] & = \left(\sum_{x = 1}^n p(x) \right) + \left(\sum_{x = 2}^n p(x) \right) + \left(\sum_{x = 3}^n p(x) \right) + ... + (p(n)) \\ & = \text{Pr}(X \geq 1) + \text{Pr}(X \geq 2) + ... + \text{Pr}(X \geq n) = \sum_{y=1}^n \text{Pr}(X \geq y). \end{aligned}

This proof is less general, but more transparent.