In this level 0 post, we look at the discrete Fundamental Theorem of Calculus and see an application that brings together combinatorics and sums of cubes (and higher powers)
For quite some time now, I have thought of mathematics as a subject that gets better as you learn it. Starting as a child, it’s all learning that there are these numbers that exist in a particular order (counting), then you learn different ways of combining numbers together (arithmetic), eventually working your way up to considering variables and the rules of algebra. It’s all rules to memorize until you get to the good stuff: proofs and theorems.
One of the main transition points in the experience of math students is calculus. In my experience, calculus was when I started to feel like I was doing math instead of merely learning it. This difference can largely be attributed to the ideas in the Fundamental Theorem of Calculus (FTC). Now, it is not necessary to know calculus in order to follow this post, basic algebra should work just fine. But the essence of the FTC is that there is basic information about how a function behaves within some interval which is encoded on the boundary of that interval. To see more precisely what I mean, here is the FTC:
You don’t have to understand what all the symbols mean to understand the essence. On the left-hand side, is the slope of the function , and the rest of what is written amounts to considering how the slope behaves for all the values of in the region of the number line between and . On the right-hand side, you see the function evaluated just at the boundary (i.e. the end points) of the region. So information about how the function behaves within a region is encoded at the endpoints.
There are many generalizations of the FTC to regions in the plane or on some funky surface, but recently I learned about the discrete FTC, that is the version that applies on the whole numbers. The idea is the same, but this time we will take a deeper dive into what everything means. First, we should see that a function on the whole numbers is a sequence of numbers, which we will write as , where Next, we define the forward difference of a sequence to be
If you have some experience with calculus, you might recognize that this looks a lot like the definition of the derivative, but with the increment (usually called ) equal to 1. This is no coincidence. Notice also that the forward difference of a sequence is also a sequence. We will care a lot about sums of sequences, as this will form the basis of the discrete FTC. Actually, we can just prove it right now. Let’s see what the sum of the forward difference of a sequence looks like.
We can now use the definition of the forward difference to see some simplification.
Notice that there is a and a , which will cancel, and if we wrote down another term, we would see that there is a to cancel the , and this carries on all the way up until the last term. In the second-to-last term, there will be a to cancel the in the last term. We call this a telescoping sum because all the internal pieces cancel, and we are left with only the endpoints (sound familiar?).
That concludes the proof! We have shown that the information about the forward difference of a sequence, for all values of between 0 and , is encoded in the original sequence at either end of the range of .
This is all well and good, but let’s get to using the discrete FTC. I have one application in mind, but we will need one more bit of information first. Let’s find the forward difference of .
This should be looking familiar to those who know calculus, but somethings not quite right. To make this look more like ordinary calculus, we don’t really want the extra +1 at the end. Luckily, there is a relatively easy way of getting rid of it. Enter the falling power (in this case, the falling square). It is defined to be . Let’s try the forward difference now.
That looks a lot better, and more reminiscent of what you would see in a first calculus course. You might notice that instead of , I wrote the first falling power of These are the same, but I wrote it like that to suggest the rule for the more general falling power. The falling kth power is , and the rule for the forward difference is
If you don’t believe me, try it out for a few small powers, or even try to prove it. We can combine this with the discrete FTC, and then we will be ready for our application.
For , the (k+1)st falling power of 1 vanishes because you have a factor of 0, so I left it out.
Combinatorics and Cubes
There are many wonderful visual proofs, or proofs without words. Some of the more famous ones involve sums of the first numbers raised to some power , for example, the sum of the first whole numbers:
These are fun to look at and construct, but these particular proofs without words are limited by our own brains. The best we can do is go up to a sum of cubes, because you can easily draw a cube, and people will understand what you mean. Any higher powers and you are trying to draw 4-dimensional hypercubes, and no one wants to parse a visual proof like that (and certainly not a proof with 5- or 6-dimensional hypercubes!). Beyond cubes, we are forced to rely on symbols instead of pictures. But we have just developed a tool that looks a lot like summing the first numbers to the kth power. Let’s use the discrete FTC with falling powers!
The kth falling power is not the same thing as the kth power, so we will have to dot some i‘s and cross some t‘s before we get our general result. Let’s see what the difference between the kth falling power and the kth power is.
This is a polynomial in , with some coefficients that I have labelled . Let’s see the polynomial for the first few values of .
These coefficients are useful in combinatorics. They even have a name. The coefficient of in the expansion of the kth falling power of is called the Stirling number of the first kind, or for short. The Stirling numbers are useful in combinatorics for counting the number of ways that you can group numbers into cycles. A cycle is a group of numbers which is cyclic in the sense that if you cycle the numbers in the front to the back, you consider that the same cycle. For example, is the same as or because to get from one to the other, you just take the number in front and put it on the end (cycling them). The cycle is not the same as the cycle because you cannot cycle numbers to get from one to the other.
In order to use our discrete FTC to do these sums, we need to express in terms of falling powers of . This is not too difficult for a small power, like 3. Clearly, the 3rd falling power provides the term, then to get rid of any stray terms, you use the 2nd falling power, and finally the 1st falling power cleans up the terms (there is no constant term). In this example, we see that
Then, we can use the discrete FTC to add up the first cubes, and we find
We now have the tools to perform whichever sums we want, so long as we can find the proper way to add up the falling powers to get . For larger , this quickly becomes quite a daunting task. If only there was an easier way! Well, let’s look just a bit harder. I wrote a little code that generates the coefficients of the th falling powers of which together add up to . Here they are for the first 7 values of :
There are some patterns in the numbers, for sure. It looks like for , the coefficient is and for , the coefficient is . There isn’t an obvious overall pattern, though, so I will let you in on a little secret to finding patterns in a list of integers: copy and paste them into the Online Encyclopedia of Integer Sequences (OEIS). If you paste the columns of the table into OEIS, you will find that these are the Stirling numbers of the second kind, . When I saw this, I was more excited than I care to admit. What a nice relation to discover* between two important sets of numbers.
The Stirling numbers of the second kind are also useful in combinatorics. They count the number of ways to group objects into non-empty sets. We can then express in terms of .
This finally gets us to a general formula for the sums we are after.
Before I end, it should be noted that none of this is new. Mathematicians have known about this formula for centuries (known as Faulhaber’s formula, and typically expressed in terms of the Bernoulli numbers). Before researching this post, I wasn’t aware of Stirling numbers or Faulhaber’s formula, which I think is a good thing. The feeling of discovery, even of something which has been known to other people for centuries, is a wonderful feeling.