In this final part to our level 2 mathematical modelling series, we move to a different way of solving our street crossing problem: numerics. We go over discretization of the problem, how to simulate it, and compare with results from part 2.
Quick Recap
In the first part of this series, we outlined a mathematical model describing a person (the walker) crossing a rectangular street. Cars drive down the street at random, with an average rate
. To get the full details of the model, see part 1. In this part, we will only need one more thing from the first part, which is the expected distance traveled by the walker:
In part 2, we discussed some analytic results that one could obtain from this very difficult problem. We showed or argued that there were two transformations that mapped an optimal path on one road to another optimal path on a different road. Then, we used perturbation theory to approximately solve the problem in the case where traffic on the road was light. We will want to compare the results from this part to the perturbative results, so I will quote them here. We expanded the function representing the path to second order in .
We then used the equation of motion (which can be found in part 2) to find and
.
With this information, we can now discuss our final attempt at solving this problem, putting it onto a computer. We will go over the process of how to reformulate the problem so that a computer can actually handle it, and then we will compare these non-perturbative solutions to those we obtained with perturbation theory in the last part.
Discretization – Overview
Our problem is based in calculus: we want to minimize a functional () over a space of functions. This all sounds very continuous, but computers cannot handle anything continuous. That is because any continuum contains an infinite amount of information. Even just the number
cannot be represented exactly on a computer because it has an infinite number of digits, without a pattern. Therefore, on a computer, we can only get approximate answers, as we did in perturbation theory. Unlike perturbation theory, however, we don’t have to be restricted to small parameter values.
Before we get into the how of discretization, we should discuss the what, that is, what is it that we want to discretize? In part 2, we saw the expected distance functional and the equation of motion. Depending on taste, we could use either one of these to solve the problem. In this post, we will be discretizing the functional. As someone who isn’t too comfortable with more sophisticated computational methods, this is the easier option. The goal is then to calculate the expected distance for some path and then try to deform the path so that we are always decreasing the expected distance.
We want to discretize a functional. How do we do it? The fundamental object in our model is the path. The path is continuous, which is bad for a computer, as we described above. Therefore, we will have to discretize the path in order for the computer to be able to calculate with it. We will see that once the path is discretized, everything else will be calculable on the computer.
Discretization in Action
We want to discretize the path. In other words, we want to chop up the path into a finite number of simple bits so that the computer will be able to handle each one. Let’s take each bit of the path to be a straight line. We will see that this will make computations easier than other possible options.

We will say that there are points (including the end points), each separated horizontally by a distance
. The straight lines connecting these points represent the path of the walker. By increasing
, we increase the accuracy of our calculation (by making the path look smoother/more continuous).
Once we have discretized the path, the rest is easy! Everything else we need in order to calculate is just taking integrals of fairly simple functions. The arclength is just the length of a bunch of straight lines (which doesn’t even require an integral), and the remaining integrals are just those of exponentials and exponentials times linear functions, which are both things that can just be written down. So, given a path, we can write a sum of elementary functions that will produce the expected distance of that path. Then what?
If we want to find the minimum expected distance, then we are going to want to deform the path in such a way that the expected distance decreases. There are different methods of doing this (I personally implemented two different ones), but we are going to focus on gradient descent. We can imagine the space of discretized paths as living in an -dimensional space,
(specifically the unit cube in that space, since we are only considering functions from
), where each dimension corresponds to one of the points we use to construct our path. We can then calculate the (approximate) partial derivative of the expected distance function in each “direction” to obtain the gradient. To get the partial derivative in the
direction, we just move the
path point by a small amount,
, and recalculate the expected distance. The partial derivative is then approximately given by
, where
is the recalculated value of the expected distance. Once we have the gradient, we know the direction in our
-dimensional space that corresponds to going uphill. We want to go downhill, so we take a step in the opposite direction. We keep doing this until we think we are at the lowest expected distance value.
Because we can’t know for sure whether we are at the minimum, we just say that since the gradient disappears at the minimum, we should keep descending until the gradient becomes very small (where we have to decide what “very small” means). That way, when we stop, we should be near a minimum. There are some subtleties here about local vs. global minima, but we will ignore those in this post.
As you might expect, the code runs more quickly with smaller values of , so as a little optimization trick, I wrote the code such that it begins with
(one free parameter), and then after it optimized at that path, I divided each line of the path in half, doubling
, and beginning the next gradient descent from that position. This allows the code to get through the large-scale deformations of the path quickly, and work its way to finer and finer detail. The code stops once the
path is considered optimized. All paths begin as straight lines from origin to destination, before the gradient descent begins.
Results
Let’s dive into the results of the computational scheme outlined above. We will do a brief survey of solutions, stepping through different values of and
. Then, we will briefly test the analytic claims from part 2: the mirror symmetry, self-similarity, and results from perturbation theory. This will allow us to also see how far perturbation theory can take us.



The above animations are displaying the optimal solutions for three different road lengths, 0.5, 1.5, and 2.0, and each frame of the animation represents a different traffic rate. There are some qualitative features that we should make note of, which seem to hold in general. For roads of length less than 1, the optimal path is concave up, whereas for roads of length greater than 1, the optimal path is concave down. This fits with the mirror symmetry property we looked at in part 2. The other major qualitative feature appears when and
get to be large enough. The optimal path reaches the other side of the road before reaching the destination. This is an interesting feature that we would not get out of perturbation theory, since it fails at these large values of
.
Mirror Symmetry
In part 2, we made the claim, with a rough idea of a proof, that there is a transformation that maps solutions on a road with traffic rate
to solutions on a
road with traffic rate
. The transformation was simple, just take the inverse of the path function and re-scale distances appropriately so that the path starts and ends at the right points. I simulated this for the
case, with the dual path having
. Taking the dual path, flipping it across the line
, and re-scaling, we see beautiful agreement between the two solutions.

There is one difficulty in producing these sorts of plots, which exposes a downside of using this particular algorithm. When using values of which are particularly large on a road with
, the slope of the path becomes very small near the destination. When considering the dual path, this corresponds to a very large slope. However, the slope is limited by the range of values that the discrete path points may take. For a road with
, like in the dual path above, slopes are limited to
, and that is only if a point is at 0, with the next point at 1. This constraint leads to issues at these larger values of
, in the region where the slope must start growing to be very large.
Self-similarity
The self-similarity property was also laid out in part 2, though it was only an argument, without even a suggestion of a proof outline. We will test that claim now. The proposal was that, starting with the optimal path, one could remove some portion of the beginning of the path, then translate and re-scale everything so that we had a new, valid set-up. This would produce another optimal path on a different road with a different traffic rate.
In more detail, say we have an optimal path, on a $1\times b&s=1$ road with traffic rate
. If we cut off the path between the origin and
, then we would translate the path back so that it started at the origin, and re-scale distances and times so that the new path,
, is optimal on a
road with traffic rate
.
To test this, I used the optimal path on a road with traffic rate
, and then cut off the path up to
, with
. Using the transformation above, that gave a new road with dimensions
and traffic rate
. Here is a plot of the optimal paths of each of these roads, superimposed and restricted to the region where they overlap:

I was very confident when laying out the mirror symmetry property in part 2 because I had done the proof myself. When it came to the self-similarity property, I was less sure, though I felt the argument was very persuasive. When I published part 2, I hadn’t yet checked whether the self-similarity property held using my code. This was a bit nerve-wrecking, but I am pleased that it turned out alright.
Comparing to Perturbation Theory
Let’s compare the results from the code to those that we obtained using perturbation theory in part 2. Let’s quickly recall the perturbative results. The path is expanded in powers of ,
.
We are going to ignore the term except to say that it actually makes the approximation worse. This may be remedied at higher order, or may be indicative of a divergent series. But we don’t need to get into that. Below, we will show a couple different possible roads with different values of
, comparing the results from the algorithm to perturbation theory.


These show how good the approximation is as gets larger. At larger values of
and
(both equal to 2, say), the first order approximation hits an issue, where it takes values above 1. This is nonsensical in the context of our problem, so there are specific points where we know that first-order perturbation theory ceases to work. However, as expected, the first order approximation works just fine for smaller values of
. It also appears as though the scale
at which this approximation breaks down decreases as the length of the road gets further from 1.
Conclusion
That concludes this series. We looked at a model of a pedestrian crossing a road, with limited prior knowledge of when cars would appear. Then we tried to optimize a path across the road which would take the least amount of time on average. We used symmetry arguments to obtain non-perturbative information about optimal paths, and we used perturbation theory to obtain approximately optimal paths, which were valid for small enough traffic rates. In this post, we placed the optimization problem on a computer by discretizing the path and implementing a gradient descent algorithm to find the smallest expected distance. Then, we used this code to verify the symmetries of the problem and test the robustness of the first order perturbative solution.
Are there any applications of this to the real world? To be honest, I don’t really care about that, I just wanted to answer this question that I’ve been thinking about for a number of years. However, I could see some weird future, where Amazon warehouses are staffed by robots wheeling around to relocate items. These results could be used to make an algorithm that would improve efficiency when it comes to robots not crashing into each other, but quicker. If Amazon wanted to, I hope they’d at least give me some money for it.