On Lanchester’s Differential Equations and WWII: Modeling the Iwo Jima battle

Almost at the end of World War II, a battle to the death between US and Japanese forces took place off Japan in the island of Iwo Jima.  We can definitely apply a Lanchesterian analysis to this encounter; in fact, it has already been done by Martin Braun in Differential Equations and their Applications, 2nd ed. (New York: Springer and Verlag, 1975).  If  x represents the number of US soldiers and  y represents the number of Japanese troops, Braun estimates that  a = 0.05 and  b = 0.01 approximately.  Assuming then that the initial strength of the US forces was hovering around 54,000, and that of the Japanese was 21,500, we’d be interested in knowing the trajectory of the battle and see who wins.  I’ve used the method described in my previous post to obtain estimates for the proportion of  soldiers alive in  X = US and in  Y = Japan , namely the Markovization or discretization (or Eulerization) method, and this is what we come up with.  We assume the battle is fought to the death and there are no reinforcements from either side:

 L = \left[ \begin{array}{ccc} 1 - 0.05 \cdot \frac{p_{alive,y}}{p_{alive,x}} & 0 & 0.05 \cdot \frac{p_{alive,y}}{p_{alive,x}} \\ 0 & 1 - 0.01 \cdot \frac{p_{alive,x}}{p_{alive,y}} & 0.01 \cdot \frac{p_{alive,x}}{p_{alive,y}} \\ 0 & 0 & 1 \end{array} \right]

The attrition constants are within the range so that all entries are less than one as required by a Markov transition matrix provided the excess ratio of one army population to another is not too large (also, each row sums to 1).  The matrix is time-dependent, however, because each time the proportion of alives in any army changes according to the strength of the other army (proportional to both numerical superiority and technology, as can be seen in the matrix above, thus, from the Markov chain vantage point, the probability of death at each time step changes), as modeled by Lanchester’s continuous differential equations.  Here are the proportion trajectories I came up with, computing at each time step the above matrix and pulling the original vector through:

Iwo Jima battle

It’s pretty clear that, despite having the technology advantage, the Japanese lose because of numerical inferiority after about 59 time-steps.  The approximate proportion of US soldiers at the end of the battle is 0.35 (of the total soldier population), or about 26,276 soldiers, a HUGE decline from the initial 54,000 (roughly 0.72 of the total soldier population).  The Japanese army essentially obliterated half the soldier population of the US, a testament to their ferocity in battle.

My Calculus book (Deborah Hughes-Hallet, Andrew M. Gleason, et al., New York, John Wiley & Sons, Inc., 1994, p.551) suggests that the US in fact would have had 19,000 soldiers that could have stepped in as reinforcements: this does not matter really if the reinforcements come at the end of the battle, the battle has already been won with 54,000 soldiers, but it does alter the outcome significantly if they are added to the original soldiers at the beginning of battle (namely the US has 73,000 troops, or 0.78 of the total soldier population): the Japanese army would lose in roughly 34 time-steps (to the death), and the number of US soldiers left standing would be about 55,420, or 0.59 of the total soldier population: the Japanese obliterate only about 25% of the US troops.

In my next post, I want to discuss Dan’s (from Giant Battling Robots) pdf document that he provided, essentially looking at a different stochastic model related to warfare and battle.  It’s pretty neat, although it counts each potential outcome as a Markovian state (and each state is linked “downward” by a transition probability so that a given probability governs the death of one soldier in either army).  Thanks Dan!

Posted in Combinatorics and Probability, Differential Equations, Linear Algebra, Markov Chains, Mathematics | Tagged , , , , , , , | 1 Comment

On Stochastic Processes

Why can’t I shake the feeling that a stochastic process really… isn’t? In previous posts we have been able to frame a deterministic system in terms of its Markov transform matrix, and going backward really doesn’t seem like a problem. So what’s going on, I can’t help but wonder?  Are deterministic systems a subclass of random processes? Are they isomorphic spaces, or convergent at the limit as we take smaller pieces of time?  Was Einstein right, that it all converges to a determined state, and “God does not roll dice?”  What do you think?

Posted in Combinatorics and Probability, Differential Equations, Linear Algebra, Markov Chains | Tagged , , , , | Leave a comment

On Lanchester’s Differential Equations and their Transform into a Markov Transition Matrix (or, the Markovization of the Lanchester Equations)

I have been very curious as to whether the Markovization of differential equations is applicable to all systems, linear or not.  In the case of the SIR differential equations, Markovization was possible even when the system is both non-linear and coupled.  I think that perhaps there is an algorithm to frame the differential equations thusly… I haven’t fleshed it out completely (other than by using quite a bit of intuition, as in the SIR case).  Markovization, I think, is distinct from putting differential equations in state-space form; the vector pulled through the state-space matrix represents the trajectories of a function and its derivatives, not the trajectories of a set of functions related in that their sum is one: probabilities or proportions.  Who knows, though, a happy mix may be possible.

I was examining a Calculus book, specifically by Deborah Hughes-Hallet, Andrew M. Gleason, et al. (New York, John Wiley & Sons, Inc., 1994), and came across this very interesting problem that models how armies or soldiers fight it out… and tries to predict who wins.  I hope you don’t judge: even after having read the most complicated books in mathematics, I honestly believe that revisiting more basic books can bring about insights that one didn’t think of before.  Maybe even ground the complicated knowledge one acquired, or refine it.  Such has been my own experience… so I think it’s a useful exercise, for all of those who have resisted “going back to the basics.” If you are one, perhaps you are missing the opportunity to bridge the gaps you may (or not) still have!  :-\

The problem on page 550 says that, if  x represents the number of soldiers from army  X and  y represents the number of soldiers from army  Y , the rate at which soldiers in one army are injured or die can be thought of as proportional to the number of “firepower” the other army can concentrate on them.  The model was first thought by F.W. Lanchester:

 \frac{dx}{dt} = -ay

 \frac{dy}{dt} = -bx with  a, b > 0 .

In order to Markovize Lanchester’s differential equations, we need to flesh out what the states are: clearly, you can be alive and belong to army  X , or army  Y , or be dead (from a different vantage point, we could have “Dead from  X ” and “Dead from  Y ” if we wanted to count each individually).  The Markov transition matrix should look something like this, then:

 L = \left[ \begin{array}{ccc} something & 0 & something \\ 0 & something & something \\ 0 & 0 & 1 \end{array} \right]

where  L_{1,1} represents the probability of being alive in army  X (if you are from army  X ) after a time step (or the proportion of alive soldiers in  X after a time step),  L_{1,2} represents the probability of “defecting” to army  Y if you are in army  X (I assume such doesn’t happen),  L_{1,3} represents the probability of dying if you are in army  X .  Then  L_{2,1} represents the probability of defecting to army  X if you are in army  Y (I also assume this doesn’t happen),  L_{2,2} represents the probability of being alive in army  Y , if you are from army  Y , and  L_{2,3} is the probability of dying if you are in army  Y .  The third row of  L simply represents the fact that one cannot come back from the dead once dead:  there are no zombies (such may be in fact possible in videogames, for example, hence why I make a point of saying this).

The next order of business is to note that the total population does not change (alives and deads).  Thus we would have that:

 p_{alive,x} = x/N

 p_{alive,y} = y/N , and

 p_{alive,x} + p_{alive,y} + p_{dead} = 1 .  Rewriting Lanchester’s equations in terms of proportions, we would have that:

 \frac{d p_{alive,x}}{dt} = -a p_{alive,y}

 \frac{d p_{alive,y}}{dt} = -b p_{alive,x}

To be complete, we must model what the rate of change of the dead is, which we can figure out by differentiating in time the equation  p_{alive,x} + p_{alive,y} + p_{dead} = 1 .  In effect, what I’m saying is that, Lanchester’s equations are incomplete – they are missing the key information that:

 \frac{d p_{dead}}{dt} = a p_{alive,y} + b p_{alive,x}

This equation basically states that the dead increases as the rate at which soldiers in army  X are put down and the rate at which soldiers in army  Y are put down.

Euler’s method suggests that to write the approximation to these differential equations, we need think of the derivative not at the limit, but taking tiny time steps.  So in the following equation

 lim_{\Delta t \rightarrow \infty} \frac{\Delta p_{alive,x}}{\Delta t} = -a p_{alive,y}

let’s forget about the limit for a second, and solve thus:

 \Delta p_{alive,x} \approx -a p_{alive,y} \Delta t

and

 p_{alive,x} (t+\Delta t) \approx p_{alive,x} (t) - a p_{alive,y} \Delta t .  Writing all differential equations thus:

 p_{alive,y} (t+\Delta t) \approx p_{alive,y} (t) -b p_{alive,x} \Delta t and

 p_{dead} (t+\Delta t) \approx p_{dead} (t) + a p_{alive,y} \Delta t + b p_{alive,x} \Delta t

Euler’s method gives us just what we need for our transition matrix construct.  If the approximations are in fact equalities, from the point-of-view of Markov chain theory, the left-hand side is the next-step vector, and the right-hand side explains how this vector would be obtained by a transformation on the at-step vector.  In terms of Markov dynamics:

 \textbf{p(t +} \Delta t \textbf{)} = \textbf{p(t)} \cdot L

Playing a bit with the matrix  L , it should look like:

 L = \left[ \begin{array}{ccc} 1 - a \cdot \frac{p_{alive,y}}{p_{alive,x}} \Delta t & 0 & a \cdot \frac{p_{alive,y}}{p_{alive,x}} \Delta t \\ 0 & 1 - b \cdot \frac{p_{alive,x}}{p_{alive,y}} \Delta t & b \cdot \frac{p_{alive,x}}{p_{alive,y}} \Delta t \\ 0 & 0 & 1 \end{array} \right]

Letting  \Delta t be unity,

 \textbf{p(t +1)} = \textbf{p(t)} \cdot L

with

 L = \left[ \begin{array}{ccc} 1 - a \cdot \frac{p_{alive,y}}{p_{alive,x}} & 0 & a \cdot \frac{p_{alive,y}}{p_{alive,x}} \\ 0 & 1 - b \cdot \frac{p_{alive,x}}{p_{alive,y}} & b \cdot \frac{p_{alive,x}}{p_{alive,y}} \\ 0 & 0 & 1 \end{array} \right]

and, I think, we have successfully Markovized the differential equations, in effect discretizing them at unit time steps.  We should be cautious though, because now both  a, b have additional restrictions in order that the entries of  L remain below or equal to 1 (and specifically, the sum of each row must be equal to one).

Euler’s method in effect is the link here between the continuous differential equations and the Markov transition matrix.  In the SIR discretization of my previous post, I went backwards, showing how we can write continuous differential equations from the Markov transition matrix.

Several interesting things to note about the matrix  L : the probability of death, if one is in army  X is proportional to the attrition  a and by how many soldiers army  Y exceeds  X at the time.  A similar argument works for the death probability of soldiers in  Y , with attrition  b and excedent of  X to  Y .

As with the SIR equations, since  L is really changing every time step, it is  L(t) .  So, from initial conditions, the amount of soldiers alive in army  X , army  Y , and dead at time step  n can be calculated using the discretized matrix by pulling the original vector through, as:

 \textbf{p(t +} n \Delta t \textbf{)} = \textbf{p(t)} \cdot L(t) \cdot L(t + \Delta t) \cdot \ldots \cdot L(t + (n-1) \Delta t)

In my next post, I want to model, as in the Calculus book I mentioned, the Iwo Jima encounter during World War II, the fight between Japan and the US, and solving the trajectory numerically using the discretization above.

An additional note, from examining our assumptions of matrix  L , we realize now simple ways in which we could extend Lanchester’s differential equations, by filling in “probabilities that a soldier will defect to the other army.”  We have indeed now enriched the model in the sense that we can now take into account other complicated dynamics (the least of which is the “creation of zombies”).  I think this is very useful in any model, because it allows us to contemplate transitions into states we wouldn’t have considered otherwise, and, more powerfully, allows us to transform our mindset to the realm of differential equations, where we can have DE-related insights, and back again, where we can have Markovian insights.

Posted in Combinatorics and Probability, Differential Equations, Linear Algebra, Markov Chains, Mathematics | Tagged , , , , , , , , | 7 Comments

On the Swine Flu (a Year-and-a-Half Later)

This week I have been revisiting the swine flu scare of April 2009.  The Mexican government then was particularly anxious: it suspended regular classes, and many kids were just thrilled that they found themselves suddenly on vacation.  Some schools even advised kids that if they felt ill, they should excuse themselves from attending classes for at least a week.  Many took the slightest symptom as an excuse to take a short vacation… to the beach, no less.

A year ago in April I attempted to simulate, a grosso modo, not the time progress of the disease, but yes some of its generalities.  I estimated, for example, that if the swine flu were concentrated in Mexico City, with its sizable population of 20 million, at the outbreak rate of infection and, by clever deduction of the SIR differential equation constants of proportionality  a (and a wild guess of  b ), the total of infected individuals at one time would come to about 500,000, or 2.5% of the city’s population.

One thing that struck me from Mexico’s Secretaria de Salud infomercials at the time was that they graphed an exponential progression of the disease (which, after government intervention, supposedly decreased exponentially again). It was unrealistic and, whether the authorities minded or not, misinforming.

With the SIR model I recently calculated what could have been the time progress of the disease: I used the method that I developed in my previous post to facilitate the calculation, after I realized that my (Markovian) matrix formulation was equivalent to Euler’s method: expanding out the matrix one obtains, indeed, Euler’s estimation formulas (which we thought of, from the point-of-view of a Markov chain, as the one-step forward transformation of a vector pulled through or propagated by the Markov matrix).  Interestingly, then, one can say that a Markov transition matrix really what it does is use Euler’s method to estimate the next step.  This is kind of an intriguing insight to me, because I never understood a transition matrix in this way.

The matrix formulation therefore also acquires an error as Euler’s method does: picking a point away form the initial condition, if the number of steps used to get there is  n , the error (the difference between the approximate value and the true value) is approximately proportional to  1/n .

The matrix formulation, then, required that the differential equations of the SIR model be recast not in absolute terms but in relative proportions, as Duke University’s website proposes from the beginning.  I didn’t do this originally, and so my calculation of the constant  a must be modified.  I said that  dS/dt = -400 .  In terms of  s(t) , which is  S(t) / N where  N is the total population,  ds/dt = -400/20,000,000 .  The constant of proportionality  a , then, can be calculated as  a = \frac{-ds/dt}{s \cdot i} \approx 0.25 if, from initial conditions,  s = \frac{19,998,400}{20,000,000} and  i = \frac{1,600}{20,000,000} .  My calculation of the constant of proportionality  b remains unmodified, it was a guesstimate based in part on CDC records of disease recovery, if I remember correctly.  Recall that this  b , from the POV of the Markov chain matrix, is the transition probability of going from infected state to recovered state.  The initial conditions  S_0 = 19,998,400 ,  I_0 = 1,600 and  R_0 = 0 need to be recast in terms of proportions, too, as  s_0, i_0 above and  r_0 = 0 .  Strictly speaking, we had data at the time that deaths were about 149 people (which I attempted to use to estimate the constant  b ).  The initial condition vector would have to be modified to reflect this forgotten piece of information, but the truth is that the difference in trajectory is negligible, so, to remain consistent with the analysis of a year ago, I will leave  s and  r as they are.

Here’s the timed graph I came up with.  It states that the time at which the number of infecteds is a maximum (about 500,000 as per the post of April of 2009; new numerical estimate using the matrix transform: 440,000) is approximately 145 days after the outbreak (April 27ish): maybe around September 19, 2009.  It also states that, at the end of the disease, roughly 8,000,000  people recover (or die).  This I know was not the case (not even thousands of people died, e.g.), largely in part because of medication and excellent medical care and, admittedly, competent government response (but it puts things in perspective, doesn’t it? No wonder it was a “scare.”).

Swine Flu SIR April 27, 2009

Posted in Combinatorics and Probability, Differential Equations, Linear Algebra, Markov Chains, Mathematics | Tagged , , , , , , , , , | Leave a comment

On the SIR Equations and Their Transform into a Markov Transition Matrix (or, the Markovization of the SIR Model)

It’s been a while since I’ve had the time to write here, mostly for personal reasons.  However this period has not been without very exciting “discoveries,” at least in my mathematics understanding.

Ever since last year in April with the Swine flu scare I attempted, at the very breakout, to model the progression of the disease directly using the differential equations and data provided by online articles and publications.  I’m not too sure if the raw data at the beginning of the scare was refined enough or accurate enough, but it was definitely enough to estimate some numbers and obtain a rough idea of the magnitude of the problem.  I suggested that, if anyone was interested, we could attempt to use Euler’s method to complete a time evolution of the disease based on the coupled differential equations the model suggests.

Having read the excellent SIR explanation from Duke University, which differed slightly from another Calculus book in my possession in that it suggested using the proportions of susceptibles, infecteds, and recovereds as the basis for the calculations (than actual raw numbers), I began to think that, indeed, the SIR coupled-differential equation formulation was suspiciously very much like a Markovian formulation, and that, perhaps, there was an interesting link between the two.  If we think about it this way, a susceptible can (only) transition to being infected or stay as a susceptible, an infected can only transition to recovery or stay as an infected, and, once recovered, one can stay such.  In other words, S, I, and R are in effect Markovian “states.”

Imagine for a moment that indeed we could somehow create an equivalence between a state-transition matrix and the actual differential equations, which, apparently we can, as I will explain shortly.  I couldn’t at this point help but remember the Schrodinger vs. Heisenberg debate, in which Schrodinger’s differential-equation formulation of quantum mechanics (which I know nothing about, by the way) was shown equivalent to Heisenberg’s discrete-matrix approach (matrix mechanics).  At this point I began to wonder: can a system described by continuous differential equations be (always) transformed into matrix-form?  I’m not sure the answer is “yes” always, but I think I have succeeded transforming the SIR equations into matrix form, through Markovian reasoning.  Not that this is the way that matrix mechanics and wave mechanics were shown to be equivalent, that is definitely not my claim, nor do I know how they were shown equivalent either.

I started out reasoning that perhaps I could model the dynamics of susceptibles, infecteds, and recovereds by building a transition matrix as follows:

 A = \left[ \begin{array}{ccc} (1-a) & a & 0 \\ 0 & (1-b) & b \\ 0 & 0 & 1 \end{array} \right]

where  a represents the probability (or, from another POV, the proportion) of susceptibles becoming infecteds, and  b represents the probability (proportion) of infecteds becoming recovereds.  From the Markovian point-of-view, the entry  A_{1,1} is the probability/proportion of susceptibles who stay susceptibles (at the next step),  A_{1,2} the proportion of susceptibles who become infecteds (at the next step), and, naturally,  A_{1,3} the proportion of susceptibles who, at the next step, become recovereds: zero.  Then  A_{2,1} is the proportion of infecteds who become susceptibles (zero),  A_{2,2} is the proportion of infecteds that stay infected in the next time period, and  A_{2,3} is the proportion of  infecteds recovering.  Finally,  A_{3,1} and  A_{3,2} represent the proportion of recovereds that become susceptible or infected (zero) respectively, and  A_{3,3} represents the proportion of recovereds who stay recovered.  In other words, the transition matrix represents a birth process of susceptibles into infecteds and infecteds into recovereds.

Following the typical Markov chain understanding, let the vector

 \textbf{p(0)} = \left[ \begin{array}{ccc} p_s(0) & p_i(0) & p_r(0) \end{array} \right]

represent the initial proportions of susceptibles, infecteds, and recovereds.  Then the one-step forward proportion vector,  \textbf{p(1)} , is given by

 \textbf{p(1)} = \textbf{p(0)} \cdot A

or, following the argument through,

 \textbf{p(t+1)} = \textbf{p(t)} \cdot A

where we necessarily assume that  t \in \mathbb{Z_+} \cup \{0\} , or that time is discrete, which means we’d need or want to know or can calculate the proportion vector at discrete time intervals. Such is not necessarily the case (we may want to know the proportion vector at times on the real continuum), but we can resolve this objection (presently), by taking a hint from Queueing Theory.

At any rate, by writing the equations explicitly, we have:

 p_s(t+1) = p_s(t) \cdot (1-a) = p_s(t) - a \cdot p_s(t)

 p_i(t+1) = a \cdot p_s(t) + (1-b) \cdot p_i(t) = a \cdot p_s(t) + p_i(t) - b \cdot p_i(t)

 p_r(t+1) = b \cdot p_i(t) + p_r(t)

Here’s the amazing thing, we can write these as difference equations that are VERY much like the SIR differential equations (a discretized version I guess).  This would have been a first step to obtain the steady state of the transition matrix (assuming the transition matrix had such: regularity, and other conditions).

 p_s(t+1) - p_s(t) = - a \cdot p_s(t)

 p_i(t+1) - p_i(t) = a \cdot p_s(t) - b \cdot p_i(t)

 p_r(t+1) - p_r(t) = b \cdot p_i(t)

Notice then the similarity (from Duke’s website look that the SIR differential equations are formulated in terms of proportions, and from my previous post take notice of notation so that the analogy is clearer):

 \frac{ds}{dt} = -a s(t) i(t)

 \frac{di}{dt} = a s(t) i(t) - b i(t)

 \frac{dr}{dt} = b i(t)

They are not exactly the same, of course: one is continuous, the other is not, and then the rate of change in the SIR equations of susceptibles also depends on the proportion of infecteds at the time, which in turn affects the rate of change of the infecteds.

Before taking on these objections, setting the difference equations equal to zero, in the usual Markov-chain context, gives the steady-state probability or proportion vector.  Thus, in the very long-run, it must be true that

 0 = - a \cdot p_s(\infty) and, with  a nonzero,  p_s(\infty) = 0

 0 = a \cdot p_s(\infty) - b \cdot p_i(\infty) \Rightarrow p_i(\infty) = 0 and

 0 = b \cdot p_i(t) \Rightarrow p_i(\infty) = 0 . This last bit is redundant, but the proportion  p_r(\infty) must be one (because the sum of all proportions must be one).

Now let’s take on the objections: the SIR differential equations suggests that how susceptibles change in time is proportional to both the susceptibles and the infecteds at the time; this furthermore affects the rate of change of the infecteds.  Taking a clue, we can manipulate the “transition matrix” to incorporate this term.  Thus:

 B = \left[ \begin{array}{ccc} (1 - a \cdot p_i(t)) & a \cdot p_i(t) & 0 \\ 0 & (1-b) & b \\ 0 & 0 & 1 \end{array} \right]

Notice that every row sums to 1 (provided appropriate conditions on  a, b ), as it should if this is a Markov chain transition matrix! This strongly suggests that we can follow a Markov chain treatment, and everything Markovian applies (particularly steady-state proportions). And so, the next-step equation is given by the usual Markov context as:

 \textbf{p(t+1)} = \textbf{p(t)} \cdot B

or, explicitly, as

 p_s(t+1) = p_s(t) \cdot (1 - a \cdot p_i(t)) = p_s(t) - a \cdot p_i(t) \cdot p_s(t)

 p_i(t+1) = a \cdot p_s(t) \cdot p_i(t) + (1-b) \cdot p_i(t) = a \cdot p_s(t) \cdot p_i(t) + p_i(t) - b \cdot p_i(t)

 p_r(t+1) = b \cdot p_i(t) + p_r(t)

with difference equations (now analogous to SIR differential equations)

 p_s(t+1) - p_s(t) = - a \cdot p_i(t) \cdot p_s(t)

 p_i(t+1) - p_i(t) = a \cdot p_s(t) \cdot p_i(t) - b \cdot p_i(t)

 p_r(t+1) - p_r(t) = b \cdot p_i(t)

and steady state proportions

 0 = - a \cdot p_i(\infty) \cdot p_s(\infty)

 0 = a \cdot p_s(\infty) \cdot p_i(\infty) - b \cdot p_i(\infty)

 0 = b \cdot p_i(\infty)

From the third equation it follows  p_i(\infty) = 0 , which now implies that both  p_s(\infty) and  p_r(\infty) can be anything at steady-state (as long, of course, that  p_s(\infty) + p_r(\infty) = 1 ).

Examining the Duke University link to a SIR progression graph, this vector tendency can be seen pretty clearly at large  t . Notice, for example, how  i(t) goes to zero while both  s(t), r(t) are anything.  We have in effect, proved that under the SIR model, the proportion of infecteds is asymptotic to 0. (Neat-o).

On to the continuity question: using the difference equations we are in effect assuming that the state vector  \textbf{p(t)} is known at discrete times.  In Queueing Theory when discussing birth-death processes, it is apparently conventional (in the standard treatment) to think that the probability in a given unit time interval tends to accumulate uniformly. Thus, we can rewrite the transition matrix as:

 C = \left[ \begin{array}{ccc} (1 - a \cdot p_i(t) \cdot \Delta t) & a \cdot p_i(t) \cdot \Delta t & 0 \\ 0 & (1-b \cdot \Delta t) & b \cdot \Delta t \\ 0 & 0 & 1 \end{array} \right]

We want to write out the explicit equations of

 \textbf{p(t+} \Delta t \textbf{)} = \textbf{p(t)} \cdot C

as

 p_s(t + \Delta t) = p_s(t) - a \cdot p_i(t) \cdot p_s(t) \cdot \Delta t

 p_i(t + \Delta t) = a \cdot p_s(t) \cdot p_i(t) \cdot \Delta t + p_i(t) - b \cdot p_i(t) \cdot \Delta t

 p_r(t + \Delta t) = b \cdot p_i(t) \cdot \Delta t + p_r(t)

By further algebraic manipulation,

 \frac{p_s(t + \Delta t) - p_s(t)}{\Delta t} = - a \cdot p_i(t) \cdot p_s(t)

 \frac{p_i(t + \Delta t) - p_i(t)}{\Delta t} = a \cdot p_s(t) \cdot p_i(t) - b \cdot p_i(t)

 \frac{p_r(t + \Delta t) - p_r(t)}{\Delta t} = b \cdot p_i(t)

Taking the limit as  \Delta t \rightarrow 0 , the difference quotient describes the very definition of the derivative, and we obtain the continuous SIR differential equations.  We have assumed, in effect, the vector  p(t) is known at all times  t \in \mathbb{R}^+ \cup \{0\} .

Perhaps an obvious thing to note, yet worth mentioning, now that we have shown the equivalence between the SIR model and the Markov transition matrix, is that, in the SIR model the proporitonality constant  a is now shown to be related to the probability of transitioning from the susceptible state to the infected state, and the proportionality constant  b is now shown to be the probability of transitioning from the infected state to the recovered state.  This, in my opinion, is an amazing insight!

Before, since the SIR differential equations could not be solved explicitly, we would rely on Euler’s method to get a numerical approximation of the solution curves  s(t), i(t), r(t) .  The present analysis suggests we need only calculate powers of matrix  C to obtain a numerical sample of the all three solution curves concurrently (spaced as  \Delta t ), since, in the Markov chain context, the power of the matrix represents the  n \cdot \Delta t -step forward evolution of the initial (or current) state vector (in the context of a time-dependent matrix  C(t) , I actually mean by “power” consecutive matrix multiplications of  C , as by

 p(t + n \Delta t) = p(t) \cdot C(t) \cdot C(t + \Delta t) \cdot C(t + 2 \Delta t) \cdot \ldots \cdot C(t + (n-1) \Delta t) ,

not actual powers of  C(t) ).  Higher-rate samplings (higher-resolution solution curves) can be achieved by reducing  \Delta t .  It may be worth exploring the link between Euler’s method and this sampling-method (see my next post), in terms of computational requirements, or in terms of a mathematical equivalence.

Needless to say, this method can probably be extended to “Markovizate” (Markovize?) other differential equations, such as SIRD or Lotka-Volterra.  I can only speculate as to the limits of this process, but I am certainly cognizant of two things: several, if not all, regular Markov chains can be formulated as continuous differential equations, and, several, if not all continuous differential equations can be formulated as Markov transition matrices.  I’d love to explore more such, and will probably in the future.

Posted in Combinatorics and Probability, Differential Equations, Linear Algebra, Markov Chains, Mathematics | Tagged , , , , , , , , , , | Leave a comment

On Markov Chain Music

Ever wonder what music produced by a Markov chain might sound like?  Yeah, me too! …and… wonder no more!  So I went to where the piano is and grabbed the more regular, simple score I could find.  As it turns out it was Mozart’s Twinkle Twinkle variations (Zwolf Variationen uber “Ah, vous dirai-je, Maman” KV 265).  Because it is written in C major and the notes are mostly regularly spaced this was great!  I used the first variation (more notes) to produce a transition probability matrix.  Perhaps an important thing to note (pun intended) is that the range is from B to E (about an octave and a half).  The little song produced is about 30 measures and has no real tune to it, although in some parts it does resemble the original melody.  Also, I added three non-random notes at the end to make it end gracefully.

MCM1

Of course, arranging it a bit it can sound marginally better;  here’s how I’ve done it, adding accompaniment and instruments, using Garageband’s licks and prefabricated transitions (to save time).  Enjoy!

MCM2

(I hope Mozart is not writhing in his grave!)

Posted in Combinatorics and Probability, Markov Chains, Mathematics, Music | Tagged , , , | 1 Comment

On the Beale Cipher, Part III

In this post I’m including a compressed archive of some results I have obtained by following through on my previous post:  I used the Declaration of Independence transition probability matrix I calculated based on a text I found online, and also used the standard frequencies of letters or of first words of the English language in this “augmented frequency analysis,” as prior beliefs.  One of the crucial things to notice is that, regardless of prior beliefs, our posterior belief significantly changes once we take into account the proportions of transitions in the actual purported key (the Declaration of Independence), which may be why some cryptographers, by following a simple statistical argument, may think the cipher is written in a language other than English.

DOITPM.xls.zip

The whole idea is that, with the information as provided in this .xls file, one can begin making educated guesses as to particular cipher numbers (as an example, “I think 5 is an ‘e’” [not that I do, by the way]), which implies modifying the probability distribution, and propagating such beliefs through the whole cipher, furthermore uncovering particular patterns, and going again.  You need not be absolutely certain of your educated guess: you can surmise two or three letters fit a particular cipher number, and propagate such belief to see where it takes you.  As I have said, processing time is currently 20 minutes per run (propagating throughout the whole cipher) until I (or someone else) enhances on my coding and develops a more efficient method.

Let me know your discoveries (as will I)!  It think working a little bit on this may be fun and interesting and may uncover interesting mathematical relationships.

Posted in Combinatorics and Probability, Cryptography, Markov Chains, Mathematics | Tagged , , , , , , , , | Leave a comment

On the Beale Cipher, Part II (and Other Book Ciphers)

Last time I talked about extending the usual frequency analysis on the (first) Beale cipher to augment our understanding of the composition of the individual letters the numbers may represent.  I have said before that Markov chains are immensely applicable everywhere; the Beale cipher seems no exception.  The idea came to me last month, a couple years after reading Singh’s book, and while I was happily lazy watching a couple seagulls fish out of Manzanillo’s ocean.  I had also read Snell’s book and a particular description of how Markov himself thought about Markov chains intrigued me: apparently he had counted the transitions of vowels to consonants and consonants to vowels in a book, thought out his theory, and showed that the long-term fractions of vowels and consonants in the book (using a chain) stabilized to the actual ratio.

This same idea can be applied, I think, on the Beale cipher.  Suppose we know a priori how transitions between the encoding letters of the key, which we surmise is the first letter.  Say for example the key is the Declaration of Independence (which in fact is the key for Beale cipher 2), as

“When in the Course of human events it becomes necessary…” and each first letter encodes for a particular number.   We can see that W transitions to I, I to T, T to C, and so:

W->I->T->C->O->H->E->  etc.  By finding the proportion of time any letter, say, W, transitions to any of the other letters, we’ve got ourselves a transition probability matrix.  In this abbreviated example we see that W transitions to I one hundred percent of the time; its transition probability vector would be represented by a 1 in the position of the letter I and 0 otherwise.  Thus, in effect, we are assuming that a random variable can take states represented by the letters of the alphabet, and it can transition to any other letter or stay where it is with a given probability.

Of course, the longer the encoding key is the better; most of the 26×26 transition probability matrix can be filled without a row of zeros, which is in effect what happens with the letter A above, since we count no letter toward which it transitions.  In counting the whole of the transitions in the Declaration of Independence, the only letters that don’t transition are X, Y, Z; to bypass the difficulty of a transition probability matrix whose rows do not sum to 1, I have made it so that X, Y, and Z’s rows do sum to 1 by assuming that the states X, Y, and Z transition to any other letter of the alphabet in equal proportion.

Writing the Declaration of Independence transition probability matrix thusly, it becomes a regular transition probability matrix with stable nth power.  In other words, we can take powers of this matrix up to an arbitrary number, such as 3000, 5000, without the fear of it diverging or giving weird numbers.  In fact, in particular, the Declaration of Independence transition probability matrix (DOITPM for short) stabilizes to the fourth decimal digit after about 15 powers.  The reason we care about the powers of the TPM is that such new matrix represents the probability of transitioning to a particular letter after n transitions!  Thus, where the DOITPM alone represents the probability of, say, W transitioning to I at the first step (the next cipher number down), DOITPM to the 3rd power represents the probability of (in particular) W jumping to another letter after three steps (the next three cipher numbers down).

Of course, calculating the DOITPM to any power is taxing or impossible if done by hand: I downloaded Octave (which I earnestly recommend if, like me, you don’t have the 5000+ dollars for a Matlab license) and built a few script files for the purpose of doing all this.

The very cool thing about this approach is that virtually all points 1-6 that I commented on in my last post are taken care of.  But in particular, if one has a prior belief about the probability that a given number in the cipher is a particular letter (a probability vector), such can be propagated forward by using the TPM and its powers.  As an example, we may have the prior belief that the cipher’s 1 is a first letter with probability vector equal to the frequencies of first letters in the English language.  Such a vector (P) times the DOITPM gives us a probability vector for the cipher’s 2 (one-step forward), P*DOITPM^2 power gives the probability vector of the cipher’s 3 (two steps forward), etc.

On the other hand, we may surmise that, say, the cipher’s 64 has the standard frequencies of letters of the English language.  We can propagate such belief forward to 65, 66, 67, etc., until about 15 letters after our original (since the DOITPM stabilizes and all steps after about the fifteenth are the same).

We need not use the above frequencies, although they do seem reasonable “first-guess” beliefs.  But if we suspect for any reason that a particular letter, say T, is the cipher’s 1, we can propagate such belief forward and get good results for cipher’s 2, 3, and even 4.  After 4 things begin to stabilize toward the long-term proportions, and this is only natural as our certainty of the next letters increases.  If we are good at crosswords and a little bit lucky, we can determine that 2 is a particular letter.  We can then modify the probability vector of such and propagate such belief forward… and so on until the whole text is deciphered.

We can also propagate partial beliefs: if we suspect the cipher’s 1 is either a T or a W with equal probability, but there is a slight chance it can be any other letter, our probability vector for that number can be something like 0.004 for all letters except T and W which are 0.45.  As always this belief can be propagated forward and with luck we can determine more letters based on frequency guesses.

Computationally, in the very inefficient script files I have built (I freely admit I am no programmer, and I wrote the script files somewhat quickly in order that the ideas not die before they could see the light of day, so there are a lot of unnecessary steps and redundancies) and my 2 core, it takes something like 20 minutes to process all the cipher and propagate all number’s initial probability vectors by proceeding in layers.  An example of an inefficiency is that, despite the fact the DOITPM stabilizes at about the 15th power, I got ambitious to squeeze even the slightest changes and am calculating it sometimes up to the 2000+ power, rather than caching the 15th power and using that.  Thus, every time I want to modify a probability vector for a particular number on the cipher, I have to wait 20+ minutes for it to finish recalculating. It would be nice to have a gazillion computers working in a grid though.

Nevertheless, as I have mentioned, one will have to proceed in layers; by this I mean the following.  Say you have an initial belief about the cipher’s 1.  We can propagate this belief all the way up to the cipher’s 2000.  But say you also have an initial probability vector of the ciphers 6.  At 6 the probability “waves” begin to collide.  It would be ideal if there were a manner to combine the probability (discrete) waves to obtain a more certain probability vector (kind of like a Kalman filter for discrete distributions): I could not think of any (although I want this to be the subject of a next post), or at least not any that would give me a more certain wave, so I opted to choose the wave that was more entropically close to 0.  Since we have 26 “boxes” with different proportions of “balls” in them (the respective probability of being a particular letter), we can use information entropy to opt for the one vector that carries the most “certain” belief, or the one with the least entropy.  Thus, if 6′s prior contains less entropy than 1′s propagated vector, I would opt to stick with 6′s vector, and vice versa.  I’d do this with all vectors at every single position (thus why it takes about 20 minutes to process, too).

Of course, if Beale number 1 was encoded with a key with similar transition probabilities as the DOI, and this would be a reasonable assumption if we think Beale encoded the cipher using a text from his time (similar stylistically, etc.), then we can use the DOITPM to attempt a decoding of it.  If instead we believe that he used a text that he himself wrote, an analysis of Beale number 2 could yield a TPM that could crack it.  Lastly, if we believe that Beale number 1 was encoded with a key very much like any other book in the world, we could again examine such a TPM and attempt a decipherment.

In my next post or possibly upon revision of this one, I will post an .xls file containing the DOITPM and the probability vectors of each number in the cipher of Beale 1, having assumed that the cipher’s 1 and 71 have the frequencies associated with the first letter of the English language, and all the rest have the standard frequencies of the letters of the English language as prior beliefs.  I also intend to post an .xls file except containing a TPM of a typical book.

The new probability vectors obtained in this fashion for each number of the cipher differ markedly from the typical letter frequencies: this makes sense, because the frequencies now depend on the transitions of the letters in the key itself.  This may be the reason why some cryptographers have disputed that the cipher is written in English (based on statistical arguments); I think it is written in English, only the proportions of the letters in the cipher changes because of the particular proportions of the first letters of words in the key.

If anyone is interested in the script files, I may post them too: they are .m files, so you can run them on Octave or Matlab both.  Leave a message below.

Otherwise if anyone is interested in hooking up computers for the processing of this (or donating money so I can buy several :)), also leave a message below or contact me using the contact form.  Thanks!

Posted in Combinatorics and Probability, Cryptography, Markov Chains, Mathematics | Tagged , , , , , , , , , | Leave a comment

On the Beale Cipher, Part I

Ugh, so right now there is construction going on behind my house; an abandoned house got put down and I can only imagine Telecable extending it’s dominions over.  The street is zoned to be residential, not commercial, but the owner seems to be politically well-positioned: he gets to do whatever he wants.  In Mexico, political power is concentrated amongst a few, and not necessarily the smartest (quite the contrary in fact, the majority seems pretty dumb); I have often wondered why it is not requisite to have those whose ambition is a political post (in the legislature, in the judiciary and executive branches) take an IQ or aptitude test (… and have the smartest be selected): I’d much rather have smart crooks than stupids.

At any rate, with all that pounding in the background, I thought I’d talk a little bit about my thoughts on the Beale cipher.  I started studying it a bit enthusiastic about the prospect of deciphering it completely (in the beginning of April actually), but I have become convinced this is not a task I can accomplish too easily and without any support — in other words, it’s difficult to do on my own.  Besides, I am terrible at crossword puzzles, a skill that seems necessary to be able to guess certain words.  I can’t but remember my friend Seth K who could complete the NY Times Sunday crossword in like five minutes.  Anyway: my motivator was the challenge, or the mathematical aspect of it, not the 40 million in US dollars that it is purported to be about (the first cipher).  For some background and history, I earnestly recommend Simon Singh’s book “The Code Book.” It is extremely entertaining — and it quickly has became one of my favorite books.  Of course there’s also the Wikipedia article which is much more succinct.

When I first looked at it under the chapter of Singh’s “Le Chiffre Indechiffrable,” I suspected then as I do now that it is not entirely undecipherable.  The reasons for this:

*1. The Beale cipher is not a pad cipher, so it may be “breakable” without the specific requirement of the key.

*2.  The fact that certain numbers repeat themselves (some 8 times (“18″), others only once) would suggest that the Beale cipher is susceptible to a form of frequency analysis.

*3.  Certain sequences, for example, 64 following 18 in two different parts (first Beale cipher letters 124-125 and 187-188) may be important.

*4.  The distance between numbers may be significant.  It would seem the closer the difference the stronger the relationship between the letters; thus, the closer they are together the more information we could potentially squeeze out of them.  Sequence repetitions may also help a lot (64 following 18 twice, for example), and adds credence to number 1 and 2.

5.  The letter encoded by 1 has a different probability distribution of being a particular letter than do the others (see Wikipedia article).  The same goes for 71, which is the first letter of the first Beale cipher.  This may be true of other letters but it’s not exactly clear which, because we don’t know where a word ends and another begins.  We can surely use this to augment our frequency analysis.

6.  It cannot escape us then that if the Beale cipher is a book cipher, then each number in the code likely represents the first letter of a word from the key.  However, it’s not that knowing the first-letter frequencies of the key is likely to be any help.  Although there does seem to be another bit of information of the key that would be extremely useful (and we can somewhat extract), which I will explain in my next post.

Later, I will describe how I have incorporated all these pieces of information into an analysis of frequencies, or an augmented form of the usual frequency analysis.  For now, suffice it to say that starred statements I realized fairly early; unstarred statements I noticed only after developing the analysis a bit.

Posted in Combinatorics and Probability, Cryptography, Markov Chains, Mathematics | Tagged , , , , , , , , , | Leave a comment

On revolutionizing the whole of Linear Algebra, Markov Chain Theory, Group Theory… and all of Mathematics

I have been so remiss about writing here lately!  I’m so sorry!  There are several good reasons for this, believe me.  Among them: (1) I have been enthralled with deciphering a two hundred-year old code, the Beale cipher part I, with no substantial results except several good ideas that I may yet pursue and expound on soon here.  But this post is not intended to be about that.  (2) My computer died around December and I got a new one and I hadn’t downloaded TEX; I used this as an excuse not to write proofs from Munkres’s Topology chapter 1, and so, I have added none.  I slap myself for this (some of the problems are really boring, although they are enlightening in some ways, I have to admit, part of the reason why I began doing them in the first place). (3) The drudgery of day to day work, which is soooo utterly boring that it leaves me little time for “fun,” or math stuff, and my attention being constantly hogged by every possible distraction, at home, etc.  Anyway.

For a few months now I have been reading a lot on Markov chains because they have captured my fancy recently (they are so cool), and in fact they tie in to a couple projects I’ve been having or been thinking about.  I even wrote J. Laurie Snell because a chapter on his book was excellent (the one on Markov chains) with plenty of amazing exercises that I really enjoyed.  In looking over that book and a Schaum’s outline, a couple questions came to my head and I just couldn’t let go of these thoughts; I even sort of had to invent a concept that I want to describe here.

So in my interpretation of what a Markov chain is, and really with zero rigor, consider you have  n < \infty states, position yourself at  i .  In the next time period, you are allowed to change state if you want, and you will jump to another state  j (possibly  i ) with probability  p_ij (starting from  i ).  These probabilities can be neatly summarized in a finite  n \times n matrix, with each row being a discrete distribution of your jumping probabilities, and therefore each row sums to 1 in totality.  I think it was Kolmogorov who extended the idea to an infinite matrix, but we must be careful with the word “infinite,”  as the number of states are still countable, and so they are summarized by an  \infty \times \infty countably infinite matrix.  Being keen that you are, dear reader, you know I’m setting this question up:  What would an uncountably infinite transition probability matrix look like?  No one seems to be thinking about this, or at least I couldn’t find any literature on the subject.  So here are my thoughts:

The easiest answer is to consider a state  i to be any of the real numbers in an interval, say  [0,1] , and to imagine such a state can change to any other state on such a real interval (that is isomorphic to any other connected closed interval of the same type, as we may know from analysis).  This is summarized by a continuous probability distribution on  [0,1] , whose sum is again 1; a good candidate is a beta function, such as  6 x (1-x) , with parameters (2,2).  I think we can “collect” such probability distributions continuously on  [0,1] \times [0,1] : a transition probability patch, as I’ve been calling it.   It turns out that it becomes important, if patches are going to be of any use in the theory, to be able to raise the patch to powers (akin to raising matrixes to powers), to multiply patches by (function) vectors and other tensors, and to extend the common matrix algebra to conform to patches; but this is merely a mechanical problem, as I describe in the following pdf.  (Comments are very welcome, preferably here on the site!).

CSCIMCR

As you may be able to tell, I’ve managed to go quite a long ways with this, so that patches conform reasonably to a number discrete Markov chain concepts, including a patch version of the Chapman-Kolmogorov equations; but having created patches, there is no reason why we cannot extend the idea to “patchixes” or continuous matrixes on  [0,1] \times [0,1] without the restriction that each row cross-section sum to 1; in fact it seems possible to define identity patchixes (patches), and, in further work (hopefully I’ll be involved in it), kernels, images, eigenvalues and eigenvectors of patchixes, commuting patchixes, commutator patchixes, and a slew of group theoretical concepts.

Having defined a patchix, if we think of the values of the patchix as the coefficients in front of, say, a polynomial, can we not imagine a new “polynomial” object that runs through exponents of say  x continuously between  [0,1] with each term being “added” to another? (Consider for example something like  \sum_i g(i)x^i, i \in [0,1] ?)  I think these are questions worth asking, even if they are a little bit crazy, and I do intend to explore them some, even if it later turns out it’s a waste of time.

Posted in Combinatorics and Probability, Group Theory, Linear Algebra, Markov Chains | Tagged , , , , , , , , , , , | Leave a comment