Introduction to Stochastic Calculus

jiha-kim.github.io

444 points by ibobev 5 months ago

markisus 5 months ago

Here is a corresponding introduction I found very useful, for readers with advanced undergraduate / graduate level math knowledge.

https://almostsuremath.com/stochastic-calculus/

hrududuu 5 months ago

Great resource. This was my area of graduate study, and I would say this material is quite hard, in the beginner to advanced PhD range.
And this inspiring textbook I think has high overlap with these topics: https://www.amazon.com/Stochastic-Integration-Differential-E...
- krick 5 months ago
  
  Can perhaps someone suggest some resources that are, uh, less advanced undergraduate? Is this possible? Or perhaps just the resources for the prerequisites themselves? Like, what's the route from "not advanced undergraduate"?
  - hrududu 5 months ago
    
    The links above are for studying this as a pure mathematician would. If you want to study it that way, you would take most of the core classes in the undergrad curriculum:
    Calculus (without proofs) Linear Algebra Real Analysis (proofs of calculus) Measure Theory
    There are also higher level courses that are worth taking, because they motivated a lot of this theory. They would be imo, Functional Analysis (real analysis applied to spaces of functions), and Partial Differential Equations.
    If you've knocked off some of the undergrad prereqs and feel good about proofs, this could be the right book for you: https://www.amazon.com/Probability-Martingales-Cambridge-Mat.... Another gem of a book.
- 3abiton 5 months ago
  
  I was traumatised by fluid dynamics course back in the days before youtube tutorials were a thing and we had to rely on a good teacher to explain some concepts.
- markisus 5 months ago
  
  Yes, by advanced undergraduate, I meant very advanced undergraduate. But when I was in undergrad I always heard about some students like this who were off in the graduate classes. And then in grad school, there was even a high school student in my Algebra course who managed to correct the professor on some technical issue of group theory. So I don't assume you have to be a PhD to work through this material.

Daniel_Van_Zant 5 months ago

Is stochastic calculus something that requires a computer to stimulate many possible unfolding of events, or is there a more elegant mathematical way to solve for some of the important final outputs and probability distributions if you know the distribution of dW? This is an awesome article. I've seen stochastic calculus before but this is the first time I really felt like I started to grok it.

sfpotter 5 months ago

In case the other responses to your question are a little difficult to parse, and to answer your question a little more directly:
- Usually, you will only get analytic answers for simple questions about simple distributions.
- For more complicated problems (either because the question is complicated, or the distribution is complicated, or both), you will need to use numerical methods.
- This doesn't necessarily mean you'll need to do many simulations, as in a Monte Carlo method, although that can be a very reasonable (albeit expensive) approach.
More direct questions about certain probabilities can be answered without using a Monte Carlo method. The Fokker-Planck equation is a partial differential equation which can be solved using a variety of non-Monte Carlo approaches. The quasipotential and committor functions are interesting objects which come up in the simulation of rare events that can also be computed "directly" (i.e., without using a Monte Carlo approach). The crux of the problem is that applying standard numerical methods to the computation of these objects faces the curse of dimensionality. Finding good ways to compute these things in the high-dimensional case (or even the infinite-dimensional case) is a very hot area of research in applied mathematics. Personally, I think unless you have a very clear physical application where the mathematics map cleanly onto what you're doing, all this stuff is probably a bit of a waste of time...
- Daniel_Van_Zant 5 months ago
  
  Thanks for the explanation this was very helpful. You've given me a whole new list of stuff to Google. The quasipotential/comittor functions especially seem quite interesting although I'm having a bit of trouble finding good resources on them.
  - sfpotter 5 months ago
    
    They are pretty advanced and pretty esoteric. They will be very difficult to get into without a solid graduate background in some of this stuff, or unless you're willing to roll up your sleeves and do some serious learning. The book "Applied Stochastic Analysis" by Weinan E, Tiejun Li, and Eric Vanden-Eijnden is probably a decent place to start. I took a look at this book a while ago, and it's probably decent enough to get a foothold on the literature in order to figure out if this stuff will be useful for you. These guys are all monsters in the field.
kkylin 5 months ago

It depends a bit on exactly what you want to calculate, but in general things like the probability density function of the solution of a stochastic differential equation (SDE) at time t satisfies a partial differential equation (PDE) that is first order in time and second order in space [0]. (This PDE is known to physicists as the Fokker-Planck equation and to mathematicians as the Kolmogorov forward equation.) Except in special examples, the PDE will not have exact analytical solutions, and a numerical solution is needed. Such a numerical solution will be very expensive in high dimensions, however, so in high-dimensional problems it is cheaper to solve the SDE and do Monte Carlo sampling, rather than try to solve the PDE.
Edit: sometimes people are interested in other types of questions, for example the solution when certain random events occur. Analogous comments apply. Also, while stochastic calculus is very useful for working with SDEs, if your interest is other types of Markov (or even non-Markov) processes you may need other tools.
Edit again: as another commenter mentioned, in special cases the SDE itself may also have exact solutions, but in general not.
[0] This statement is specific to stochastic differential equations, i.e., a differential equation with (gaussian) white noise forcing. For other types of stochastic processes, e.g., Markov jump processes, the evolution equation for distributions have a different form (but some general principles apply to both, e.g., forms of the Chapman-Kolmogorov equation, etc).
FabHK 5 months ago

Certain simple stochastic differential equations can be solved explicitly analytically (like some integrals and simple ordinary differential equations can be solved explicitly), for example the classic Black Scholes equation. More complicated ones typically can't be solved in that way.
What one often wishes to have is the expectation of a function of a stochastic process at some point, and what can be shown is that this expectation obeys a certain (deterministic) partial differential equation. This then can be solved using numerical PDE solvers.
In higher dimensions, though, or if the process is highly path-dependent (not Markovian), one resorts to Monte Carlo simulation, which does indeed simulate "many possible unfolding of events".
LeonardoTolstoy 5 months ago

It has been a while since I studied along these lines (stochastic chemical reaction simulations in my case) but I think the answer is often yes, but not always (I don't think). A random walk for example will be a normal distribution (and you know the mean, and you know the variance is going to infinity), so I do think in that case you end up with an elegant analytical solution if I'm understanding correctly as the inputs can determine the function the variance follows through time.
But often no, you need to run a stochastic algorithm (e.g. Gillespie's algorithm in the case of simple stochastic chemical kinetics) as there will be no analytical solution.
Again it has been a while though.
- yoyoma1234 5 months ago
  
  For normal distributions I think do - black scholes is an analytical solution to option pricing. Been a while since I studied stochastic calculus
  I question why this is the second highest article on hacker news currently, can’t imagine many people reading this website are REALLY in this field or a related one, or if it’s just signaling like saying you have a copy of Knuths books or that famous lisp one
  - PhilipRoman 5 months ago
    
    This is one of those archetypal submissions on HN: mathematics (preferably pure, using the word "calculus" outside of integrals/derivatives gives additional points), moderately high number of upvotes, very few comments. Pretty much the opposite of political posts, where everyone can "contribute" to the discussion.
  - magicalhippo 5 months ago
    
    I upvote so it sticks around longer, so it has a better chance of generating interesting comments.
    I also upvote because I find it interesting to learn about stuff I didn't know about. I might not understand it, but I do like the exposure regardless.
  - nh23423fefe 5 months ago
    
    I upvote good things even if i dont read because i dont want to spend all my energy reacting to trash politics posts. cut away bad, promote good
anvuong 5 months ago

Depends on what you want to know. If you want to get some trajectories then simulation of the stochastic differential equation is required. But if you just want to know the statistics of the paths, then in many cases you can write and try to solve the Fokker-Planck equation, which is a partial differential equation, to get the path density.

paulfharrison 5 months ago

A further step is Langevin Dynamics, where the system has damped momentum, and the noise is inserted into the momentum. This can be used in molecular dynamics simulations, and it can also be used for Bayesian MCMC sampling.

Oddly, most mentions of Langevin Dynamics in relation to AI that I've seen omit the use of momentum, even though gradient descent with momentum is widely used in AI. To confuse matters further, "stochastic" is used to refer to approximating the gradient using a sub-sample of the data at each step. You can apply both forms of stochasticity at once if you want to!

zzazzdsa 5 months ago

The momentum analogue for Langevin is known as underdamped Langevin, which if you optimize the discretization scheme hard enough, converges faster than ordinary Langevin. As for your question, your guess is as good as mine, but I would guess that the nonconvexity of AI applications causes problems. Sampling is a hard enough problem already in the log-concave setting…

graycat 5 months ago

Own favorite source on stochastic calculus:

     Eugene Wong,
     {\it Stochastic Processes in Information and
     Dynamical Systems,\/}
     McGraw-Hill,
     New York,
     1971.\ \

paulfharrison 5 months ago

Thanks for this. Despite the vintage this seems very clearly written, the introductory material on measure theory has already made it worthwhile for me.

EGreg 5 months ago

I remember studying stochastiv calculus

And I remember noting that the standard deviation in regular statistics was that “quadratic variation” was slightly different than how variance is calculated. Off by one or squared or whatever. I made a note to eventually investigate why. Probably due to some stochastic volatility.

FabHK 5 months ago
There is the fact that the variance of the entire population is defined [0] as
```
  sum i=1..N (x_i - mu)^2 / N
```
while, given a sample of n iid [1] samples from a distribution, the best [2] estimate of the distribution variance is
```
  sum i=1..n (x_i - a )^2 / (n-1)
```
Note that we replaced the mean mu by the sample average a, [3] and divided by (n-1) instead of N.
[0] with the mean mu := sum x_i / N being the actual mean of the population
[1] independent and identically distributed
[2] best in the sense of being unbiased. It's a tedious, but not very difficult calculation to confirm that the expectation of that second expression (with n-1) is the population variance.
[3] with the sample average a := sum x_i / n being an estimate of the population mean
SeaGully 5 months ago

The other guy gives a solid explanation so don't use mine as a replacement or to assume the other is wrong.
To me there are two ways to approach the problem I think you are thinking of (sample variance I think).
(1) The sample variance depends on the sample mean which is sum(x_i) / n. Given the first n-1 of n samples, you would then know the final value (x_n = n * sample_mean - sum(x_i)_(n-1)) so at the very least n-1 could be understood as a "degrees of freedom". There are only n-1 degrees of freedom. Other higher sample moments can be roughly understood with the same degrees of freedom argument. This could be wrong though, it was just something I remember from somewhere.
(2) The more mathematically inclined way is that biased_sample_variance = sum((x_i - sum(x_i) / n)^2) / n. The mean of the biased_sample_variance (across many iterations of a set of samples N), is not the population variance, but (n - 1) / n * population_variance (i.e. it is biased). So you multiply the biased_sample_variance by (n / (n - 1)) which gives the unbiased sample_variance equation: sum((x_i - sum(x_i) / n)^2) / (n - 1). The math is rather fun in my opinion, once you get into the swing of things.
I sure do hope I understood your question correctly.

janalsncm 5 months ago

Here’s an example where I ran into this recently.

Let’s say we play a “game”. Draw a random number A between 0 and 1 (uniform distribution). Now draw a second number B from the same distribution. If A > B, draw B again (A remains). What is the average number of draws required? (In other words, what is the average “win streak” for A?)

The answer is infinity. The reason is, some portion of the time A will be extremely high and take millions of draws to beat.

drdeca 5 months ago

Showing the calculation you described:
If p is the value drawn for A, then each time B is drawn, the probability that B>A is (1-p), So, the chance that B is drawn n times before being less than or equal to A is, p^(n-1) (1-p) (a geometric distribution). The expected number of draws is then (1/p) . Then, E[draws] = E[E[draws|A=p]] = \int_0^1 E[draws|A=p] dp = \int_0^1 (1/p) dp, which diverges to infinity (as you said).
(I wasn’t doubting you, I just wanted to see the calculation.)
joncrocks 5 months ago

For anyone interested, I think this is an example of the https://en.wikipedia.org/wiki/St._Petersburg_paradox
RandomBK 5 months ago

The way the question was framed, it was ambiguous whether "draw again" only applied to B, or whether A would draw again as well. I'm assuming the 'infinity' answer applies only to the former case?
- janalsncm 5 months ago
  
  Sorry, we only draw B again.
zzazzdsa 5 months ago

Does this really require stochastic calculus to prove? This should just be a standard integration, based on the fact that the expected number of samples required for fixed A being 1/(1-A).

robwwilliams 5 months ago

Question for HN readers: We have defined about 50 spots (loci) in the mouse genome that contain DNA differences that modulate mortality rates. Most of them have complex age-dependent “actuarial” effects. We would like to predict age at death.

Would stochastic calculus be a useful approach in actuarial prediction of life expectancies of mice?

(And this is why I am pleased to see this high on HN.)

whatshisface 5 months ago

Stochastic calculus is like ordinary calculus in that it is most useful when one time is like another except for a few variables that describe a state, and least useful when one time is unlike another.
Because you have as many questions (loci) as you have segments that you can reasonably expect to divide time into (changing the time of death by 1/50th of a mouse lifespan would be impossible to detect unless I am wrong?), and because the time intervals are not that numerous, and also because you wouldn't really have a model for the interaction of the state variables and would be using model-free statistical methods, I think you would get all of the value there is to get out of noncontinuous methods.
evanfrommaxar 5 months ago

I would apply an L1-regularized regression where the variables are simple 0-1 for the presence of the gene. The L1-regularization helps you deal with the high-dimensionality of the problem.
https://en.wikipedia.org/wiki/Lasso_(statistics)
Since these are ages, I wouldn't assume an underlying Gaussian distribution. Making that change isn't as hard as you think.
https://en.wikipedia.org/wiki/Generalized_linear_model
As Always: Consult your friendly neighborhood statistician
etiam 5 months ago

I'm not prepared to say "no", and as has been noted already, it depends on the application, but from your description it seems to me more like a task for Bayesian statistics organized on graphs (the nodes & vertices kind).
- btown 5 months ago
  
  And going beyond this: my layman's understanding of biology is that the way in which genes are expressed can be highly nonlinear and modulated by all sorts of different pathways. If you have some clarity on how these pathways work, probabilistic programming might be a helpful tool here in a Bayesian context.
  It's been a number of years since I've looked at these things, but https://www.theactuary.com/2024/04/04/bayesian-revolution and https://arxiv.org/abs/2310.14888 are recent articles that may be relevant.
joe_the_user 5 months ago

(Just spitballing)
I think stochastic calculus looks at a system whose output value is a smooth/real value. Basically, it is for modeling systems like random walks where there is a little bit of random up-and-down jumping in each interval. However, if you are basically looking time versus dead-or-alive, your output is binary and time-of-death is really all the info you get and you wouldn't need/want a random walk model, just a more ordinary statistical model. Maybe if there was some other variable besides dead-or-alive you were measuring or aware of a stochastic model could help then (which is a bit like saying "if we had bacon, we could have bacon-and-eggs, if we had eggs").
Also, if what you're saying is you have 50*X bytes of information that all influence life expectancy, it sounds like a challenging problem. But also it's kind of Taylor-made for neural networks; many discreet inputs versus a single smooth output. You might try a neural network and linear model and see how much better the neural network is - then you could determine if more complex-than-linear interactions were occurring.
bbminner 5 months ago

Just in case you missed it, https://en.m.wikipedia.org/wiki/Survival_analysis exists to answer specifically this question.
In more practical terms, if I were to approach this problem, I'd discretize it in time and apply classical ml to predict "chance to die during month X assuming you survived that long" and fit it to data - that'd be much easier to spot errors and potential issues with your data.
I'd go for the stochastic calculus or actual survival analysis only if you wanted to prove/draw a connection between some pre-existing mathematical properly such as memory-less-ness and a physical/biological properly of a system such as behavior of certain proteins (that'd be insanely cool, but rather hard, esp if data is limited). In my (very vague) understanding, that's what finance papers that use stochastic analysis do - they make a mathematical assumption about some universal mathematical properly of a system (if markets were always near optimal with probability of deviation decaying as XYZ, the world economy would react this way to these things), and then prove that it actually fits the data.
Happy to chat more, sounds like a fun project :)
nextos 5 months ago

I was coming here to say this is a survival analysis problem, and thus a different branch of probability and statistics. However, you can also frame it as a stochastic process if you have extra epigenetic data that is associated to those 50 DNA loci or some genes they regulate.
For example, your DNA loci of interest could have a state (methylated or unmethylated). And you could come up with a stochastic process where death occurs when a function of methylation changes at those loci (e.g. a linear model) crosses a threshold (first passage in stochastic process jargon).
Omer Karin & Uri Alon have published a similar concept to explain how the decreased capacity of immune cells to remove senescent cells leads to a Gompertz-like law of longevity, something that originates from actuarial studies! Their model is simpler as they deal with a univariate problem [1].
[1] https://www.nature.com/articles/s41467-019-13192-4
paulfharrison 5 months ago

As others have said in various ways, start by fitting a survival model using glmnet.
That said, here are some folks trying to use SDEs to model cells, they even have a "dW" on their logo. This is a long way from predicting age of death, but it might eventually give insights into the exact mechanism. Also I think they're starting with bacteria and yeast, so mice might be a way off.
https://macsys.org/
robwwilliams 5 months ago

Thanks for the set of helpful replies. I’m going to be working through your suggestions over the next few months. The study is a follow-up to this work from 2 years ago:
https://pubmed.ncbi.nlm.nih.gov/36173858/
seanhunter 5 months ago

Can’t speak about mice, but stochastic calculus is used in modelling for life insurance for humans I believe.
eg https://www.soa.org/globalassets/assets/Files/static-pages/r...
- joe_the_user 5 months ago
  
  Your link doesn't demonstrate the use of stochastic calculus by life insurance companies or for life insurance. It's just an undergraduate curriculum for actuarial students (that they learn all this stuff doesn't imply that's what life insurance companies use).
- layer8 5 months ago
  
  This is rather https://en.mwikipedia.org/wiki/Stochastic_modelling_(insuran...

whatshisface 5 months ago

Here's my understanding of Ito calculus if it helps anyone:

1. The only random process we understand initially is Brownian motion.

2. Luckily, we can change coordinates.

max_ 5 months ago

Thanks, could you expand more on 2?
- hrududuu 5 months ago
  
  Ito's formula/lemma is like the chain rule from calculus. It is a generalization, in that it uses a second order Taylor series expansion, whereas the chain rule only needs a first order expansion. Anyway, I think (2) is a reflection of this fact, and how the chain rule lets us compute dynamics of a derived process.
  I sort of disagree with (1), since Ito's lemma is most naturally applied to ~martingales, of which Brownian Motion is an important special case.

ngriffiths 5 months ago

This is such a good model for how to write a beginner friendly introduction. Especially the motivation for the Ito lemma, with the dW^2 term remaining important even though it disappears in regular calculus, and the conversion to Stratonovich is really nice.

dmvdoug 5 months ago

Can someone please help me parse this sentence?

> Brownian motion and Itô calculare a notable example of fairly high-level mathematics that are applied to model the real world

What is “Itô calculare” supposed to have been? I am stumped. “Its calculation”?

luisfmh 5 months ago

Itô is the name of the type of calculus (https://en.wikipedia.org/wiki/It%C3%B4_calculus) and calculare I think is just the plural of calculus. So something like "all the itô calculus are notable examples of fairly high level mathematics ..."
- layer8 5 months ago
  
  The plural of calculus is calculi or calculuses. Calculare might be an autocorrection for a different language (https://en.wiktionary.org/wiki/calculare), though given that the author has a Korean name, it’s more likely just a weird typo.
- dmvdoug 5 months ago
  
  That makes so much more sense! Although the pedant in me wants to argue that calculus plural is “calculi”/“calculuses” (the dictionary gives me the latter, although I’ve never seen it in the wild myself—-but I won’t pursue that because it’s beside the point!) Thanks for the help!
incognito124 5 months ago

It's a typo. "calculare" is supposed to be "calculus are"
FabHK 5 months ago

Typo.
-> and Itô calculus are a notable
ricoxicano 5 months ago

I think it's a reference to Itô Calculus
https://en.wikipedia.org/wiki/It%C3%B4_calculus
karpierz 5 months ago

Ito calculus - https://en.wikipedia.org/wiki/It%C3%B4_calculus
adgjlsfhk1 5 months ago

the article goes into the details in https://jiha-kim.github.io/posts/introduction-to-stochastic-... but the TLDR is it's a way to define integration of random walks.

eachro 5 months ago

For those in quant finance, how much of this is useful in your day to day?

mamonster 5 months ago

Day to day not so much unless you are in structured products/exotics as a structurer, at which point yeah its pretty important.
That said, already at masters level internships you could get asked much harder questions than what this article touches on. I got asked to prove the Cameron-Martin theorem once, I found that to be extremely difficult in a job interview setting.
keithalewis 5 months ago

There is no need for it. Here is a simple replacement: https://keithalewis.github.io/math/um1.html.
mhh__ 5 months ago

It depends.
In a linear rates shop (i.e. not trading options), almost all of the effort goes to tuning the deterministic bit of this equation. Thousands upon thousands of lines of code to do a problem that most books don't even mention behind giving the term a symbolic name!
And then if you do trade an option it's probably good enough to use an off the shelf model to work out your delta and so on.
If you're making markets or flogging exotics and structured products then you may indeed be wrangling this stuff all the time.

bowsamic 5 months ago

I had to study quantum stochastic calculus for my PhD. Really crazy because you get totally different results for the same mathematical expression compared to normal calculus

ta8645 5 months ago

Doesn't this mean that at least one of the results is wrong?
- antognini 5 months ago
  
  No, I think one of the fundamental insights of stochastic calculus is that the addition of noise to a process changes the trajectory in a non-trivial way.
  In finance, for instance, it leads to the concept of a "volatility tax." Naively, you might think that adding noise to the process shouldn't change the expected return, it would just add some noise to the overall return. But in fact adding volatility to the process has the effect of reducing the expected return compared to what you would have in the absence of volatility. (This is one of the applications of the result that the original article talks about in the Geometric Brownian Motion section.)
  - crdrost 5 months ago
    
    Just to add to this, the reason that the things are different is, stochastics as a subject is trying to do calculus in the presence of noise, and what noise does is, it makes your function nondifferentiable. You would think that you cannot do calculus, without smooth curves! But you can, but we have to modify the chain rule and define exactly what we mean by integration etc.
    So the idea is “smooth curves do X, but non-smooth noisy curves do Υ(χ) where χ in some sense is the noise input into the system, and they aren't contradictory because Y(0) = X. (At least usually... I think chaos theory has some counterexamples where like the time t that you can predict a system’s results for, is, in the presence of exactly 0 noise, t=∞, but in the limit of nonzero noise going to zero, it's some finite t=T.)
- bowsamic 5 months ago
  
  Kinda. The differential operator in quantum Ito calculus can be applied to mathematical objects that the normal differentials aren’t properly defined on, such as stochastic variables.

bytesandbits 5 months ago

Does it have applications in modeling decision-making?

tsunego 5 months ago

still wild to me that diffusion models are fast becoming the secret sauce behind ai image generation, but their roots are buried deep in stochastic calculus

who knew brownian motion would eventually help create cat memes?

ForceBru 5 months ago

Seems like a great article. Having some prior experience with stochastic calculus, I think I understand almost everything here. Any other good introductory materials?

seanhunter 5 months ago

I’ve been planning to study this in a bit although I have some background to cover first so haven’t got on to it. From what I’ve found, the youtube channel “Mathematical Toolbox” has some videos which are quite introductory but seem good. Some people also recommend the book “An Informal Introduction to Stochastic Calculus with Applications” by Calin as a good place to start. Then Klebaner “Introduction to Stochastic Calculus with Applications” and also Evans “An Introduction to Stochastic Differential Equations” are apparently very good but harder and more formal texts, but you need some analysis and measure theoretic probability background first. The Evans is the same Evans who wrote the definitive book about PDEs fwiw. Klebaner and Evans are apparently a lot harder than Calin though even though they are all called introductions.

__lbracket__ 5 months ago

[dead]