Science, culture, complexity

Tag: stochastic processes

  • What does a quantum Bayes’s rule look like?

    Bayes’s rule is one of the most fundamental principles in probability and statistics. It allows us to update our beliefs in the face of new evidence. In its simplest form, the rule tells us how to revise the probability of a hypothesis once new data becomes available.

    A standard way to teach it involves drawing coloured balls from a pouch: you start with some expectation (e.g. “there’s a 20% chance I’ll draw a blue ball”), then you update your belief depending on what you observe (“I’ve drawn a red ball, so the actual chance of drawing a blue ball is 10%”). While this example seems simple, the rule carries considerable weight: physicists and mathematicians have described it as the most consistent way to handle uncertainty in science, and it’s a central part of logic, decision theory, and indeed nearly every field of applied science.

    There are two well-known ways of arriving at Bayes’s rule. One is the axiomatic route, which treats probability as a set of logical rules and shows that Bayesian updating is the only way to preserve consistency. The other is variational, which demands that updates should stay as close as possible to prior beliefs while remaining consistent with new data. This latter view is known as the principle of minimum change. It captures the intuition that learning should be conservative: we shouldn’t alter our beliefs more than is necessary. This principle explains why Bayesian methods have become so effective in practical statistical inference: because they balance a respect for new data with loyalty to old information.

    A natural question arises here: can Bayes’s rule be extended into the quantum world?

    Quantum theory can be thought of as a noncommutative extension of probability theory. While there are good reasons to expect there should be a quantum analogue of Bayes’s rule, the field has for a long time struggled to identify a unique and universally accepted version. Instead, there are several competing proposals. One of them stands out: the Petz transpose map. This is a mathematical transformation that appears in many areas of quantum information theory, particularly in quantum error correction and statistical sufficiency. Some scholars have even argued that it’s the “correct” quantum Bayes’s rule. Still, the situation remains unsettled.

    In probability, the joint distribution is like a big table that lists the chances of every possible pair of events happening together. If you roll a die and flip a coin, the joint distribution specifies the probability of getting “heads and a 3”, “tails and a 5”, and so on. In this big table, you can also zoom out and just look at one part. For example, if you only care about the die, you can add up over all coin results to get the probability of each die face. Or if you only care about the coin, you can add up over all die results to get the probability of heads or tails. These zoomed-out views are called marginals.

    The classical Bayes’s rule doesn’t just update the zoomed-out views but the whole table — i.e. the entire joint distribution — so the connection between the two events also remains consistent with the new evidence.

    In the quantum version, the joint distribution isn’t a table of numbers but a mathematical object that records how the input and output of a quantum process are related. The point of the new study is that if you want a true quantum Bayes’s rule, you need to update that whole object, not just one part of it.

    A new study by Ge Bai, Francesco Buscemi, and Valerio Scarani in Physical Review Letters has taken just this step. In particular, they’ve presented a quantum version of the principle of minimum change by showing that when the measure of change is chosen to be quantum fidelity — a widely used measure of similarity between states — this optimisation leads to a unique solution. Equally remarkably, this solution coincided with the Petz transpose map in many important cases. As a result, the researchers have built a strong bridge between classical Bayesian updating, the minimum change principle, and a central tool of quantum information.

    The motivation for this new work isn’t only philosophical. If we’re to generalise Bayes’s rule to include quantum mechanics as well, we need to do so in a way that respects the structural constraints of quantum theory without breaking away from its classical roots.

    The researchers began by recalling how the minimum change principle works in classical probability. Instead of updating only a single marginal distribution, the principle works at the level of the joint input-output distribution. Updating then becomes an optimisation problem, i.e. finding the subsequent distribution that’s consistent with the new evidence but minimally different from the evidence from before.

    In ordinary probability, we talk about stochastic processes. These are rules that tell us how an input is turned into an output, with certain probabilities. For example if you put a coin into a vending machine, there might be a 90% chance you get a chips packet and a 10% chance you get nothing. This rule describes a stochastic process. This process can also be described with a joint distribution.

    In quantum physics, however, it’s tricky. The inputs and outputs aren’t just numbers or events but quantum states, which are described by wavefunctions or density matrices. This makes the maths much more complex. The resulting stochastic processes also become sequences of events called completely positive trace-preserving (CPTP) maps.

    A CPTP map is the most general kind of physical evolution allowed: it takes a quantum state and transforms it into another quantum state. And in the course of doing so, it needs to follow two rules: it shouldn’t yield any negative probabilities and it should ensure the total probability adds up to 1. That is, your chance of getting a chips packet shouldn’t be –90% nor should it be 90% plus a 20% chance of getting nothing.

    These complications mean that, while the joint distribution in classical Bayesian updating is a simple table, the one in quantum theory is more sophisticated. It uses two mathematical tools in particular. One is purification, a way to embed a mixed quantum state into a larger ‘pure’ state so that mathematicians can keep track of correlations. The other is Choi operators, a standard way of representing a CPTP map as a big matrix that encodes all possible input-output behaviour at once.

    Together, these tools play the role of the joint distribution in the quantum setting: they record the whole picture of how inputs and outputs are related.

    Now, how do you compare two processes, i.e. the actual forward process (input → output) and the guessed reverse process (output → input)?

    In quantum mechanics, one of the best measures of similarity is fidelity. It’s a number between 0 and 1. 0 means two processes are completely different and 1 means they’re exactly the same.

    In this context, the researchers’ problem statement was this: given a forward process, what reverse process is closest to it?

    To solve this, they looked over all possible reverse processes that obeyed the two rules, then they picked the one that maximised the fidelity, i.e. the CPTP map most similar to the forward process. This is the quantum version of applying the principle of minimum change.

    In the course of this process, the researchers found that in natural conditions, the Petz transpose map emerges as the quantum Bayes’s rule.

    In quantum mechanics, two objects (like matrices) commute if the order in which you apply them doesn’t matter. That is, A then B produces the same outcome as B then A. In physical terms, if two quantum states commute, they behave more like classical probabilities.

    The researchers found that when the CPTP map that takes an input and produces an output, called the forward channel, commutes with the new state, the updating process is nothing but the Petz transpose map.

    This is an important result for many reasons. Perhaps foremost is that it explains why the Petz map has shown up consistently across different parts of quantum information theory. It appears it isn’t just a useful tool but the natural consequence of the principle of minimum change applied in the quantum setting.

    The study also highlighted instances where the Petz transpose map isn’t optimal, specifically when the commutativity condition fails. In these situations, the optimal updating process depends more intricately on the new evidence. This subtlety departs clearly from classical Bayesian logic because in the quantum case, the structure of non-commutativity forces updates to depend non-linearly on the evidence (i.e. the scope of updating can be disproportionate to changes in evidence).

    Finally, the researchers have shown how their framework can recover special cases of practical importance. If some new evidence perfectly agrees with prior expectations, the forward and reverse processes become identical, mirroring the classical situation where Bayes’s rule simply reaffirms existing beliefs. Similarly, in contexts like quantum error correction, the Petz transpose map’s appearance is explained by its status as the optimal minimal-change reverse process.

    But the broader significance of this work lies in the way it unifies different strands of quantum information theory under a single conceptual roof. By proving that the Petz transpose map can be derived from the principle of minimum change, the study has provided a principled justification for its widespread use rather than being restricted to particular contexts. This fact has immediate consequences for quantum computing, where physicists are looking for ways to reverse the effects of noise on fragile quantum states. The Petz transpose map has long been known to do a good job of recovering information from these states after they’ve been affected by noise. Now that physicists know the map embodies the smallest update required to stay consistent with the observed outcomes, they may be able to design new recovery schemes that exploit the structure of minimal change more directly.

    The study may also open doors to extending Bayesian networks into the quantum regime. In classical probability, a Bayesian network provides a structured way to represent cause-effect relationships. By adapting the minimum change framework, scientists may be able to develop ‘quantum Bayesian networks’ where the way one updates their expectations of a particular outcome respects the peculiar constraints of CPTP maps. This could have applications in quantum machine learning and in the study of quantum causal models.

    There are also some open questions as well. For instance, the researchers have noted that if different measures of divergence other than fidelity are used, e.g. the Hilbert-Schmidt distance or quantum relative entropy, the resulting quantum Bayes’s rules may be different. This in turn indicates that there could be multiple valid updating rules, each suited to different contexts. Future research will need to map out these possibilities and determine which ones are most useful for particular applications.

    In all, the study provides both a conceptual advance and a technical tool. Conceptually, it shows how the spirit of Bayesian updating can carry over into the quantum world; technically, it provides a rigorous derivation of when and why the Petz transpose map is the optimal quantum Bayes’s rule. Taken together, the study’s finding strengthens the bridge between classical and quantum reasoning and offers a deeper understanding of how information is updated in a world where uncertainty is baked into reality rather than being due to an observer’s ignorance.

  • A not-so-random walk through random walks

    Though I’ve been interested of late with the idea of random walks, I was introduced to the concept when, more than two decades ago, I stumbled across Conway’s Game of Life, the cellular automaton built by John Conway in 1970. A cellular automaton is a grid of cells in which each cell has a value depending on the values of its neighbours. The automaton simulates the evolution of the grid as the cells’ values change dynamically.

    Langton’s ant was a popular instance of the simulator and one of my favourites, too. One 2001 conference paper described it as “a simple … system with a surprisingly complex behaviour.” Here, a (virtual) ant starts off at a random cell on the grid and moves randomly into one of the four neighbouring squares (diagonal squares aren’t accessible). There are three rules:

    (i) A cell can be either black or white in colour;

    (ii) If the square is white when the ant moves into it, the colour is flipped, and the ant turns 90º clockwise and moves forward;

    (iii) If the square is black, the colour is flipped, and the ant turns 90º counter-clockwise and moves forward.

    As the ant moves across the grid in this way, the first hundred or so steps produce a symmetric pattern before chaos ensues. For the next 9,900 or so steps, an image devoid of any patterns comes into view. But after around 10,000 steps, there’s magic: the ant suddenly enters into a repetitive 104-step pattern that it continues until the end of time. You can run your own simulation and check.

    The path of a Langton’s ant. The repetitive pattern after ~10,000 steps is the ‘highway’ growing at the bottom. The location of the ant is shown in red. Credit: Krwarobrody and Ferkel/Wikimedia Commons

    The march of the Langton’s ant before the repetitive portion has been described as a pseudorandom walk — a walk whose pattern appears random but whose next step is not quite random (because of the rules). In a truly random walk, the length of each step is fixed and the direction of each step is chosen at random from a fixed number of options.

    If it sounds simple, it is, but you might be forgiven for thinking it’s only a mathematical flight of fancy. Random walks have applications in numerous areas, including econometrics, finance, biology, chemistry, and quantum physics.

    The trajectory of a random walk after 25,000 steps. Credit: László Németh/Wikimedia Commons

    Specific variants of the random walk behave in ways that closely match the properties of some complex system evolving in time. For example, in a Gaussian random walk, the direction of each step is random and the length of each step is sampled randomly from a Gaussian distribution (the classic example of a bell curve). Experts use the evolution of this walk to evaluate the risk exposure of investment portfolios.

    The Lévy flight is a random walk with a small change: instead of the step length being determined by a random pick from the Gaussian distribution, it comes from any distribution with a heavy tail. One common example is the gamma distribution. Each such distribution can be tweaked with two parameters called κ (kappa) and θ (theta) to produce different plots on a graph, all with the same general properties. In the examples shown below, focus on the orange line (κ = 2, θ = 2): it shows a gamma distribution with a heavy tail.

    Various gamma distributions for different values of κ and θ. Credit: MarkSweep and Cburnett/Wikimedia Commons, CC BY-SA 3.0

    Put another way, the distribution has some large values but mostly small values. A Lévy flight is a random walk where the step length is sampled randomly from this distribution, and as a result has a few large steps and many small steps. Research has shown that the foraging path of animals looking for food that is scarce can be modelled as a Lévy flight: the large steps correspond to the long distances towards food sources that are located far apart and the short steps to finding food spread in a small area at each source.

    A Lévy flight simulated for 1,000 steps. Credit: PAR/Wikimedia Commons

    Perhaps the most famous ‘example’ of a random walk is Brownian motion; it isn’t a perfect example however. Brownian motion can describe, say, the path of a single atom over time in a gas of billions of atoms by using a Lévy process. Whereas a random walk proceeds in discrete steps, a Lévy process is continuous; they are in other respects the same. The motion itself refers to the atom’s journey in some time period, frequently bumping into other atoms (depending on the gas’s density) and shifting its path in random ways.

    The yellow circle depicts the motion of a larger particle in a container filled with smaller particles moving in random directions at different speeds. Credit: Francisco Esquembre, Fu-Kwun and lookang/Wikimedia Commons, CC BY-SA 3.0

    Brownian motion in particular uses a type of Lévy process called the Wiener process, where the path evolves according to the following rules:

    (i) Each increment of the process is independent of other (non-overlapping) increments;

    (ii) How much the process changes over a period of time depends only on the duration of the period;

    (iii) Increments in the process are randomly sampled from a Gaussian distribution;

    (iv) The process has a statistical mean equal to zero;

    (v) The process’s covariance between any two time points is equal to the lower variance at those two points (variance denotes how quickly the value of a variable is spreading out over time).

    The path of the atom in the gas follows a Wiener process and is thus Brownian motion. The Wiener process has a wealth of applications across both the pure and the applied sciences. Just to name one: say there is a small particle — e.g. an ion — trapped in a cell. It can’t escape the cell except through a small opening. The Wiener process, which models the Brownian motion of the ion through the cell, can be used to estimate the average amount of time the ion will need to reach the opening and escape.

    Like random walks, Wiener processes can also be tweaked to produce models for different conditions. One example is the Brownian bridge, which arises when a Wiener process is limited to appear at the start of an interval and disappear at the end, with the start and end points fixed. A different, more visual way to put this is in terms of a graph with two y-axes and one x-axis. Say the point 0 is the start of the interval on the left y-axis and 1 is the end of the interval on the right y-axis. A Wiener process in the interval [0, 1] will be a ‘bridge’ that connects 0 and 1 in a path that follows Brownian motion.

    A Brownian bridge pinned at the two endpoints of an interval. Credit: Zemyla/Wikimedia Commons, CC BY-SA 3.0

    By analogy, a random bridge in the interval [0, 1] will be a random walk based on the Gaussian distribution between 0 and 1; a gamma random bridge in the interval [0, 1] will be a random walk based on the gamma distribution between 0 and 1; and so on. (This said, a Wiener process and a random walk are distinct: a Wiener process will play out the same way if the underlying grid is rotated by an arbitrary angle but a random walk won’t.)

    It’s a wonder of mathematics that it can discern recurring behaviours in such highly noisy systems and with its finite tools distil from them glimpses into their future. According to a 2020 preprint paper on arXiv, “Various random-walk-related models can be applied in different fields, which is of great significance to downstream tasks such as link prediction, recommendation, computer vision, semi-supervised learning, and network embedding.”

    If some basic conditions are met, there are random walks out in the universe as well. In 2004, researchers estimated the Brownian velocity of the black hole at the Milky Way’s centre to be less than 1 km/s.

    For a more mathematical example, in a ‘conventional’ random walk, after N steps the walker’s distance from the origin will be comparable to the square root of N. Further, it takes on average S2 steps to travel a distance of S from the origin. For a long time, researchers believed this so-called S → S2 scaling law could model almost any process in which a physical entity was moving from one location to another. The law captured the notion of how much a given distribution would spread out over time.

    One of the earliest deviations from this law was fractals, where there is an S → Sβ law but such that β is always greater than 2, implying a greater amount of spread relative to the step length vis-à-vis random walks. (Factoid: a random walk on a fractal also gives rise to a fractal.)

    A Sierpinski triangle fractal. Credit: Beojan Stanislaus/Wikimedia Commons, CC BY-SA 3.0

    For yet another example, random walks have a famously deep connection to resistor networks: electric circuits where a bunch of resistors are connected in some configuration, plus a power source and a grounding. Researchers have found that the effective voltage between any two points in the circuit is proportional to the time a random-walker would take to travel between those two points for the first time.

    A schematic diagram of an electrical circuit where the challenge is to determine the resistance to the flow of an electric current at each of the in-between nodes. Source: math.ucla.edu

    The resistor model speaks to a beautiful thing random walks bring to light: the influence an underlying structure exerts on a stochastic process — one governed entirely by chance — playing out on that structure, its inherent geometry imposing unexpected limits on the randomness and keeping it from wheeling away into chaos. At each step the random walk makes an unpredictable choice but the big picture in which these steps are individual strokes is a picture of predictability, to some degree at least.

    Flip this conclusion on its head and an even more captivating notion emerges: that though two random walks may resemble each other in their statistical properties, they can still be very different journeys.