Alcove

Science, culture, complexity

Tag: unitarity

What does a quantum Bayes’s rule look like?

Bayes’s rule is one of the most fundamental principles in probability and statistics. It allows us to update our beliefs in the face of new evidence. In its simplest form, the rule tells us how to revise the probability of a hypothesis once new data becomes available.

A standard way to teach it involves drawing coloured balls from a pouch: you start with some expectation (e.g. “there’s a 20% chance I’ll draw a blue ball”), then you update your belief depending on what you observe (“I’ve drawn a red ball, so the actual chance of drawing a blue ball is 10%”). While this example seems simple, the rule carries considerable weight: physicists and mathematicians have described it as the most consistent way to handle uncertainty in science, and it’s a central part of logic, decision theory, and indeed nearly every field of applied science.

There are two well-known ways of arriving at Bayes’s rule. One is the axiomatic route, which treats probability as a set of logical rules and shows that Bayesian updating is the only way to preserve consistency. The other is variational, which demands that updates should stay as close as possible to prior beliefs while remaining consistent with new data. This latter view is known as the principle of minimum change. It captures the intuition that learning should be conservative: we shouldn’t alter our beliefs more than is necessary. This principle explains why Bayesian methods have become so effective in practical statistical inference: because they balance a respect for new data with loyalty to old information.

A natural question arises here: can Bayes’s rule be extended into the quantum world?

Quantum theory can be thought of as a noncommutative extension of probability theory. While there are good reasons to expect there should be a quantum analogue of Bayes’s rule, the field has for a long time struggled to identify a unique and universally accepted version. Instead, there are several competing proposals. One of them stands out: the Petz transpose map. This is a mathematical transformation that appears in many areas of quantum information theory, particularly in quantum error correction and statistical sufficiency. Some scholars have even argued that it’s the “correct” quantum Bayes’s rule. Still, the situation remains unsettled.

In probability, the joint distribution is like a big table that lists the chances of every possible pair of events happening together. If you roll a die and flip a coin, the joint distribution specifies the probability of getting “heads and a 3”, “tails and a 5”, and so on. In this big table, you can also zoom out and just look at one part. For example, if you only care about the die, you can add up over all coin results to get the probability of each die face. Or if you only care about the coin, you can add up over all die results to get the probability of heads or tails. These zoomed-out views are called marginals.

The classical Bayes’s rule doesn’t just update the zoomed-out views but the whole table — i.e. the entire joint distribution — so the connection between the two events also remains consistent with the new evidence.

In the quantum version, the joint distribution isn’t a table of numbers but a mathematical object that records how the input and output of a quantum process are related. The point of the new study is that if you want a true quantum Bayes’s rule, you need to update that whole object, not just one part of it.

A new study by Ge Bai, Francesco Buscemi, and Valerio Scarani in Physical Review Letters has taken just this step. In particular, they’ve presented a quantum version of the principle of minimum change by showing that when the measure of change is chosen to be quantum fidelity — a widely used measure of similarity between states — this optimisation leads to a unique solution. Equally remarkably, this solution coincided with the Petz transpose map in many important cases. As a result, the researchers have built a strong bridge between classical Bayesian updating, the minimum change principle, and a central tool of quantum information.

The motivation for this new work isn’t only philosophical. If we’re to generalise Bayes’s rule to include quantum mechanics as well, we need to do so in a way that respects the structural constraints of quantum theory without breaking away from its classical roots.

The researchers began by recalling how the minimum change principle works in classical probability. Instead of updating only a single marginal distribution, the principle works at the level of the joint input-output distribution. Updating then becomes an optimisation problem, i.e. finding the subsequent distribution that’s consistent with the new evidence but minimally different from the evidence from before.

In ordinary probability, we talk about stochastic processes. These are rules that tell us how an input is turned into an output, with certain probabilities. For example if you put a coin into a vending machine, there might be a 90% chance you get a chips packet and a 10% chance you get nothing. This rule describes a stochastic process. This process can also be described with a joint distribution.

In quantum physics, however, it’s tricky. The inputs and outputs aren’t just numbers or events but quantum states, which are described by wavefunctions or density matrices. This makes the maths much more complex. The resulting stochastic processes also become sequences of events called completely positive trace-preserving (CPTP) maps.

A CPTP map is the most general kind of physical evolution allowed: it takes a quantum state and transforms it into another quantum state. And in the course of doing so, it needs to follow two rules: it shouldn’t yield any negative probabilities and it should ensure the total probability adds up to 1. That is, your chance of getting a chips packet shouldn’t be –90% nor should it be 90% plus a 20% chance of getting nothing.

These complications mean that, while the joint distribution in classical Bayesian updating is a simple table, the one in quantum theory is more sophisticated. It uses two mathematical tools in particular. One is purification, a way to embed a mixed quantum state into a larger ‘pure’ state so that mathematicians can keep track of correlations. The other is Choi operators, a standard way of representing a CPTP map as a big matrix that encodes all possible input-output behaviour at once.

Together, these tools play the role of the joint distribution in the quantum setting: they record the whole picture of how inputs and outputs are related.

Now, how do you compare two processes, i.e. the actual forward process (input → output) and the guessed reverse process (output → input)?

In quantum mechanics, one of the best measures of similarity is fidelity. It’s a number between 0 and 1. 0 means two processes are completely different and 1 means they’re exactly the same.

In this context, the researchers’ problem statement was this: given a forward process, what reverse process is closest to it?

To solve this, they looked over all possible reverse processes that obeyed the two rules, then they picked the one that maximised the fidelity, i.e. the CPTP map most similar to the forward process. This is the quantum version of applying the principle of minimum change.

In the course of this process, the researchers found that in natural conditions, the Petz transpose map emerges as the quantum Bayes’s rule.

In quantum mechanics, two objects (like matrices) commute if the order in which you apply them doesn’t matter. That is, A then B produces the same outcome as B then A. In physical terms, if two quantum states commute, they behave more like classical probabilities.

The researchers found that when the CPTP map that takes an input and produces an output, called the forward channel, commutes with the new state, the updating process is nothing but the Petz transpose map.

This is an important result for many reasons. Perhaps foremost is that it explains why the Petz map has shown up consistently across different parts of quantum information theory. It appears it isn’t just a useful tool but the natural consequence of the principle of minimum change applied in the quantum setting.

The study also highlighted instances where the Petz transpose map isn’t optimal, specifically when the commutativity condition fails. In these situations, the optimal updating process depends more intricately on the new evidence. This subtlety departs clearly from classical Bayesian logic because in the quantum case, the structure of non-commutativity forces updates to depend non-linearly on the evidence (i.e. the scope of updating can be disproportionate to changes in evidence).

Finally, the researchers have shown how their framework can recover special cases of practical importance. If some new evidence perfectly agrees with prior expectations, the forward and reverse processes become identical, mirroring the classical situation where Bayes’s rule simply reaffirms existing beliefs. Similarly, in contexts like quantum error correction, the Petz transpose map’s appearance is explained by its status as the optimal minimal-change reverse process.

But the broader significance of this work lies in the way it unifies different strands of quantum information theory under a single conceptual roof. By proving that the Petz transpose map can be derived from the principle of minimum change, the study has provided a principled justification for its widespread use rather than being restricted to particular contexts. This fact has immediate consequences for quantum computing, where physicists are looking for ways to reverse the effects of noise on fragile quantum states. The Petz transpose map has long been known to do a good job of recovering information from these states after they’ve been affected by noise. Now that physicists know the map embodies the smallest update required to stay consistent with the observed outcomes, they may be able to design new recovery schemes that exploit the structure of minimal change more directly.

The study may also open doors to extending Bayesian networks into the quantum regime. In classical probability, a Bayesian network provides a structured way to represent cause-effect relationships. By adapting the minimum change framework, scientists may be able to develop ‘quantum Bayesian networks’ where the way one updates their expectations of a particular outcome respects the peculiar constraints of CPTP maps. This could have applications in quantum machine learning and in the study of quantum causal models.

There are also some open questions as well. For instance, the researchers have noted that if different measures of divergence other than fidelity are used, e.g. the Hilbert-Schmidt distance or quantum relative entropy, the resulting quantum Bayes’s rules may be different. This in turn indicates that there could be multiple valid updating rules, each suited to different contexts. Future research will need to map out these possibilities and determine which ones are most useful for particular applications.

In all, the study provides both a conceptual advance and a technical tool. Conceptually, it shows how the spirit of Bayesian updating can carry over into the quantum world; technically, it provides a rigorous derivation of when and why the Petz transpose map is the optimal quantum Bayes’s rule. Taken together, the study’s finding strengthens the bridge between classical and quantum reasoning and offers a deeper understanding of how information is updated in a world where uncertainty is baked into reality rather than being due to an observer’s ignorance.

2025.10.04
Dispelling Maxwell’s demon

Maxwell’s demon is one of the most famous thought experiments in the history of physics, a puzzle first posed in the 1860s that continues to shape scientific debates to this day. I’ve struggled to make sense of it for years. Last week I had some time and decided to hunker down and figure it out, and I think I succeeded. The following post describes the fruits of my efforts.

At first sight, the Maxwell’s demon paradox seems odd because it presents a supernatural creature tampering with molecules of gas. But if you pare down the imagery and focus on the technological backdrop of the time of James Clerk Maxwell, who proposed it, a profoundly insightful probe of the second law of thermodynamics comes into view.

The thought experiment asks a simple question: if you had a way to measure and control molecules with perfect precision and at no cost, will you able to make heat flow backwards, as if in an engine?

Picture a box of air divided into two halves by a partition. In the partition is a very small trapdoor. It has a hinge so it can swing open and shut. Now imagine a microscopic valve operator that can detect the speed of each gas molecule as it approaches the trapdoor, decide whether to open or close the door, and actuate the door accordingly.

The operator follows two simple rules: let fast molecules through from left to right and let slow molecules through from right to left. The temperature of a system is nothing but the average kinetic energy of its constituent particles. As the operator operates, over time the right side will heat up and the left side will cool down — thus producing a temperature gradient for free. Where there’s a temperature gradient, it’s possible to run a heat engine. (The internal combustion engine in fossil-fuel vehicles is a common example.)

A schematic diagram of the Maxwell’s demon thought experiment. Htkym (CC BY-SA)

But the possibility that this operator can detect and sort the molecules, thus creating the temperature gradient without consuming some energy of its own, seems to break the second law of thermodynamics. The second law states that the entropy of a closed system increases over time — whereas the operator ensures that the temperature will decrease, violating the law. This was the Maxwell’s demon thought experiment, with the demon as a whimsical stand-in for the operator.

The paradox was made compelling by the silent assumption that the act of sorting the molecules could have no cost — i.e. that the imagined operator didn’t add energy to the system (the air in the box) but simply allowed molecules that are already in motion to pass one way and not the other. In this sense the operator acted like a valve or a one-way gate. Devices of this kind — including check valves, ratchets, and centrifugal governors — were already familiar in the 19th century. And scientists assumed that if they were scaled down to the molecular level, they’d be able to work without friction and thus separate hot and cold particles without drawing more energy to overcome that friction.

This detail is in fact the fulcrum of the paradox, and the thing that’d kept me all these years from actually understanding what the issue was. Maxwell et al. assumed that it was possible that an entity like this gate could exist: one that, without spending energy to do work (and thus increase entropy), could passively, effortlessly sort the molecules. Overall, the paradox stated that if such a sorting exercise really had no cost, the second law of thermodynamics would be violated.

The second law had been established only a few decades before Maxwell thought up this paradox. If entropy is taken to be a measure of disorder, the second law states that if a system is left to itself, heat will not spontaneously flow from cold to hot and whatever useful energy it holds will inevitably degrade into the random motion of its constituent particles. The second law is the reason why perpetual motion machines are impossible, why the engines in our cars and bikes can’t be 100% efficient, and why time flows in one specific direction (from past to future).

Yet Maxwell’s imagined operator seemed to be able to make heat flow backwards, sifting molecules so that order increases spontaneously. For many decades, this possibility challenged what physicists thought they knew about physics. While some brushed it off as a curiosity, others contended that the demon itself must expend some energy to operate the door and that this expense would restore the balance. However, Maxwell had been careful when he conceived the thought experiment: he specified that the trapdoor was small and moved without friction, so it could in principle operate in a negligible way. The real puzzle lay elsewhere.

In 1929, the Hungarian physicist Leó Szilard sharpened the problem by boiling it down to a single-particle machine. This so-called Szilard engine imagined one gas molecule in a box with a partition that could be inserted or removed. By observing on which side the molecule lay and then allowing it to push a piston, the operator could apparently extract work from a single particle at uniform temperature. Szilard showed that the key step was not the movement of the piston but the acquisition of information: knowing where the particle was. That is, Szilard reframed the paradox to be not about the molecules being sorted but about an observer making a measurement.

(Aside: Szilard was played by Máté Haumann in the 2023 film Oppenheimer.)

A (low-res) visualisation of a Szilard engine. Its simplest form has only one atom (i.e. N = 1) pushing against a piston. Credit: P. Fraundorf (CC BY-SA)

The next clue to cracking the puzzle came in the mid-20th century from the growing field of information theory. In 1961, the German-American physicist Rolf Landauer proposed a principle that connected information and entropy directly. Landauer’s principle states that while it’s possible in principle to acquire information in a reversible way — i.e. to be able to acquire it as well as lose it — erasing information from a device with memory has a non-zero thermodynamic cost that can’t be avoided. That is, the act of resetting a memory register of one bit to a standard state generates a small amount of entropy (proportional to Boltzmann’s constant multiplied by the logarithm of two).

The American information theorist Charles H. Bennett later built on Landauer’s principle and argued that Maxwell’s demon could gather information and act on it — but in order to continue indefinitely, it’d have to erase or overwrite its memory. And that this act of resetting would generate exactly the entropy needed to compensate for the apparent decrease, ultimately preserving the second law of thermodynamics.

Taken together, Maxwell’s demon was defeated not by the mechanics of the trapdoor but by the thermodynamic cost of processing information. Specifically, the decrease in entropy as a result of the molecules being sorted by their speed is compensated for by the increase in entropy due to the operator’s rewriting or erasure of information about the molecules’ speed. Thus a paradox that’d begun as a challenge to thermodynamics ended up enriching it — by showing information could be physical. It also revealed to scientists that entropy is disorder in matter and energy as well as is linked to uncertainty and information.

Over time, Maxwell’s demon also became a fount of insight across multiple branches of physics. In classical thermodynamics, for example, entropy came to represent a measure of the probabilities that the system could exist in different combinations of microscopic states. That is, the probabilities referred to the likelihood that a given set of molecules could be arranged in one way instead of another. In statistical mechanics, Maxwell’s demon gave scientists a concrete way to think about fluctuations. In any small system, random fluctuations can reduce entropy for some time in a small portion. While the demon seemed to exploit these fluctuations, the laws of probability were found to ensure that on average, entropy would increase. So the demon became a metaphor for how selection based on microscopic knowledge could alter outcomes but also why such selection can’t be performed without paying a cost.

For information theorists and computer scientists, the demon was an early symbol of the deep ties between computation and thermodynamics. Landauer’s principle showed that erasing information imposes a minimum entropy cost — an insight that matters for how computer hardware should be designed. The principle also influenced debates about reversible computing, where the goal is to design logic gates that don’t ever erase information and thus approach zero energy dissipation. In other words, Maxwell’s demon foreshadowed modern questions about how energy-efficient computing could really be.

Even beyond physics, the demon has seeped into philosophy, biology, and social thought as a symbol of control and knowledge. In biology, the resemblance between the demon and enzymes that sorts molecules has inspired metaphors about how life maintains order. In economics and social theory, the demon has been used to discuss the limits of surveillance and control. The lesson has been the same in every instance: that information is never free and that the act of using it imposes inescapable energy costs.

I’m particularly taken by the philosophy that animates the paradox. Maxwell’s demon was introduced as a way to dramatise the tension between the microscopic reversibility of physical laws and the macroscopic irreversibility encoded in the second law of thermodynamics. I found that a few questions in particular — whether the entropy increase due to the use of information is a matter of an observer’s ignorance (i.e. because the observer doesn’t know which particular microstate the system occupies at any given moment), whether information has physical significance, and whether the laws of nature really guarantee the irreversibility we observe — have become touchstones in the philosophy of physics.

In the mid-20th century, the Szilard engine became the focus of these debates because it refocused the second law from molecular dynamics to the cost of acquiring information. Later figures such as the French physicist Léon Brillouin and the Hungarian-Canadian physicist Dennis Gabor claimed that it’s impossible to measure something without spending energy. Critics however countered that these requirements stipulated the need for specific technologies that would in turn smuggle in some limitations — rather than stipulate the presence of a fundamental principle. That is to say, the debate among philosophers became whether Maxwell’s demon was prevented from breaking the second law by deep and hitherto hidden principles or by engineering challenges.

This gridlock was broken when physicists observed that even a demon-free machine must leave some physical trace of its interactions with the molecule. That is, any device that sorts particles will end up in different physical states depending on the outcome, and to complete a thermodynamic cycle those states must be reset. Here, the entropy is not due to the informational content but due to the logical structure of memory. Landauer solidified this with his principle that logically irreversible operations such as erasure carry a minimum thermodynamic cost. Bennett extended this by saying that measurements can be made reversibly but not erasure. The philosophical meaning of both these arguments is that entropy increase isn’t just about ignorance but also about parts of information processing being irreversible.

Credit: Cdd20

In the quantum domain, the philosophical puzzles became more intense. When an object is measured in quantum mechanics, it isn’t just about an observer updating the information they have about the object — the act of measuring also seems to alter the object’s quantum states. For example, in the Schrödinger’s cat thought experiment, checking whether there’s a cat in the box also causes the cat to default to one of two states: dead or alive. Quantum physicists have recreated Maxwell’s demon in new ways in order to check whether the second law continues to hold. And over the course of many experiments, they’ve concluded that indeed it does.

The second law didn’t break even when Maxwell’s demon could exploit phenomena that aren’t available in the classical domain, including quantum entanglement, superposition, and tunnelling. This was because, among others, quantum mechanics also has some restrictive rules of its own. For one, some physicists have tried to design “quantum demons” that use quantum entanglement between particles to sort them without expending energy. But these experiments have found that as soon as the demon tries to reset its memory and start again, it must erase the record of what happened before. This step destroys the advantage and the entropy cost returns. The overall result is that even a “quantum demon” gains nothing in the long run.

For another, the no-cloning theorem states that you can’t make a perfect copy of an unknown quantum state. If the demon could freely copy every quantum particle it measured, it could retain flawless records while still resetting its memory, this avoiding the usual entropy cost. The theorem blocks this strategy by forbidding perfect duplication, ensuring that information can’t be ‘multiplied’ without limit. Similarly, the principle of unitarity implies that a system will always evolve in a way that preserves overall probabilities. As a result, quantum phenomena can’t selectively amplify certain outcomes while discarding others. For the demon, this means it can’t secretly limit the range of possible states the system can occupy into a smaller set where the system has lower entropy, because unitarity guarantees that the full spread of possibilities is preserved across time.

All these rules together prevent the demon from multiplying or rearranging quantum states in a way that would allow it to beat the second law.

Then again, these ‘blocks’ that prevent Maxwell’s demon from breaking the second law of thermodynamics in the quantum realm raise a puzzle of their own: is the second law of thermodynamics guaranteed no matter how we interpret quantum mechanics? ‘Interpreting quantum mechanics’ means to interpret what the rules of quantum mechanics say about reality, a topic I covered at length in a recent post. Some interpretations say that when we measure a quantum system, its wavefunction “collapses” to a definite outcome. Others say collapse never happens and that measurement is just entangled with the environment, a process called decoherence. The Maxwell’s demon thought experiment thus forces the question: is the second law of thermodynamics safe in a particular interpretation of quantum mechanics or in all interpretations?

Credit: Amy Young/Unsplash

Landauer’s idea, that erasing information always carries a cost, also applies to quantum information. Even if Maxwell’s demon used qubits instead of bits, it won’t be able to escape the fact that to reuse its memory, it must erase the record, which will generate heat. But then the question becomes more subtle in quantum systems because qubits can be entangled with each other, and their delicate coherence — the special quantum link between quantum states — can be lost when information is processed. This means scientists need to carefully separate two different ideas of entropy: one based on what we as observers don’t know (our ignorance) and another based on what the quantum system itself has physically lost (by losing coherence).

The lesson is that the second law of thermodynamics doesn’t just guard the flow of energy. In the quantum realm it also governs the flow of information. Entropy increases not only because we lose track of details but also because the very act of erasing and resetting information, whether classical or quantum, forces a cost that no demon can avoid.

Then again, some philosophers and physicists have resisted the move to information altogether, arguing that ordinary statistical mechanics suffices to resolve the paradox. They’ve argued that any device designed to exploit fluctuations will be subject to its own fluctuations, and thus in aggregate no violation will have occurred. In this view, the second law is self-sufficient and doesn’t need the language of information, memory or knowledge to justify itself. This line of thought is attractive to those wary of anthropomorphising physics even if it also risks trivialising the demon. After all, the demon was designed to expose the gap between microscopic reversibility and macroscopic irreversibility, and simply declaring that “the averages work out” seems to bypass the conceptual tension.

Thus, the philosophical significance of Maxwell’s demon is that it forces us to clarify the nature of entropy and the second law. Is entropy tied to our knowledge/ignorance of microstates, or is it ontic, tied to the irreversibility of information processing and computation? If Landauer is right, handling information and conserving energy are ‘equally’ fundamental physical concepts. If the statistical purists are right, on the other hand, then information adds nothing to the physics and the demon was never a serious challenge. Quantum theory can further stir both pots by suggesting that entropy is closely linked to the act of measurement, of quantum entanglement, and how quantum systems ‘collapse’ to classical ones by the process of decoherence. The demon debate therefore tests whether information is a physically primitive entity or a knowledge-based tool. Either way, however, Maxwell’s demon endures as a parable.

Ultimately, what makes Maxwell’s demon a gift that keeps giving is that it works on several levels. On the surface it’s a riddle about sorting molecules between two chambers. Dig a little deeper and it becomes a probe into the meaning of entropy. If you dig even further, it seems to be a bridge between matter and information. As the Schrödinger’s cat thought experiment dramatised the oddness of quantum superposition, Maxwell’s demon dramatised the subtleties of thermodynamics by invoking a fantastical entity. And while Schrödinger’s cat forces us to ask what it means for a macroscopic system to be in two states at once, Maxwell’s demon forces us to ask what it means to know something about a system and whether that knowledge can be used without consequence.

2025.09.19

Tag: unitarity

What does a quantum Bayes’s rule look like?

Dispelling Maxwell’s demon