Alcove

Science, culture, complexity

Tag: Brian Wansink

Why do we trust scientists?

A friend of mine had recently been asked to consider the possibility that facts can change. Since she brought her thoughts to me, I’ve been thinking about the different ways in which that’s possible. For one, there’s reality and then there’s our knowledge of reality; the two needn’t be coincident. While a statement like ‘facts are facts’ could mean that reality doesn’t change, what we know about reality can still change. For example, our methods to acquire information about reality may have been flawed before and are less flawed now, so what we know about reality, i.e. our facts, change.

Some items in the political sphere are institutional or conventional: they count as facts only by social consensus. Such facts can lose their identity as such if they lose that consensus. Some examples include money, laws, cultural norms like red light means stop, and — closer to science — the decision to use the p-values as a meaningful statistical threshold; consensus among scientists as to how to define different units of measurement; and the convention of dividing time into years, months, and days.

A third interpretation is that misinformation or disinformation can get in the way of a person understanding which information is factual and which isn’t. The important thing here is there’s still a constituency of people — the scientists — for whom some information is factual even if for a different constituency that piece of information is not factual.

In the first interpretation, dispute arises within the expert community as evidence changes; in the second, dispute concerns a socially instituted status across society (including experts). That is, in the first case, scientists’ own knowledge of that information can be updated, sometimes drastically. In the second case, society (including scientists) may disagree over whether a convention should count as a fact for coordination. In the third case, expert consensus holds that the information is factual but segments of the publics reject it.

These three possibilities leave behind a practical question: when we (non-experts) can’t settle a question of factuality by ourselves — because the evidence is evolving, we can’t agree on conventions or because not all people accept it equally — on what grounds can we justifiably defer to experts? That is, what makes deference rational?

Epistemic dependence names a basic fact of modern life: for most of what we claim to know, we rely on other people’s testimony rather than our own inspection of the evidence. John Hardwig (the American philosopher famed for the “dying art” argument) has contended that this dependence isn’t a defect but a rational and defensible strategy in complex societies. For instance, individuals can’t master the mathematics of cryptography, the molecular biology of vaccines, the econometrics of inflation, and the engineering of bridges — yet they still trust these fields of science and the suggestions of their exponents in order to make their own decisions.

Their challenge isn’t to acquire this scientific knowledge (which is often impossible) but to develop reliable ways to distinguish trustworthy from untrustworthy sources of scientific wisdom and to design institutions that make accurate testimony likely and deception expensive. In short, Hardwig’s point is that epistemic responsibility typically involves, rather than rejects, responsible deference to scientists, with the ‘responsibility’ reinforced by the scientists’ track records, incentive structures, and the error-correcting mechanisms operating in the context of their work.

The German sociologist Max Weber’s typology of authority is relevant here because it helps structure deference. Weber drew lines between traditional authority, charismatic authority, and rational-legal authority. The authority of science aspires to the third because it’s less grounded in who speaks and more in the procedures by which statements are vetted. For instance, a research finding that survives peer review, replication attempts, and other forms of critical scrutiny post-publication bears an impersonal authority — one that doesn’t demand allegiance to a particular leader or a lineage.

This rational-legal form also defines how sanctions in science work. Retractions, loss of funding, and reputational damage follow codified rules and shared expectations of disclosure and transparency rather than serve as conduits to express the wrath of a sovereign. The non-expert’s deference to scientific claims is thus a portable deference to procedures that the non-expert believes correspond to the truth rather than just the social prestige of scientists. The flip side is that the non-expert must endeavour constantly to maintain these procedures.

Further, when scientific procedures are politicised or when charismatic or traditional authorities claim jurisdiction over empirical questions, the basis for deference goes away. That is to say, appeals to ‘trust science’ work only to the extent that the rational-legal authority remains credible.

The sociology of expertise has refined these observations by describing how expertise is distributed and recognised. In particular, the philosophers Harry Collins and Robert Evans have distinguished between contributory expertise and interactional expertise. Contributory experts can produce and evaluate new knowledge within a field; they’re called so because their competence is a function of their ability to contribute meaningfully to research. Interactional experts can’t contribute original work but they can speak the language of the field fluently enough to engage credibly with contributory experts.

Policymakers, journalists, and ethicists embedded in laboratories often need this interactional fluency to translate findings across domains and to interrogate claims without performing the experiments themselves. This distinction helps separate legitimate from irrational deference. A well-equipped non-expert or policymaker still can’t adjudicate between competing models in climate dynamics, say, but an interactional expert should be able to parse which disagreements are barely signals (rather than noise) and which are symptoms of deeper methodological divides.

(Aside: The idea isn’t unlike Bora Zivkovic’s concept of journalists as “temporary experts” because the topics they’re conversant with in the interactional sense can be transient, from anthropology this week to zoology the next. But for the purpose of this post, this nuance is redundant.)

Further, peer review, gatekeeping, and credentialing don’t only protect quality but also control who’s inside the conversation and who isn’t. These practices can devolve into exclusion and conservatism but they’re also useful to guard against diluting standards. In their paper, Collins and Evans proposed that the legitimacy of expert advice in public matters depends on both the technical adequacy of contributory experts and the social processes that connect them to decision-makers and the affected publics. And deference is both rational and democratic when those processes are transparent, include mechanisms for non-experts to challenge experts, and acknowledge uncertainty.

Robert Merton’s widely cited norms of communalism, universalism, disinterestedness, and organised scepticism underpin these arrangements. Communalism holds that scientific knowledge is a common resource and that results should be shared, methods disclosed, data made available, etc. Universalism requires claims to be evaluated by impersonal criteria and independent of the claimant’s identity or status. Disinterestedness expects scientists to subordinate their personal or financial incentives to the pursuit of truth and declare conflicts and design protections against bias. Organised scepticism institutionalises doubt in the form of peer review, replication studies, and methodological criticism.

Together, these Mertonian norms offer a sort of moral economy for the production of reliable beliefs — but the issue is reality is almost always more messy. Empirical studies often reveal ‘counter-norms’ and tensions while competition for grants and prestige can incentivise scientists to chase hype (e.g. Brian Keating), salami-slice their results (e.g. Brian Wansink) or resort to p-hacking (e.g. Francesca Gino). Commercialisation and intellectual property regimes can restrain communalism. Social hierarchies can undermine universalism through the Matthew effect, where credit accrues to already eminent scientists. People can be insufficiently sceptical of research findings when they align with dominant paradigms or market interests.

The replication crisis in parts of psychology and biomedicine also revealed how structural incentives could produce a research literature high in statistical significance but low in reliability. Yet the very diagnosis of a replication crisis also illustrates the self-correcting aspiration of the Mertonian norms: attempts at reform in the form of registered reports, data-sharing mandates, stricter statistical thresholds, and post-publication review are simply forms of organised scepticism turned inward on itself. The point isn’t that Merton’s norms are fully realised but that they set expectations against which research practice can be judged and corrected.

Taken together, epistemic dependence is unavoidable — and perhaps desirable. Authority rooted in rational-legal procedures can channel that dependence through institutions explicitly designed to reward truth and punish errors. In parallel, the sociology of expertise explains how technical competence is recognised, translated, and connected to publics while the Mertonian norms articulate the moral constraints that make the whole arrangement credible.

When this system in toto functions well, non-experts don’t need to track every inference in a paper to hold a justified belief: it’s enough that they trust a claim has been produced in conditions that make accuracy more likely than not and that there are durable pathways for them to detect and fix mistakes. Likewise, when the system falters because incentives have become misaligned, boundaries have hardened into dogma or norms are being honoured in the breach, deference ceases to be rational and starts to resemble a more reductive allegiance.

To be clear, punishing errors isn’t the essence of scientific credibility so much as transparency in the face of organised criticism. Sanctions against scientists are important to uphold incentives for them to pay attention, conduct replication studies, and disclose their methods and data — but punishment without openness can quickly become arbitrary. Second, rational deference is compatible with democratic debates about how expertise is mobilised in policy. A technically sound result can still be challenged on the grounds of its values and trade-offs.

In practice, then, the non-expert’s trust is best anchored not in claims about the moral virtue of scientists or assertions that “science says something” but in the visibility of institutions that embody Mertonian norms, the availability of interactional experts who can translate and interrogate scientific knowledge, and the continuity of disciplinary mechanisms that correct errors in public view.

Axiomatically, deference to any “alternative system” of knowledge is indefensible when it asks for authority without submitting to the same procedures that justify deference to science. The problem isn’t the origin of a claim but how tests of its reliability are governed. When the so-called “Indian knowledge system” is advanced as an epistemic substitute, for instance, it grounds authority in identity, heritage, and scriptural precedence — all bases that don’t instantiate the mechanisms that make testimony trustworthy in complex domains, including public methods, reproducible tests, data disclosure, independent scrutiny, and routine exposure to organised criticism.

Scientific authority is portable because its procedures are impersonal, i.e. a result is credible irrespective of who produced it, provided it survives scrutiny. Alternative systems invert this logic by privileging who speaks — the text, the lineage, the nation — over how claims are vetted. This inversion erodes Mertonian communalism by restricting access to methods or sources to insider circles and blunts organised scepticism by classifying critical appraisal as disloyalty. Once criticism becomes pathologised in this way, incentives to detect and report error fade and testimony ceases to be a rational basis for belief.

2025.11.09
A gentle push over the cliff

From ‘Rotavirus vaccine: tortured data analyses raise false safety alarm’, The Hindu, June 22, 2024:

Slamming the recently published paper by Dr. Jacob Puliyel from the International Institute of Health Management Research, New Delhi, on rotavirus vaccine safety, microbiologist Dr. Gagandeep Kang says: “If you do 20 different analyses, one of them will appear significant. This is truly cherry picking data, cherry picking analysis, changing the data around, adjusting the data, not using the whole data in order to find something [that shows the vaccine is not safe].” Dr. Kang was the principal investigator of the rotavirus vaccine trials and the corresponding author of the 2020 paper in The New England Journal of Medicine, the data of which was used by Dr. Puliyel for his reanalysis.

This is an important rebuttal. I haven’t seen Puliyel’s study but Bharat Biotech’s conduct during and since the COVID-19 pandemic, especially that of its executive chairman Krishna Ella, plus its attitude towards public scrutiny of its Covaxin vaccine has rendered any criticism of the company or its products very believable, even if such criticism is unwarranted, misguided, or just nonsense.

Puliyel’s study itself is a case in point: a quick search on Twitter reveals many strongly worded tweets, speaking to the availability of a mass of people that wants something to be true, and at the first appearance of even feeble evidence will seize on it. Of course The Hindu article found the evidence to not be feeble so much as contrived. Bharat Biotech isn’t “hiding” anything; Puliyel et al. aren’t “whistleblowers”.

The article doesn’t mention the name of the journal that published Puliyel’s paper: International Journal of Risk and Safety in Medicine. It could have because journals that don’t keep against bad science out of the medical literature don’t just pollute the literature. By virtue of being journals, and in this case claiming to be peer-reviewed as well, they allow the claims they publish to be amplified by unsuspecting users on social media platforms.

We saw something similar earlier this year in the political sphere when members of the Indian National Congress party and its allies as well as members of civil society cast doubt on electronic voting machines with little evidence, thus only undermining trust in the electoral process.

To be sure, we’ve cried ourselves hoarse about the importance of every reader being sceptical about what appears in scientific journals (even peer-reviewed) as much as news articles, but because it’s a behavioural and cultural change it’s going to take time. Journals need to do their bit, too, yet they won’t because who needs scruples when you can have profits?

The analytical methods Puliyel and his coauthor Brian Hooker reportedly employed in their new study is reminiscent of the work of Brian Wansink, who resigned from Cornell University five years ago this month after it concluded he’d committed scientific misconduct. In 2018, BuzzFeed published a deep-dive by Stephanie M. Lee on how the Wansink scandal was born. It gave the (well-referenced) impression that the scandal was a combination of a student’s relationship with a mentor renowned in her field of work and the mentor’s pursuit of headlines over science done properly. It’s hard to imagine Puliyel and Hooker were facing any kind of coercion, which leaves the headlines.

This isn’t hard to believe considering it’s the second study to have been published recently that took a shot at Bharat Biotech based on shoddy research. It sucks that it’s become so easy to push people over the cliff, and into the ravenous maw of a conspiracy theory, but it sucks more that some people will push others even when they know better.

2024.06.24
The not-so-obvious obvious

If your job requires you to pore through a dozen or two scientific papers every month – as mine does – you’ll start to notice a few every now and then couching a somewhat well-known fact in study-speak. I don’t mean scientific-speak, largely because there’s nothing wrong about trying to understand natural phenomena in the formalised language of science. However, there seems to be something iffy – often with humorous effect – about a statement like the following: “cutting emissions of ozone-forming gases offers a ‘unique opportunity’ to create a ‘natural climate solution’”¹ (source). Well… d’uh. This is study-speak – to rephrase mostly self-evident knowledge or truisms in unnecessarily formalised language, not infrequently in the style employed in research papers, without adding any new information but often including an element of doubt when there is likely to be none.

1. Caveat: These words were copied from a press release, so this could have been a case of the person composing the release being unaware of the study’s real significance. However, the words within single-quotes are copied from the corresponding paper itself. And this said, there have been some truly hilarious efforts to make sense of the obvious. For examples, consider many of the winners of the Ig Nobel Prizes.

Of course, it always pays to be cautious, but where do you draw the line before a scientific result is simply one because it is required to initiate a new course of action? For example, the Univ. of Exeter study, the press release accompanying which discussed the effect of “ozone-forming gases” on the climate, recommends cutting emissions of substances that combine in the lower atmosphere to form ozone, a compound form of oxygen that is harmful to both humans and plants. But this is as non-“unique” an idea as the corresponding solution that arises (of letting plants live better) is “natural”.

However, it’s possible the study’s authors needed to quantify these emissions to understand the extent to which ambient ozone concentration interferes with our climatic goals, and to use their data to inform the design and implementation of corresponding interventions. Such outcomes aren’t always obvious but they are there – often because the necessarily incremental nature of most scientific research can cut both ways. The pursuit of the obvious isn’t always as straightforward as one might believe.

The Univ. of Exeter group may have accumulated sufficient and sufficiently significant evidence to support their conclusion, allowing themselves as well as others to build towards newer, and hopefully more novel, ideas. A ladder must have rungs at the bottom irrespective of how tall it is. But when the incremental sword cuts the other way, often due to perverse incentives that require scientists to publish as many papers as possible to secure professional success, things can get pretty nasty.

For example, the Cornell University consumer behaviour researcher Brian Wansink was known to advise his students to “slice” the data obtained from a few experiments in as many different ways as possible in search of interesting patterns. Many of the papers he published were later found to contain numerous irreproducible conclusions – i.e. Wansink had searched so hard for patterns that he’d found quite a few even when they really weren’t there. As the British economist Ronald Coase said, “If you torture the data long enough, it will confess to anything.”

The dark side of incremental research, and the virtue of incremental research done right, stems from the fact that it’s non-evidently difficult to ascertain the truth of a finding when the strength of the finding is expected to be so small that it really tests the notion of significance or so large – or so pronounced – that it transcends intuitive comprehension.

For an example of the former, among particle physicists, a result qualifies as ‘fact’ if the chances of it being a fluke are 1 in 3.5 million. So the Large Hadron Collider (LHC), which was built to discover the Higgs boson, had to have performed at least 3.5 million proton-proton collisions capable of producing a Higgs boson and which its detectors could observe and which its computers could analyse to attain this significance.

But while protons are available abundantly and the LHC can theoretically perform 645.8 trillion collisions per second, imagine undertaking an experiment that requires human participants to perform actions according to certain protocols. It’s never going to be possible to enrol billions of them for millions of hours to arrive at a rock-solid result. In such cases, researchers design experiments based on very specific questions, and such that the experimental protocols suppress, or even eliminate, interference, sources of doubt and confounding variables, and accentuate the effects of whatever action, decision or influence is being evaluated.

Such experiments often also require the use of sophisticated – but nonetheless well-understood – statistical methods to further eliminate the effects of undesirable phenomena from the data and, to the extent possible, leave behind information of good-enough quality to support or reject the hypotheses. In the course of navigating this winding path from observation to discovery, researchers are susceptible to, say, misapplying a technique, overlooking a confounder or – like Wansink – overanalysing the data so much that a weak effect masquerades as a strong one but only because it’s been submerged in a sea of even weaker effects.

Similar problems arise in experiments that require the use of models based on very large datasets, where researchers need to determine the relative contribution of each of thousands of causes on a given effect. The Univ. of Exeter study that determined ozone concentration in the lower atmosphere due to surface sources of different gases contains an example. The authors write in their paper (emphasis added):

We have provided the first assessment of the quantitative benefits to global and regional land ecosystem health from halving air pollutant emissions in the major source sectors. … Future large-scale changes in land cover [such as] conversion of forests to crops and/or afforestation, would alter the results. While we provide an evaluation of uncertainty based on the low and high ozone sensitivity parameters, there are several other uncertainties in the ozone damage model when applied at large-scale. More observations across a wider range of ozone concentrations and plant species are needed to improve the robustness of the results.

In effect, their data could be modified in future to reflect new information and/or methods, but in the meantime, and far from being a silly attempt at translating a claim into jargon-laden language, the study eliminates doubt to the extent possible with existing data and modelling techniques to ascertain something. And even in cases where this something is well known or already well understood, the validation of its existence could also serve to validate the methods the researchers employed to (re)discover it and – as mentioned before – generate data that is more likely to motivate political action than, say, demands from non-experts.

In fact, the American mathematician Marc Abrahams, known much more for founding and awarding the Ig Nobel Prizes, identified this purpose of research as one of three possible reasons why people might try to “quantify the obvious” (source). The other two are being unaware of the obvious and, of course, to disprove the obvious.

2020.01.31