Roko’s Basilisk

Warning: this post is laden with jargon.  Start with this story as a soft introduction to the convoluted way of thinking, then read this article to get up to speed.  Read this last.

A strange but fascinating thought experiment has emerged from the site Less Wrong some time ago, called the Roko’s Basilisk. David Auerbach called it the “Most Terrifying Thought Experiment of All Time” in Slate. In essence, it is a secular version of Pascal’s Wager, with a twist of “The Game” mixed in (for those that do not wish to be exposed to a mental virus, do NOT click on the link forThe Game”).

I will not repeat the article here, as David does a great job of explaining the thought experiment, along with some introductory concepts on Timeless Decision Theory and acausal trade (in narrative form here).  Apparently some Less Wrong users suffered enough mental anguish that it was banned outright by the site founder Elizier Yudkowsky, calling it an “infohazard”, an action he later regretted because of the Streisand Effect.

The main problem with Roko’s Basilisk, is that it seems to be specifically tailored for those who have adopted (and applied blindly) certain thinking tools. The foundation for these tools to work is built on layers upon layers of speculative, conjunctive reasoning. Jumping out of the system (JOOTSing per Dennet) and examining the premises, it becomes obvious that it is a shaky house of cards. Ironically, the inability to grasp the miniscule probability of these interdependent, chained assumptions, is an example of the scope insensitivity that Less Wrong strives to address, and is similar to Pascal’s Mugging. I prefer Alexander Kruel’s analysis (archived) to the one on Rationalwiki.

Kruel describes it quite nicely as follows (archived article is here):

A textbook example of what is wrong with New Rationalism is Roko’s basilisk. It relies on several speculative ideas, each of which is itself speculative. Below is an incomplete breakdown.

Initial hypothesis 1 (Level 1): The human brain can be efficiently emulated on a digital computer.

Initial hypothesis 2 (Level 1): There exists, or will exist, a superhuman intelligence.

Initial hypothesis 3 (Level 1): The expected utility hypothesis is correct. Humans either are, or should become expected utility maximizers. And it is practically feasible for humans to maximize expected utility.

Initial hypothesis 4 (Level 1): Humans should care about what happens to copies of them, even if it occurs in a time or universe totally disconnected from this one.

Dependent hypothesis 1 (Level 2): At least one superhuman intelligence will deduce and adopt timeless decision theory, or a similar decision theory.

Dependent hypothesis 2 (Level 3): Agents who are causally separated can cooperate by simulating each other (Acausal trade).

Dependent hypothesis 3 (Level 4): A human being can meaningfully model a superintelligence in their brain.

Dependent hypothesis 4 (Level 5): At least one superhuman intelligence will want to acausally trade with human beings.

Dependent hypothesis 5 (Level 6): At least one superhuman intelligence will be able to obtain a copy of you that is good enough to draw action relevant conclusions about acausal deals.

Dependent hypothesis 6 (Level 7): People will build an evil god-emperor because the evil god-emperor will punish anyone who doesn’t help build it, but only if they read this sentence (Roko’s basilisk).

Final hypothesis (Level 8): The expected disutility of level 8 is large enough that it is rational to avoid learning about Roko’s basilisk.

Note how all of the initial hypotheses, although accepted by New Rationalists, are somewhat speculative and not established facts. The initial hypotheses are however all valid. The problem starts when they begin making dependent hypotheses that rely on a number of unestablished initial hypotheses. The problem gets worse when the dependencies become even more fragile when further conclusions are drawn based on hypotheses that are already N levels removed from established facts. But the biggest problem is that eventually action relevant conclusions are drawn and acted upon.

The problem is that logical implications can reach out indefinitely. The problem is that humans are spectacularly bad at making such inferences. Which is why the amount of empirical evidence required to accept a belief should be proportional to its distance from established facts.

It is much more probable that we’re going make everything worse, or waste our time, than that we’re actually maximizing expected utility when trying to act based on conjunctive, non-evidence-backed speculations. Since such speculations are not only improbable, but very likely based on fallacious reasoning.

As computationally bounded agents we are forced to restrict ourselves to empirical evidence and falsifiable hypotheses. We need to discount certain obscure low probability hypotheses. Otherwise we will fall prey to our own shortcomings and inability to discern fantasy from reality.

Less Wrong seems to attract an audience who likely possess above average intelligence and reasoning abilities, as reflected in the comments on the site. It is a reminder of how intelligence does not preclude one from falling victim to mental traps, and the toughest trap to escape from is of the self-dug variety. And although more nuanced and subtle, the God like nature of the thought experiment suggests that the vestigial superstition can rear its ugly head, even in groups one would least expect.

Strangely enough, even though Roko’s Basilisk is expressly forbidden at Less Wrong, a similar version of this thought experiment popped up and was discussed extensively here. Suppose that someone developed extremely advanced artificial intelligence, and it is “locked in a box”, with you as the “gatekeeper” (details and protocol here). The AI and you converse through a text terminal only, and you can decide on your own free will, whether to let the AI out of the box and into the wild – the equivalent of opening Pandora’s Box. I admit that the AI makes a very compelling argument, if it ever comes to that point. It sounds like a solution in search of a problem, or a dilemma constructed for dilemma’s sake. I consider it the futurist’s version of the philosophical Giant Robot with a Swampman twist (Dennet, Intuition Pumps and Other Tools for Thinking); philosophically interesting but of limited use in real life.

Thinking tools and thought experiments are useful, but can go awry and taken too far if not applied carefully.  Knowing folk psychology (ToM) does not mean that it is a good idea to go full meta and negotiate with yourself.  Some common sense can often keep oneself from burrowing into a philosophical existential depression.  After all, our brains and cognitive capacity are limited (Bounded Rationality), and the mere ability to imagine something does not make it any closer to reality than science fiction.