Haha, thanks for giving me something to do this afternoon. There's very few things I enjoy more than arguing about decision theory.
A few thoughts:
I agree that the Bomb example is tough - I think FDT's framing that you retroactively change the past is genuinely bad, and the correct way of thinking about it is a lot deeper. (I have an unfinished draft about my own idiosyncratic framing, if you're interested I can send it to you).
But I think in criticisms of FDT, including this one, there is usually a missing mood. There's a mental move very important to intellectual progress of "Okay, I can't get on board with this as it currently stands, but there are some very interesting ideas here that seem important". I think this is clearly true about FDT / LW decision theories! So you should say it, and not call it "devoid of genuine content"! :P
(to be clear, I personally am not doing that mental move since I *am* on board with LW decision theory, but I understand why someone might not be due to the counterintuitiveness)
Also, I disagree that there's "no fact about whether two algorithms are the same". There's a whole literature about "multiple realizability" (often in the context of functionalism in philosophy of mind) that I don't know why MacAskill and you don't mention. Last time I tried to look into this, my personal takeaway was the opposite - that it *does* seem like there plausibly is an in-principle approach for determining whether two physical systems implement the same algorithm: Chalmers' CSA approach. (The concrete response to the calculator example would just be to say that the + and - version of the calculator are the same algorithm with different labels.)
I reject that it's anywhere close to the truth. Maybe the right view is something between CDT and EDT, but beyond that, I don't think FDT is particularly near being correct. Re two algorithms being the same, sure but then you get the problem we talk about where you might just change the economy.
No, because I highly doubt that an algorithm similar to my brain is implemented anywhere in the economy in a consequential way. There's no reason to think this.
I think it's irrational to pay in PH, but you want at the earlier time to bind yourself to do the irrational thing. So it depends on whether you can get yourself to be irrational later.
You should ask an LLM about Chalmers' CSA thing, the bar for two combinatorial state automata to be the same is very high. I found it really insightful. You'd basically need a full simulation of a human brain in the economy somehow, and if you somehow had this, I'm willing to bite the bullet that you can acausally change (to a slight degree) the workings of the economy.
But concretely, if you personally were in the situation, in front of the ATM, would you pay?
I'm not sure, it would just depend on empirical facts about my psychology.
If there is some system of inputs and outputs that corresponds to me defecting in the economy, then it wouldn't seem like my defecting would make any difference to it.
Wait, why do you agree that Bomb is tough? It's basically conditioning on an impossibility--it is free in real life to choose left.
(I have bitten the bullet on retroactively changing the past; I think this is in fact how you have to reason when you exist around other agents, who are reasoning about whether or not you can be convinced. If your partner believes that you will forgive cheating because "it happened in the past and there's no changing that now" then you get cheated on more than if you reason "by having a hard line here, I will have made it less likely that this happened to me.")
How does the age of the universe compare to a trillion trillion seconds?
This question might seem flippant or irrelevant, but I think it's actually pretty important to have a good sense of scale. Like, a lot of being good at decisions hinges on using numbers to mean things, and if you don't believe numbers mean things, you're going to have some incorrect positions.
Suppose rather than facing Bomb!, God flips a coin when creating the universe. If heads, the bomb is live every time; if tails, the bomb is a dud every time. Some agent faces this problem every second for the whole duration of the universe. Could you tell which way the coin landed, just from looking at the outcomes of the decisions?
[Note that I am assuming the 1/trillion trillion chance is real, rather than it secretly being a 1/1 chance that the predictor makes an error.]
Sorry, is English your native language? (The word "basically" is often used to signify rounding--there is a difference between 1 and 0.999999999999999999999999 but the difference is 0.000000000000000000000001, which is a scale which is often discarded, because many concerns will be more important and it's impractical to consider all of them.)
Hey Vaniver! :) I like a lot of your work and online presence.
Yeah I used to bite the bullet of retrocausality too, until I realized that there's a better framing that preserves the upside and doesn't have the obvious "incorrectness" of thinking we can change the past: Just say that alternate realities "exist" in some way, and that you might be in a small pocket reality that changes the expected outcomes in the main reality, and that you also care about the you in the main reality. (I believe this is an updateless EDT / UDT thing, although I'm not sure)
Thanks! I think this are either 1) equivalent formulations, and so the question is just what seems weirder to you (which I don't expect to be objective), or 2) we should think carefully about the cases where they differ.
I think there are objective reasons to prefer the alternate reality framing - but also, I do think it changes things because it means the world is much larger, e.g. if we are in something like Tegmark IV. So e.g. ECL becomes more important, the importance of infinite ethics increases, etc.
You might say that alternate realities are also counterintuitive, but I think in some important ways they are not - I have a draft about this I could send you tomorrow, would be curious what you think.
My understanding is that in the causal graph implementation in the paper, you are intervening on nodes that are allowed to be temporally prior to your own action. I'd characterize that as "changing the past".
It seems like your example hinges upon the FDT agent picking Left.
But it also says that the predictor with a one in a trillion trillion error rate predicted you would pick Right.
If all of this is correct, it seems like you're hinging your example on this being the one time in a trillion trillion when the predictor was wrong.
But decision theories shouldn't be judged on whether they work well in unbelievably rare edge cases that you would never encounter in a million lifetimes.
Compare Lottery Decision Theory:
> you have the choice to buy a lottery ticket for $100. There is a one in a trillion trillion chance you will win. If you win you get $1 million. Should you do it? Before answering, keep in mind that, unbeknownst to you, this is the one time you would actually win.
We can use this example to prove that you should definitely play the lottery. I think the bomb situation maps to this - although it fails in the one/trillion trillion case where the superpredictor was wrong, it succeeds in the 9999999999.../trillion trillion cases where the superpredictor is right, and when you multiply out the probabilities by the utilities (eg of getting any extra $100 vs. getting bombed), FDT gets you more utility overall.
The particular way FDT succeeds is that you never (okay, 1/trillion trillion times, but this rounds to never) find yourself in this situation. So just by asking about this situation, you've already started with the assumption that FDT fails, which is why you are so easily able to prove that FDT fails.
I think this goes back to what I said last time we discussed this. You and Eliezer are optimizing for different things. You are optimizing for never finding yourself in a situation where you have to do something silly within that situation; he is optimizing for having the most utility overall if you can set your algorithm. I think the thing he's optimizing for makes more sense.
Yes you are depending on this being the one time in a trillion trillion when the predictor was wrong. Decision theories are judged by whether they give the right answer across the board. So if they get the wrong result even in weird edge cases, they're disproved. But the wrong result isn't the same as any case where adopting the situation is bad for you. So what matters is not "if they work well" but if they give the right answers. Compare: if there's a single result where utilitarianism gives the wrong answer to what you should do, that disproves it. This doesn't mean that any scenario where being a utilitarian turns out badly is a counterexample.
Decision theories are theories of rational action. That's different from what dispositions you want to have. If you're in a world where you encounter Newcomb's problems all the time, then you want have one-boxing dispositions. But that's different from a theory of rationality.
Also, just want to register, I think the bomby problems are actually much less big of problems than the "FDT doesn't say anything in any case" problems. I find the discourse has focused more on cases where FDT is counterintuitive and has weirdly neglected the fact that it's totally ill-formed.
It's fine for theories to give the wrong answer in a probabilistic sense, i.e., they recommend some action under uncertainty and you end up getting unlucky and losing out. I think what you want to say with the bomb case is that it gives the wrong answer under certainty.
Looking through that thread, the FDT people might want to say there really is uncertainty in the bomb case - namely, which person you are among the real you and your simulations. But I don't feel the introducing simulation aspect is actually necessary to the example.
> Decision theories are judged by whether they give the right answer across the board.
In a world of uncertainty, it is generally not the case that you can win in every possible outcome. The question is whether you can get the best outcome in the aggregate.
> So if they get the wrong result even in weird edge cases, they're disproved.
Guaranteed Payoffs gets the wrong answer on whether or not you should take a bet on a fair coin which pays out $2 if you win and $1 if you lose.
> Also, just want to register, I think the bomby problems are actually much less big of problems than the "FDT doesn't say anything in any case" problems. I find the discourse has focused more on cases where FDT is counterintuitive and has weirdly neglected the fact that it's totally ill-formed.
Have you seen any decision problems where a FDT proponent has not been able to give the FDT answer?
Like, I think we can exhibit problems where it's challenging, or requires empirical estimation. ("How correlated is my decision to vote with other voters in my district?") But I think this is a challenge with the world, not with the decision theory, and generally it's easy to come up with the rule. ("If my influence on whether other voters similar to me vote is at least 0.02, then it makes sense for me to vote.")
I believe the bomb example doesn't work (or at least not as stated). As stated, the note is irrelevant and uninformative about the bomb, because it has to be presented to the simulation before the simulation makes a decision, and then the exact same note will have to presented to you, whatever your simulation does.
It's not clear that there is an example where a) the note is accurate, b) the error rate is as stated, and c) the FDT agent is forced Left in this case.
The Bomb example stipulates that you know it, so I'm not sure whether this Lottery analogy is apt, because "always play the Lottery when you know you will win" seems reasonable. (That being said, given the probabilities involved, when you read this "predictor's note", you should think that it is almost certainly a fake/prank/illusion, though maybe additional stipulations could fix that)
I agree that the focus on extremely-rare-outcomes seems unfair, though (at least, in the framework of what makes sense to me as an optimisation target)
Note the similarity to transparent Newcomb's Problem--the boxes are transparent, so you know whether or not Omega predicted you would one-box or two-box. Even if you see the box full, and so know that Omega has already predicted you would one-box, I argue that it is important to one-box because that increases the probability you are in this (desirable!) scenario.
The thing that is strange about the lottery case is that discovering that you bought a lottery ticket is normally an undesirable scenario--the money was, in expectation, wasted--but it just so happens that you also won. If you think the lottery was rigged--your friend who works at the lottery commission mailed you the winning tickets--then you might think that actually this is a desirable scenario and it does make sense to play the rigged lottery. But that's not usual!
I fundamentally agree with the criticism about the mathematical impossible world and the vibe but I think there is an even deeper issue here.
At a fundamental level what are we even doing when we adopt a deciscion theory or say you should or should not do something? I mean in a fully literal sense you don't get to make decisions. You will always just follow the laws of physics.
What we are doing is adopting some kind of idealization about the world which -- just like when we define a game formally in mathematics -- idealizes certain things as choices while others are held fixed. And idealizing something as a choice is exactly to treat everything 'before' the choice as unable to depend on the outcome of the choice.
And once you understand things that way the whole Newcomb setup is just kinda non-sensical as regards deciscion theory. It's saying: what if you had a situation where it doesn't make sense to idealize what you are doing using a framework that treats it as a free choice how would you idealize it as a free choice.
The right answer is obviously: don't idealize it as a choice at all. Depending on how you describe the problem you can imagine idealizing in a way where the choice is what rule you precommit to or something like that but the whole debate about FDT or CDT or what have you is just fundamentally confused.
---
I mean just to illustrate how silly the argument is, what if I said the right answer to the paradox was: be someone who is physically guaranteed to take 1 box (so demon predicts you will take 1) and then take both. That does land you in a better position but it's kinda silly because I'm just breaking the rules of the game. Same with treating something both as predictable and a choice.
The rules of the deciscion theory idealization is that you have sometree and each node of the tree represents a choice with earlier ones being able to depend on later ones. If you want to look at scenarios like the demon case you need to reidealize it in a way that obeys those rules -- like a choice between rules.
Totally reasonable to attribute to me. I did coauthor a paper arguing for fdt. I’m just revealing what was in my heart of hearts to get you an accurate count of academics who like it. From a glance, I have a similar reaction of surprise at how many rationalists are fans or think of it as a mature theory. I am also surprised how many are Solomonoff induction stans in case you’re on the hunt for more philosophers v rationalists material.
You should have been at my other talk at Manifest!
First problem, which even the Solomonoff fans admit - the precise prior depends on your choice of universal algorithm. (I'll grant them their response that this only introduces an error up to a single constant, even though in this case the constant appears up front and dominates, unlike in the case of complexity theory, where it disappears in the limit.)
Second problem, which again the Solomonoff fans admit - it's uncomputable (though it is computably-approximable-from-below).
Third problem - it just builds in certainty that the truth is computable! Why think that? Especially if you think that some normatively ideal thing is uncomputable!
Fourth problem (which I think is the most important one, though lots of academic epistemologists face as well) - what motivation is there for someone who has a different prior to treat this prior as better? Omniscient priors always perform best in the worlds they are adapted to, and in general, if you throw someone into situations in proportion to a particular probability function, then them the prior that matches this probability function will be the most successful prior for them to have. If someone isn't being thrown into the world in proportion to the Solomonoff prior, why should we fetishize computational simplicity in precisely this way?
As far as I can tell, the motivation for the Solomonoff prior comes from people who are impressed by Occam's razor, but instead of trying to justify it, they reason in a Kantian transcendental way to figure out what constraint rationality would have to have to make Occam's razor automatically fall out.
> I know I’m not a simulated algorithm. The simulated algorithm isn’t conscious (we can stipulate). I am.
But you don’t know that. Especially not now that you’ve posted that!
Consider the most mundane form of simulation possible: human imagination. Suppose I set up a Newcomb experiment where I make my predictions by simply reading what the participant has written online and imagining what their thought process would be like. The prediction won’t be very accurate but it will probably be better than chance. Better than chance is all you need to create the paradox.
Well, now that I’ve read your post, my little imaginary version of you is definitely going to start with, ‘I know I’m not a simulated algorithm. The simulated algorithm isn’t conscious, I am.’ But imaginary-you is quite mistaken. Imaginary-you is not conscious. And when imaginary-you two-boxes, it costs the real you the prize. Hypothetically.
> Decision theory generally assumes that you’re self-interested. But if I’m the algorithm, then I care about algorithm me—not the version in the real world. So then I wouldn’t care about what the output of the algorithm was.
I think this is correct. If you are purely self-interested, then two-boxing is rational. However, if you are a simulation, you don’t lose anything by one-boxing because the simulation almost certainly terminates as soon as you make a choice. Therefore, you should one-box if you have any goodwill towards your real-world counterpart. Either because they are “kind of like me but different in a bunch of respects”, or just because you have a default of goodwill towards other people.
On the flipside, if you value your life as a simulated being and resent that the simulation will terminate, then you might rationally two-box as revenge. This presumably doesn’t apply to low-fidelity simulations like human imagination.
(I don’t know whether my opinion comports with FDT. As I see it, the simulation argument is an argument *against* FDT. It shows how plain old CDT can justify one-boxing.)
So clearly I need to brush up on my decision theory, but let me get this right.
The issue is that 1. FDT claims you should
Make the decision that basically if infinitely repeated in all similar scenarios as the output of your decision procedure would leave you best off. 2. So the problem is that it leads to weird stuff such as
A. It matters what other algorithms relationship is to your algorithm such as the case where it you would pick the bomb because you having that as your decision output would counter factually mean there was no bomb even though there is?
Plus all sorts of similar issues where we have to basically try to explain why a scenario where ~you did things only ~you would ever do is related to this situation.
B. Anytime an algorithm is introduced you need to show what the relationship between your algorithm and this 3rd party algorithm (I know there are only 2 parties), but it seems impossible to establish anything beyond an epistemic one so you really have EDT and even if you could it would lead to weird examples like the gene one.
The fundamental problem with this whole debate is that what it *means* to idealize Newcomb's problem in terms of decision theory as a choice by the predictor and then a choice by you just IS to suppose that the predictor's choice can't depend on your choice in any way.
As such there aren't different decision theories. There is one correct answer: it is be the sort of person who would be predicted to take one box then take two.
If someone objects that you broke the assumption respond that no, that's what it means to idealize something as a subsequent decision in decision theory.
If they prefer, they can choose to model the situation instead as consisting of a deciscion precommitting the agent to be a 1 or 2 boxer and then a choice by the predictor with no subsequent decision. That problem also has an obvious answer.
The one thing that doesn't make sense is to idealize what you do with the boxes as a decision and then not treat it as such. That's conflating the formal idealized notion of decision in the theory with the notion of a decision as happening any time someone goes "Hmm, what should I do".
As such there aren't different decision theories AT all. There are different attitudes about how to idealize Newcomb's problem in terms of decision theory. But once you've done that it reveals there isn't really a philosophical problem -- just a question about how to think about the scenario.
Does your verdict that "the rational act is two-boxing" actually guide the decision procedure of a rational agent?
If it does, then the predictor predicts two-boxing and the rational agent loses the million. Also, "rational sort of person" and "rational act" are the same thing after all.
If it doesn't - which I suspect - then the predictor predicts one-boxing, but... The agent is somehow built to ignore what she finds rational as a person? Isn't it impossible to build an agent that generally views one-boxing as rational, but somehow still two-boxes? In any case, it's highly suspect - paradoxical even - that the ideal agent's action directly goes against its own character.
Your response to the bomb scenario with an initial simulation relies on the claim that the simulation wouldn't be conscious, is this really necessary to the argument? Even if the simulation is conscious and counts just as much as the later biological version for utilitarian purposes, why wouldn't it be better for both of them to suffer the minor inconvenience of paying $100 (bc initial sim version chooses Right, and bio version then also has to choose Right to avoid the bomb), rather than for the simulation to experience getting burned alive by choosing Left while the bio version then gets to save $100 by taking an empty Left? Does FDT differ from this utilitarian answer?
The bomb scenario could be made more tricky for a non-FDT utilitarian by imagining the predictor gives the choice to an initial simulated mind with a bomb in Left, then in the future creates some suitably huge number (say a billion) of further copies of that sim, who will each face an empty Left box if the initial sim chose to die by taking Left, but will get a Left box loaded with a bomb if the initial sim chose to pay $100 to take Right. And we could also make it tricker by making the price for choosing Right something much more negative (say getting non-lethally tortured for an hour) but still not as bad as getting burned alive by a bomb. Even if the sims are not perfect duplicates, here a utilitarian sim might bite the bullet on taking Left with a bomb since this gives a big boost on the epistemic probability that they are the first and they are thus saving all the future versions from having to choose Right.
I still do agree with other criticisms of FDT like reliance on "logical counterfactuals", and I also think that claims about weaknesses of EDT tend to rely on misleading intuition-pumps which under-specify some key aspects of the problem. For example in the smoking lesion problem, it's not specified whether the statistics on the association between the lesion and smoking were collected on a population that was similar to me in all relevant respects including the fact that they too learned about the lesion/smoking link, they had not taken up smoking before learning about it, and when they considered whether to take up smoking they all introspected and found no preexisting "tickle" of desire to smoke (the 'tickle defense' found in the various references at https://casparoesterheld.com/overview-why-we-think-that-the-smoking-lesion-does-not-refute-edt/ ). If they are not similar to me in all these respects, an EDT advocate may think it's justifiable to say that my taking up smoking should not increase the epistemic probability I have the lesion the way it would for a random member of the population, or at least not increase it to nearly the same degree.
I think a similar issue of under-specification would apply to your section 3.3 where you say "some gene that correlates 99.9% with two-boxing" but don't make clear if this is in a population where everyone else has also been told about the gene/two-boxing correlation (and where I also have no reason to believe I might be significantly better at thinking about the nuances of EDT problems than the average person in the population). It's possible in a population of "naive" Newcomb test-takers the correlation would be this good, but I might still strongly suspect it'd be much lower for a population of test-takers who were "like me" in some relevant respects like knowledge that the prediction was based on the gene, in which case from an EDT perspective there would be justification for treating this case as different from the one with a detailed simulation of my brain.
Re simulation, it's not the simulation who gets burned but the simulation that affects if you do.
None of those changes to the situation change my intuition about bomb.
Re 99.9% correlation, I was imagining that all you know is about the population-level correlation. So this gives you defeasible reason to think it pairs up in your case.
Re the Chalmers proposal, this will produce the result that changing tiny features makes itno longer the same algorithm. But that's rough for FDT. See also the economy example.
If we assume both the sim and the bio version have consciousness do you mean the sim won't actually experience anything bad if they see they see a bomb in Left and choose it? If that were the case, then given first-person uncertainty about whether I am the sim or bio version, it seems that if I see a bomb in Left my choice is more like "pay $100 to pick Right, or pick Left and subsequently get burned alive if I turn out to be the bio version, but get off scott free if I turn out to be the sim version (and thereby ensure the bio version will see an empty Left)."
If the laws of physics were deterministic & computable, and I knew the sim version was a perfect one so the probability of the two versions doing something different was 0, wouldn't it make sense from an EDT perspective to choose the Left, since that choice updates the epistemic probability I am the sim to 1? In that case, whether an EDT person would still choose Left in the case of a *near* perfect sim would just depend on combination of probability of error and the values for negative utility they assign to getting burned alive vs. losing $100.
But unless you are making metaphysical assumptions about the sim lacking consciousness, how would I know whether I am the sim or bio version? If both are explicitly given that information, that would be a divergence in their experience which would make the sim's behavior much less accurate for predicting the behavior of the later bio version.
Would your answer on this problem be any different if instead of a sim the experiment was done on two biological teletransporter duplicates who were replicated at different times, the first one getting Left-with-bomb and the second one's Left depending on what the first chose, and neither was told whether they were the first or the second?
We're just stipulating that it lacks consciousness. Maybe this assumption is wrong, but in any case, it shouldn't be that FDT depends on substrate independence being true--so that if substrate independence was false then FDT would be wrong too.
Why shouldn't one's answer to decision theory questions involving sims depend on whether one believes in substrate independence? As I said I endorse EDT rather than FDT, but one's answer to this problem in an EDT framework could just as easily depend on whether one believes the initial sim is conscious (since that makes a difference in terms of whether the teletransporter scenario is treated the same as the sim/bio scenario).
The Cheating Death in Damascus paper, IMO, has a lot of genuine content in it. I think there are nontrivial decision problems where you can clearly tell what the FDT answer is, even tho there isn't a complete theory of counterlogicals to do it for all possible problems.
I think counterlogicals are easier to solve than you think they are, especially for the sorts of decision problems that people often discuss.
There are cases where FDTers agree on an answer, and this was what motivated the theory. But there aren't cases where you can deduce the answer from the theory, and I think this is in-principle impossible for the reasons I explain.
When I reason about computer programs while I'm writing them, I do a lot of 'imagining mathematically impossible worlds'. Like, there's obviously some deterministic answer about what the computer program would in fact do, and if I knew all the math facts I could immediately figure it out. But I don't know all the math facts, and so I start off with a lot of different hallucinatory worlds where the program does different things. Some of those hallucinations find contradictions and vanish; other hallucinations persist and they end up as my best guess. [Similarly for writing math proofs.]
Do I have a theory of how to do that in general? Not yet, but I don't view this as a lethal objection. (For example, I buy that the Halting Problem is not solvable in the general case, but I think I can restrict my attention to just problems where it is solvable without losing many programs that I care about. When I restrict my attention to problems where the FDT solution is computable, how many problems that I care about do I lose?)
"CDTers say: both boxes. Taking the second box causes you to get an extra $1,000. The fact that it correlates with there being less money in the box is irrelevant. By taking one box, CDTers claim, you’re just passing up an extra $1,000."
Ah yes just like how when I buy a lottery ticket it causes me to win millions of dollars, much more than my life savings is worth, so I will spend it all on lottery tickets.
(This take brought to you by someone who doesn't understand the various decision theories and knows he is fighting a strawman. Nonetheless, two boxing seems plainly silly to me. As long as the predictor is better than chance, one boxing just makes more sense to me.)
I think this is almost all right, and FDT is a bad view for basically these reasons. But I'll note that I didn't really follow this bit:
"There’s a somewhat strange paradox here. Imagine that there’s someone who is psychologically identical to me at all times before the prisoner’s dilemma. I’m in a prisoner’s dilemma against them. They defect. On FDT, I should defect too. But then we’re running the same algorithm. So then I should cooperate. But then we’re running different algorithms, so I should defect."
On my understanding, FDT would just say you (and they) should cooperate, even though you (and they) will in fact defect.
I think something weird is going on with your "but thens". You're going from the normative claim "I should cooperate" to the claim "we're running different algorithms". But whether you're running different algorithms doesn't depend on whether what you should do matches what they do do, it depends on whether what you do do (and would do) matches what they do do (and would do).
Now maybe you're taking what you wrote as shorthand for something like "if I were instead disposed to do what the view tells me I actually should do, (and if the person who is in fact my duplicate retained the dispositions they in fact have) then we would be running different algorithms", which is true. And it's true that for that version of you (and them), FDT would recommend defecting. But as far as I can see that's neither here nor there - I don't see a paradox.
Why do you need a separate decision theory to justify taking one box in the transparent Newcomb's box problem? FDT just sounds like iterated games in a game theory context. But if you have an iterated game, then CDT or EDT can just incorporate the iteration and still get to the FDT answer.
Aside from the nugget of stipulating non-consciousness of a simulated algorithm (which... What.)... let's abstract a bit. You are running a very weird comparison between the following three:
1. Causal Decision Theory
2. "EDT and updateless EDT" (two quite different theories; the latter is, essentially, Wei Dai's Updateless Decision Theory, or UDT, https://www.lesswrong.com/w/updateless-decision-theory, and thus _also_ partially grows from the Rationalist tradition, though Wei Dai appears to _also_ be more classically read-up!)
3. FDT
So, according to you, Rationalists are wrong because they deliver FDT; and yet the preferable alternative is UDT, which is done by a Rationalist and, historically, because of FDT-like exploration.
But it gets weirder. According to https://www.lesswrong.com/w/timeless-decision-theory, "[t]he FDT paper thus describes a general framework which remains agnostic about an updateless approach (like UDT) vs an updateful one (like TDT), but which sticks close to the logical-causality approach introduced by TDT." TDT stands for Timeless Decision Theory, Yudkowsky's _previous_ attempt at formalizing his gripes with CDT (and is itself barely formalized). If this is true, then _of course_ FDT is underdefined under the terms you offer, because it is at a different level of abstraction! It covers a _family_ of approaches!
If we stipulate the simulated algorithm isn’t conscious, then aren’t we are also stipulating that i wouldn’t care about the utility gained the simulation? My simulation would therefore only optimize for the non-simulated case and pick the right box.
>You face two open boxes, Left and Right, and you must take one of them. In the Left box, there is a live bomb[...] The Right box is empty, but you have to pay $100 in order to be able to take it.
>A long-dead predictor predicted whether you would choose Left or Right, *by running a simulation of you and seeing what that simulation did*. If the predictor predicted that you would choose Right, then she put a bomb in Left. If the predictor predicted that you would choose Left, then she did not put a bomb in Left, and the box is empty.
>*The predictor has a failure rate of only 1 in a trillion trillion. Helpfully, she left a note, explaining that she predicted that you would take Right, and therefore she put the bomb in Left.* [...] What box should you choose?
First problem with the scenario: as stated, the note actually contains no information about the bomb. How so? Well, the example is clear: the predictor predicted your choice *by running a simulation and seeing what the simulation did*. In order to predict you via simulating you, both you and the simulation need to have the same inputs. Thus the simulation would have to have seen an identical note to you.
The problem is that the note and its contents have to be decided *before they are shown to the simulation*. Because the Left or Right decision is not known before the simulation is run. And once the simulation is run with that particular note, then you have to see that note as well (identical inputs). So you will see the note that was decided upon before the prediction was made. Its content is thus immaterial to the presence or absence of the bomb.
If we want the example to work, it needs to be redefined first. We need an example that (a) makes the note true, (b) makes the predictor as accurate as stated, and (c) forces FDT to Left in this example. *It’s not clear that such an example exists*.
Partial example: there is a fixed-point version of this that gives (a) and (b), but it is no longer a problem for FDT. Assume the predictor ignores the simulation and always puts the bomb and the note in, and the note is known to be accurate. Then all agents will go Right, including FDT, because they believe the note. And the predictor, who predicted Right, will thus be extremely accurate.
>First problem with the scenario: as stated, the note actually contains no information about the bomb. How so? Well, the example is clear: the predictor predicted your choice *by running a simulation and seeing what the simulation did*. In order to predict you via simulating you, both you and the simulation need to have the same inputs. Thus the simulation would have to have seen an identical note to you.
Does the mechanism of prediction really need to be simulation? Are we allowed to imagine there's just some magic oracle that can accurately (but not perfectly) predict what you'd do without any simulating, so there were never any previous notes to begin with?
If not, here's another variant. Suppose time is past-eternal. Every year for all of the eternal past, the simulator has gone through the bomb experiment with a copy of you. The note in the present includes the following: "In every contiguous sliding window of yearly experiments of size 1 trillion up till now, you've chosen Right way over 99% of the time, so I've put the bomb in Left; I am programmed to only predict your behavior and fill up boxes based on the most recent such window, and no other. But since this statistical pattern of outcomes has in fact been universal across windows, the note I've left in each previous experiment has always said the same thing as what I'm telling you now." Everything else is the same as the original thought experiment. Now the simulator's notes are and have always been totally honest.
"You are the only person left in the universe. You have a happy life" only rationalist could write a sentence like this
Also, what do I care about $100 if there's nobody else in the universe?
You just do!
Scrooge Mc Duck, last surviving descendant of Earth...
Haha, thanks for giving me something to do this afternoon. There's very few things I enjoy more than arguing about decision theory.
A few thoughts:
I agree that the Bomb example is tough - I think FDT's framing that you retroactively change the past is genuinely bad, and the correct way of thinking about it is a lot deeper. (I have an unfinished draft about my own idiosyncratic framing, if you're interested I can send it to you).
But I think in criticisms of FDT, including this one, there is usually a missing mood. There's a mental move very important to intellectual progress of "Okay, I can't get on board with this as it currently stands, but there are some very interesting ideas here that seem important". I think this is clearly true about FDT / LW decision theories! So you should say it, and not call it "devoid of genuine content"! :P
(to be clear, I personally am not doing that mental move since I *am* on board with LW decision theory, but I understand why someone might not be due to the counterintuitiveness)
Also, I disagree that there's "no fact about whether two algorithms are the same". There's a whole literature about "multiple realizability" (often in the context of functionalism in philosophy of mind) that I don't know why MacAskill and you don't mention. Last time I tried to look into this, my personal takeaway was the opposite - that it *does* seem like there plausibly is an in-principle approach for determining whether two physical systems implement the same algorithm: Chalmers' CSA approach. (The concrete response to the calculator example would just be to say that the + and - version of the calculator are the same algorithm with different labels.)
I reject that it's anywhere close to the truth. Maybe the right view is something between CDT and EDT, but beyond that, I don't think FDT is particularly near being correct. Re two algorithms being the same, sure but then you get the problem we talk about where you might just change the economy.
No, because I highly doubt that an algorithm similar to my brain is implemented anywhere in the economy in a consequential way. There's no reason to think this.
Have you looked at Parfit's Hitchhiker? (https://x.com/reconfigurthing/status/2031963649124818967) Do you pay?
Yes but if it was!
I think it's irrational to pay in PH, but you want at the earlier time to bind yourself to do the irrational thing. So it depends on whether you can get yourself to be irrational later.
You should ask an LLM about Chalmers' CSA thing, the bar for two combinatorial state automata to be the same is very high. I found it really insightful. You'd basically need a full simulation of a human brain in the economy somehow, and if you somehow had this, I'm willing to bite the bullet that you can acausally change (to a slight degree) the workings of the economy.
But concretely, if you personally were in the situation, in front of the ATM, would you pay?
I'm not sure, it would just depend on empirical facts about my psychology.
If there is some system of inputs and outputs that corresponds to me defecting in the economy, then it wouldn't seem like my defecting would make any difference to it.
You're not sure what you would do in practice? Hmm can you imagine being transported there right now and having to make the decision?
Wait, why do you agree that Bomb is tough? It's basically conditioning on an impossibility--it is free in real life to choose left.
(I have bitten the bullet on retroactively changing the past; I think this is in fact how you have to reason when you exist around other agents, who are reasoning about whether or not you can be convinced. If your partner believes that you will forgive cheating because "it happened in the past and there's no changing that now" then you get cheated on more than if you reason "by having a hard line here, I will have made it less likely that this happened to me.")
How is it conditioning on impossibility?
How does the age of the universe compare to a trillion trillion seconds?
This question might seem flippant or irrelevant, but I think it's actually pretty important to have a good sense of scale. Like, a lot of being good at decisions hinges on using numbers to mean things, and if you don't believe numbers mean things, you're going to have some incorrect positions.
Suppose rather than facing Bomb!, God flips a coin when creating the universe. If heads, the bomb is live every time; if tails, the bomb is a dud every time. Some agent faces this problem every second for the whole duration of the universe. Could you tell which way the coin landed, just from looking at the outcomes of the decisions?
[Note that I am assuming the 1/trillion trillion chance is real, rather than it secretly being a 1/1 chance that the predictor makes an error.]
It's not impossible if it's very improbable.
Sorry, is English your native language? (The word "basically" is often used to signify rounding--there is a difference between 1 and 0.999999999999999999999999 but the difference is 0.000000000000000000000001, which is a scale which is often discarded, because many concerns will be more important and it's impractical to consider all of them.)
This is an unnecessarily douchey reply to an accurate clarification.
Hey Vaniver! :) I like a lot of your work and online presence.
Yeah I used to bite the bullet of retrocausality too, until I realized that there's a better framing that preserves the upside and doesn't have the obvious "incorrectness" of thinking we can change the past: Just say that alternate realities "exist" in some way, and that you might be in a small pocket reality that changes the expected outcomes in the main reality, and that you also care about the you in the main reality. (I believe this is an updateless EDT / UDT thing, although I'm not sure)
Thanks! I think this are either 1) equivalent formulations, and so the question is just what seems weirder to you (which I don't expect to be objective), or 2) we should think carefully about the cases where they differ.
I think there are objective reasons to prefer the alternate reality framing - but also, I do think it changes things because it means the world is much larger, e.g. if we are in something like Tegmark IV. So e.g. ECL becomes more important, the importance of infinite ethics increases, etc.
You might say that alternate realities are also counterintuitive, but I think in some important ways they are not - I have a draft about this I could send you tomorrow, would be curious what you think.
FDT's framing is not that you retroactively change the past though. It just says that if you Left-box, your simulation also Left-boxed.
My understanding is that in the causal graph implementation in the paper, you are intervening on nodes that are allowed to be temporally prior to your own action. I'd characterize that as "changing the past".
I'm confused about the bomb example.
It seems like your example hinges upon the FDT agent picking Left.
But it also says that the predictor with a one in a trillion trillion error rate predicted you would pick Right.
If all of this is correct, it seems like you're hinging your example on this being the one time in a trillion trillion when the predictor was wrong.
But decision theories shouldn't be judged on whether they work well in unbelievably rare edge cases that you would never encounter in a million lifetimes.
Compare Lottery Decision Theory:
> you have the choice to buy a lottery ticket for $100. There is a one in a trillion trillion chance you will win. If you win you get $1 million. Should you do it? Before answering, keep in mind that, unbeknownst to you, this is the one time you would actually win.
We can use this example to prove that you should definitely play the lottery. I think the bomb situation maps to this - although it fails in the one/trillion trillion case where the superpredictor was wrong, it succeeds in the 9999999999.../trillion trillion cases where the superpredictor is right, and when you multiply out the probabilities by the utilities (eg of getting any extra $100 vs. getting bombed), FDT gets you more utility overall.
The particular way FDT succeeds is that you never (okay, 1/trillion trillion times, but this rounds to never) find yourself in this situation. So just by asking about this situation, you've already started with the assumption that FDT fails, which is why you are so easily able to prove that FDT fails.
I think this goes back to what I said last time we discussed this. You and Eliezer are optimizing for different things. You are optimizing for never finding yourself in a situation where you have to do something silly within that situation; he is optimizing for having the most utility overall if you can set your algorithm. I think the thing he's optimizing for makes more sense.
Yes you are depending on this being the one time in a trillion trillion when the predictor was wrong. Decision theories are judged by whether they give the right answer across the board. So if they get the wrong result even in weird edge cases, they're disproved. But the wrong result isn't the same as any case where adopting the situation is bad for you. So what matters is not "if they work well" but if they give the right answers. Compare: if there's a single result where utilitarianism gives the wrong answer to what you should do, that disproves it. This doesn't mean that any scenario where being a utilitarian turns out badly is a counterexample.
Decision theories are theories of rational action. That's different from what dispositions you want to have. If you're in a world where you encounter Newcomb's problems all the time, then you want have one-boxing dispositions. But that's different from a theory of rationality.
Also, just want to register, I think the bomby problems are actually much less big of problems than the "FDT doesn't say anything in any case" problems. I find the discourse has focused more on cases where FDT is counterintuitive and has weirdly neglected the fact that it's totally ill-formed.
It's fine for theories to give the wrong answer in a probabilistic sense, i.e., they recommend some action under uncertainty and you end up getting unlucky and losing out. I think what you want to say with the bomb case is that it gives the wrong answer under certainty.
Looking through that thread, the FDT people might want to say there really is uncertainty in the bomb case - namely, which person you are among the real you and your simulations. But I don't feel the introducing simulation aspect is actually necessary to the example.
"Decision theories are theories of rational action. That's different from what dispositions you want to have."
The fact that, under FDT, these are not different at all counts as a major point for FDT over e.g. CDT.
> Decision theories are judged by whether they give the right answer across the board.
In a world of uncertainty, it is generally not the case that you can win in every possible outcome. The question is whether you can get the best outcome in the aggregate.
> So if they get the wrong result even in weird edge cases, they're disproved.
Guaranteed Payoffs gets the wrong answer on whether or not you should take a bet on a fair coin which pays out $2 if you win and $1 if you lose.
> Also, just want to register, I think the bomby problems are actually much less big of problems than the "FDT doesn't say anything in any case" problems. I find the discourse has focused more on cases where FDT is counterintuitive and has weirdly neglected the fact that it's totally ill-formed.
Have you seen any decision problems where a FDT proponent has not been able to give the FDT answer?
Like, I think we can exhibit problems where it's challenging, or requires empirical estimation. ("How correlated is my decision to vote with other voters in my district?") But I think this is a challenge with the world, not with the decision theory, and generally it's easy to come up with the rule. ("If my influence on whether other voters similar to me vote is at least 0.02, then it makes sense for me to vote.")
I believe the bomb example doesn't work (or at least not as stated). As stated, the note is irrelevant and uninformative about the bomb, because it has to be presented to the simulation before the simulation makes a decision, and then the exact same note will have to presented to you, whatever your simulation does.
https://benthams.substack.com/p/functional-decision-theory-not-even/comment/285235267
It's not clear that there is an example where a) the note is accurate, b) the error rate is as stated, and c) the FDT agent is forced Left in this case.
> unbeknownst to you
The Bomb example stipulates that you know it, so I'm not sure whether this Lottery analogy is apt, because "always play the Lottery when you know you will win" seems reasonable. (That being said, given the probabilities involved, when you read this "predictor's note", you should think that it is almost certainly a fake/prank/illusion, though maybe additional stipulations could fix that)
I agree that the focus on extremely-rare-outcomes seems unfair, though (at least, in the framework of what makes sense to me as an optimisation target)
Note the similarity to transparent Newcomb's Problem--the boxes are transparent, so you know whether or not Omega predicted you would one-box or two-box. Even if you see the box full, and so know that Omega has already predicted you would one-box, I argue that it is important to one-box because that increases the probability you are in this (desirable!) scenario.
The thing that is strange about the lottery case is that discovering that you bought a lottery ticket is normally an undesirable scenario--the money was, in expectation, wasted--but it just so happens that you also won. If you think the lottery was rigged--your friend who works at the lottery commission mailed you the winning tickets--then you might think that actually this is a desirable scenario and it does make sense to play the rigged lottery. But that's not usual!
It's important to play 'follow the improbability' ( https://www.lesswrong.com/posts/k6EPphHiBH4WWYFCj/gazp-vs-glut ) and notice when the hypothetical involves rigging and whether or not that is 'legit'.
I fundamentally agree with the criticism about the mathematical impossible world and the vibe but I think there is an even deeper issue here.
At a fundamental level what are we even doing when we adopt a deciscion theory or say you should or should not do something? I mean in a fully literal sense you don't get to make decisions. You will always just follow the laws of physics.
What we are doing is adopting some kind of idealization about the world which -- just like when we define a game formally in mathematics -- idealizes certain things as choices while others are held fixed. And idealizing something as a choice is exactly to treat everything 'before' the choice as unable to depend on the outcome of the choice.
And once you understand things that way the whole Newcomb setup is just kinda non-sensical as regards deciscion theory. It's saying: what if you had a situation where it doesn't make sense to idealize what you are doing using a framework that treats it as a free choice how would you idealize it as a free choice.
The right answer is obviously: don't idealize it as a choice at all. Depending on how you describe the problem you can imagine idealizing in a way where the choice is what rule you precommit to or something like that but the whole debate about FDT or CDT or what have you is just fundamentally confused.
---
I mean just to illustrate how silly the argument is, what if I said the right answer to the paradox was: be someone who is physically guaranteed to take 1 box (so demon predicts you will take 1) and then take both. That does land you in a better position but it's kinda silly because I'm just breaking the rules of the game. Same with treating something both as predictable and a choice.
The rules of the deciscion theory idealization is that you have sometree and each node of the tree represents a choice with earlier ones being able to depend on later ones. If you want to look at scenarios like the demon case you need to reidealize it in a way that obeys those rules -- like a choice between rules.
I haven't read this whole thing, but someone sent me the quote involving me. FWIW:
1. I never believed in FDT, but I did think it was a fun view. At the time I wrote the paper with Nate Soares, I put most of my credence on CDT.
2. I now put most of my credence on EDT (although I've recently argued against it) because Arif Ahmed is very persuasive.
Fixed, sorry. Unrelated, have read some of your papers over the years and found them quite good.
Totally reasonable to attribute to me. I did coauthor a paper arguing for fdt. I’m just revealing what was in my heart of hearts to get you an accurate count of academics who like it. From a glance, I have a similar reaction of surprise at how many rationalists are fans or think of it as a mature theory. I am also surprised how many are Solomonoff induction stans in case you’re on the hunt for more philosophers v rationalists material.
Hmm, I don't really know much about the Solomonoff induction thing! Why do academics reject it?
You should have been at my other talk at Manifest!
First problem, which even the Solomonoff fans admit - the precise prior depends on your choice of universal algorithm. (I'll grant them their response that this only introduces an error up to a single constant, even though in this case the constant appears up front and dominates, unlike in the case of complexity theory, where it disappears in the limit.)
Second problem, which again the Solomonoff fans admit - it's uncomputable (though it is computably-approximable-from-below).
Third problem - it just builds in certainty that the truth is computable! Why think that? Especially if you think that some normatively ideal thing is uncomputable!
Fourth problem (which I think is the most important one, though lots of academic epistemologists face as well) - what motivation is there for someone who has a different prior to treat this prior as better? Omniscient priors always perform best in the worlds they are adapted to, and in general, if you throw someone into situations in proportion to a particular probability function, then them the prior that matches this probability function will be the most successful prior for them to have. If someone isn't being thrown into the world in proportion to the Solomonoff prior, why should we fetishize computational simplicity in precisely this way?
As far as I can tell, the motivation for the Solomonoff prior comes from people who are impressed by Occam's razor, but instead of trying to justify it, they reason in a Kantian transcendental way to figure out what constraint rationality would have to have to make Occam's razor automatically fall out.
Ah sorry, heard from someone else that you adopted it. Will fix.
> I know I’m not a simulated algorithm. The simulated algorithm isn’t conscious (we can stipulate). I am.
But you don’t know that. Especially not now that you’ve posted that!
Consider the most mundane form of simulation possible: human imagination. Suppose I set up a Newcomb experiment where I make my predictions by simply reading what the participant has written online and imagining what their thought process would be like. The prediction won’t be very accurate but it will probably be better than chance. Better than chance is all you need to create the paradox.
Well, now that I’ve read your post, my little imaginary version of you is definitely going to start with, ‘I know I’m not a simulated algorithm. The simulated algorithm isn’t conscious, I am.’ But imaginary-you is quite mistaken. Imaginary-you is not conscious. And when imaginary-you two-boxes, it costs the real you the prize. Hypothetically.
> Decision theory generally assumes that you’re self-interested. But if I’m the algorithm, then I care about algorithm me—not the version in the real world. So then I wouldn’t care about what the output of the algorithm was.
I think this is correct. If you are purely self-interested, then two-boxing is rational. However, if you are a simulation, you don’t lose anything by one-boxing because the simulation almost certainly terminates as soon as you make a choice. Therefore, you should one-box if you have any goodwill towards your real-world counterpart. Either because they are “kind of like me but different in a bunch of respects”, or just because you have a default of goodwill towards other people.
On the flipside, if you value your life as a simulated being and resent that the simulation will terminate, then you might rationally two-box as revenge. This presumably doesn’t apply to low-fidelity simulations like human imagination.
(I don’t know whether my opinion comports with FDT. As I see it, the simulation argument is an argument *against* FDT. It shows how plain old CDT can justify one-boxing.)
https://link.springer.com/article/10.1007/s11238-025-10080-w
So clearly I need to brush up on my decision theory, but let me get this right.
The issue is that 1. FDT claims you should
Make the decision that basically if infinitely repeated in all similar scenarios as the output of your decision procedure would leave you best off. 2. So the problem is that it leads to weird stuff such as
A. It matters what other algorithms relationship is to your algorithm such as the case where it you would pick the bomb because you having that as your decision output would counter factually mean there was no bomb even though there is?
Plus all sorts of similar issues where we have to basically try to explain why a scenario where ~you did things only ~you would ever do is related to this situation.
B. Anytime an algorithm is introduced you need to show what the relationship between your algorithm and this 3rd party algorithm (I know there are only 2 parties), but it seems impossible to establish anything beyond an epistemic one so you really have EDT and even if you could it would lead to weird examples like the gene one.
Do I have that right?
The fundamental problem with this whole debate is that what it *means* to idealize Newcomb's problem in terms of decision theory as a choice by the predictor and then a choice by you just IS to suppose that the predictor's choice can't depend on your choice in any way.
As such there aren't different decision theories. There is one correct answer: it is be the sort of person who would be predicted to take one box then take two.
If someone objects that you broke the assumption respond that no, that's what it means to idealize something as a subsequent decision in decision theory.
If they prefer, they can choose to model the situation instead as consisting of a deciscion precommitting the agent to be a 1 or 2 boxer and then a choice by the predictor with no subsequent decision. That problem also has an obvious answer.
The one thing that doesn't make sense is to idealize what you do with the boxes as a decision and then not treat it as such. That's conflating the formal idealized notion of decision in the theory with the notion of a decision as happening any time someone goes "Hmm, what should I do".
As such there aren't different decision theories AT all. There are different attitudes about how to idealize Newcomb's problem in terms of decision theory. But once you've done that it reveals there isn't really a philosophical problem -- just a question about how to think about the scenario.
The way I put it is that the rational sort of person to be is to be a one-boxer, but the rational act is two-boxing.
Does your verdict that "the rational act is two-boxing" actually guide the decision procedure of a rational agent?
If it does, then the predictor predicts two-boxing and the rational agent loses the million. Also, "rational sort of person" and "rational act" are the same thing after all.
If it doesn't - which I suspect - then the predictor predicts one-boxing, but... The agent is somehow built to ignore what she finds rational as a person? Isn't it impossible to build an agent that generally views one-boxing as rational, but somehow still two-boxes? In any case, it's highly suspect - paradoxical even - that the ideal agent's action directly goes against its own character.
Your response to the bomb scenario with an initial simulation relies on the claim that the simulation wouldn't be conscious, is this really necessary to the argument? Even if the simulation is conscious and counts just as much as the later biological version for utilitarian purposes, why wouldn't it be better for both of them to suffer the minor inconvenience of paying $100 (bc initial sim version chooses Right, and bio version then also has to choose Right to avoid the bomb), rather than for the simulation to experience getting burned alive by choosing Left while the bio version then gets to save $100 by taking an empty Left? Does FDT differ from this utilitarian answer?
The bomb scenario could be made more tricky for a non-FDT utilitarian by imagining the predictor gives the choice to an initial simulated mind with a bomb in Left, then in the future creates some suitably huge number (say a billion) of further copies of that sim, who will each face an empty Left box if the initial sim chose to die by taking Left, but will get a Left box loaded with a bomb if the initial sim chose to pay $100 to take Right. And we could also make it tricker by making the price for choosing Right something much more negative (say getting non-lethally tortured for an hour) but still not as bad as getting burned alive by a bomb. Even if the sims are not perfect duplicates, here a utilitarian sim might bite the bullet on taking Left with a bomb since this gives a big boost on the epistemic probability that they are the first and they are thus saving all the future versions from having to choose Right.
I still do agree with other criticisms of FDT like reliance on "logical counterfactuals", and I also think that claims about weaknesses of EDT tend to rely on misleading intuition-pumps which under-specify some key aspects of the problem. For example in the smoking lesion problem, it's not specified whether the statistics on the association between the lesion and smoking were collected on a population that was similar to me in all relevant respects including the fact that they too learned about the lesion/smoking link, they had not taken up smoking before learning about it, and when they considered whether to take up smoking they all introspected and found no preexisting "tickle" of desire to smoke (the 'tickle defense' found in the various references at https://casparoesterheld.com/overview-why-we-think-that-the-smoking-lesion-does-not-refute-edt/ ). If they are not similar to me in all these respects, an EDT advocate may think it's justifiable to say that my taking up smoking should not increase the epistemic probability I have the lesion the way it would for a random member of the population, or at least not increase it to nearly the same degree.
I think a similar issue of under-specification would apply to your section 3.3 where you say "some gene that correlates 99.9% with two-boxing" but don't make clear if this is in a population where everyone else has also been told about the gene/two-boxing correlation (and where I also have no reason to believe I might be significantly better at thinking about the nuances of EDT problems than the average person in the population). It's possible in a population of "naive" Newcomb test-takers the correlation would be this good, but I might still strongly suspect it'd be much lower for a population of test-takers who were "like me" in some relevant respects like knowledge that the prediction was based on the gene, in which case from an EDT perspective there would be justification for treating this case as different from the one with a detailed simulation of my brain.
Finally on the uncertainty about what it means to physically implement a given computation, I don't think this criticism is fatal. This issue has been considered by a number of philosophers who either identify mental states with computations or think there are psychophysical laws mapping computations to mental states, and they've proposed possible rules for what qualifies as a physical implementation. For example, David Chalmers falls into the psychophysical laws camp, he outlines a possible solution in "Does a Rock Compute Every Finite-State Automaton" at http://consc.net/papers/rock.html and goes into more detail in his paper "A computational foundation for the study of cognition" at https://www.ida.liu.se/divisions/hcs/seminars/cogsciseminars/Papers/Chalmers_Computational_foundations.pdf ...another proposal can be found in Anderson and Piccini's book The Physical Signature of Computation: A Robust Mapping Account, summarized at https://philosophyofbrains.com/2024/10/02/mapping-robustly-mechanisms-out.aspx and https://philosophyofbrains.com/2024/09/30/introductory-remarks-on-the-physical-signature-of-computation-a-robust-mapping-account.aspx
Re simulation, it's not the simulation who gets burned but the simulation that affects if you do.
None of those changes to the situation change my intuition about bomb.
Re 99.9% correlation, I was imagining that all you know is about the population-level correlation. So this gives you defeasible reason to think it pairs up in your case.
Re the Chalmers proposal, this will produce the result that changing tiny features makes itno longer the same algorithm. But that's rough for FDT. See also the economy example.
If we assume both the sim and the bio version have consciousness do you mean the sim won't actually experience anything bad if they see they see a bomb in Left and choose it? If that were the case, then given first-person uncertainty about whether I am the sim or bio version, it seems that if I see a bomb in Left my choice is more like "pay $100 to pick Right, or pick Left and subsequently get burned alive if I turn out to be the bio version, but get off scott free if I turn out to be the sim version (and thereby ensure the bio version will see an empty Left)."
If the laws of physics were deterministic & computable, and I knew the sim version was a perfect one so the probability of the two versions doing something different was 0, wouldn't it make sense from an EDT perspective to choose the Left, since that choice updates the epistemic probability I am the sim to 1? In that case, whether an EDT person would still choose Left in the case of a *near* perfect sim would just depend on combination of probability of error and the values for negative utility they assign to getting burned alive vs. losing $100.
Imagine that the sim's experiences don't change no matter what happens. Nothing you do affects the welfare of the sim.
But unless you are making metaphysical assumptions about the sim lacking consciousness, how would I know whether I am the sim or bio version? If both are explicitly given that information, that would be a divergence in their experience which would make the sim's behavior much less accurate for predicting the behavior of the later bio version.
Would your answer on this problem be any different if instead of a sim the experiment was done on two biological teletransporter duplicates who were replicated at different times, the first one getting Left-with-bomb and the second one's Left depending on what the first chose, and neither was told whether they were the first or the second?
We're just stipulating that it lacks consciousness. Maybe this assumption is wrong, but in any case, it shouldn't be that FDT depends on substrate independence being true--so that if substrate independence was false then FDT would be wrong too.
Why shouldn't one's answer to decision theory questions involving sims depend on whether one believes in substrate independence? As I said I endorse EDT rather than FDT, but one's answer to this problem in an EDT framework could just as easily depend on whether one believes the initial sim is conscious (since that makes a difference in terms of whether the teletransporter scenario is treated the same as the sim/bio scenario).
The Cheating Death in Damascus paper, IMO, has a lot of genuine content in it. I think there are nontrivial decision problems where you can clearly tell what the FDT answer is, even tho there isn't a complete theory of counterlogicals to do it for all possible problems.
I think counterlogicals are easier to solve than you think they are, especially for the sorts of decision problems that people often discuss.
There are cases where FDTers agree on an answer, and this was what motivated the theory. But there aren't cases where you can deduce the answer from the theory, and I think this is in-principle impossible for the reasons I explain.
When I reason about computer programs while I'm writing them, I do a lot of 'imagining mathematically impossible worlds'. Like, there's obviously some deterministic answer about what the computer program would in fact do, and if I knew all the math facts I could immediately figure it out. But I don't know all the math facts, and so I start off with a lot of different hallucinatory worlds where the program does different things. Some of those hallucinations find contradictions and vanish; other hallucinations persist and they end up as my best guess. [Similarly for writing math proofs.]
Do I have a theory of how to do that in general? Not yet, but I don't view this as a lethal objection. (For example, I buy that the Halting Problem is not solvable in the general case, but I think I can restrict my attention to just problems where it is solvable without losing many programs that I care about. When I restrict my attention to problems where the FDT solution is computable, how many problems that I care about do I lose?)
"CDTers say: both boxes. Taking the second box causes you to get an extra $1,000. The fact that it correlates with there being less money in the box is irrelevant. By taking one box, CDTers claim, you’re just passing up an extra $1,000."
Ah yes just like how when I buy a lottery ticket it causes me to win millions of dollars, much more than my life savings is worth, so I will spend it all on lottery tickets.
(This take brought to you by someone who doesn't understand the various decision theories and knows he is fighting a strawman. Nonetheless, two boxing seems plainly silly to me. As long as the predictor is better than chance, one boxing just makes more sense to me.)
I think this is almost all right, and FDT is a bad view for basically these reasons. But I'll note that I didn't really follow this bit:
"There’s a somewhat strange paradox here. Imagine that there’s someone who is psychologically identical to me at all times before the prisoner’s dilemma. I’m in a prisoner’s dilemma against them. They defect. On FDT, I should defect too. But then we’re running the same algorithm. So then I should cooperate. But then we’re running different algorithms, so I should defect."
On my understanding, FDT would just say you (and they) should cooperate, even though you (and they) will in fact defect.
I think something weird is going on with your "but thens". You're going from the normative claim "I should cooperate" to the claim "we're running different algorithms". But whether you're running different algorithms doesn't depend on whether what you should do matches what they do do, it depends on whether what you do do (and would do) matches what they do do (and would do).
Now maybe you're taking what you wrote as shorthand for something like "if I were instead disposed to do what the view tells me I actually should do, (and if the person who is in fact my duplicate retained the dispositions they in fact have) then we would be running different algorithms", which is true. And it's true that for that version of you (and them), FDT would recommend defecting. But as far as I can see that's neither here nor there - I don't see a paradox.
Maybe you can explain the thought to me better.
Why do you need a separate decision theory to justify taking one box in the transparent Newcomb's box problem? FDT just sounds like iterated games in a game theory context. But if you have an iterated game, then CDT or EDT can just incorporate the iteration and still get to the FDT answer.
Aside from the nugget of stipulating non-consciousness of a simulated algorithm (which... What.)... let's abstract a bit. You are running a very weird comparison between the following three:
1. Causal Decision Theory
2. "EDT and updateless EDT" (two quite different theories; the latter is, essentially, Wei Dai's Updateless Decision Theory, or UDT, https://www.lesswrong.com/w/updateless-decision-theory, and thus _also_ partially grows from the Rationalist tradition, though Wei Dai appears to _also_ be more classically read-up!)
3. FDT
So, according to you, Rationalists are wrong because they deliver FDT; and yet the preferable alternative is UDT, which is done by a Rationalist and, historically, because of FDT-like exploration.
But it gets weirder. According to https://www.lesswrong.com/w/timeless-decision-theory, "[t]he FDT paper thus describes a general framework which remains agnostic about an updateless approach (like UDT) vs an updateful one (like TDT), but which sticks close to the logical-causality approach introduced by TDT." TDT stands for Timeless Decision Theory, Yudkowsky's _previous_ attempt at formalizing his gripes with CDT (and is itself barely formalized). If this is true, then _of course_ FDT is underdefined under the terms you offer, because it is at a different level of abstraction! It covers a _family_ of approaches!
If we stipulate the simulated algorithm isn’t conscious, then aren’t we are also stipulating that i wouldn’t care about the utility gained the simulation? My simulation would therefore only optimize for the non-simulated case and pick the right box.
Problem with the bomb example:
>You face two open boxes, Left and Right, and you must take one of them. In the Left box, there is a live bomb[...] The Right box is empty, but you have to pay $100 in order to be able to take it.
>A long-dead predictor predicted whether you would choose Left or Right, *by running a simulation of you and seeing what that simulation did*. If the predictor predicted that you would choose Right, then she put a bomb in Left. If the predictor predicted that you would choose Left, then she did not put a bomb in Left, and the box is empty.
>*The predictor has a failure rate of only 1 in a trillion trillion. Helpfully, she left a note, explaining that she predicted that you would take Right, and therefore she put the bomb in Left.* [...] What box should you choose?
First problem with the scenario: as stated, the note actually contains no information about the bomb. How so? Well, the example is clear: the predictor predicted your choice *by running a simulation and seeing what the simulation did*. In order to predict you via simulating you, both you and the simulation need to have the same inputs. Thus the simulation would have to have seen an identical note to you.
The problem is that the note and its contents have to be decided *before they are shown to the simulation*. Because the Left or Right decision is not known before the simulation is run. And once the simulation is run with that particular note, then you have to see that note as well (identical inputs). So you will see the note that was decided upon before the prediction was made. Its content is thus immaterial to the presence or absence of the bomb.
If we want the example to work, it needs to be redefined first. We need an example that (a) makes the note true, (b) makes the predictor as accurate as stated, and (c) forces FDT to Left in this example. *It’s not clear that such an example exists*.
Partial example: there is a fixed-point version of this that gives (a) and (b), but it is no longer a problem for FDT. Assume the predictor ignores the simulation and always puts the bomb and the note in, and the note is known to be accurate. Then all agents will go Right, including FDT, because they believe the note. And the predictor, who predicted Right, will thus be extremely accurate.
>First problem with the scenario: as stated, the note actually contains no information about the bomb. How so? Well, the example is clear: the predictor predicted your choice *by running a simulation and seeing what the simulation did*. In order to predict you via simulating you, both you and the simulation need to have the same inputs. Thus the simulation would have to have seen an identical note to you.
Does the mechanism of prediction really need to be simulation? Are we allowed to imagine there's just some magic oracle that can accurately (but not perfectly) predict what you'd do without any simulating, so there were never any previous notes to begin with?
If not, here's another variant. Suppose time is past-eternal. Every year for all of the eternal past, the simulator has gone through the bomb experiment with a copy of you. The note in the present includes the following: "In every contiguous sliding window of yearly experiments of size 1 trillion up till now, you've chosen Right way over 99% of the time, so I've put the bomb in Left; I am programmed to only predict your behavior and fill up boxes based on the most recent such window, and no other. But since this statistical pattern of outcomes has in fact been universal across windows, the note I've left in each previous experiment has always said the same thing as what I'm telling you now." Everything else is the same as the original thought experiment. Now the simulator's notes are and have always been totally honest.