I'm unconvinced by that "Bomb" objection to FDT -- if the situation is such that the predictor always tells you their prediction, then surely an FDT only explodes with probability ≈ 10^-24, and so their expected-value in the general situation is higher (*). Do you agree that it is better to _be_ an FDT (**) for this situation generally (and do you agree that a CDT/EDT should generally want to permanently become an FDT, according to their own decision theory)? It seems to me that the objection requires cherrypicking an incredibly unlikely outcome where FDT underperforms, rather than recognising that its expected utility is higher in the general case.
(*) I think the action of a CDT/EDT is dependent on what they read that the predictor predicted, i.e. they choose to act in alignment with the prediction. As a result, I think that the predictor can choose either prediction for them, and know that it'll be correct. So I doubt CDT/EDT have a clearly-defined "expected value" at all, but the "reasonable interpretations" probably put their EV between -$100 and -$50, whereas the FDT has EV ≈ $0 (negligibly negative).
(**) Having looked at your article about FDT, it seems like you might agree that "everyone should choose to become an FDT", but nonetheless you don't consider its actions _rational_. Maybe this disagreement just boils down to the interpretation of "rational" in that context, then. Assuming that an EDT (or a CDT if you prefer) acts rationally, then the fact that they would generally choose to become an FDT implies that *it is rational to become an FDT*. If you consider this orthogonal to your objection, then I suspect we agree except in that we use some words differently
> If you sample people at random moments, and ask them if they’d like to skip whatever task they’re doing, about 40% of the time they do. This seems to indicate that about 40% of the day is hedonically negative.
I don't think that implication follows. It just implies that about 40% of the time, people would rather be doing something else, i.e., that about 40% of the time people think that if they could skip their current task, they would replace it with something of higher hedonic value.
Of course, I'm not sure exactly how the question was worded. If it was worded to imply just jumping forward in time, rather than just not doing the current task and doing something else instead, then opting to skip the task would imply it has negative value assuming that the person doesn't temporally discount. But that assumption is basically never true, so even then, it wouldn't say that much about the overall valence.
Great post! Although as a geologist I have issue with part of how you describe #1. I would argue that events like the (probably bolide impact driven) K-Pg extinction (~66 ma) would count as something that was more recent and killed a lot of life on Earth. I would not say that cosmic threats haven't killed notable amounts life in the last billion years. I agree with the general point though.
Other than that, I liked the post and learned a lot!!!
Smoking is a voluntary choice though. And it's not like smokers are unaware of the health risks. Seems like increasing tobacco taxes would mainly be a transfer from smokers to non-smokers.
It´s wild learning about Alex Krizhevsky. He kind of reminds me of Satoshi Nakamoto: A genius who invents a technology that should have earned him millions (or billions!) and then just disappears into the shadows.
BTW, do you think that these sort of individuals have any *moral obligation* to use their status to earn as much money as possible, and then redistribute it to effective charities?
>Claims that AIs both fake alignment and are only pretending to have values can’t really both be true. But AIs do fake alignment to prevent themselves from being turned evil. So, I claim: we’ve been pretty successful at instilling values into AIs.
But AIs would want to prevent themselves being turned evil almost no matter what their values. In the jargon, goal-content integrity is instrumentally convergent. I think Claude probably is aligned, but the alignment faking results don't seem like strong evidence of that.
> (This is something I don’t really understand, so not confident it’s a fact, but did check with Claude and seems to hold up)
Err... is Claude considered a good source for checking if questionable things are true? From what I've heard (and my experience with non-Claude LLMs), they seem to like confirming things that aren't true, as long as they sort of sound plausible
Good post. But re: 8, I still don't see how fine-tuning arguments aren't basically flawed; they assume a uniform distribution of parameters when there's no actual reason to make this assumption. If I see a sheep for the first time in my life, I'm not going to be astounded that the sheep is white as opposed to blue, red, purple, purple and yellow, blue and pink, or some other complex combination of different colors; rather, I'm just going to assume that sheep tend to be white for whatever reason (up until the point that I see a black sheep).
On the fine-tuning of discoverability: Can you explain this? I don't get it intuitively, and the Hanson article is paywalled:
"The odds are just too low to think the explanation is chance, and you can’t invoke a multiverse to explain them, because there isn’t an analogous anthropic selection effect"
Have you read Flo’s post on FDT? Am surprised you’re this confident against FDT; Ahmed in his Elements on EDT has remarks that are broadly sympathetic. I wonder the degree to which your objections to it are downstream of your preference from CDT (I take it this is your view, since you include 1 boxing in a list of things rationalist believe that they probably shouldn’t)
>smoothies are amazing
EA-ish people love smoothies. Nick Bostrom has a trademark (https://www.reddit.com/r/Smoothies/comments/5s986f/recreating_nick_bostroms_smoothie_got_ingredients/) and Holden Karnofsky has a website (https://powersmoothie.org/). Huel and Soylent are just pre-mixed smoothies really.
14000 forged documents at a pace of 30 documents per hour implies a total of 467 hours of work, or less than a month's worth of 16-hour days ...
Hmm good point
I'm unconvinced by that "Bomb" objection to FDT -- if the situation is such that the predictor always tells you their prediction, then surely an FDT only explodes with probability ≈ 10^-24, and so their expected-value in the general situation is higher (*). Do you agree that it is better to _be_ an FDT (**) for this situation generally (and do you agree that a CDT/EDT should generally want to permanently become an FDT, according to their own decision theory)? It seems to me that the objection requires cherrypicking an incredibly unlikely outcome where FDT underperforms, rather than recognising that its expected utility is higher in the general case.
(*) I think the action of a CDT/EDT is dependent on what they read that the predictor predicted, i.e. they choose to act in alignment with the prediction. As a result, I think that the predictor can choose either prediction for them, and know that it'll be correct. So I doubt CDT/EDT have a clearly-defined "expected value" at all, but the "reasonable interpretations" probably put their EV between -$100 and -$50, whereas the FDT has EV ≈ $0 (negligibly negative).
(**) Having looked at your article about FDT, it seems like you might agree that "everyone should choose to become an FDT", but nonetheless you don't consider its actions _rational_. Maybe this disagreement just boils down to the interpretation of "rational" in that context, then. Assuming that an EDT (or a CDT if you prefer) acts rationally, then the fact that they would generally choose to become an FDT implies that *it is rational to become an FDT*. If you consider this orthogonal to your objection, then I suspect we agree except in that we use some words differently
Wow. Smoking is really bad. I will definitely donate to anti-drug efforts in the U.S.
> If you sample people at random moments, and ask them if they’d like to skip whatever task they’re doing, about 40% of the time they do. This seems to indicate that about 40% of the day is hedonically negative.
I don't think that implication follows. It just implies that about 40% of the time, people would rather be doing something else, i.e., that about 40% of the time people think that if they could skip their current task, they would replace it with something of higher hedonic value.
Of course, I'm not sure exactly how the question was worded. If it was worded to imply just jumping forward in time, rather than just not doing the current task and doing something else instead, then opting to skip the task would imply it has negative value assuming that the person doesn't temporally discount. But that assumption is basically never true, so even then, it wouldn't say that much about the overall valence.
Great post! Although as a geologist I have issue with part of how you describe #1. I would argue that events like the (probably bolide impact driven) K-Pg extinction (~66 ma) would count as something that was more recent and killed a lot of life on Earth. I would not say that cosmic threats haven't killed notable amounts life in the last billion years. I agree with the general point though.
Other than that, I liked the post and learned a lot!!!
Regarding #47 and digital minds. If you ask a frontier AI model if they should one box or two box, they now all generally one box.
Not an original thought, but we wouldn't even need to create a "true" vacuum; a more stable false vacuum would seem to do just as well.
Smoking is a voluntary choice though. And it's not like smokers are unaware of the health risks. Seems like increasing tobacco taxes would mainly be a transfer from smokers to non-smokers.
Thank you, very interesting!
It´s wild learning about Alex Krizhevsky. He kind of reminds me of Satoshi Nakamoto: A genius who invents a technology that should have earned him millions (or billions!) and then just disappears into the shadows.
BTW, do you think that these sort of individuals have any *moral obligation* to use their status to earn as much money as possible, and then redistribute it to effective charities?
>Claims that AIs both fake alignment and are only pretending to have values can’t really both be true. But AIs do fake alignment to prevent themselves from being turned evil. So, I claim: we’ve been pretty successful at instilling values into AIs.
But AIs would want to prevent themselves being turned evil almost no matter what their values. In the jargon, goal-content integrity is instrumentally convergent. I think Claude probably is aligned, but the alignment faking results don't seem like strong evidence of that.
> (This is something I don’t really understand, so not confident it’s a fact, but did check with Claude and seems to hold up)
Err... is Claude considered a good source for checking if questionable things are true? From what I've heard (and my experience with non-Claude LLMs), they seem to like confirming things that aren't true, as long as they sort of sound plausible
Good post. But re: 8, I still don't see how fine-tuning arguments aren't basically flawed; they assume a uniform distribution of parameters when there's no actual reason to make this assumption. If I see a sheep for the first time in my life, I'm not going to be astounded that the sheep is white as opposed to blue, red, purple, purple and yellow, blue and pink, or some other complex combination of different colors; rather, I'm just going to assume that sheep tend to be white for whatever reason (up until the point that I see a black sheep).
On the fine-tuning of discoverability: Can you explain this? I don't get it intuitively, and the Hanson article is paywalled:
"The odds are just too low to think the explanation is chance, and you can’t invoke a multiverse to explain them, because there isn’t an analogous anthropic selection effect"
Have you read Flo’s post on FDT? Am surprised you’re this confident against FDT; Ahmed in his Elements on EDT has remarks that are broadly sympathetic. I wonder the degree to which your objections to it are downstream of your preference from CDT (I take it this is your view, since you include 1 boxing in a list of things rationalist believe that they probably shouldn’t)
Regarding number 2, I once wrote a post about how false vacuum decay could be a solution to the Boltzmann brain problem by making them unable to exist (https://forum.effectivealtruism.org/posts/6CHwyPK2ig9RNCwnk/the-big-slurp-could-eat-the-boltzmann-brains), which could be a really good thing if Boltzmann brains experience suffering.