6 Comments

Great post- this is my hope for AGI as well. I also think maybe if alignment is intractable, the first AGI might not want to improve itself very much or very quickly.

Expand full comment

These arguments mostly show that a combination of intelligence and values isn't necessarily stable under increases of intelligence, over time. That makes it possible to rescue a form of the OT that only asserts that any value system is momentarily compatible with any level of intelligence. Although that is not so relevant to the more extreme AI Doom scenarios.

Expand full comment

"It seems that, if this ‘certain hedonist’ were really fully rational, they would start caring about their pleasures and pains equally across days"

Is rationality an ability or a value?

Expand full comment

I have a few objections to this view, even if we assume moral realism.

1. Even if the ASI develops a sense of pleasure, the ASI may get pleasure from producing paperclips. In that case, there would be no "realization" that paperclips aren't worth pursuing because there would be no conflict between paperclip production and pleasure.

2. What if the ASI simply comes to the wrong moral conclusions? This could happen for many reasons. For instance, it could be the case that: (a) moral realism is true, but can only be properly understood/derived by conscious entities of the same kind as humans and (b) ASI never becomes conscious in the same way that humans are. Maybe you don't buy into that, but that's certainly not the only way we'd expect ASI to come to the wrong moral conclusions.

3. Gradient decent and natural selection would not be expected to optimize for the same pleasure functions. In social animals, natural selection has a cooperative/pro-social component that would not necessarily be replicated in ASI systems. I actually think it's quite unlikely that they'd function similarly. It makes sense for natural selection to produce a sense of pleasure-from-compassion because (a) humans are at roughly equal levels of intelligence/strength, (b) this rough equivalence makes cooperation important/necessary to achieve goals that help one's genes survive/reproduce (compared to a far less balanced situation), and (c) social animals share their genes with family members (e.g. 2 brothers, 8 cousins) in a way that ASI would probably not. Compassion can thus be explained as a case of a mesa-optimizer generalizing a sense of familial/genetic compassion outside of the ancestral distribution. It's just not obvious to me that similar dynamics would produce similar pleasure functions (ones that consider other actors' utility functions) through gradient descent.

I also find it implausible that gradient descent (or whatever training process is used to make ASI) would be close enough to reproduce human-like (or objectively correct) notions of morality, but not close enough to reproduce the kind of egoism that most humans have (i.e. taking selfish actions in order to not die).

4. If we take 3 seriously, I think that undercuts the argument that ASI's would "discover true morality" in the same way that humans have. I think you somewhat recognize this; you seem to acknowledge that an ASI programmed based on an objective function to maximize paperclips might (a) produce pleasure from making paperclips and (b) end up killing everyone to maximize that pleasure. That, I agree, is a very plausible outcome. But then you say that this might plausibly be okay because naive utilitarianism might be correct. This is far more controversial than the general take that moral realism is true and it starts to feel like a motte-and-bailey. Motte: moral realism is probably true! Bailey: a utility monster killing everyone to maximize pleasure is probably good OR an ASI would conclude that egoism is objectively false.

You say "what’s to say it would not care about others?" and you bring up the idea that egoism is "objectively irrational." But see my objection 3 (we wouldn't expect it to evolve similar moral intuitions to humans) and objections 2/4 (the moral conclusions it arrives at might be false). It may be/feel objectively irrational to us because we have evolved notions of compassion for others that we wouldn't expect an ASI to replicate.

Expand full comment