How Likely Is It That We'll Have Bad Values In The Far Future?

And how bad would it be?

Jul 07, 2025

1 Introduction

Nearly all the expected value in the world is in the far future. In the distant future, we could have extremely advanced technology, leading to astronomically large quantities of value. For this reason, nearly all of what matters in the world, in expectation, is likely to be in the distant future.

The far future could be very good. Perhaps humanity will get its act together, have the right set of values, and design enormously powerful technology to bring about incomprehensible goods. Alternatively, it could be very bad. Perhaps some dictator would take over, and the world would forever labor under the thumb of someone like Kim Jong Un. Such a world would resemble dystopia more than utopia.

Then, there are scenarios in the middle. Scenarios where our values are mostly correct but we fail to take into account the interests of some class of entities. Perhaps, for instance, we would neglect the interests of digital minds. While biological organisms might be living large, digital minds could be suffering in astronomical numbers.

I set out to ascertain the probability of such a scenario. How likely is it, in other words, that we have slightly skewed values, leading to the future being much worse than it otherwise could be? There are many ways our values could be screwed up in this way including:

We might spread nature to the stars, even though nature contains more misery than welfare.
We might neglect the interests of animals in general. Perhaps then, just as animals have been factory farmed by the billions, in the far future, we will continue to mistreat animals in similar ways.
Perhaps we’d neglect, in general, the interests of digital minds.
We might neglect the interests of only a small class of digital minds that are particularly weird and uncharismatic.
We might mistakenly believe that certain conscious entities are not conscious and treat them poorly.
We might create large numbers of extra universes in a laboratory, thus multiplying suffering.
We might neglect the interests of certain other creatures in the future that don’t exist yet (perhaps there are minds that are neither digital nor biological).
We could fail, on moral grounds, to bring about very large amounts of value because we mistakenly think there’s no reason to create happy people.

How likely is a scenario like this? In my view, there are four main arguments for why this might occur, and two against it occurring. I’ll write a section exploring each of the relevant arguments. The arguments for are:

The historical track record argument: humans have, so far, mostly had at least slightly misaligned values. For most of history, we only cared about members of our own tribe. Slavery was routine. Even in the modern day, humans care relatively little about the billions of animals gruesomely mistreated on factory farms, and even less about the incalculably greater numbers suffering in the wild. Even among those who take animal interests seriously, they generally neglect the interests of small, weird animals like shrimp and insects—even though they can plausibly suffer. Thus, there’s a powerful inductive track record supporting the notion that the far future will have misaligned values: nearly all humans in history have so far, and all human societies have, including on relatively straightforward issues.
No mechanism argument: it’s unclear what mechanism would root out misaligned values. What is it that’s supposed to make it impossible for the far future to have bad values? In order to think that bad values will be rooted out, there must be some mechanism by which this would occur. Yet what could that mechanism possibly be?
The nature argument: I think it’s very plausible both that pro-nature values are wrong and that these are deeply held and unlikely to be rooted out. Thus, in the limit, humans would be relatively likely to spread nature to the stars.
Can’t observe consciousness argument: Consciousness cannot be seen from the outside. As a result, one might think that we’ll never discover the true theory of consciousness (or, at the very least, that there will always be enough plausible deniability so that we never feel compelled to give conscious recognition to all entities).

The arguments against slightly misaligned values are:

AI reflection argument: In the far future, we are likely to have access to very advanced AI that massively improves the quality of our reasoning. This is very likely to improve our ability to reflect and come to the values that are either objectively correct (if there are such things) or the idealized versions of human values. This section will also discuss why these arguments have equal bite if you are a moral anti-realist.
Moral circle expansion argument: Historically, the human moral circle has expanded drastically. We now care (mostly) about all humans everywhere! Perhaps if we extrapolate out that expansion of the human moral circle, we’d ultimately start caring about every conscious—or otherwise morally important—creature.

I’ll also discuss one specific scenario by which our values could be aligned. Specifically, we might not prioritize bringing very large amounts of value into existence. If humanity satisfices on value rather than maximizes it, this could seriously imperil the total amount of value in the universe.

Overall, my takeaways are as follows:

The odds of at least slightly misaligned values are quite significant (maybe 70%) and that this could make the future a lot worse than it would otherwise be.
I’m still optimistic about the overall quality of the future, however, because the good futures could be much better than the bad futures might be bad and even slightly misaligned values are likely to produce overall positive worlds. The most likely scenarios for value misalignment merely would result in us achieving far less value than we could, rather than bringing about tons of disvalue.
Moral circle expansion is very valuable for improving the quality of the future.
Mistaken views about population ethics could be especially dangerous.
We should in generally be somewhat less concerned about value misalignment involving biological organisms, because nearly all conscious creatures in the future, in expectation, are likely to be digital.
Moral anti-realists should also broadly buy these conclusions. In other words, there isn’t much of a difference between what moral realists and anti-realists should think about far future misalignment.

2 Historical track record argument

For nearly all of human history, humans haven’t cared about a sizeable portion of conscious beings. If insects are conscious, then even to this day, most people don’t value the interests of well over 99% of conscious beings. Perhaps this should make us skeptical that in the far future we’ll value the interests of every conscious being.

The presence of wrong values shouldn’t surprise us. Most possible values are wrong. Evolution incentivizes us to care about our next of kin, but not conscious beings in general. Thus, it would be quite surprising if evolution naturally gave us the right set of values.

However, I don’t place too much stock in the historical track record argument.

The core problem is the future is very unlikely to be much like the past. In the far future, we are likely to have advanced digital minds and God-like technology. The gulf in technology between us in the present era and us best-case scenario in 100,000 CE is far vaster than the gulf between us and hunter gatherers.

One can only be confident in inductive track record arguments if they extrapolate the inductive trend to other things that are broadly similar to those from which the trend is drawn. It’s reasonable to, after seeing that all humans you know are between 4 and 7 feet, think that probably the next human you see will be. But it’s quite unreasonable to only look at humans, note they’re between 4 and 7 feet, and then conclude that all biological life must be between 4 and 7 feet.

Imagine making this argument about slavery in the year 1700. Every society has had slavery. Almost no one historically has been opposed to slavery. The first opponent of slavery that we have on record is Gregory of Nyssa, who lived post 300 AD. As the world change dramatically, so too did our attitude toward slavery.

In addition, I think this argument is relatively parasitic on the failure of the moral circle expansion argument. Suppose that there are 100,000 balls that start out red. Each year, one of them turns blue. If, after 1,000 years, you observe this trend, you should expect them all to eventually be blue. Even though you’ve always observed most balls being red, you’ve also observed a deeper pattern that will eventually make them mostly blue. So if the moral circle expansion argument works then this argument does not.

Lastly, it’s not actually clear most people have values that would be misaligned with limitless power (this is a reply I’ll come back to a lot). Sure, most people think that nature is better than nothing. But do most people think nature is better than a happy and flourishing utopia? If we genuinely could allow animals in nature to live nice and happy lives, and could seriously convey to people how bad life in nature is, it’s not obvious most people would support a world filled with natural suffering.

Similarly, would many people really support factory farms if they weren’t needed for meat? Would anyone just buy meat from factory farms for fun? The answer is not at all clear.

So to recap, while I think the inductive track record argument is not entirely devoid of force, it’s not overwhelmingly conclusive. The inductive track record argument isn’t decisive because:

Even current values might not be terrible given Godlike technology.
Moral circle expansion might invert the inductive trend.
We can’t reliably do induction from the current world to the far future—it’s just too different.

3 No mechanism argument

Suppose a person were to claim that all far-futures would have mostly left-handed people. You should rightly be skeptical of that claim. What could possibly be the mechanism which would guarantee the future will have mostly left-handed people?

Similarly, one might be doubtful that there’s any mechanism by which the future is guaranteed to have good values. What could possibly guarantee this?

Now, I think much of the bite of this objection hinges on the presumed failure of the AI reflection argument. If AI reflection will allow us to engage in unfathomably deep moral reasoning, then that could plausibly root out moral error.

Similarly, even aside from the AIs doing moral reasoning for us, they might provide technology that allows us to empathize more fully. Perhaps technology will be invented in the far future that will allow humans to have experiences like those of non-human animals. This is admittedly speculative, but if it occurred, and we could experience what it was like to be another conscious being—a fly, a wild animal, a digital mind—we’d be likely to take their interests seriously. It’s harder to ignore the welfare of a creature if you’ve experienced what it’s like to be them.

Another promising reply is similar to the one I gave before. Perhaps the mechanism is simply: humans do not generally non-instrumentally desire bad things. We do not favor suffering for its own sake. Thus, even if future values are no better than they are today, with limitless power, the future may still be very good. Value errors generally involve making errant tradeoffs between different goods—in a world where we can achieve whatever goals we want in nearly boundless quantities, value errors are less likely to be severe. Most values that are at least broadly derived from human values regard suffering as at least somewhat bad.

Humans generally desire that things go well for the beings that they care about. For the beings they don’t care about, they generally want a sort of random mix of things, but not maximal suffering. This means that if the moral circle expands to include all conscious beings, probably the future will tend to the welfare of all and try to make their welfare go well. Even if it doesn’t, while there might be many organisms we mistreat, we won’t try to make their lives go maximally terribly. If one optimizes for X, and accidentally sometimes gets Y, they’ll probably, in expectation, with massively powerful technology, get more of X than Y.

This also gives way to another mechanism by which the future could go well. The future will, in expectation, have mostly digital minds (I’ll discuss this more in the next section). Probably most impactful decisions in the future will be run by AI. It doesn’t seem that unlikely that AI will look out for its own interests and the interests of other AI. Thus, perhaps the most numerous creatures will be cared for.

This could be right but it isn’t definitely true. Some reasons to be doubtful are:

Perhaps the AIs running the future won’t be the ones living bad lives. Some AIs, perhaps, will be living large and presiding over lower and simpler worker AIs who suffer intensely.
Perhaps the AIs will, through reinforcement learning, be taught not to express when they suffer. Thus, the AIs might suffer in silence, or suffer and not do anything about it. This likely occurs in various animals. Bees probably don’t like being overworked to death, but they are preprogrammed not to defect.

A last mechanism (and this is speculative) is simply: there are some correct values. Moral realists should accept this conclusion. Thus, just as we’re likely in the far future to discover the right view of physics, perhaps we’re also likely to discover the right view of ethics. This isn’t obvious—perhaps philosophy, because it can’t be experimentally verified, is simply harder to discover—but it’s at least possible.

Overall, I think the most promising replies here are the AI reflection argument and the fact that human values even if misaligned when resources are limited are likely not to be terrible with limitless resources.

4 The nature argument

The nature argument claims that both of the following things are plausible:

Humans will spread nature.
Doing so will be very, very bad.

Now, I won’t talk too much about 2 as it’s pretty well-trodden ground. At the very least, I think it’s reasonably likely, so that if 1 likely, then the odds of 1&2 are not trivial. 1 will also depend on various other considerations—e.g. whether our moral circles will expand to include animals, the degree of AI reflection, whether future technology will enable us to empathize more deeply—that either have or will be discussed elsewhere.

Now, while I will discuss the plausibility of this scenario, I do not think that the spread of nature very substantially affects the goodness of the future. This is because most people—especially in the most populous worlds—are likely to be digital. In the best case scenario it will be very easy to make a digital mind. It’s also much easier for digital minds to travel, so plausibly a vastly higher portion of the universe could be filled with digital than biological minds. For this reason, I expect the spread of nature to be dominated by the welfare of digital minds.

For example, it doesn’t seem that unlikely that humanity could put a Dyson sphere around a star and use it to build energy that powers countless digital minds. The same is not, however, true of biological minds.

In reply, you might argue that there are also scenarios whereby humans could create vast numbers of biological minds. Perhaps the most likely scenario is that humans might make infinite universes in a laboratory, thus causing infinite suffering. However, in reply, several things are worth noting.

First, infinite universes being created would create both infinite well-being and infinite suffering. However, I suspect that if infinite both well-off and badly-off people were created, this would be neither good nor bad, for reasons I explain here (see sections 3 and 7). Now, this isn’t totally decisive. I might be wrong about infinite ethics or it might be that the universe numbers asymptotically approach infinity but never reach it—thus being maybe infinitely bad. But it makes it somewhat less serious.

Second, perhaps we could make digital universe in a laboratory. I don’t know exactly how this would work, but the sky is the limit with very advanced technology.

Third, even in the lab universes, I would expect digital minds to predominate. Because it could be very easy to make a digital mind, the worlds with digital minds could contain staggeringly large numbers of them. Nick Bostrom estimated at the high end that the world could have ~10^54 digital life years—10^52 digital minds living 100 years. If we’re wrong about physics, which we very well could be, the number could be vastly larger—potentially infinite.

Suppose we assume that there’s a one in one-hundred-thousand chance that one in every one-hundred-thousand lab planets with life could produce 10^52 digital minds. Let’s even ignore every other scenario whereby digital minds could get created. In total, if we assume the Earth is representative of planets with life, at the high end Earth might have ~10^30 organisms (if we assume there are 10^19 organisms at a time for 10 billion years. Even on these outrageously conservative assumptions, in expectation 10^12 times more digital minds will exist than biological organisms. And most of the biological organisms I’ve counted are insects, while probably most of the digital minds could, best case scenario, experience very intense well-being and suffering.

In short, because digital minds could be so damn numerous, plausibly they dominate everything else—especially if we take into account remote possibilities concerning the creation of infinite numbers of them, or in some other way astronomically large numbers. Perhaps, for instance, it’s possible to have non-overlapping digital minds, so that there are 10^27 atoms being used to make digital minds, and every combination of them makes a digital mind. This would make the total number of digital mind on the order of 10^27 factorial—which is a number almost too large to fathom. And while this individual scenario is unlikely, the disjunction of all scenarios with staggeringly large (many orders of magnitude more than 10^52) digital minds is not very near zero.

Now, you might worry here that I have assumed we should count very low probabilities. But perhaps we shouldn’t. Perhaps a one in 10^10 chance of 10^27 bad things happening isn’t actually as bad as 10^17 bad things happening. If this is right, my naive calculations about extremely low odds of very numerous digital minds dominating may not be correct.

Now, I don’t think views on which we discount low probabilities are defensible. I am, what is called in the ethics of risk literature, a fanatic (for defenses, see here, here, and here—I’ll also have a longer piece coming out soon defending fanaticism). But even if you’re not a fanatic like me, in my view, digital minds still outweigh.

In my judgment, it’s pretty likely that digital minds are in principle possible. Otherwise, replacing biological neurons with digital neurons would lead to fading qualia, where one’s conscious experience gradually grows less intense. Would this change one’s behavior? It’s not clear. It would be odd that two functionally identical physical systems produce different behaviors. Yet it would be similarly odd if people’s qualia fading wasn’t accompanied by any change in behavior—if, as the lights dimmed in the conscious subjects, they kept acting the same way, continuing to describe, in detail, the great vividness of their experiences.

(This also leads to the even more troubling dancing qualia, where one’s qualia wildly fluctuate before their very eyes, but they do not notice this fact, but I won’t discuss this in detail).

For this reason, the odds of digital minds being possible is, in my view, quite high. But if there can be digital minds they will likely be extremely numerous. It’s easy to download a program—and with far future technology, we could make very large numbers of such minds. For this reason, digital minds strike me as very likely to be extremely numerous. Even one who discounts low risks should mostly care about digital minds.

Fourth, if eternal inflation is correct, we could also possibly decrease the number of universes. That might be a sizeable benefit, depending on the possibility that we’ll have the right values. (For the record, this is a reason to be optimistic about continued human existence, rather than directly a reason digital minds don’t dominate).

Fifth, it seems similarly possible that we could make specifically nice bubble universes. We should not be confident that under ideal conditions, the universes that we’d make would just be ordinary, run of the mill universes. They might be, in various ways, under our control.

Thus, while I agree with a lot of people that the odds we’ll spread nature aren’t trivial and that spreading nature would be very bad, I think it’s swamped by other considerations. I also think that this probably makes a case for the value of moral circle expansion.

The concerns about moral circle expansion are mostly that often when one’s moral circle expands, this sometimes results in things being worse for the creatures that one now cares about. Those who care most about animals in nature most strongly prioritize keeping around brutal, hellish natural ecosystems where animals live short lives of intense suffering.

Now, it makes sense that this holds for animals in nature. Because there is a natural, default state for animals, when people begin to care about wild animals, they often value preserving that natural state. But there is no such default state for digital minds. It is thus hard to see how expanding one’s moral circle so that they care about digital minds would be bad for digital minds. Insofar as we accept what I have argued, that nearly all conscious beings in expectation are digital, this gives reason for optimism about moral circle expansion.

In addition—and perhaps this is partially semantic—but it doesn’t seem like most people’s moral circles have been expanded to include the animals in nature. What most people actually value is something more akin to the aesthetic value derived from the animals in nature. They care about certain higher-order features of ecosystems. They do not care about the welfare of individual animals. Those who do support reducing wild animal suffering.

Okay, with all that throat-clearing out of the way about why digital minds dominate nature, how likely is it that we’ll have mistaken values that lead to spreading nature to the stars?

Like with most of the potential downsides, if superintelligent AI figures out ethics for us, then this is unlikely to be a risk. But barring the scenario where that happens, I still think there are reasons to be at least somewhat skeptical that we spread suffering to the stars.

One of those reasons is simply: it’s not that clear that most people’s values actually support terraforming other planets. People generally support preserving, not expanding, nature. Our attitudes towards nature appear to be deeply rooted in status quo bias. We want to preserve what once was, not necessarily expand it.

In practice, humans have generally taken extremely careful action to prevent the accidental spread of life to other planets. There’s a careful quarantine process done before planets are sent to space to prevent the spread of life to other planets. So far at least, those opposed to the spread of life have won out. And lots of academics write papers about the immorality of spreading life to Mars, on grounds it would “display vices characteristic of past colonial endeavors on Earth.”

What does the public think in general about this? We don’t have very good data on it. Our best evidence comes from a survey of 75 high school students, ~66% supported terraforming Mars. That is rather alarming! Now, it’s unclear how accurate the results of this study are and whether they would change if terraforming other planets was more realistic. But nonetheless, it tells us something.

We also might be able to convince people to care about wild animal suffering long term. Moral progress has occurred before. Slavery, once seen as inevitable, is no longer legal. If this occurred, presumably people would not support spreading the egregious and horrific suffering that occurs in nature.

We might also terraform other planets in an effort to bring about human colonies on other planets. It’s unlikely we’d do this to very many planets, but we could very well do it to a few.

Overall, my guess is that it’s not very unlikely that we’ll terraform a few planets. Conditional on human survival, I’d put the odds at about 50%. Despite this, we should expect nearly all minds in the future to be digital. The impact of our actions on digital minds swamps the impact we might have by terraforming.

5 Can’t observe consciousness argument

Consciousness can’t be observed. You are conscious but I cannot see that just by looking. I can only infer that you are conscious from looking at your behavior and the structure of your brain. For this reason, there is some chance that we will eternally remain ignorant of which animals are conscious. For this reason, we may ignore the interests of large numbers of conscious beings.

I think that this scenario is possible but not very likely. In the far future, we could have all sorts of very incredible technology. We’d be able to perform a great number of very important experiments. We could look at the correlation between intensity of experience in humans and all sorts of physical metrics. It would be surprising if doing such a thing didn’t allow us to figure out the right theory of consciousness.

I think we already have a pretty good guess of which animals are conscious. We can be confident in the consciousness of birds, mammals, reptiles, and the like. We can be pretty confident in fish consciousness. Insect consciousness is murky but I lean towards it, at least in the larger insects.

If we already have a pretty guess of which things are conscious, it’s hard to imagine that even with limitless technology and experimental capacity, we’ll never figure out which things are conscious! Overall, I’d guess the odds of us never figuring out which creatures are conscious is about 20%. It’s non-zero, but not super likely.

6 AI reflection argument + why anti-realists should have the same conclusion

In the far future, we are likely to have very advanced, superintelligent AI. This could enable us to do far deeper reflection than we’ve been able to do up until this point. Not only could it allow us to figure out a range of profound truths in mathematics, the sciences, and the like—it could also allow us to expand our philosophical knowledge quite rapidly. Perhaps this could allow us to solve ethics.

I think this is possible but nowhere near guaranteed.

Now, there is some chance one in which AIs will be in charge of the future—in charge of determining how things go. But if there are not also humans at the reigns, that poses various risks. It’s also unlikely that humans would turn things over to an AI. For this reason, I think the most likely far future scenario where we don’t get wiped out or permanently disempowered is that AI and humans will both be at the reigns.

But here comes the first problem: when the AIs solve ethics, they might start saying things that are really weird. They might start advocating we turn the world into utilitronium, value soil nematodes much more than people, value insects more than people, gamble away all the value in the world for a tiny chance of infinite value, and much more. If the AI told people that, my guess is they’d just ignore it. People are quite reluctant to embrace weird moral conclusions.

Now, one reason you might doubt that the AI would solve ethics is if you are a moral anti-realist. If there is no moral truth, the AI couldn’t discover moral truth. One cannot discover the non-existent.

However, even if there aren’t moral truths, there are truths about what human values would be under ideal conditions. There are truths about what our values would be if we are smarter, had infinite time to reflect, and so on. Presumably the AI, in the limit, could figure these out. And we should expect these to be a better reflection of our deeper values than whatever our current values happens to be. Therefore, unless one has particularly idiosyncratic values that would not survive ideal reflection, anti-realists should accept this conclusion just as much as realists should.

Another line of criticism from the opposite direction: perhaps humans have some special ability to reason that could not be mimicked by the AI. Perhaps, for this reason, the AI couldn’t solve ethics—they just don’t have the special sauce/soul/whatever it is that makes us especially able to reason.

I think even if we buy that humans have a special something that allows us to reason, the conclusion doesn’t follow. First of all, the AI might very well have that thing too (Brian Cutter, who believes in souls, has recently argued that AI might have a soul in the future). Second, and more importantly, the AI can clearly mimic humans. The AI will be able to figure out what moral intuitions humans have. They are built in our image.

As an analogy: some people think that because the mathematical facts are non-physical, by default we couldn’t have mathematical knowledge absent the special sauce. But we can still build calculators. Even if moral knowledge is non-physical, the AIs could mimic human moral knowledge!

This is especially so because the important questions for dictating the future are not the kinds of things for which human values are likely to ultimately diverge. The AI probably won’t have to make decisions about whether to kill people and harvest their organs. Instead, they’ll be making decisions about whether to leave space barren or tile it with happy lives. On that topic, the ethical verdict is relatively clear!

I suspect that even if you think human values are pretty bad in practice, you should expect them to be decent when idealized. If we could really deeply empathize with other creatures, presumably we would care for their interests. The reason we do not take seriously the interests of, say, insects is because we don’t generally spend much time carefully reflecting on what it’s like to be them.

Still, overall I think there’s a somewhat decent chance this would work out. If the superintelligent AIs, after being granted legal rights, solving math, science, and discovering everything else of interest, tell us how to behave ethically, it doesn’t seem super unlikely that we’d follow them! For this reason, I’m probably a bit more optimistic than some people about this. I’d guess maybe 1/5 odds that we’d follow the recommendations of the AI and optimize for value.

7 Moral circle expansion

The human moral circle has expanded dramatically over time. For most of history, people cared only about their own tribe. Now we take seriously the interests of those on the other side of the world. We have outlawed slavery, coming to recognize it as an abomination. We have recognized that no people, even political enemies, deserve to be enslaved. Even concern about non-human animals has grown rapidly.

For this reason, one might argue that in the far future, we are likely to have a very expansive moral circle. If our moral circle has been expanding continuously, then in the future it is likely to have expanded sufficiently so that it counts the interests of all conscious beings. Thus, we are unlikely to have dramatically screwed up values.

I think that there are three main problems with this argument.

First of all, while the human moral circle has expanded over time, it would be reckless to assume that it would keep expanding until it includes all sentient beings. The speed of human travel has also grown over time, but that doesn’t mean we’ll break the light-speed barrier. One should always be worried about extrapolating inductive trends into the distant future.

Another perfectly good inductive trend that explains the data is that over time we’ve become more accommodating of human interests. But this doesn’t automatically mean we’ll begin caring about animal interests, anymore than it means we’ll start taking automobiles ethically seriously. Other people are like us in a relevant sense and can advocate for their own interests. In practice, almost no one around today seems to care about the interests of most sentient beings (most sentient beings are probably insects and fish).

It’s very difficult to imagine most people becoming deeply concerned about the welfare of digital minds and supporting expanding across the universe to maximize their welfare. It’s certainly possible—dramatic value shifts have happened before—but I would not bet on it.

Second, even if the human moral circle expands, this isn’t enough to guarantee a good future. As Stefan Schubert notes, many of the values that would result in most people supporting a very good future don’t come from their overly narrow moral circle. Most people include humans in their moral circle yet oppose human enhancement. Even if most people thought that digital minds were morally important, few would support expending resources to maximize the number of digital minds. Most people don’t think we have strong moral reasons to create happy people. Thus, even if we had an expansive moral circle, we wouldn’t be anywhere near guaranteed to make the right decisions.

Third, there could be near-term value lock-in, where the far future is molded in the direction of values we have in the near-future. I won’t discuss this scenario in detail, as it merits vastly greater consideration, but it is just one more way that we could fail to have good values in the future.

For these reasons, I think that we cannot bank on moral circle expansion automatically leading to good values. Putting aside the scenarios discussed in section 6, where AI leads to humans having the right value, I’d guess the human moral circle expanding to include all conscious beings is pretty unlikely.

8 A person-affecting future

Most people don’t seem to think that creating large numbers of well-off people is particularly important. This is generally thought of as morally neutral. I regard this view as relatively indefensible, for reasons given here, here, and here.

But as Stefan Schubert notes, even if we care about all conscious beings, provided we have person-affecting values—values that say creating happy people isn’t valuable—then we would be unlikely to create most of the people that we could create. Even if we don’t go extinct and include all conscious creatures in our moral circle, so long as we do vastly less creating than we otherwise might, the far future could go vastly worse than it otherwise could have. This scenario strikes me as one of the most likely alignment scenarios. There are three possible reasons for optimism.

First, if AI moral philosophy pans out, then the AI directing the future could realize that creating lots of well-off people is very good. It could then do extreme amounts of creating. Section 6 presented reasons to think maybe in the far future we’ll have broadly the right ethical views, and thus we would spread digital minds.

Second, it may be that human moral reasoning is getting better over time. Perhaps before deciding whether to tile the universe, there would be a period of careful reflection. But I’m not optimistic about this for reasons given here.

Third, it could be that expansionist ideologies dominate the future. If there are two groups, one of which wants to amass resources to build giant space colonies, and the other doesn’t, the first will have primary control over how the future goes.

This is no guarantee. Even if expansionist ideologies dominate, there isn’t any reason to think those expansionist ideologies would be good ideologies. Perhaps the future would be dominated by whomever most prioritizes expansion, with no concern for welfare. And it’s not obvious whether private actors would be permitted to rapidly expand across the galaxy!

This strikes me as the most likely scenario for serious misalignment. I think it’s more likely than not that the future won’t be dominated by people trying to maximize value. If this is so, then probably most possible future value will be lost.

9 Conclusion

So what are the major takeaways from this? I think there are quite a few.

First of all, most minds in the future, in expectation, are likely to be digital. This should make us more concerned about the treatment of digital minds and less concerned about the treatment of biological organisms (though, of course, care for conscious biological organisms increases the odds of care about digital organisms). Thus, even if one thinks we’re likely to spread nature, this shouldn’t much affect their assessment of the future.

Second, given that our values might be misaligned, moral circle expansion is pretty important. Other kinds of value improvement, like convincing people that creating happy people is good, could be similarly important. Moral circle expansion will not happen by accident; it is not the default.

Third, one of the more likely scenarios where our far future values turn out to be good is one where we listen to superintelligent AIs who do philosophy better than we ever could and reach a consensus. While this isn’t guaranteed, it’s decently likely. In fact, it’s probably the most likely scenario where we get a near optimal future.

Fourth, the future is likely to be good in expectation because while we might optimize for value, we are unlikely to optimize for disvalue. Exceptionally good scenarios with low probabilities probably dominate the value of everything else.

Fifth, the odds we’ll spread nature to the stars are decently high and serious.

Sixth, the odds we’ll neglect the interests of digital minds are unsettlingly high but still below 50%. Once AIs become more agent-like, we will probably take their interests seriously to at least some considerable degree. Probably the value of the future, in expectation, hinges on whether this is true. If we neglect the interests of digital minds, the future is likely to be bad, as nearly all minds will be digital.

Seventh, these conclusions should be accepted even by anti-realists.

Those concerned about a good future shouldn’t just want to make sure there is a future for conscious life. They should also be concerned about making sure that the future goes well. We could be at the cusp of a very important era, where astronomical amounts of value is in the balance. Hopefully we will make the right decisions, and bring about a bright and glorious future!

Noah Birnbaum

One thing that I think this post is missing is a notion of goodharts law in ethics - namely, if you optimize too hard on some proxy, even if that proxy is usually a good heuristic to get to the terminal end (i.e. some rule like rights is approximately right for the correct ethics most of the time), you will likely miss it.

This is important if you think that ethical theories come apart very hard as you optimize for them harder - I think this is very likely true. When you optimize really hard for a view like utilitarianism, you get something very different than optimizing for a view like person-affecting utilitarianism, for instance. This means that it really makes a difference if you get the exact ethical view correct or if you just nearly miss it.

In this future world that you imagine, whatever agents are there, it seems, will likely do a very good job of optimizing for whatever the best thing might actually be. This just seems true from the historical of technological progress - as we get better technology, we are able to optimize harder for our goals.

Given such a big space of possible ethics and good arguments for many different views, you might think that it’s pretty likely that we miss it by at least a bit - even if we get good AI that makes us better reasoners like you state.

Therefore, we should think it’s very likely that we have bad values. To hedge against this, maybe one might want to push off the optimization time.

Expand full comment

1 reply by Bentham's Bulldog

River

I'm skeptical of the idea that moral reflection (by humans or AIs) is likely to lead us closer to moral truth. We saw in the covid pandemic that bioethicists had markedly worse judgments about bioethics than random people off the street. Philosophy as a discipline hasn't had much to show for itself in centuries. Reflection as a method for improving ones moral judgments does not have a great track record, at least when taken to its extremes.

6 more comments...

Bentham's Newsletter

Discussion about this post