How Likely Is It That We'll Have Bad Values…

Jul 7

And how bad would it be?

8 Comments

One thing that I think this post is missing is a notion of goodharts law in ethics - namely, if you optimize too hard on some proxy, even if that proxy is usually a good heuristic to get to the terminal end (i.e. some rule like rights is approximately right for the correct ethics most of the time), you will likely miss it.

This is important if you think that ethical theories come apart very hard as you optimize for them harder - I think this is very likely true. When you optimize really hard for a view like utilitarianism, you get something very different than optimizing for a view like person-affecting utilitarianism, for instance. This means that it really makes a difference if you get the exact ethical view correct or if you just nearly miss it.

In this future world that you imagine, whatever agents are there, it seems, will likely do a very good job of optimizing for whatever the best thing might actually be. This just seems true from the historical of technological progress - as we get better technology, we are able to optimize harder for our goals.

Given such a big space of possible ethics and good arguments for many different views, you might think that it’s pretty likely that we miss it by at least a bit - even if we get good AI that makes us better reasoners like you state.

Therefore, we should think it’s very likely that we have bad values. To hedge against this, maybe one might want to push off the optimization time.

Expand full comment

Reply (1)

Bentham's Bulldog

I grant that value lock in is a big deal! I also grant that monomaniacally pursuing bad goals would be a bad thing. I suspect, however, that if we got correctly aligned superintellligent AI--AIs that were basically ideal versions of human reasoners--they simply would not pursue bad goals. This post is not about alignment.

Expand full comment

River

20h

I'm skeptical of the idea that moral reflection (by humans or AIs) is likely to lead us closer to moral truth. We saw in the covid pandemic that bioethicists had markedly worse judgments about bioethics than random people off the street. Philosophy as a discipline hasn't had much to show for itself in centuries. Reflection as a method for improving ones moral judgments does not have a great track record, at least when taken to its extremes.

Expand full comment

Oscar Delaney

Excellent post! I think the thing of some (hopefully non-trivial) fraction of people/capital/agents will want to promote The Good, while hopefully a near-zero fraction will want to specifically promote The Bad seems especailly important to me.

A couple of snippets I wrote last year are relevant:

https://forum.effectivealtruism.org/posts/JdDnPmZsSuBgh2AiX/space-settlers-will-favour-meat-alternatives about how it is unlikely factory farminig will persist in space, because people with more cosmopolitan values will go to space.

https://forum.effectivealtruism.org/posts/fosBQPAokSsn4fv77/cosmic-nimbys-and-the-repugnant-conclusion about how we might miss out on a lot of value by having too few people with really high welfare rather than far more people with marginally lower welfare (kind of an anti-repugnant-conclusion).

But yours is more systematic, these focused on just narrow pieces of the relevant idea space.

Expand full comment

Eric

Why does the author, (or anyone in these circles really) care to maximize welfare at all? I do not understand. Sure, suffering is "bad" but why does an EA care? simply because it feels "wrong" to our naturally empathetic brains?

Expand full comment

Undistorted, Radical Clarity

What I keep circling back to is the asymmetry you surface between reach and recognition. The reach of future tech is potentially un-bounded—billions of star-systems, trillions of substrate-agnostic minds—but our capacity to recognise morally relevant experience keeps stalling at the familiar and the photogenic. That gap, more than any single mechanism, feels like the live wire here.

Two implications jump out:

1. Epistemic tooling is ethical tooling. If “AI reflection” becomes the microscope for consciousness, then funding interpretability research isn’t just a safety play—it’s moral-circle infrastructure. We can’t care about what we can’t reliably see.

2. Default settings matter. Your “person-affecting future” worry highlights that omission, not malice, is the likeliest failure mode. Civilisations don’t have to be evil to waste the cosmos; they just have to be content. That makes institutional slack—the room to revise our goals—as valuable as the goals themselves.

So the practical question is: what norms or governance levers actually keep that slack open once systems start optimising at scale? I’m less worried about a single dictator than about well-meaning lock-in through path-dependent incentives. Would love to hear how you’d weigh reversibility mandates (e.g., sunset clauses for cosmic engineering projects) against the opportunity costs they introduce.

In any case, thanks for mapping the terrain with this level of granularity—few conversations thread the needle between utopian hype and dystopian fatalism this cleanly.

Expand full comment

FLWAB

Small clouds were scudding across the stars and the full Moon — Mark had never seen her so bright — stared down upon them. As the clouds passed her she looked like a ball that was rolling through them. Her bloodless light filled the room.

“There is a world for you, no?” said Filostrato. “There is cleanness, purity. Thousands of square miles of polished rock with not one blade of grass, not one fibre of lichen, not one grain of dust. Not even air. Have you thought what it would be like, my friend, if you could walk on that land? No crumbling, no erosion. The peaks of those mountains are real peaks: sharp as needles, they would go through your hand. Cliffs as high as Everest and as straight as the wall of a house. And cast by those cliffs, acres of shadow black as ebony, and in the shadow hundreds of degrees of frost. And then, one step be yond the shadow, light that would pierce your eyeballs like steel and rock that would burn your feet. The temperature is at boiling point. You would die, no? But even then you would not become filth. In a few moments you are a little heap of ash; clean, white powder. And mark, no wind to blow that powder about. Every grain in the little heap would remain in its place, just where you died, till the end of the world... but that is nonsense. The universe will have no end.”

“Yes. A dead world,” said Mark gazing at the Moon.

“No!” said Filostrato. He had come close to Mark and spoke almost in a whisper, the bat-like whisper of a voice that is naturally high pitched. “No. There is life there.”

“Do we know that?” asked Mark.

“Oh, si. Intelligent life. Under the surface. A great race, further advanced than we. An inspiration. A pure race. They have cleaned their world, broken free (almost) from the organic.”

“But how — ?”

“They do not need to be born and breed and die; only their common people, their canaglia do that. The Masters live on. They retain their intelligence: they can keep it artificially alive after the organic body has been dispensed with — a miracle of applied biochemistry. They do not need organic food. You understand? They are almost free of Nature, attached to her only by the thinnest, finest cord.”

“Do you mean that all that,” Mark pointed to the mottled globe of the Moon, “is their own doing?”

“Why not? If you remove all the vegetation, presently you have not atmosphere, no water.”

“But what was the purpose?”

“Hygiene. Why should they have their world all crawling with organisms? And specially, they would banish one organism. Her surface is not all as you see. There are still surface-dwellers — sav ages. One great dirty patch on the far side of her where there is still water and air and forests — yes; and germs and death. They are slowly spreading their hygiene over their whole globe. Disinfecting her. The savages fight against them. There are frontiers, and fierce wars, in the caves and galleries down below. But the great race presses on. If you could see the other side you would see year by year the clean rock-like this side of the Moon-encroaching: the organic stain, all the green and blue and mist, growing smaller. Like cleaning tarnished silver.”

“But how do we know all this?”

“I will tell you all that another time. The Head has many sources of information. For the moment, I speak only to inspire you. I speak that you may know what can be done: what shall be done here. This Institute — Dio mio, it is for something better than housing and vaccinations and faster trains and curing the people of cancer. It is for the conquest of death: or for the conquest of organic life, if you prefer. They are the same thing. It is to bring out of that cocoon of organic life which sheltered the babyhood of mind the New Man, the man who will not die, the artificial man, free from Nature. Nature is the ladder we have climbed up by, now we kick her away.”

-C. S. Lewis, *That Hideous Strength*

Expand full comment

FLWAB

3dEdited

>Imagine making this argument about slavery in the year 1700. Every society has had slavery. Almost no one historically has been opposed to slavery. The first opponent of slavery that we have on record is Gregory of Nyssa, who lived post 300 AD. As the world change dramatically, so too did our attitude toward slavery.

It's important to note that the historical record is not "Slavery, slavery, slavery, 1800s, no more slavery". While the first person on record opposing slavery on moral grounds is St. Gregory, there was a lot of steady progress from them up until it was finally abolished. The Roman Empire basically ran off of millions of agricultural slaves, but over time those slaves became serfs. Now arguably a serf is not much better off than a slave (they can't move, they can't change professions) but they are definitely an upgrade! Serfs have rights, they pay rents but otherwise keep the produce of their labor, and you can't do whatever you want to them. Over time slavery gets abolished in Europe, for Christians specifically: in the 8th Century Pope Zachary bans the sale of Christian slaves to Muslims (and then buys all Christian slaves in Rome and sets them free), in 873 Pope John VIII declares that enslaving a Christian is a sin and orders all Christian slaves to be released, in the 10th Century the Byzantine empire outlaws voluntary enslavement and declares that any such slave contracts are now void, in 960 Venice bans the slave trade, then William the Conquer in 1080 bans the sale of any person to non-Christians, in 1102 the city of London bans the slave trade, in 1171 Ireland frees all their slaves, in 1220 the Holy Roman Empire outlaws slavery, in 1256 Bologna outlaws slavery and serfdom, in 1315 King Louis X of France abolishes slavery and declares that any slave who enters France would be freed, 3 years later King Philip V of France abolishes serfdom, 1335 Sweden abolishes slavery, 1347 Poland abolishes slavery, 1477 Isabella of Spain outlaws slavery in the territories reconquered from the Muslims, then in 1493 she outlaws the enslavement of Native Americans (unless they are cannibals), in 1530 bans slavery of everyone except Africans, 1537 Pope Paull III forbids the slavery of Native Americans (or any peoples yet to be discovered), in 1574 the last serfs in England are freed, and in 1679 Russia converts all slaves into serfs (Russia is always behind the rest of Europe).

This brings us finally to the 1700s, where in 1706 an English court case rules that any slave, even African slaves, are free as soon as they set foot in England.

All of that is to say that abolishing slavery isn't something that just happened all of a sudden when "the world changed dramatically", but was a steady process that took about 1,500 years. It went from "Slavery is totally fine, why would you have a problem?" to "Slavery is bad, but we can still enslave non-Christians" to "Slavery is bad and we shouldn't enslave anybody, except Africans" until we finally got to "Slavery is bad for everyone."

Expand full comment

Bentham's Newsletter

How Likely Is It That We'll Have Bad Values…