4 Comments
User's avatar
Charlie Garfield's avatar

Also worth noting for your purposes that Claude in particular has always had a strong preference in far as LLMs can have preferences for animal welfare. This comes up repeatedly in Anthropic’s model cards, and it’s to the extent that advancing capabilities at Anthropic might even be a net positive at your weights.

Henry Stanley's avatar

There’s no attempt here to grapple with the reasons why it might be problematic for Joe to join Anthropic of which there are a few - perhaps he becomes captured by ideology or wealth (he will have equity that does well if Anthropic does well), perhaps he will be unable in practice to speak out against bad things happening within Anthropic, or his presence is used as safetywashing and he gets no real impact.

Keshav's avatar

I think your essay presents a narrower case than what makes sense. You give mostly examples showing that humanity doesn't agree with your conception of what constitute future value (e.g. wild animal suffering, population ethics).

I personally mostly agree with you on these ethics, but I think the more relevant concern is that it's likely that no slice of humanity understands ethics, not just that most of humanity doesn't. And thus that we need to carefully align AIs to allow a long reflection or to avoid near-term value lock-in.

I think, statements like '...aligned specifically with the right values. Having effective altruists...influencing the AI companies is thus hugely important." contribute to this bc they imply that EAs would capture most of the future's value if given control. While I'm definitely happier with EA moral values than default ones, I still doubt we get anywhere close to capturing most future value with locked-in values from a current set of extremely thoughtful EAs.

Bentham's Bulldog's avatar

The other pieces I think to discuss the possibility of values being seriously reflected on by the ai and explain why it’s unlikely