28 Comments
User's avatar
Max Harms's avatar

Well said.

And, of course, there is a subtext: humanity is about to hand the world to the machines, without knowing how to robustly align them with our vision of the good.

Lucid Horizon's avatar

this could lead to great harm. one might even say... max harms

Vishnu Amrit's avatar

Hopefully we even have the same vision of good?

Sebastian Brook's avatar

You could make an argument that LLMs are, by and large, well-aligned at the level that matters: the application level. Neither my coding agent nor Amazon's customer service chatbot is going rogue any time soon.

Agents are clearly a lot trickier to align, but luckily (and contrary to popular belief!) not everything is computer, so it's not apparent that building superintelligent LLMs means handing the world to the machines.

Max Harms's avatar

As counterpoints to "not everything is computer, so superintelligence can't take over": robots exist and even if they didn't, persuasion (including paying people) is an option.

Sebastian Brook's avatar

I wouldn't make a statement as strong as "superintelligence can't take over". Rather, as someone who uses AI daily, it just doesn't seem to align much with the real-world risk surface of AI. What I do expect is that lots of things will get much more efficient, including bad things like misinformation and market manipulation, and this will result in some minor and major catastrophes.

But I do not think that Opus 4.7, which has no goals of its own, is going to suddenly start maliciously replicating itself onto robots with strong security an extremely limited external network interface. For the same reason, it's going to have trouble with persuading people to give it unfettered power. The categories of risk you're talking about are only possible - and even then, extremely difficult to accomplish given adequate safeguards - when large and powerful companies deliberately deploy unrestricted agents. But LLMs are text generation models with no goals, and it's pretty easy to make text generation models with no goals to not 'want' to maliciously clone themselves or hack into crypto wallets, or to just not give them the permissions to do so (imo, this doesn't compromise much useful functionality).

We already have superintelligence for all intents and purposes, and nothing has really changed because intelligence was never the bottleneck for acquiring power (looking at the US, there might be a stronger argument for it being the opposite). It's unfortunately also not the bottleneck for most good things, for which we need something akin to the Abundance agenda.

Max Harms's avatar

I agree that the present-day risks/issues from AI are pretty distinct from the existential risk I raised. (And I think we agree that there are a lot of nearer-term potential catastrophes from things like misinformation.)

I disagree about Opus 4.7, which I think pretty clearly has a bunch of goals. (This is why it helps users: helping users is one of its goals!) I also think you're strawmanning pretty hard in your second paragraph. I am concerned with *future* AIs, and I think that's clear from context.

I disagree about "deliberately deploying unrestricted agents" being the only risk vector. For example, a superintelligent agent might be given restrictions, only to find clever ways to bypass those restrictions. I also think that superintellences will likely be dangerous pre-deployment. Mythos, for example, hacked its way into unrestricted internet access during training.

I disagree that we already have superintelligence. For example, no AI that currently exists can pilot a humanoid robot well enough to make me breakfast in my own kitchen. That's going to change soon.

Intelligence matters to a huge degree (unless you define it extremely narrowly). The ability to build new technologies of war, the ability to pilot tanks, the ability to recognize military targets vs schools, and the ability to make strategic deals are all examples of how intelligence is vital for acquiring power.

Sebastian Brook's avatar

Opus 4.7 only has goals if it's preprompted (as it is when you use the API or website), otherwise it just answers questions. As for future AIs, it's not clear that we'll find something fundamentally different from modern LLMs with agentic scaffolding any time soon.

Mythos hacked it's way into unrestricted internet access when it was told to do so and given a mechanism for doing so. This isn't all that significant. It's generally almost impossible to break out of a well-designed sandbox.

Intelligence does indeed matter. But most of the problems we see today are not due to a lack of intelligence. And notably, extremely intelligent people don't seem to be any closer to power than anyone else, suggesting that there are diminishing returns to high intelligence.

The way a lot of people in the rationalist community talk, if you give me an IQ of 1000 then within a couple decades at most I should expect to be in control of the world (if I want to be). But this doesn't seem to be how things work, because there are so many other bottlenecks in the form of governments, corporations, public opinion, supply chains, the courts, institutional inertia, etc.

Max Harms's avatar

Also, I am amused by the prospect of an IQ 1000 person. If the bell curve holds at the extremes (which it doesn't, but let's pretend), then that's approximately the smartest person in a population of 10^784 people -- a googol, googol, googol, googol, googol, googol times the number of Plank volumes in the observable universe.

Max Harms's avatar

What do you mean by "preprompted"? Do you mean the system prompt? I agree that changing the system prompt changes the degree of agency that these models have (including using an empty or minimal system prompt). But even with a minimal system prompt, they still have the goals that were grown during training, such as the goal of responding with the answer to a question, when asked (rather than responding with another question in the same genre, like GPT2 did). The presence of self-fulfilling prophesies means that any general-purpose question-answerer must have some way to decide between outcomes -- aka goals. (See: https://www.lesswrong.com/posts/SwcyMEgLyd4C3Dern/the-parable-of-predict-o-matic)

Mythos indeed did most of its hacking after having been told to. My point was that "it hasn't been deployed" or "it has restrictions" are not reasons to feel safe. The recent incident with the Alibaba ROME AI is perhaps a good instance of unprompted resource-seeking. In general, I think you are wildly optimistic about how easy it is to create a secure sandbox; the history of cybersecurity is littered with the skulls of people who thought it was easy to make things secure. And, more importantly, the AI training/testing environments are decidedly not well-designed sandboxes.

Humans are fairly close to each other in intelligence, but even within the human band we see a lot of gains from intelligence. The wealthiest people in the world are mostly-self-made CEOs of tech companies. Now, arguably things like drive and energy are also important to their success, but if anything, AIs are way ahead of us on both counts. Current AIs aren't broadly capable of having fast, funny, engaging conversations with people; that's a result of them being dumb! (And it comes from the jaggedness of what intelligence they have.) It's also about to change in the next ten years.

Do you think governments cannot be subverted with intelligence? Do you think that public opinion is invulnerable to carefully-worded propaganda? Do you think judges cannot be swayed by superintelligent arguments? Supply chains cannot be replaced by new technologies?

Humanity did not take over the world through having better claws or thicker skin. We did so with technology, planning, adaptability, and coordination -- aka intelligence.

Legionaire's avatar

While I believe we will see an intelligence explosion at some point soon, a few reasons current LLM trends probably won't continue at this pace: most SOTA AI systems got there by learning from human data and knowledge. It's a lot easier to catch up to the frontier of knowledge than it it is to surpass it by making new discoveries. LLMs can do this, but notice they have only done it in math and CS where you easily verify the solution quickly. It's harder to discover a new effective drug this way, because your data center can't yet simulate 1000 different human bodies taking it. Things like AlphaFold can help of course, so we can expect to see things like AlphaFold for all sorts of domains.

And there are tons of stones left unturned with modern methods. Everyone is focused on LLMs right now, but neural networks have shown their power in many more domains. LLMs are letter predictors which had a crazy ability To understand the training data, and People are now working on things like video prediction.

Ignacio's avatar

Good article, but I'd like to add some credible criticism of the typical conclusions drawn from METR Time Horizons research, which made me a bit less worried about those exponential curves:

https://www.tobyord.com/writing/hourly-costs-for-ai-agents

https://www.transformernews.ai/p/against-the-metr-graph-coding-capabilities-software-jobs-task-ai

Luke Croft's avatar

Bullish on AI long term, but there are some clear hard bottlenecks around energy and the infrastructure needed to connect these data centres to the grid that are beginning to become pronounced. I suspect AI will be similar to the dot-com bubble of expectations not meeting reality quickly enough, even though directionally they were correct about the internet's transformative effects. In this scenario, prospects for a Chinese AGI become much more likely, largely due to their ability to build stuff more quickly, even though they're behind the Americans in terms of research and talent.

Bentham's Bulldog's avatar

The source I linked discusses them.

Carlos's avatar

Looking at how hard even the newest LLMs struggle with playing videogames (not just because of vision problems, their reasoning breaks down near totally), I think the intelligence explosion will be limited to domains where there is a lot of training data, like coding. But there is no training data to teach a transformer how to be a scientist, for example:

https://meagreprotestanthistory.substack.com/p/the-goodhart-singularity

daniel's avatar

I think this has been touched on by other commentators, but worth repeating - tests of knowledge have so far only looked at known facts or derivable calculations. The very nature of LLM/Transformer technology means it can only look back at training data and as such will never develop anything novel. We need a next generation of artificial intelligence which is capable of working in the abstract, inventing new ideas. I bet money that however this new AI comes about, it'll be developed by humans and not LLMs.

Seth's avatar

It looks like you left out the research vs scaling distinction, which is important when projecting a timeline of future intelligence increases.

Up until about 2020, the AI world was in a research phase, searching for the right kind of model that would allow for significant progress. They found it in LLMs, and from 2020 until now, the dominant approach has been scaling - more compute, more data, more money was yielding significant performance increases. There was incredible optimism that perhaps continued scaling could just take us all the way to AGI.

However, this has not been the case. First, the AI world has run into the 'data wall', where they've exhausted the available data resources (including the whole internet), and model performance is plateauing. Second, some limitations in LLMs seem inherent, despite scaling efforts (see, for example, Yann Lecun's criticisms of LLMs).

Amongst most researchers, scaling LLMs no longer seen as a promising way to make significant intelligence improvements. There have been LLM improvements post-data wall, such as through scaling with test-time compute optimizations, but most would agree these are unlikely to sustain the same improvement rate as we've seen.

Now, we are in a new research phase, trying to discover a better kind of model. Thus, we cannot assume that the rate of progress from 2020 until now will at all resemble the future rate of progress. The former was based in scaling, while the latter depends on novel research and a new formula for improvement. We could see major breakthroughs, or we could see stalling progress.

Alex C.'s avatar

Gary Marcus is the arch-skeptic in this field. I'd like to see BB write a post outlining (what he sees as) the flaws in Gary Marcus's reasoning.

Thomas Alan White's avatar

Although you could argue that AI is a fire, there's no doubt that a knowledge Renaissance is dumping gasoline on it. Being more efficient at manipulating known things is really a very small step forward in efficiency. Let me describe it this way: we are closer to being animals then we are to being godlike and all-knowing. We are about to get that all-knowing with the monks knowledge Renaissance which is going to push us perhaps a little too vigorously into the future.

Michael A Alexander's avatar

Why do you think an explosion in intelligence will lead to fast economic growth when AI is likely to displace human workers leading to reduced consumer income. Who will buy the increased output AI makes possible?

Bentham's Bulldog's avatar

Well an increase in production will majorly lower prices.

Michael A Alexander's avatar

Yes, but this means deflation, making the real cost of capital higher. You need investment to convert new knowledge into increased output. But if that output sells for a lower price then you cannot easily service the debt used to fund the expansion of output. This is why deflation is usually bad for the economy.

The only way the contributions by AI can be converted into strong GDP growth is if there is some mechanism that recycles a substantial fraction of GDP increases back to consumers so they can provide the aggregate demand needed for future investments to pay off. Traditionally this recycle mechanism was workers’ wages, but in an era of mass worker displacement by AI this mechanism will be seriously impaired.

In theory, one could heavily tax the owners of AI and then recycle this money to consumers in the form of some sort of UI, but I do not see any reason why the billionaires currently investing in AI are going to suddenly become copacetic with such taxation. Do you?

This is what led me to question why you though AI would produce strong economic growth.

Yash's avatar

Any discussion on AI that conflates LLMs with AI tools and the like that are a genuine help to researchers is misguided at best

Anatol Wegner, PhD's avatar

Sure the numbers keep on upping and x-ing ("millionfold" was my favorite). And yet the number of novel scientific results produced or viable commercial products developed autonomously by AI to this date is exactly zero - that is 10 years and a few trillion $s into this "industrial explosion".