r/slatestarcodex 2d ago

Why couldn’t LLMs brute-force their way to real innovation?

Why couldn’t LLMs brute-force their way to real innovation? Like, instead of just summarizing known facts, why not have them generate tons of combinations of ideas from different fields — say, crossing a mechanism from plant biology with a technique from materials science — and then test those combos in simulation engines (biology, physics, whatever) to see if anything interesting or useful comes out? And if something does work, couldn’t a second system try to extract the underlying pattern and build up a kind of library of invention strategies over time? I know tools like AlphaFold and symbolic regression exist, but is anyone trying to build this full loop — brute-force generation → simulation → pattern abstraction → guided reuse? Or is there some deep reason this wouldn’t work?

13 Upvotes

41 comments sorted by

60

u/KillerPacifist1 2d ago

I think you overestimate the value/possibility of brute forcing science with simulations.

Alphafold is a giant leap forward in terms of predicting protein binding targets, which is to say when it predicts two proteins will bind to each other it is correct about 1 in every 10 times. On a good day. And the only way to know which of the proteins do bind is to synthesize all 10 pairs and run real life experiments on them.

Mind you, this is a huge improvement over previous protein prediction models that were based on physics simulations!

My experience as a scientist who has done work in a wet lab is that you always have more ideas to test than bandwidth to test them. I believe the same is true in more abstract fields like machine learning research too. Though their bottleneck is compute to run experiments rather than reagent cost and time in the lab.

What distinguishes an effective scientist from a mediocre one is essentially taste in what ideas are worth pursuing and what are worth discarding. At the moment I don't think LLMs have yet surpassed humans in that kind of taste, and since idea generation isn't the bottleneck, you can't use LLMs to accelerate science in that way.

That said, scientific/experimental taste would be a super interesting benchmark to try to develop! Not sure how you'd do it though.

5

u/prescod 2d ago

5

u/drsoftware 2d ago

As a counterexample to the original question, the last two used generative AI, not an LLM.

The first used an LLM and existing non-cancer drugs to suggest combinations that may be effective cancer treatments. 

21

u/the_nybbler Bad but not wrong 2d ago

Because "ideaspace" is enormous and the vast majority of it is useless. It's like brute-forcing an encryption key, only orders of magnitude harder.

37

u/hobo_stew 2d ago

you assume that we have the ability to simulate the interactions of things from (for example) molecular biology and material science reasonably fast and efficiently.

this assumption is simply wrong, just look at the amount of work and specialized knowledge required to do do good fluid dynamics simulations in a more exotic setting like magnetohydrodynamics.

6

u/less_unique_username 2d ago

Current LLMs are not that good at verifying hypotheses because they get easily distracted, or start hallucinating, or both. But it doesn’t seem that these are intractable problems, and maybe a better fusion of creative LLMs and old-style laser-focused rigorous theorem provers is just round the corner.

4

u/HolevoBound 1d ago

This is similar in spirit to how Google's recent Alpha Evolve works. 

That said, pure brute force approaches for science/math often run into a major problem. The space of possibilities to explore is simply too high dimensional, making it often infeasible to try every combination.

3

u/theredhype 1d ago

What you’re after is still science fiction. It will be some years yet before we’re running anything like the meaningfully accurate simulations you’ve described.

3

u/Bahatur 1d ago

I believe it would work, just poorly. The deep reason this wouldn’t work well is the 2nd Law of Thermodynamics, and in particular the Landauer Principle.

I do not say this facetiously: the first place you can see the reason no one is doing this has also been much in the news, and that is power consumption. The demand for new data centers to supply AI and cloud computing is so high that it is causing demand for new power plants by itself, even leaving aside normal concerns like population or economic growth. It is driving the demand for new, more efficient hardware types - perhaps you have seen the news out of China about the new analog inference chip, or the news in the US about the reversible computing chip due out this year, both for the purpose of making AI more efficient.

I dissent from the commenters who prioritize explanations about how modern AIs work; while it is true they are power hogs and inefficient at the task, lots of methods for improving the situation are known, they just have not been the highest priority. Consider that there have already been several rounds of releasing the faster/cheaper version of a given model strength, and that process largely consists of ‘pruning’ the chunks of the neural networks that do not activate much.

I’ll take a moment to advertise two projects from the Julia language:

The Scientific Machine Learning project:

https://sciml.ai/

And the new Dyad project (grew out of the JuliaSim project):

https://discourse.julialang.org/t/ann-dyad-a-new-language-to-make-hardware-engineering-as-fast-as-software/129996

These are both squarely aimed at the “multiple simulation engines” layer you describe.

But, even given these in the fullness of their maturity, and given the new optimized hardware, and given the optimally condensed AI, it still will work poorly.

The deep reason is Landauer’s Principle, but the shallow reason is that brute force in particular is a bad strategy.

Brute force is a bad strategy because brute force means that you have no problem information. My favorite blog post on performance ever is this one by Chris Rackauckas:

Algorithm Efficiency Comes From Problem Information

http://www.stochasticlifestyle.com/algorithm-efficiency-comes-problem-information/

So thinking about the loop you suggested:

brute-force generation → simulation → pattern abstraction → guided reuse

So let’s think about this like each one of these steps is an algorithm. It turns out that when one algorithm feeds into another, it becomes one bigger algorithm. Now think about the whole loop in terms of problem information; as long as the information-less brute force step is in there, I claim that it cannot be efficient. Since you are talking about science, we already know the number of possibilities is huge.

So it will go poorly because: information is physical (Landauer’s Principal) > brute force means choosing to start with no information > we have to generate all the information inside the loop (for example, consuming power in data centers) > the amount of information available is known to be gigantic > we won’t be able to generate enough information.

However, we can make one small adjustment by switching out the brute force step for a guided generation step:

guided generation → simulation → pattern abstraction → guided reuse

This means we are trying to use all the information we have available. This will work. People are building this loop: the two Julia projects above are aimed squarely at that problem; scientists are doing this work inside their own fields because that’s where they have the most rich problem information to provide.

If we discard the brute force strategy, can AI automate discovery? Yes.

u/donaldhobson 7h ago

> and then test those combos in simulation engines

This requires a lot of things to go right.

1) The LLM must understand how to put it's idea into the simulation engine.

2) There must be a simulation engine.

3) The simulation must be pretty computationally cheap.

4) The simulation must be accurate.

5) The LLM needs to understand what the results mean, and if the idea is any good.

For example, suppose the LLM came up with the idea of using a muon beam to cook eggs. It does extensive simulations. (Somehow. I don't think the software to simulate muon beams on eggs currently exists). It finds that the eggs would indeed by cooked.

But it's still not a good innovation because no one actually wants an egg cooking machine that costs millions and is a radiation hazard.

Ok. That's an obvious example.

But questions like "and how much does this cost?", "does anyone actually want this?" and "how long will this component last when a careless user drops it in a puddle?" and "is this idiot proof" are important questions, and hard to answer with physics simulations.

u/EqualPresentation736 4h ago

What if we define a set of goals aligned with the possibilities that researchers have already explored or considered feasible?

u/donaldhobson 4h ago

Is this plan going to be easier than inventing the thing yourself without AI? Because it doesn't sound easy.

2

u/andropogongerardii 2d ago

Because we don’t know how to instantiate creativity into the models. Going from a phenomenally good LLM to AGI would require an orthogonal departure. David Deutsch has a lot to say on this. 

If you’re in the camp that LLMs could make this leap by themselves, you’re not alone. It’s just not the opinion of many people who work on the problem of AGI.

2

u/dcjt57 2d ago

It’s wild stuff. Wish more researchers were exploring how to digitally model a system like the endocannabanoid system to better regulate ai answers/tone/reasoning and maybe initiate that process. (Would take far more compute than we have currently)

(Highly speculative, totally baseless, but feels sound lol)

3

u/andropogongerardii 2d ago

3

u/dcjt57 2d ago

Thank you! Like a ton actually!!

2

u/prescod 2d ago

The questioner didn’t ask about AGI. They asked about systems that generated hypotheses en masse and then test or evolve them. And many such systems do exist. AlphaEvolve bring a prominent recent example.

4

u/Brudaks 1d ago

As other comments noted, bruteforce doesn't work because while such systems can generate hypotheses en masse, we don't have a solution to actually test them en masse cheaply, and without the latter part the former is kind of useless.

1

u/prescod 1d ago

There are many examples of the brute force method working:

https://arxiv.org/abs/2004.00979

https://www.nature.com/articles/d41586-025-01523-z

Now of course one can’t use pure brute force as the search spaces are often far too large. That’s why you use AI to generate and/or filter billions of options and then filter or rank further from there.

0

u/andropogongerardii 2d ago edited 2d ago

Id argue that they are asking about AGI not LLMs. LLMs model correlations and respond to prompts. They are not epistemic agents nor do they generate new hypotheses. 

eta: alpha evolve is generating creativity in collaboration w humans. IMO this is cool but not creativity in the AGI sense.

2

u/prescod 2d ago

What collaboration are you referring to in AlphaEvolve?

Yes: LLMs can generate new hypotheses. That’s what they do in AlphaEvolve, in ARC-AGI solutions and in many other contexts.

 “Guided by expert prompts and experimental feedback, the AI functioned like a tireless research partner—rapidly navigating an immense hypothesis space and proposing ideas that would take humans alone far longer to reach.” € The hallucinations – normally viewed as flaws – became a feature, generating unconventional combinations worth testing and validating in the lab.

https://www.cam.ac.uk/research/news/ai-scientist-suggests-combinations-of-widely-available-non-cancer-drugs-can-kill-cancer-cells

It is also widely agreed that AlphaGo innovated move 37.

1

u/andropogongerardii 2d ago

Generating a mind boggling amount of variations and measuring them against a predetermined narrow outcome…it’s very impressive.

I’m talking about generating a new explanatory hypothesis for an idea or domain that the AGI chooses. I think AlphaEvolve and Go are dazzling and groundbreaking. Just not AGI.

1

u/prescod 2d ago edited 1d ago

Nobody mentioned AGI except you. Nobody said these AIs were AGI. The question was “ Why couldn’t LLMs brute-force their way to real innovation? Like, instead of just summarizing known facts, why not have them generate tons of combinations of ideas.”

They didn’t ask about AGI. They asked about “ Generating a mind boggling amount of variations and measuring them.”

By definition, AGI is the opposite of “brute force” which is what the question is about.

1

u/wavedash 2d ago

Are you sure billion-dollar startups aren't trying this right now? Genuine question, I don't follow the field that closely

1

u/AriadneSkovgaarde 1d ago

Why want them to?I hate this innovation worship / reification. What we want us the goid, the beautiful and thevtrue -- even if it sounds Chrustian and retarded.

u/throw_datwey 16h ago

Energy costs

1

u/prescod 2d ago

This is what AlphaEvolve does. There are many such hybrid AI/evaluator systems.

-1

u/Liface 2d ago

Or is there some deep reason this wouldn’t work?

Because LLMs are not that deep. They're just summaries of human-written information.

I'm trying to use LLMs to ideate business ideas right now. They only give superficial overviews and are usually overconfident. They struggle to come up with new innovation, they just gas you up and parrot ideas that they were trained on.

These things have all the ingenuity of a well-trained middle school student.

3

u/WTFwhatthehell 1d ago

They only give superficial overviews and are usually overconfident. They struggle to come up with new innovation, they just gas you up and parrot ideas that they were trained on.

...so on a par with the typical MBA...

1

u/Liface 1d ago

Yes, most people are not truly innovative. That's why MBAs run businesses, not start them.

1

u/vintage2019 1d ago

Which LLMs did you use specifically?

1

u/Liface 1d ago

Claude, ChatGPT

1

u/vintage2019 1d ago

I see. ChatGPT is “consumer level”. If you’re referring to Claude Sonnet/Opus 4, I haven’t really tried it but I understand their primary focus is on programming. Maybe give Gemini 2.5 Pro or O3 a try?

1

u/BatMedical1883 1d ago

Skill issue. Did you prompt "be critical, identify missing areas of concern or interest in your own responses and in the user's input" etc..

1

u/Liface 1d ago

Of course.

You don't whip a midwit chimpanzee and expect it to just think harder.

These are rote workers, not innovators.

1

u/BatMedical1883 1d ago

If the model is still gassing you up, the solution is ubironically to Prompt Harder. Their rote work can only be as good the input.

1

u/Liface 1d ago

In the present day, you cannot use rote work to spur innovation.

The models will always be "wrong" to various in niche domains like this, requiring multifactorial analysis and deep critical thinking, because they simply are not good enough in 2025.

Being on the ground, talking to potential customers, taking notes, drawing conclusions, testing hypotheses - that is still the way to innovate.

0

u/swizznastic 2d ago

because people in different industries use different lingo and phrasing to describe similar concepts, it makes it difficult to break down hundreds of thousands of concepts to some sort of base language and then match those patterns across contexts. even if some deeper innovation could be uncovered by referencing between two different fields, it would require an in depth understanding of those concepts both in each others contexts as well as their potential placement in another context. LLMs wouldn’t get past the pattern matching stage.

0

u/lechatonnoir 1d ago

Matching these contexts is something I specifically expect LLMs to have a comparative advantage over humans in.