Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Can LLMs Generate Novel Research Ideas? (arxiv.org)
50 points by kmdupree on Sept 12, 2024 | hide | past | favorite | 81 comments


An LLM is like a well read college student with a nearly photographic memory that sometimes mixes things up.

It's great for bouncing ideas off of and getting feedback on them. And yeah, it might product "novel ideas" by mixing and matching existing ideas, but LLMs will never create truly novel ideas. Not in their current form.

The paper didn't really answer the question sadly: their conclusion was just that humans rate LLM answers as more novel than human ones, but less feasible.


Let's be real though. 99% of people 99% of the time are also not coming up with novel ideas.

LLMs seem to mostly be limited right now by the fact that they're always losing context on new conversations and their interactions don't rewire their neural nets. Hard to come up with a new idea when your brain gets reset with every conversation.


I agree but I have tried many times to intersect two ideas with a LLM that would be novel and the LLM can not do this at all.

We shouldn't expect the stochastic parrot to be able to do this though and it is unfair to the stochastic parrot.

It is like expecting a real parrot to say words it has never heard before.

No one asks that of a real parrot because we don't anthropomorphize a real parrot like we do the LLM.


> Hard to come up with a new idea when your brain gets reset with every conversation.

Haha I love this picture!


You are trivially incorrect.

LLMs (unless used in deterministic mode, which you shouldn't anyway) will eventually generate all ideas for the same reason as 1_000_000 monkeys on typewriters will eventually generate War and Peace. The question is only how soon in practice this will happen.

Well, I am nearly certain, that 1_000_000 LLMs (or rather 1_000_000 streams of generation using a few LLMs) will do better than 1_000_000 monkeys on typewriters.

The same 1_000_000 LLMs could do better than 1_000_000 average humans, but we don't know yet.


> And yeah, it might product "novel ideas" by mixing and matching existing ideas, but LLMs will never create truly novel ideas. Not in their current form.

Can you give a historic example of a human creating "truly novel ideas" that is not the product of mixing and matching existing ideas?


This argument can be applied recursively. For example: the information in the human genome is just information from the environment transferred into the genome by evolutionary learning. Ultimately you get back to some kind of first cause argument where everything goes back to God or whatever natural process created information in the universe. Either nothing new is actually new or everything new is new.

In the end it becomes moot. A novel rearrangement of existing ideas that does something new or different is creativity.

Can LLMs do that? I think they can to a limited extent, but not as well as humans. Is it something we get with scale or does it require a fundamental architectural innovation? Don't know.


> Can you give a historic example of a human creating "truly novel ideas" that is not the product of mixing and matching existing ideas?

The invention of PCR comes to mind:

> During a symposium held for centenarian Albert Hofmann, Hofmann said Mullis had told him that LSD had "helped him develop the polymerase chain reaction that helps amplify specific DNA sequences".

https://en.m.wikipedia.org/wiki/Kary_Mullis


https://www.iflscience.com/lsd-dna-pcr-the-strange-origins-o... would argue that it was exactly mixing and matching at play here, but that the component parts wouldn't necessarily have been called to mind without the hallucination.



How about the Transformer models that power LLMs? That was pretty novel.


I mean, it depends on how pedantic you want to be. I'd say e=mc2 was pretty novel, but yes, it was based on existing math concepts.

But I don't think an LLM could ever come up with something like that.


> I'd say e=mc2 was pretty novel

Well, the rest mass notion in special relativity is just a natural derivation. It appears once you have the "real" ideas in place. And those aren't really "existing math" at all. It was a pure physics idea at root: "the universe's laws don't change if you are in motion" (or alternate framings like "you can't tell if you're moving from inside a moving box").

Well, it turns out that if you try to construct such a theory, you end up needing some different (pre-existing) math to help describe it. But the idea isn't math at all.

Then the question is "Can LLMs propose new well-framed, evocative theories like relativity", and the answer is sort of open. In point of fact human beings, to first approximation, can't do this either!


Considering the fact that we know the names of the individuals who came up with new ideas in physics, it's extremely rare. If every other college kid was in that category, then we could call it "human ability".


Relevant to this discussion is the fact that if an LLM can’t come up with that, it wouldn’t be due to the inability to mix and match to form novel ideas, but something else, and that something else hasn’t been clearly articulated yet.


I think every major idea was truly novel at some point. Even an idea like "I will write down the things that happened, so when I am dead, other will know what has happened from my writings" was novel at some point.

Which means that an LLM can probably technically come up with novel ideas, since novel ideas aren't some special category of things, it's just that LLMs are not very good at it.


Why? Just a gut feeling?


Perhaps it's specific to present-day LLMs and things will improve in the future - but when I ask my favourite LLM to suggest creative, novel marketing options for a coffee shop it suggests a tenth-drink-free punch card. Which is pretty much the least novel answer imaginable.

It's such a banal suggestion it makes me think there could be a tension between the requirement to be creative, and the requirement that the next token be among the most probable.


I mean if that's how you're prompting it that's your problem. What are marketing options? Do you mean campaign? Campaign to do what? Increase repeat business? Get new customers? Expand into a new market?

"Hey make me a cool like marketing thing or whatever" isn't going to work.


What does `e=mc2` mean?


https://en.wikipedia.org/wiki/Mass%E2%80%93energy_equivalenc...

> Mass–energy equivalence states that all objects having mass, or massive objects, have a corresponding intrinsic energy, even when they are stationary. In the rest frame of an object, where by definition it is motionless and so has no momentum, the mass and energy are equal or they differ only by a constant factor, the speed of light squared (c²).


Thanks.

Now go check the "History" section of that article; which, by the way, takes up about half of it.


Alright.

I'm trying to come up with other examples, but as the parent said, it then depends how pedantic we want to be e.g. take Poincaré's spacetime, but he'd still be working from two pre-existing ideas, "space" and "time"; the idea of combining the two is (feels?) quite novel and unexpected.

Going back some more, the notion of "space" (or that of "time") feels more primitive, less explainable in terms of other notions.


Yes, but the whole "combine things and see if they make other thing make sense" seems to me like a procedure that may be reproduced by pattern matching at a very large scale.


Creativity is one of the most problematic concepts in psychology. I am right now reading

https://www.amazon.com/Sounds-Bell-Jar-Psychotic-Authors/dp/...

where the authors (a psychologist and two literature critics) carefully tease apart the connection between creativity and psychosis which is of course problematic because insanity mostly gets in the way of being creative which leads to much more serious definition of what "being creative" really means than one usually finds. (One thing they point out is that a third-rate artist (Andy Warhol?) can become quite prominent if they are good at marketing their work.)

People who are religious will make a theological argument to the effect that "God gave you the power to create when he created you" or "You can be creative because God is inside you".

Atheists may dismiss these arguments out of hand but it's a mistake to do so because of

https://en.wikipedia.org/wiki/Ontological_argument

in the sense that "God" can be defined as "the reason why there is something instead of nothing" which could have no relation to the image of some old patriarch on a throne. If we are made "in his image" we should consider the image revealed in a microscope that reveals that we are based on cybernetic principles that apply to the individual cell as well the whole organism and how those principles apply to the evolution of language and culture as they do to our genetic endowment.

(Insofar as God can delegate his creative ability to you, can't you further delegate it?)

A vulgar version of this is Rodger Penrose's "I can solve math problems because I am a thetan" where he claims to be exempt from the problems that Godel and Tarski and Turing warned you about but since there is nothing complete or consistent about Rodger Penrose these don’t apply (he can't solve Collatz and neither can a OT VIII!)

Muddy thinkers may reject the existence or relevance of God or not explicitly believe they are "a spirit in the material world" but often think there is something uniquely human about creativity (can other animals be creative?) but I sense that the ghost of the arguments above is behind that thinking.

In this transcript I get Python to create something that was never seen before and will never be seen again

   >>> uuid4()
   UUID('21205a92-2611-4710-b120-4a94f5ccf2d9')
which is by no means interesting; real creativity involves creating something that is useful and/or expressive using certain resources and subject to some system of constraints. Insofar as some task is repeatable, creativity is involved in the creation of some process or and/or system that makes the task repeatable.

As Edison put it “Genius is one percent inspiration and ninety-nine percent perspiration” so it is not so interesting that the LLM can generate novel (yuck I hate that word, "novel" is the first word I delete when I have to squash a long paper title to fit into 80 characters) research ideas, I'll be impressed when it can fill out a grant application that gets funded.


I'd be curious what an LLM would be able to accomplish if it was given all pre-modern texts we have access to. Would it be able to come up with ideas we know as modern (political theories, philosophical ideas, scientific theories, etc.)?

I have a hard time believing that if we fed an LLM all prehistoric speech uttered from humans that no matter what, it would never escape the paradigms of those people. If that is the case, than would relying on LLMs just get us stuck in our own paradigms and prevent true "progress"?


LLM are just large language models, they're not capable of thinking or coming up with novel ideas. They could enable people to do so by providing food for brainstorming.


Anytime I see an argument of the form X is just Y an alarm bell goes off in my head. While it is descriptive of the components, these arguments ignore the possibility of emergent complexity. Sometimes things are more than just the sum of their parts.

The sun is just a bunch of hydrogen.

A computer is just a bunch of transistors.

Humans are just a bunch of cells.

An artificial neural network is just a bunch of matrix multiplications.

I personally think LLMs are extremely limited and overhyped. But this form of argument seems incorrect to me since it can be used to argue that LLMs cannot do things we already know they can do.


Probably overhyped from a business and media perspective. From a technological perspective LLMs are the first thing we've ever built that can rocket past a turing test.

LLMs are like mentally challenged autistic human beings. The fact that we even built such a thing is a milestone in humanity.


> "...they're not capable of thinking or coming up with novel ideas."

This claim is unfalsifiable given common definitions of "thinking" and "coming up with novel ideas".


No, it is very much demonstrably false.

Here's a simple program that does not even do LLMs that will trivially enumerate all ideas (broken UTF8 handling omitted for brevity):

  for (var numeral = new BigInteger(0); ; numeral++)
    WriteLine(UTF8.GetString(numeral.ToByteArray());
For any given idea in English LLMs will certainly get it faster.


But it is verifiable. We could quibble over "who judges novelty," but I bet if there were regular examples of it doing so, and there were some community agreement the ideas were indeed suitably novel, we'd pretty quickly shout "existence proof!" and be done.


An LLM could come up with an infinite amount of new theories and ideas, of which >99% would be meaningless, if given enough time and energy.

Human’s ability is in guiding the prompt (of AIs, and the mind) on what is worth knowing and which hypotheses are worth testing.

Choosing an action from the countable infinite number of actions is the general framework of free will.


I think the amazing thing about the LLM is that it's not a random text generator so it's not 99% meaningless.

I give it only 50% meaningless.


With nearly unlimited energy, if you set up an indefinite while-loop asking it to continue to expand and make new theories on the same token vector it would approach >99% meaninglessness.


Yeah that's because the only thing changing is some random seed. It's like looking at f(0) = 1 and f(0.00001) = 1.00001 and saying that the function can't produce anything novel because the answer is always less than 2. Hint: try f(99999).

Of course it's all meaningless because most of the input and output is virtually identical. Vary the input heavily and then you will approach 50%. Of course this is assuming the token vector is truly random in terms of subject matter.


It’s not just a random seed though, this would be attempting to squeeze everything out of the LLM network with all the seeds :)


If you wanted to squeeze everything out of the network it's not just varying the seeds. It's varying the token vector.


Agreed, I doubt the full squeeze would be 50% correct on new hypotheses though. Would be a fun PhD thesis if I went back to grad school :D


I've wondered that too. If we trained an LLM on only the scientific content that predates Einstein could it come up with general relativity?

How much coaxing would it require to get it there? What would that process be like?

Can we learn from that and get an LLM trained on all scientific material pre Einstein and post to discover new physics stuff from what we learn from that process?


>> Can we learn from that and get an LLM trained on all scientific material pre Einstein and post to discover new physics stuff from what we learn from that process?

The parts of an LLM that teach it language is not disconnected from the parts that teach it facts. Good luck teaching it language with only pre Einstein data.


Why can't you just omit all modern content related to physics? There's lots of training data that doesn't mention any of Einstein's work.


Maybe you could, but any influence these concepts had on general culture might still leak in.


To me that's an interesting feature, not necessarily a bug.

How much would have to leak in to guide the LLM to the right conclusions? Is it a matter of quantity or quality? Can we successfully eliminate all leakage?


i just read this paper on arxiv and im still trying to wrap my head around the implications so the authors got over 100 nlp researchers to write down novel research ideas and then had them review ideas generated by a large language model (llm) without knowing which ones were human generated and which ones were from the llm and the results are pretty fascinating the llm generated ideas were actually judged to be more novel than the human ones p value < 0.05 but slightly weaker in terms of feasibility i mean whats the point of having a novel idea if its not feasible right but still its pretty cool that the llm can come up with stuff that humans havent thought of before

and the authors are saying that this study highlights some of the open problems in building research agents that can generate novel ideas like the llm was bad at self evaluation it couldnt tell which of its own ideas were good or bad and they also found that the llm generated ideas that were too similar to each other lacking diversity i mean thats not surprising right llms are trained on huge datasets but theyre still just pattern recognition machines they dont really understand the context or the implications of what theyre generating

but heres the thing novelty is hard to judge even for experts i mean how do you even define novelty is it just something that nobody has thought of before or is it something that challenges our current understanding of the world and the authors are proposing a follow up study where they actually have researchers execute these ideas into full projects to see if the novelty and feasibility judgements actually translate into meaningful differences in research outcomes which is a great idea i mean thats the only way we can really know if these llms are useful for accelerating scientific discovery or not

anyway im rambling on now but i just think this is a really interesting area of research and im excited to see where it goes can we really use llms to accelerate scientific discovery and what are the limitations of these models and how can we overcome them etc etc


Your output would benefit from proper punctuation. Are you using some kind of speech-to-text application? Certainly looks like it.

Interesting comment nevertheless, if not somewhat difficult to parse.


It's a brilliant idea, but LLMs are predicated upon the massive data of the internet.

That being said, you could take a pre-2018 dataset and test it for discoveries/insights circa 2019 or 2020 and see how well it performs.


Is this how our simulation started?


Even if they can't, individuals are prone to inspiration from an LLMs attempt. Worth giving it a shot at least.


Strongly agree.

Back in high school, I put together a program called DreamPool that would just literally pick out a few nouns from a gigantic dictionary file and bubble them up in a little graphic of a well, and then I would sit quietly and spin those concepts around attempting to connect them together.

LLMs are like a version of this on steroids and the potential as a tool for augmentation is huge.


Dreampool sounds very cool, almost like brain excercise!


Even if we've essentially created a rubber duck that slightly more easy to engage with, that sounds like a win.


This is true, I find LLMs to be the best mental lubricant ever created. Cures writers block and gets me out of creative and technical jams all the time.


I'm outlining a fantasy story I want to write, I've found using image generation (leonardo.ai) very helpful in nailing down what the world looks like and how it'll work, I've actually changed my mind on a few things based on what the AI generated.

Plus honestly it's just genuinely fun! I've resisted moving to one of their paid packages just yet because I think I'd spend all day on it!


Devils advocate, writers block and jams are symptoms not bugs to fix


Could you elaborate? Symptoms of what?


Lots of things I think, lack of understanding, lack of motivation. Typical suspects. I think We look for easy solutions like using LLMs for instance (but not exclusively) doing busywork when we shouldn’t be doing the busywork in the first place. Busywork is just example, but in creative non-busywork it could also be the extensive, time intensive exploration required to advance. In jazz for instance spending hundreds if not thousands of hours on scales and arpeggios over the instrument is a non-negotiable expense to access fluidity.

So while I think LLMs are great fun and I don’t judge anyone using them, personally wanting to use one for a difficult problem is a flag that I should allow more time to think before going ahead.


Not OP, but much writing advice with which I'm familiar, and my own experience, both suggest these are symptoms of an error made earlier and not yet recognized.

If I can't figure out where the plot could possibly go from here, go back and look for where I sent it off the rails earlier such that there's nowhere to take it now; if I can't find dialogue or action that fits, go back and find where I put the character in a situation they'd never get themselves in, or miswrote them to respond to it in a way they never would. Stuff like that, especially once it's had you stuck too long for inspiration latency still to be tenable as the cause.


Yep and in general I think the matrix got it right

when discussing Trinity and Neo's visions, the Oracle states: "We can never see past the choices we don't understand".


Or have yet to realize that we made, yeah.


Good advice, thanks!


Writer's traffic congestion


For some reason my mind maps concepts from chess onto the real world and the other way around. The novel AI ideas in chess are usually combinations of several... well... bad ideas. People usually take on one bad idea at a time and try to make it work. If they succeed it is quite surprising because objectively it was a bad idea.

Our thinking is a lot less error prone than that of the LLM's but we have to study for years to absorb prior art that LLM's mostly receive at birth by osmosis. It won't be able to take on the truly stupid ideas but it can combine large numbers of the somewhat stupid.

Like a chess position with lots of possible moves followed by lots of possible responses.


Taking a step back. Define "novel".

I have the idea for an induced draft umbrella. Stick a fan at the top under an opening.

Is that idea novel? I haven't seen it anywhere, it's just something I came up with. But it's not entirely novel, I'm just borrowing the concept of a fan, and an umbrella.

I don't feel this is entirely out of scope for what an LLM could describe in words?


Could rolling dice with matching phrases generate novel research ideas? (ok the metaphor is slightly off, but not a lot)


Maybe yes.

If you give it phrases from, say, the last five years' worth of published research papers, and it combines phrases at random and spits out the words, yes, in that there will be some interesting research ideas, and maybe even some that a human would not have come up with.

Unfortunately, they'll be buried in a blizzard of stuff that no human would come up with, because they are totally meaningless combinations. Finding the good ones is the issue - and it's not an issue that LLMs can currently solve.


Looks like @sama's o1 is killing PhD science questions. https://x.com/sama/status/1834283100639297910?t=iCRehNoBofMP...


Would you be able to tell if it did? There could be an obscure document in the training set that contains the idea. It seems like a very hard problem to definitively detect whether a concept came from the training set.


A single document doesn’t contribute much to the loss during base training. LLMs can absolutely memorize text if it’s duplicated in the training set, not so much if the document in question is obscure.


The difference between novelty and hallucination is feasibility. With integrated critique and feasibility checks you can eventually map the hallucination space into novelty.


Yes.

LLMs have helped me generate 1 novel discovery: that every top 10 programming language has a single creator (https://pldb.io/blog/aSingleCreator.html).

They also helped me generate this map yesterday (https://pldb.io/blog/whereInnovation.html), which is the most comprehensive map of programming language creation and software innovation ever created.


How do you define single creator? The person who typed the first keystroke?

It seems pretty silly to claim that Algol 60 has thirteen creators. Sure, Wikipedia lists that many names in some table, but that is not really a significant statistic, now is it? Java 22 probably has hundreds of people who contributed to it.

Fun exercise though :)


> How do you define single creator?

The creators define how many creators there were.

It's all open source and anyone can fix any mistakes by updating one line (for example, here's the creators entry for Python: https://github.com/breck7/pldb/blob/b8ae74253733e4aa0fb57d26...).

We generally don't have any disputes but there's the occasional error and always open to pull requests to fix those.

But you do bring up a good point about contributors/maintainers. Adding that data is definitely on the priority list, but might be a 2025/2026 thing.


Apache is credited with things it didn't initiate, but accepted after they were given up by another company (e. g. Cassandra)


shuf -n 3 /usr/dict/words will also sometimes generate novel research ideas.


Harmony Korine, the poet (done at SXSW 2010 with GordonandtheWhale.com)

https://youtu.be/PqvAmlnJTfU?si=C8M7IGnJ_iYJE-8_


[flagged]


There once was a user online,

Who thought AI verse wasn't fine.

"These LLMs can't reason!"

He cried with displeasure,

"Their rhymes are just data, not mine!"

-

He typed with furious might,

Certain robots can't versify right.

But with each posted screed,

The irony grew indeed -

His arguments lacked the insight.

-

For while he debated with zeal,

AIs composed with appeal.

They reasoned through meter,

Made metaphors sweeter,

And rhymed with poetic ideal.


No.

Perhaps we can use LLMs to invalidate patents. Want to check a patent? Just download the LLM model from before the issue date, then ask the LLM to produce the work. If it succeeds, you have invalidated the patent because the work was not novel.


Huh.

Regardless of folks opinions on the actual plan, does this idea imply an interesting question?

In some sense all of the ideas in a book already exist. Before I read the book, they haven’t been put through a process of being interpreted by my eyeballs and ingested into my brain. But, they do already exist.

Do the ideas in an LLM latent space (or whatever) already exist? They haven’t been read or interpreted yet, but the sentient cognition that goes into creating the idea has already happened.

What does it mean for an idea to exist anyway?



Conversely, you may ask: do we really want to grant patents for ideas that can be generated simply by asking an llm?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: