Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Fake HN titles generated by GPT-3 (filippo.io)
95 points by tmlee on Nov 26, 2022 | hide | past | favorite | 79 comments


Ask HN: Is there another HN beginning with a “h”?$

Show HN: Daily XLSX to-do list with attached spreadsheet as material*

Banning JavaScript from web pages is bad for the user

Has Google become too social?

SQLite development from scratch from scratch

Armor-piercing lasers are not shooting lasers but missiles

We made a public blockchain off-chain

Apple sued for pricing user data against provider who did not provide refunds

100% Embarrassing Haskell Builds

Heroku Compose is not fit for purpose

Pain Enhancer

The distance between reality and fantasy is grows ever smaller.


My favorites so far:

"Ask HN: Why do brands fail to realize that consumers want snacks?"

"Ask HN: Why does everyone assume everyone is “too technical?”"

"Ask HN: HN is now running on GCP. Why?"

"Incest" (i'm not kidding)

"Tell HN: Switzerland is replacing its central bank with a peer to bank banking company"

"Starlink—Smart RFID Hotspot"

"Kryptos: A robot written in Rust who needs to talk to humans"

"Cloudflare was hacked and replaced with an NSA spyware-run Proxy"

"I poked a hole in a wall and sent the image through a digital lens"


“Zxyn – A C escape language”


Apparently it’s not even the most realistic it could be! From the footer:

> We used Ada because titles generated by the larger Curie model are barely distinguishable from the real thing, despite being original.


This has the disconcerting effect of making the real HN titles look like they're generated by GPT-3.


Eventually we will get AI based title analysers that simulate the conditions to predict and optimise where your title will land on the front-page listing order.


I think a case could be made that ML is automated growth hacking.


Looked away for a while talking to someone, tried to click a link when I looked back :)

Also unrelated but good one "Tether issue: under $1B and all of the tether tokens suspended". This would definitely get my click :)


I absolutely want to read some of these. There needs to be a way to for people to post and rank the best matching articles that actually exist. I dream of the day that generative AI is repurposed to act as a more effective search engine.


I didn’t need to look away to click a link. Spoiler: it doesn’t work, nor do the comments


"Glassdoor: Employees have the right to be anonymous, but the CEO doesn't"

Hard to argue with that logic!


Personal favourite:

Mark Zuckerberg: “Haters are supporters; hate is a solution”


I came across this one: "Why I Love Vapor (2018)"


Mine was “Elon Musk’s armies are about to begin the slaughter”


It’s not GPT3 it’s just posting from the future!


Oh god, I sense this thing is trying to make us write essays to back up the titles.

My favorite so far: "Why isn't Linux perfect?"


I got "Why encryption doesn’t work", which does feels like an essay which should exist, if it doesn't already.


"Ask HN: Is HN worth caring about?" is by far the best one I've gotten.

"James Webb Telescope Recall" was pretty amusing as well.


> The Impossible History of Soil Temperature in the Earth’s Atmosphere

This one seems too good to be AI


This is good, and reminds me of the "Fake Paper Generator" that some MIT students made using context free grammars in 2005: https://pdos.csail.mit.edu/archive/scigen/

Unfortunately it doesn't work anymore, but the titles it generated for the papers were all plausible but also ridiculous CS paper titles, e.g. Rooter: A Methodology for the Typical Unification of Access Points and Redundancy


I got "Install the NuGet package manager on a Mac" which I'm still not sure is ridiculously infeasible or the kind of hack somebody might actually manage to pull off. Definitely HN-worthy if they manage it!


It's actually super easy! NuGet.exe has had support for Mono as long as I can remember :)

In fact, I think you can even install NuGet via Homebrew.


TIL :)


The real trick here is to use this site as a source of inspiration for legitimate blog posts.


Is there a dataset of HN titles? This made me want to fiddle with this, but step one is to get the data, and I don't want to crawl HN if the data has already been collected.


There are a few sources. There's the official API [0], the Algolia search API [1], and the BigQuery dataset which is pretty up to date [2].

I used the Algolia search API, it has extremely generous rate limits and page limits.

[0]: https://github.com/HackerNews/API [1]: https://hn.algolia.com/api [2]: https://console.cloud.google.com/bigquery?p=bigquery-public-...


There's an API[0] but it's frustratingly limited in capabilities (albeit not rate-limited.) You'll have to iterate all post IDs, download each post as JSON and get the titles that way.

There's also a Google dataset but I don't know the URL for it or if it's up to date.

[0]https://github.com/HackerNews/API


> frustratingly limited in capabilities

What's missing precisely? Seems to be good enough for every use I could think of.

One time I even downloaded every single item from it, with a threaded fetcher of I think 16 threads, iterating from 1 up to latest ID and it was done in some like 2 hours I think.


No ability to directly download threads with a single request, for one, or query it like a database to sort or filter results, exclude unwanted fields, etc.


Those things should be trivial to achieve with most general purpose languages, as the API is so simple. No need for pagination or other things, just request things by ID recursively and you get the full thing, then after than filter/select whatever you want.

Pseudo-code to show how simple it would be:

    function get_thread(id) {
      let item = http.get(`{api}/?id={id}`).body
      if item.childs {
        item.fetched_children = item.childs.map((id) => {
          return get_thread(id)
        })
      }
      return item
    }
(untested, but you don't really need more than that, besides checking if the item was deleted)


Yes, and i've done it. But you still wind up having to make a separate request for each item, which makes building threads incredibly slow. It's also a waste of time if you're filtering out anything, because you still have to make the request and download the item to filter it out.

Which is why it would be preferable for the API itself to support these features.


Can you somehow fine-tune GPT3 on a dataset? I just assumed the OP generated them using a prompt like "top hacker news threads" or somehing like that.


"Redtube’s Mark Twain Is a Scientologist Now"

Now I totally want to read that article. Please let the AI write it:)


It averages around 0.0005$ per request according to the footer. Could this end up costing the author quite a bit due to HN traffic? Also, whats stopping bad actors from writing a script to continuously fetch the page?

I wonder if some sort of caching might help lower costs.


It's part of the bit.

There's intentionally no caching, every batch is warm from the AI oven.

HN usually drives around 10k visits, so organically it's going to be well within my Saturday night budget. If someone decides to hammer it, well, the OpenAI account has a hard limit of $20/month. It will live until it's killed I guess.


> HN usually drives around 10k visits

You have to account for this not being a regular visit, but rather (I guess) 5-50 "visits" per visit, as people mash F5 to get more titles.

> OpenAI account has a hard limit of $20/month

That makes me feel better about the couple refreshes I did myself :).


Nothing. And in fact that’s exactly what happened to me. Some fella from HN spawned like 45 simultaneous wget’s in a loop to cause maximum financial damage. All of a sudden we see Firebase’s cost graph go vertical.

It happened after I mentioned “just be kind, please! Theoretically this could cost a lot of money.”

So there’s at least one person who will do exactly this just for fun.

Firebase customer support was super cool about it, but it still knocked us off the paid tier.


>All of a sudden we see Firebase’s cost graph go vertical.

think of all the money you have saved by not having to hire a hundred engineers to maintain your website's infrastructure!


Microsoft considered killing the startup community, says CEO

Tesla’s strange new CEO: ‘This is not the right place for me to be successful’

Puerto Rico is receiving $6

Earth’s largest cloud is the remains of a comet (2019)

A deep dive into C++ features that make your life harder

Windows 10 is basically iOS without the app drawer

Never trust a trusted set of data (2013)

The Rust Tax

Terabot: A remote-controlled toy train that trains riders to do your work

4-Hour Workweek Passes UNH’s Students

Why does Brexit happen?

NASA Launches High-Definition to Broadcast Its Infomercials

I made a living building software that no one needs

Scientists have created a new kind of blood clot

Two weeks without Instagram

Life on the ocean floor and how humans inhabit it

Taxing robots is immoral, right? Because it’s a) easier, and b) it would vastly reduce human labor

Why are people so glad to have their memories wiped?

Does the US lead the world in online harassment?

A 50-year-old man shops for lamb chops on the Internet for the first time

They said it was impossible to dig for cobblestones without being coated with mud

--

Ok, that's it. Time to abandon HN and go over to this site.

Actually, scratch that - I'm just moving to the universe this bot is from.


Hurricane Florence will likely “peel off” the US Atlantic Coast, US Coast Guard spokesman says

Getting the right amount of memories right – the culture of memory buffing in bats

The art of lithium-ion dynamite (1995)

We built the game that makes it harder to punch people

I posted this on Hacker News, and now it's being cut off from my website (2021)

Show HN: I built an acrylic web interface that resembles my laptop

Ask HN: How come not other languages?

80-bit integers get twice as big

EBay fired me for praying during my contract negotiations

Oregon Community College Students with Common Hyperparasites (2018)

New body cameras that didn’t see a shooting

Show HN: Awk – The ultimate search engine

The Complete Code of the IWK-01 Space Gun

Write It Down: Probably

NASA's Hubble observatory detects earliest known appearance of crude grease on objects around the world

Bitcoin collapses below one BTC

Show HN: I made a web app for making to-do lists

Hollywood is turning to Russia as a model of “positive cultural integration”

--

Ok it's just interviewing for The Onion at this point.


Beware, that universe might be closer to the singularity than this one:

Google Chrome's 'enhanced' pop-up blocker opens an emergency emergency_auth package after public disclosure


Yes,

Google CEO Jeff Weiner's Quest to Turn the AI Resource Engine Into an Effective Leader

is a very worrying development.


"Ask HN: I'm working full time on my blog and still have not found a sustainable way to make money"


Ask HN: How do you limit your son's software interest?

Show HN: A tool to quickly improve your skills

Amazing stuff.


@rachelbythebay's HN spoof is still my personal favorite: https://rachelbythebay.com/fun/hrand/


I want the links clickable and AI generated discussion threads! Please! Please!!


It reminds me of subreddit simulator


"French technical community in Bastogne after a brush with German Army"

"US sets 10-minute work hours as government turns to technology, not people"

"China Urges First Time Travelers to Avoid Underwear Sales"


On mobile, after clicking on the link and getting distracted for a second, I could not tell that I was not on HN ! I even tried to click multiple links until I reached the bottom of the page and remembered !

Good job with this !


Right. The only reason I didn't get confused for long is because I use a userstyle to render HN in dark mode, and this one displayed "plain". But my immediate reaction wasn't "hey, that's fake HN", but "hey, why is this HN tab rendering in default style?!".

Good job, OP!


Would love to see it go a step further and do the same thing for comment threads as well. Think I could possibly lose hours on a site like that rather than minutes. This was a lot of fun as well though.


The various subreddit simulator subreddits are a decent amusement.


"SpaceX is thinking of a rocket rail system"

This is something I can support


> Show HN: Rhapsody.fm – Send and receive messages using Discord

What an invention!


If only there was a way


"What Microsoft needs to do to become a large tech company 46 points by a machine learning model | hide | 24 comments"


I refuse to believe this one wasn’t snuck in to the outputs by a human:

> Tesla Co-CEO says the autonomous vehicle ‘is for chancers’


got another one about this Tesla Co-CEO

> Tesla Co-CEO is trying to raise $800M for T4, but details are sketchy


Well now I need to know more about the "Mystery of the frozen frog latin speaker (2012)"


"Pornhub was around for 18 years before people started thinking about it [video]"


Pick the good ones, and write blogs on them, then post on HN ... Profit!


Not fair. There's some articles on there I really want to read.


Top title: “The baby I was supposed to deliver on twitter”.

Man this is good.


Superb and timely. Although "Silicon Valley’s single biggest problem is that it’s the clearest way of doing things" is just garbled.


First warp drive test successfully conducted – US engineers

Nice


> "Apple is so petty, it may murder your kid"

> Git as a platform for mentalillness

What!?

> Show HN: Open-Source PyXMP Cryptographic Algorithm

I'd love to see that!


That's wrong on so many levels.

"ToddlerPorn.com is a website for parents to help their toddlers get off porn"


"Ask HN: What's your strategy for winning the lottery?"

"Computer Science at MIT Does Not Exist."

Still other ones I would use as writing prompts for a blog. It would be about things an ML model produced because it thought this was what you would think would be popular. (I defy any ML model to craft a more painful sentence.)

"The first hour of a movie is often worth the whole movie"

"Transhumanist manifesto: explore my visions for the future"

"Fill up on technology. Technology freaks out. Tell your kids to build a crisis framework"

These are amazing.


This is incredible and hilarious


It'd be so fun if you could generate comments for each of the threads as well.


“Progress on addressing the massive crisis in electrical chargers”

I’m glad to hear it!


When there is no new content on HN, I'm gonna refresh this one.


“The experience of a young black man in a white prison (2020)“


My favourite was "Crowdfunding Scam - Monero"


Interested in the process for fine tuning gpt3 for this


Super easy, just took 10k titles/comments/points from the Angolia API, formatted them as JSON Lines like the following with jq, and fed them to the very well built and documented openai CLI.

{"prompt": "A plausible Hacker News title:", "completion": " The Feynman Lectures on Physics (1964) (280 points, 62 comments) END"}

The space at the beginning of the completion is for tokenizing, and the END token is for use as a stop token in the generations.


Thanks!


A Caveman’s Guide to C++ (2009)

I am crying.


Where’s “Tiny X written in Y”?




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: