Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Actually seems absurdly simple now, but sometime last year I was trying to figure out what I'd need to tow my daughter's car cross country with my truck: what are the trailer/dolly options, what do they cost, can my truck actually tow the combined weight, etc.

I started out prompting ChatGPT kinda how I would with Google, one small prompt at a time, asking about various details. But after one or two of those I just tried "I want to tow a car of make A with my truck model B, from point C to point D, what are my options?" And it wrote me a report with comparison tables and computed towing weights and other details for different options.

At that point, I was like "Oh. This is different. And it's just the beginning."

 help



Similarly, I used gen ai to review a real estate purchase. I provided Zillow listing photos and serial numbers of all appliances, the electric panel, and a few additional not pictured areas that I took during the walk through.

I prompted the AI to write a report as if it were a home inspector and it actually did a better job and identified some issues the paid 750 usd inspector missed.


From pictures alone? What are some examples?


It noticed a flooding area due to low grass by the walkout door. It noticed mixed 15 and 20a receptacles on the same circuit. It noticed warped siding and recalled circuit breakers still in use.

15A and 20A receptacles on the same circuit sounds fine as long as it's a 20A circuit? And how could it tell which outlet is on which circuit?

It can’t, but it’s read reports before so it sure can simulate an answer.

What, the Zillow listing of you home doesn't have pictures of mixed 15 and 20a receptacles on the same circuit that an AI caught but that an inspector missed?

Is that what you're telling us??


Good thing you didn't want to wash the car on your way.

It very plausibly might have been totally wrong.

Out of laziness I several times asked Claude and ChatGPT each some torque figures and other simple, hard data related to my dirt bike. They often got it completely wrong, but full of confidence every time. I never trust LLMs with hard data, unless you RAG the PDF into the context and even then it's sketchy.


Dates matters. Questions I asked about my Mazda a year ago that were total hucillunations were answered very well this year. To me it feel like the early days of computing. What was not possible one year became possible when a new generation CPU or GPU came out and you have to consistently re-evaluate your expectations or else you'll miss the things that others are discovering with fresh eyes.

I made this personal 'benchmark' of odd and strange questions a few years back when this took off and I would keep re-running these questions whenever some big news came out about a new model and also going back and fourth between the different companies to see where they all stood. (Obvioulsy with clean cache/new accounts)

10 questions: In 2023 it could only get past question 3-4 to reaching the last question and still hacillunating(last year) to providing sources pulled from really obscure books(this year).

For example, one of the harder questions was about the transition of a particular 30 second portion of a background song used in a 30+ year old Bond film that was only played once in the entire film. Went from totally making up nonsense to accurately describing the music theory defintiion of the transition(called a 'stinger') to also explaining why it was done in that particular scene of the film and also providing sources from a snippet of a unrelated interview with the composer explaining his mindset at the time.

Maybe this isn't considered a real benchmark as its not reproducable but for a 'personal benchmark' I came away impressed. I would consider everyone to define their own benchmarks and 'tests' and to consistantly challenge the models to see if there are any meaningful improvements. Now I treat the AI as something to keep skeptical but to also to always consider what it proposes as an answer(ie. dont ever dismiss it outright). I sometimes wonder if this is slowly messing up my biases and maybe thats what Altman, Amodei and others want.


Hard numbers, no. Even high level concepts and theory you need to triangulate and prompt in different angles, across different models, and figure out what overlaps to build a mental mode that’s - even then - roughly 80% correct. It’s better than google, but the information isn’t free

It wasn’t wrong, though, in my case.

Fascinating; you used a non-deterministic tool - one that disclaims its own accuracy - to calculate critical information that could result in serious damages or physical injury? Did you like, double-check the results?

One must imagine how many claims have been denied by insurance companies for doing something like this...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: