Hacker Newsnew | past | comments | ask | show | jobs | submit | anonfunction's commentslogin

I've been doing bias and misaligned behavior research, creating custom private eval suites to test and compare models. Claude Opus 4.7 is heavily biased and presents clear regulatory and reputational risk.

It seems the initial product footprint tries to sidestep this problem by not giving the agents control on who to lend to or which applications to approve. Even so I think it's quite an optimistic read on their end. Happy to share reports to anyone who's interested (montana@latentevals.com), especially if you work at a frontier model lab and are interested in plugging my evals into your RL systems!


Slightly related, I used Opus 4.6 to help me make marketing copy and ideas for my app. It understood the vibe I was going for on my baby-naming app (elation at discovery, curiosity, shared experiences), while 4.7 instantly wanted to pit the couples against each other (really highlighting the he said/she said) and the marketing copy went from "find a name easier" to "Our new feature is great. You're welcome." I can't get it to drop the snarky sass no matter how much I change CLAUDE.md, brand voice, etc.

All I did was upgrade claude code and use the new model. It most definitely exhibits misaligned behavior (compared to 4.6)


I tried Opus 4.7 for two days before I started beginning every session with "claude --model claude-opus-4-6".

I assume that 4.6 will become unavailable at some point, but I hope not any time soon. 4.7 hit usage limits faster, didn't do anything obviously better, and had more annoying behaviors in other aspects. I don't know if this is strictly a model issue or if there are also problems with how it's harnessed through Claude Code. I'm not willing to spend more time digging into it until I'm forced to.


Join me in a petition for them to opensource 4.6 as their first model! It'll be like gemma4 but good enough for all the coding we do.

Nobody is using LLMs to make lending decisions. They are using LLMs to read, extract and audit the supporting documents that go into normal well-tested, compliant and rules-based underwriting systems. And firms A/B test against humans doing the same work. The outcomes your are looking for are metrics like delivering faster results back to customers, with fewer mistakes and less fraud, more compliant, than a comparable human-only process.

Author here, I built this because my agents could not use TUI or interactive CLI software. Think puppeteer for TUIs.

If you've ever asked Claude Code, codex, opencode, etc... to make logical commits from a bunch of changes you might have seen them struggle with moving files around, deleting files when a simple `git add -p` would have let them work through the changes interactivly. ht (the binary name) allows them to spawn their own terminal and drive it similar to puppeteer, sending keys, viewing a snapshot of the terminal and making a decision.

Some other use cases are also solved from CI/CD to making nice demos (it can export images or the format used by asciicast for making videos of the session.

It's powered by libghostty so it has support for all the stuff a modern terminal can do (full color, mouse clicks, etc...) and I've had a lot of fun letting claude run wild and even play games like nethack.

Happy to answer questions, and of course let me know if you run into any issues!


Fourth thing in a row they've announced that I wanted to try and couldn't.

Previous comment with the prior 3: https://news.ycombinator.com/item?id=47794419


Start designing at claude.ai/design.

That link is redirecting me to https://claude.ai/404, anyone else?


Have same issue, it redirect to https://claude.ai/login?returnTo=%2Fdesign then back and redirect to https://claude.ai/login?returnTo=%2Fdesign again, looping...


It works for me. Check again?


Seems they jumped the gun releasing this without a claude code update?

     /model claude-opus-4.7
      ⎿  Model 'claude-opus-4.7' not found



claude-opus-4-7


It seems they nerf it, then release a new version with previous power. So they can do this forever without actually making another step function model release.


Same, if we're punished for being on the highest tier... what is anthropic even doing.


You're not, it wasn't released yet. Update to 111 and you'll see it (i'm on Max20, i do)

Heck, mine just automatically set it to 4.7 and xhigh effort (also a new feature?)


Thanks, I was already on the latest claude code, I just restarted it and now it's showing 4.7 and xhigh.

xhigh was mentioned in the release post, it's the new default and between high and max.


     /model claude-opus-4.7
      ⎿  Model 'claude-opus-4.7' not found
Just love that I'm paying $200 for models features they announce I can't use!

Related features that were announced I have yet to be able to use:

    $ claude --enable-auto-mode 
    auto mode is unavailable for your plan

    $ claude
    /memory 
    Auto-dream: on · /dream to run
    Unknown skill: dream


I think that was a typo on my end, its "/model claude-opus-4-7" not "/model claude-opus-4.7"


That sets it to opus 4:

/model claude-opus-4.7 ⎿ Model 'claude-opus-4.7' not found

/model claude-opus-4-7 ⎿ Set model to Opus 4

/model ⎿ Set model to Opus 4.6 (1M context) (default)


A little off topic, but did Anthropic distill from an older OpenAI model? All the sudden over the last few days I'm getting a ton of em dashes in claude code responses!


It gets really fun when you prompt them to update their BOT.md, also they get access to previous run results so they continually "learn" from mistakes or investigate changes.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: