Hacker Newsnew | past | comments | ask | show | jobs | submit | eclark's commentslogin

Be careful with initial impressions of metrics. We as humans have a heavy tenancy to anchor to our first judgments or impression. We see a win and assume the win is long term, with no downsides, and dependent on the new information/change.

So combine that with the Hawthorne effect and new business or health initiatives that can look great simply because participants notice change and notice the increased attention. However many human patterns have a tendency to regress to the mean.

Personally I have seen this a lot with developer tools and DevOps. A new SEV/incident/disaster happens and everyone rushes to create or onboard to a tool that would help. Around the office everyone raves about it and is sure that it would fix all issues. And the number of commits goes up, or the number of SEV's in an area decreases for a while. People were paying attention, after a while the tool starts to slow down or not be as used. It's got rough edges that weren't seen or scenarios that were supposed to be supported never get fully integrated. Eventually the patterns regress, but with more tools and more complexity.

- https://pmc.ncbi.nlm.nih.gov/articles/PMC1936999/

- https://arxiv.org/abs/2102.12893


> We as humans have a heavy tenancy to anchor to our first judgments or impression.

One of my lifelong guiding quotes: The first principle is that you must not fool yourself, and you are the easiest person to fool. - Richard P. Feynman

> We see a win and assume the win is long term, with no downsides, and dependent on the new information/change.

Not me. I've had a hard life and I've worked incredibly hard to get here. I'm a little more loss-averse and focus on what can go wrong, not what went right. It's far too easy for us to become complacent. All in all I'm not your average CIO at all. I'm extremely technical, got my experience as an IT consultant for years and learned business by doing. Since moving from consultancy to employed life, I took the time to get several certifications and even did an MBA about a decade ago.


No it's far from trivial for three reasons.

First being the hidden information, you don't know your opponents hand holdings; that is to say everyone in the game has a different information set.

The second is that there's a variable number of players in the game at any time. Heads up games are closer to solved. Mid ring games have had some decent attempts made. Full ring with 9 players is hard, and academic papers on it are sparse.

The third is the potential number of actions. For no limit games there's a lot of potential actions, as you can bet in small decimal increments of a big blind. Betting 4.4 big blinds could be correct and profitable, while betting 4.9 big blinds could be losing, so there's a lot to explore.


Text trained LLM's are likely not a good solution for optimal play, just as in chess the position changes too much, there's too much exploration, and too much accuracy needed.

CFR is still the best, however, like chess, we need a network that can help evaluate the position. Unlike chess, the hard part isn't knowing a value; it's knowing what the current game position is. For that, we need something unique.

I'm pretty convinced that this is solvable. I've been working on rs-poker for quite a while. Right now we have a whole multi-handed arena implemented, and a multi-threaded counterfactual framework (multi-threaded, with no memory fragmentation, and good cache coherency)

With BERT and some clever sequence encoding we can create a powerful agent. If anyone is interested, my email is: elliott.neil.clark@gmail.com


They would need to lie, which they can't currently do. To play at our current best, our approximation of optimal play involves ranges. Thinking about your hand as being any one of a number of cards. Then imagine that you have combinations of those hands, and decide what you would do. That process of exploration by imagination doesn't work with an eager LLM using huge encoded context.


I don't think this analysis matches the underlying implementation.

The width of the models is typically wide enough to "explore" many possible actions, score them, and let the sampler pick the next action based on the weights. (Whether a given trained parameter set will be any good at it, is a different question.)

The number of attention heads for the context is similarly quite high.

And, as a matter of mechanics, the core neuron formulation (dot product input and a non-linearity) excels at working with ranges.


No the widths are not wide enough to explore. The number of possible game states can explode beyond the number of atoms in the universe pretty easily, especially if you use deep stacks with small big blinds.

For example when computing the counterfactual tree for 9 way preflop. 9 players have up to 6 different times that they can be asked to perform an action (seat 0 can bet 1, seat 1 raises min, seat 2 calls, back to seat 0 raises min, with seat 1 calling, and seat 2 raising min, etc). Each of those actions has check, fold, bet min, raise the min (starting blinds of 100 are pretty high all ready), raise one more than the min, raise two more than the min, ... raise all in (with up to a million chips).

(1,000,000.00 - 999,900.00) ^ 6 times per round ^ 9 players That's just for pre flop. Postflop, River, Turn, Showdown. Now imagine that we have to simulate which cards they have and which order they come in the streets (that greatly changes the value of the pot).

As for LLMs being great at range stats, I would point you to the latest research by UChicago. Text trained LLMs are horrible at multiplication. Try getting any of them to multiply any non-regular number by e or pi. https://computerscience.uchicago.edu/news/why-cant-powerful-...

Don't get what I'm saying wrong though. Masked attention and sequence-based context models are going to be critical to machines solving hidden information problems like this. Large Language Models trained on the web crawl and the stack with text input will not be those models though.


Why would they need to lie? Where's the lying in Poker?

(Ignore for a moment that LLMs can lie just fine.)

What you are describing is exploring a range of counterfactuals. That's not lying.


Early game bluffs are essentially lies that you tell through the rest of the streets. In order to keep your opponents from knowing when you have premium starting hands, it's required to play some ranges, sometimes as if they were a different range. E.g., 10% of the time, I will bluff and act like I have AK, KK, AA, QQ. On the next street, I will need to continue that; otherwise, it becomes not profitable (opponents only need to wait one bet to know if I am bluffing). I have to evolve the lie as well. If cards come out that make my story more or less likely/profitable/possible, then I need to adjust the lie, not revert to the truth or the opponent's truth.

To see that LLMs aren't capable of this, I present all of the prompt jailbreaks that rely on repeated admonitions. And that makes sense if you think about the training data. There's not a lot of human writing that takes a fact and then confidently asserts the opposite as data mounts.

LLMs produce the most likely response from the input embeddings. Almost always, the easiest is that the next token is in agreement of the other tokens in the sequence. The problem in poker is that a good amount of the tokens in the sequence are masked and/or controlled by a villain who is actively trying to deceive.

Also, notice that I'm careful to say LLM's and not generalize to all attention head + MLP models. As attention with softmax and dot product is a good universal function. Instead, it's the large language model part that makes the models not great fits for poker. Human text doesn't have a latent space that's written about enough and thoroughly enough to have poker solved in there.


I wouldn't call a bluff a lie. In the sense that you can tell anyone who asks honestly about your general policy around bluffing and that would not diminish how well your bluffs work. In contrast with lying, where you going around and saying "Oh, yeah, I tend to lie around 10% of the time." would backfire quite a bit.

In game theory, the point of bluffing is not so much to make money from your bluff directly, but to mask when you are playing a genuinely good hand.

> [...] it's required to play some ranges, sometimes as if they were a different range; [...]

Why the mental gymnastics? Just say what the optimal play for 'some ranges' is, and then play that. The extra indirection in explanation might be useful for human intuition, but I'm not sure the machine needs that dressing up.

> LLMs produce the most likely response from the input embeddings. [...]

If I wanted to have my LLM play poker, I would ask it suggest me probabilities for what to play next, and then sample from there, instead of using the next-token sampler in the LLM to directly tell you the action you should take.

(But I'm not sure that's what the original article is doing.)

> The problem in poker is that a good amount of the tokens in the sequence are masked and/or controlled by a villain who is actively trying to deceive.

> Human text doesn't have a latent space that's written about enough and thoroughly enough to have poker solved in there.

I agree with both. Though it's still a fun exercise to pit contemporary off-the-shelf LLMs against each other here.

And perhaps add a purpose built poker bot to the mix as a benchmark. And also try with and without access to an external random sampler (like I suggested above). Or with and without access to eg being able to run freshly written Python code.


>They would need to lie, which they can't currently do

They lie better than most people lol.


I am the author/maintainer of rs-poker ( https://github.com/elliottneilclark/rs-poker ). I've been working on algorithmic poker for quite a while. This isn't the way to do it. LLMs would need to be able to do math, lie, and be random. None of which are they currently capable.

We know how to compute the best moves in poker (it's computationally challenging; the more choices and players are present, the more likely it is that most attempts only even try at heads-up).

With all that said, I do think there's a way to use attention and BERT to solve poker (when trained on non-text sequences). We need a better corpus of games and some training time on unique models. If anyone is interested, my email is elliott.neil.clark @ gmail.com


Why wouldn't something like an RL environment allow them to specialize in poker playing, gaining those skills as necessary to increase score in that environment?

E.g. given a small code execution environment, it could use some secure random generator to pick between options, it could use a calculator for whatever math it decides it can't do 'mentally', and they are very capable of deception already, even more so when the RL training target encourages it.

I'm not sure why you couldn't train an LLM to play poker quite well with a relatively simple training harness.


> Why wouldn't something like an RL environment allow them to specialize in poker playing, gaining those skills as necessary to increase score in that environment?

I think an RL environment is needed to solve poker with an ML model. I also think that like chess, you need the model to do some approximate work. General-purpose LLMs trained on text corpus are bad at math, bad at accuracy, and struggle to stay on task while exploring.

So a purpose built model with a purpose built exploring harness is likely needed. I've built the basis of an RL like environment, and the basis of learning agents in rust for poker. Next steps to come.


> None of which are they currently capable

what makes you say this? modern LLMs (the top players in this leaderboard) are typically equipped with the ability to execute arbitrary Python and regularly do math + random generations.

I agree it's not an efficient mechanism by any means, but I think a fine-tuned LLM could play near GTO for almost all hands in a small ring setting


To play GTO currently you need to play hand ranges. (For example when looking at a hand I would think: I could have AKs-ATs, QQ-99, and she/he could have JT-98s, 99-44, so my next move will act like I have strength and they don't because the board doesn't contain any low cards). We have do this since you can't always bet 4x pot when you have aces, the opponents will always know your hand strength directly.

LLM's aren't capable of this deception. They can't be told that they have some thing, pretend like they have something else, and then revert to gound truth. Their egar nature with large context leads to them getting confused.

On top of that there's a lot of precise math. In no limit the bets are not capped, so you can bet 9.2 big blinds in a spot. That could be profitable because your opponents will call and lose (eg the players willing to pay that sometimes have hands that you can beat). However betting 9.8 big blinds might be enough to scare off the good hands. So there's a lot of probiblity math with multiplication.

Deep math with multiplication and accuracy are not the forte of llm's.


Agreed. I tried it on a simple game of exchanging colored tokens from a small set of recipes. Challenged it to start with two red and end up with four white, for instance. I failed. It would make one or two correct moves, then either hallucinate a recipe, hallucinate the resulting set of tiles after a move, or just declare itself done!


If you could, theoretically, make a LLM that could actually excel at poker would that mean that it is good at lying to people?


> lie

LLMs are capable of lying. ChatGPT / gpt-5 is RL'd not to lie to you, but a base model RL'd to lie would happily do it.


I think 'Batteries Included' would interest you, then. Like this, it's installable on AWS. It's a whole platform PaaS + AI + more built on open source. So Kubernetes is at the core, but with tons of automation and UI. Dev environments are Kubernetes in Docker (Kind-based).

- https://github.com/batteries-included/batteries-included/ - https://www.batteriesincl.com/


> The feature is integrated with DJ software and hardware platforms AlphaTheta

They called out AlphaTheta, so here's hoping that it is. That would make my decision to move off of Spotify for personal streaming even easier


While I was at FB (it wasn't Meta then), I saw what a superpower the infrastructure is there. Product engineers build things of a scale in days. While I was there, I got to be tech lead for several different teams (2x distributed dbs, 1x Dev Efficiency, 1x Ads), some of which are called out by name here.

Shout out to the HBase and ZippyDB teams! This is the first public acknowledgment that ZippyDB was converged upon.

It's also super cool to see the Developer Efficiency pushes called out. 10,000 Services pushed daily, or every commit is so impressive.

When I left FB, I couldn't find anything close. So, I'm building the infra that I was missing as a startup. Batteries Included. https://www.batteriesincl.com/ https://github.com/batteries-included/batteries-included/


Good luck to you. Maybe you’ll be the next StatSig.


Thanks! They have built an impressive business and tool.


I work on a startup where the entire self-hosted SaaS is permissively licensed.

https://github.com/batteries-included/batteries-included https://www.batteriesincl.com/ https://www.batteriesincl.com/LICENSE-1.0

I started the company because I wanted to give the infrastructure team that FAANG companies have to smaller enterprises. Most of the best infrastructure is open source but too complicated to use or maintain. So we've built a full platform that will run on any Kubernetes cluster, giving a company a push-button infrastructure with everything built on open source. So you get Heroku with single sign-on and no CLI needed. Or you get a full RAG stack with model hosting on your EKS cluster.

Since most of the services and projects we're building on top of are open source, we wanted to give the code to the world while being sustainable in the long term as a team. I had also been a part of Cloudera, and I had seen the havoc that open core had on the long-term success of Hadoop. So, I wanted something different for licensing. We ended up with a license that somewhat resembles the FSL but fixes its major (in my opinion) problem. We don't use the competing use clause instead opting for a total install size requirement.

I'm happy to chat with anyone about this, my email is in my profile. Good Luck nd I hope it works for you.


I was at a conference where this was presented by John https://www.amazon.com/Journey-Profound-Knowledge-Altered-In...

It’s a fun little eye opener that starts conversations. I wish more of those conversations ended up moving decision makers


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: