One nice development recently was ollama's support for MLX optimization on Mac hardware. It's not obvious how to know you're using a model that works with it, yet, so it's rough around the edges.
Similarly, none of our comments actually exist as language on Hacker News—just numerical values from the ASCII table. We're deluding each other into thinking we're using language.
I believe it's reasonably clear that our thought processes generally occur outside of language. We do use language during explicit reasoning, but most thinking occurs heuristically. It's on par with the thinking of animals that don't use language but do complex behavior.
It not clear to me how well that maps onto LLMs. Our wetware predates language, and isn't derived from it. Language is built on top. LLMs are derived from language. I think that means that the intermediate layers are very different from the brain neurons, but I don't know. It's eerie how well the former emulates the latter.
There’s an interesting thing there that I believe varies person to person. My understanding is that some people do think in a more symbolic/heuristic way, some rely very heavily on their inner monologue to make sense of things (I am in the latter camp, and only have a single core language processor so pretty much cannot come up with coherent thoughts if I’m concentrating on what someone else is saying)
Even more interesting, and getting off on a bit of a tangent, there is also a mode that I use for revealing emotions that I don’t have words for (alexythmia): I open up a text editor, stare off into space, and let my fingers type without “observing” the stream of words coming out. I then go back and read what I “wrote” and often end up understanding how I’m feeling much better than I did. It’s weird.
Edit: also, playing with local models through e.g. llama-cpp in “thinking mode” is super fascinating for me. The “thought process” that comes out before the real answer often feels pretty familiar when I reflect on my own inner monologue, although sometimes it’s frustrating for me because I see where their “thinking” went off the rails and want to correct it.
Sounds like you're betting that the performance users experience today will be the same as the performance they'll expect tomorrow. I wouldn't take that bet.
You mean that if you were Anthropic, you'd build the data centers on every continent? Can you explain your reasoning?
We're talking about billions of dollars of extra capex if you take the "let's build them everywhere" side of the bet instead of "let's build them in the cheapest possible place" side. It seems to me that you'd have to be really sure that you need the data center to be somewhere uneconomical. I think if you did build them in the cheap place, it's a safe bet that you'll always have at least enough latency-insensitive workloads to fill it up. I doubt that we would transition entirely to latency-sensitive workloads in the future, and that's what would have to happen for my side of the bet to go wrong. The other side goes wrong if we don't see a dramatic uptick in latency-sensitive inference workloads. As another comment pointed out, voice agents are the one genuinely latency-sensitive cloud inference workload we have right now; they do need low latency for it. Such workloads exist, but it's a slim percentage so far.
I believe I'm taking the safe bet that lets Anthropic make hay while the sun shines without risking a major misstep. Nothing stops them from using their own data centers for cheap slow "base load" while still using cloud partners for less common specialized needs. I just can't see why they would build the international data centers to reduce cloud partner costs on latency-sensitive workloads before those workloads actually show up in significant numbers.
I suspect that the touch bar served its likely real purpose: to ship an ARM CPU with a secure enclave in the machines so that we could have Touch ID without needing to wait for Apple Silicon. Everything other than that was gravy, an interesting experiment.
Yeah, congress forces the military to contract out to companies in enough congressional districts to secure passage of the legislation. We basically force these companies into byzantine and inefficient supply chains because we treat it all as a jobs program.
Maybe in the future but with the current models I found the constantly accessible memories to be an impediment. I don't want models to record and repeat mistakes or suboptimal strategies.
Gemini (just in the browser) has been really bad about conflating a bunch of similar projects. It remembers "oh, you have a home server that does XYZ", so my new home server that's doing ZYX instead must be the same system.
Is it lifting him up? It's certainly irrelevant, is my point. My assumption then is that it's because it's supposed to be surprising. 'Hobbyist spends 20y on their hobby' isn't that surprising, even if the hobby is interesting; instead of letting the story stand on that interest, they're attempting to add 'shock and awe'.
https://ollama.com/blog/mlx
reply