I first leanred about tree sitter a couple months back when I started looking at what was inside the NPM fodler for claude. It's a really cool library.
One of the things it made me think about is whether it made sense for using when editing large markdown files would it be more efficient to convert a document form markdown to DOM then back again for the purposes of editing a large markdown file via code agents? (or a json)
The theory being that agents are always asking me for pemission to use sed in bash to edit markdown files -- could tree-sitter do the same thing using its code-editing capabilities? And would that difference be materially impactful?
Could I lower the token cost of writing an extensive plan by choosing a format that allows me to use tree sitter?
I really haven't explored that much yet since I've been working on other things but it was more just one of those things that make you go hmmmmm... maybe someone else knows :)
oooh ... I wonder if it would be sufficient to just write a markdown gramar for TS?! I should ask my AI what it thinks.. I'm sure it'll tell me I'm absoloutely right and a very good and smart boy.
this is really good. it aligns closely with what I've been writing on the same subject (but is way better :) )
nice share. thanks.
OH! I commented before I realized it was a product pitch ... literal seconds before she introduced ACE.
If I'm honest, the idea of a collaborative agentic coding interface is pretty novel and interesting. Still impressed with the article just the nature of my appreciation has changed.
It is just a product pitch with beta access (I don't have access to it) But also agree that the direction is good. I think once everyone using the 12 terminal agent setup emerges from the void, teamwork processes will need to have a rethink. So it's interesting to follow this and other projects. What is yours?
> Feels like a fundamental bottleneck for production agent systems, so would love to compare how you're thinking about the latency vs accuracy tradeoff.
I'm really not focusing on latency right now. My short term goal is to prove the thesis that `ail` can improve same-model performance on SWEBench Pro vs. their own published results.
Can I run swebp with GLM-4.6 and get a score better than their published `68.20` https://www.swebench.com/?
The argument is that the latency right now just isn't the part we should worry about. If we're reducing the time to code something from ~6 weeks to 1 hour... then does it really matter tha we add an other 30 minutes of tool calls if we get it 100% right vs. 80% right?
so - my approach is still being built and I'm still very hand wavy around how it is going to come together, but effecively I'm building pipelines of prompts. Rather than running our LLM sequences as long running sessions where the entire context gets loaded on every turn (a recipe for rot), we unlock the ability to introduce a thinking layer at each step in between the process.
So before each turn is sent into the LLM we (potentially) run a local process to assemble a bespoke context of only what is required for that specific turn.
If a tool call is not going to be needed on the prompt, we don't include it in the system prompt on that round.
I'm still formalizing the spec at the moment and think I'm about six months to a year out before I have a full human ready UI running.
Essentially I'm trying to build an artificial neocortex and frontal lobe to provide a complete layer of Executive Function that operates on top of our agents - like Claude Code (or whatever else).
I'm basing the roadmap on the about 100 years of cognitive science. We've legitimately had names for all these failure modes (in humans) since the 1960's. We have observations of what we're witnessing in agents from 1848.
this is a pretty important piece and the research backs you up. Moving that context out of your system prompt dynamically is going to help reduce your lost in the middle effect. Context rots almost immediately. I've got a project that is being built to address this directly as well, but I'm still very early days.
Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2024). Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12, 157–173. https://doi.org/10.1162/tacl_a_00638
Hey! This looks a lot like what I'm working on, from a slightly different angle. I think you're on the right track. In fact, cortex as a name is perfect since you're effectively building the executive function layer for search and selection. I also think rust is the right language to go with.
I'm going do a deeper read of your work in a bit. I'd love it if you took a look at my theory of artificial cognition The YAML of the Mind https://alexchesser.medium.com/the-yaml-of-the-mind-8a4f945a..., dropped in to the `ail` project and let me know what you think.
I just have to get the kids to school and I'll pop back into cortex later
Hey folks, I wrote this. If you're interested in the concepts or pressure testing the ideas a little deeper, please feel free to comment here or reach out directly.
I appreciate that it's pretty long feel free to point your LLMs at it
I'm writing the tool as proof of the spec. Still very much a pre-alpha phase, but I do have a working POC in that I can specify a series of prompts in my YAML language and execute the chain of commands in a local agent.
One of the "key steps" that I plan on designing is specifically an invocation interceptor. My underlying theory is that we would take whatever random series of prose that our human minds come up with and pass it through a prompt refinement engine:
> Clean up the following prompt in order to convert the user's intent
> into a structured prompt optimized for working with an LLM
> Be sure to follow appropriate modern standards based on current
> prompt engineering reasech. For example, limit the use of persona
> assignment in order to reduce hallucinations.
> If the user is asking for multiple actions, break the prompt
> into appropriate steps (**etc...)
That interceptor would then forward the well structured intent-parsed prompt to the LLM. I could really see a step where we say "take the crap I just said and
turn it into CodeSpeak"
What a fantastic tool. I'll definitely do a deep dive into this.
that's a bit of a meta discussion and it'd probably reveal some super interesting things about how tech culture have changed in the last ~15 years.
I've been on HN since 2010 (lost the password to my first account, alexc04) and I recall a time when it felt like every second article on the front-page was an bold directive pronouncement or something just aggressively certain of its own correctness.
Like "STOP USING BASH" or "JQUERY IS STUPID" - not in all caps of course but it created an unpleasant air and tone (IMO, again, this is like 16 years ago now so I may have memory degredation to some extent)
Things like donglegate got real traction here among the anti-woke crew. There have been times where the venn diagram of 4chan and hackernews felt like it had a lot more overlap. I've even bowed out of discussion for years at a time or developed an avoidance reaction to HN's toxic discussion culture.
IMO it has been a LOT better in more recent years, but I also don't dive as deep as I used to.
ANYWAYS - my point is I would be really interested to see a sentiment analysis of HN headlines over the years to try and map out cultural epochs of the community.
When has HN swayed more into the toxic and how has it swayed back and forth as a pendulum over time? (or even has it?)
I wonder what other people's perspective is of how the culture here has changed over time. I truly think it feels a lot more supportive than it used to.
One of the things it made me think about is whether it made sense for using when editing large markdown files would it be more efficient to convert a document form markdown to DOM then back again for the purposes of editing a large markdown file via code agents? (or a json)
The theory being that agents are always asking me for pemission to use sed in bash to edit markdown files -- could tree-sitter do the same thing using its code-editing capabilities? And would that difference be materially impactful? Could I lower the token cost of writing an extensive plan by choosing a format that allows me to use tree sitter?
I really haven't explored that much yet since I've been working on other things but it was more just one of those things that make you go hmmmmm... maybe someone else knows :)
reply