I think that'd be cool, but I'd say that Claude Code/Codex is often used for this exact thing and they do a decent job of it (at least in my experience with R). Usually once I've kind of wrapped up my model or data work I'll just ask "okay, now organize this so it makes sense", and it usually does a great job at organizing the helpers, etc.
Not that TIOBE or PyPl are the end all be all, but R was in the Top 10 for the first time since 2020 and PyPl has it at #4. A lot of people use R in 2026, because it's still great for data science work, "tidy" language is still fantastic for working with data, and also it's caught up to Python in almost every way when it comes to putting models into production. Both are great "orchestrator" languages, and I've put both into production on sites that get hundreds of thousands of hits a day.
In my opinion, RStudio is still the best data science IDE and it's not even close. I've been using Positron a bit more lately just for Claude Code reasons, as I prefer having the pane itself rather than using the terminal, but man it's really tough to shake RStudio. Even with the work put into configuring VSCode to get it kind of close to it, it still just always feels a bit janky.
Emacs + ESS is superior IMO. RStudio has a bunch of frills I don't care about and doesn't let me configure files as I'd like. ESS showing the function signature in the minibuffer to me is the killer feature. Wish I could get that for EVERYTHING.
Yeah, once you move onto legitimate business evaluation metrics (where Precision@k or Recall@k don't actually fit your business model without modification), GPTs just seem to suffer without context, and hey, knowing the context is part of what gives a data scientist his value.
I still feel like they are in an incredible position when it comes to AI because of their hardware integration/advantage across all of their devices. I think they see immense value in getting things on-device and not having to rely on any of these other companies.
When it comes to AI, there's ~5 trillion dollars of datacenter revenue Apple could be competing for, but isn't. That's not good.
Now, maybe it would be justifiable if there were great local AI experiences on iPhone, or an easy $5 trillion to be made elsewhere. Until then, Apple is bleeding money hand-over-fist by refusing to sign the CUDA UNIX drivers and sell the rackmount Mac as a cutting-edge TSMC inference box. The Grace superchip is absolutely eating Apple's ARM lunch right now.
From what I saw in the latest "language" surveys or whatever, R does seemingly seem to be making a slight comeback. I was actually surprised at its place above Ruby, iirc. Again, not that these surveys are the end-all-be-all, but I've also started to see a lot more data science postings that have R or Python be a requirement, where I feel like for a few years it was ALL Python.
Yes, I've absolutely noticed this. I feel like I can always tell when something is up when it starts trying to do WAY more things than normal. Like I can give it a few functions and ask for some updates, and it just goes through like 6 rounds of thinking, creating 6 new files, assuming that I want to write changes to a database, etc.
My advice from someone who has built recommendation systems: Now comes the hard part! It seems like a lot of the feedback here is that it's operating pretty heavily like a content based system system, which is fine. But this is where you can probably start evaluating on other metrics like serendipity, novelty, etc. One of the best things I did for recommender systems in production is having different ones for different purposes, then aggregating them together into a final. Have a heavy content-based one to keep people in the rabbit hole. Have a heavy graph based to try and traverse and find new stuff. Have one that is heavily tuned on a specific metric for a specific purpose. Hell, throw in a pure TF-IDF/BM25/Splade based one.
The real trick of rec systems is that people want to be recommnded things differently. Having multiple systems that you can weigh differently per user is one way to be able to achieve that, usually one algorithm can't quite do that effectively.
Speaking of TF-IDF I once added it “after” the recommendations to downscore items that were too popular and tended to be recommended too much/with too many other items (think Beatles/iphone) and inversely for more niche items. It might be too costly too do depending on how you generate the recommendations though.
I have also anecdotally noticed it starting to do things consistently that it never used to do. One thing in particular was that even while working on a project where it knows I use OpenAI/Claude/Grok interchangeably through their APIs for fallback reasons, and knew that for my particular purpose, OpenAI was the default, it started forcing Claude into EVERYTHING. That's not necessarily surprising to me, but it had honestly never been an issue when I presented code to it that was by default using GPT.
reply