Hacker Newsnew | past | comments | ask | show | jobs | submit | sciences44's commentslogin

I wanted to test the new Qwen 3.5 Small models (released March 2) for a structured output task. I fine-tuned the 0.8B, 2B and 4B on text-to-SQL using LoRA on a Mac (64 GB, MLX), and added Mistral-Nemo 12B as a baseline.

The 2B beat the 12B by 19 percentage points (50% vs 31% semantic accuracy). Larger models are "too smart"? They compute the answer mentally and output "42" instead of writing SQL. 81% of the 12B's errors were plain numbers.

Everything runs locally, zero cloud compute. The repo has scripts, data and full results to reproduce it.


Interesting subject, thank you! I have a cluster of 2 Orange Pis (16 GB RAM each) plus a Raspberry Pi. I think it's high time to get them back on my desk. I never had time to get very far with the setup due to a lack of time. It took so long to write the Ansible scripts/playbooks, but with Claude Code, it's worth a try now. So thanks for the article; it makes me want to dust it off!


Hey HN! Creator here.

What is meshii?

An orchestration layer for 3D mesh generation models (Trellis 1/2, PartPack). Built it because I'm working with a small game studio (Peakeey - English learning game) where our 3D artists can't produce assets fast enough.

Technical approach:

Modular architecture that abstracts different AI models behind a unified interface. Switch between Microsoft's Trellis and NVIDIA's PartPack with config changes, not code rewrites. Handles input processing, model execution, and export normalization.

Current status:

Alpha. It works but output quality needs improvement (as you'll see in the examples). Biggest challenge: the models themselves produce inconsistent topology - post-processing can't fix fundamentally poor generation.

Why I'm sharing:

1. Looking for feedback on mesh generation workflows 2. Want to hear from game devs about what makes this useful vs. just running models directly 3. Open to contributors who know more about mesh processing than I do

Stack: Python, supports multiple export formats (FBX, OBJ, GLB).

Happy to answer technical questions!


Love the originality here - makes you curious to explore more.

Solid technical execution too. Well done!


Really interesting workflow, Addy! Two questions about your approach:

1. *Spec → Implementation process:* How do you ensure Claude actually completes everything in the spec and finds the optimal path? Do you: - Document every step in extreme detail upfront (spec as single source of truth)? - Use an agentic framework that lets Claude self-guide through implementation? - Iteratively validate each step with human checkpoints?

2. *Tool comparison:* Have you experimented with GitHub Copilot vs Claude Code vs Cursor? What made you settle on your current stack?

I'm working on multi-step AI pipelines (3D mesh generation with validation stages), and I find that LLMs often skip edge cases or take suboptimal paths when given too much autonomy.

Curious if you've built any scaffolding/guardrails to keep the LLM on track, or if your spec writing has evolved to be more "agent-friendly"?

The balance between human specification vs. agent autonomy seems like the key challenge going into 2026 especially to allow code in production from agents.


This is fascinating! I'm working on a problem using AI to accelerate game development workflows but never used it for playing.

Curious how Claude handles the game logic constraints in RollerCoaster Tycoon?

Also, what's your iteration loop like? Do you find yourself spending more time prompting or debugging the generated code?

Great work - it is really exciting.


Claude knows the game pretty well! It can struggle with nitty-gritty stuff like placing a path on a specific tile. I tried to make the CLI errors semantically useful, like why an operation failed, not just that it failed


Interesting, thanks for the explanation! Good luck with the project.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: