I really like all the compress-to-X links at the bottom and the convert from X to Y tools. Especially the Discord one with presets for different target file sizes based on Discord's subscription tiers.
I've been using server-based (online, upload required) tools for this sort of stuff, but am now going to be using this.
Pretty cool find considering I have no need for a full-fledged video editor right now, and was just checking this out for fun.
I'm honestly unsure what could be improved at this point.
Consistency? So it fails less often?
Based on the released images, (especially the one "screenshot" of the Mac desktop) I feel like the best images from this model are so visually flawless that the only way to tell they're fake is by reasoning about the content of the image itself (ex. "Apple never made a red iPhone 15, so this image is probably fake" or "Costco prices never end in .96 so this image is probably fake")
Yep. “Where’s Waldo” has been a classic challenge for generative models for a while because it requires understanding the entire concept (there’s only one Waldo), while also holding up to scrutiny when you examine any individual, ordinary figure.
I experimented with the concept of procedural generation of Waldo-style scavenger images with Flux models with rather disappointing results. (unsurprisingly).
If you asked me what I expected, since this one has "thinking", it'd be that it would've thought to do something like generate the image without Waldo first, then insert Waldo somewhere into that image as an "edit"
I'm been impressed when testing this model today, but it still can't consistently adhere to the following prompt: make me an image of a pizza split into 10 equal slices with space in between the them, to help teach fractions to a child.
It doesn't reliably give you 10 slices, even if you ask it to number them. None of the frontier models seem to be able to get this right
> I'm honestly unsure what could be improved at this point.
That's because you're focusing a little bit too much on visual fidelity. It's still relatively trivial to create a moderately complex prompt and have it fail miserably.
Even SOTA models only scored a 12 out of 15 on my benchmarks, and that was without me deliberately trying to "flex" to break the model.
Here's one I just came up with:
A Mercator projection of earth where the land/oceans are inverted. (aka land = ocean, and oceans = land)
So I guess while "realism" (or believability) is really good now, prompt adherence has much room for improvement.
(though put it another way, realism has always been "solved" if the model gets to output whatever it wants as long as it looks realistic, though now it looks less like a malfunction and more like an inattentive human mistake or oversight, so even when it gets it wrong it's hard to tell it's wrong without knowing what the prompt was)
> it's hard to tell it's wrong without knowing what the prompt was.
Yeah this is actually a huge point of frustration on reddit where lots of people post their "impressive generative images" but fail to disclose the prompts so the audience is only able to evaluate realism/fidelity and not how faithfully the model actually followed the prompt.
Xenon is very rare too and currently without substitute for certain medical applications, but more interestingly it produces psychoactive effects that could shed light on stuff no other substance apparently can: https://pmc.ncbi.nlm.nih.gov/articles/PMC11203236/
There's two trillion kilograms of it in the atmosphere. People sometimes get confused because it's one of the rarest element in the Earth's crust. But that's because it floats away.
The video was really good, and the UI looks fun too.
If I understand correctly, is this not as useful for frameworkless html/css/js development? Since when you make edits using browser-built-in-devtools it can and does modify the actual css files (in-memory, of course) which you can use to copy-replace with entirely (assuming no build/bundling step aswell).
If so and this allows you to use any framework and still have that kind of workflow, that's fantastic. Half the reason I don't like using frameworks is because I lose the built-in WYSIWYG editor functionality. Guess I'd still lose the usefulness of the built-in js debugger, tho :(
Yeah I mean you can basically achieve this set up even with frameworks, if you're using stylesheets, but it's the copy/pasting and finding source code that is usually the pain. With this you just press apply (or enable auto-apply) and your agent gets to work. You can also edit the content, add/remove/reorder elements etc, I don't know how good the browser dev tools are at writing all that back though.
When you said Agent and AI, I thought there would be some way for us to resize or move elements, and have the agent figure out the right properties to change (whether it's margin, padding, top, left, and on the wrapper or whatever) ideally in a way that's cleanest WRT surrounding/existing CSS.
But I can see the more deterministic nature of the current offering being a plus too since there's no worry about the agent doing things you didn't "approve" or in the "wrong way"
I agree the original poster exaggerated it. But generally models indeed have stopped growing at around 1-1.5 trillion parameters, at least for the last couple of years.
>Even now, I don't know if parameter count stopped mattering or just matters less
Models in the 20b-100b range are already very capable when it comes to basic knowledge, reasoning etc. Improving the architecture, having better training recipes helped decrease the required parameter count considerably (currently 8b models can easily beat the 175b strong GPT3 from 3 years ago in many domains). What increasing the parameter count currently gives you is better memorization, i.e. better world knowledge without having to consult external knowledge bases, say, using RAG. For example, Qwen3.5 can one-short compilable code, reason etc. but can't remember the exact API calls to to many libraires, while Sonnet 4.6 can. I think what we need is split models into 2 parts: "reasoner" and "knowledge base". I think a reasoner could be pretty static with infrequent updates, and it's the knowledge base part which needs continuous updates (and trillions of parameters). Maybe we could have a system where a reasoner could choose different knowledge bases on demand.
Yea, webapps (even PWAs) still can't compete with native apps when it comes to responsiveness, but I still don't know why. I've yet to see even a demo PWA that passes the "native turing test" where I can't tell whether it's a native app or not.
Even native apps that were built with cross-platform frameworks feel a bit "off" sometimes.
Can't relate. Except for Google Maps and Docs, I can't think of a native app that couldn't be a WebView. Hell, most of them are anyway!
The worst kind is French banking apps or IBKR app: many features are native, but then because of some weird tech debt or incompetent tech leadership, they'll sometimes show you web pages in a shitty, slow, completely different UI-wise built-in WebView for mundane tasks like downloading a PDF statement.
WASM apps get around this for the most part but there's so many more layers between the app and the hardware for web apps compared to native, plus it's javascript. And a lot of the cross-platform frameworks use a javascript bridge so that becomes the bottleneck. Kotlin/Compose multiplatform is fast on everything.
I feel like its because other than the user, the people involved have a benefit to running native instead of as a webapp. The phone OS companies get their percent of apps developed in their stores and the app developers get better access to your data to resell. Apple in particular has been really hostile to webapps.
I'd like to second that I wish a lot of those existed.
For 1), 60fps is another good one.
It seems Youtube also removed "sort by upload date" if I'm not mistaken. The closest we can get now is the "uploaded today" filter but it's not the same since it still seems to prioritize popularity over recency, surfacing mostly second-hand sources or popular "reactions" to the primary-source videos (that also exist on Youtube!) I'm actually looking for.
Edit: IIRC they even used to have an "uploaded in last hour" filter, but I'm not sure. Can anyone confirm this?
I was hoping to finally see an advanced time filter so I could do something like "over 2 minutes" but it seems you've only got the same ones Youtube has (< 4 minutes, 4-20, and > 20).
If it's an opaqueness restriction with the API or something, I'd like to suggest letting us at least combine the provided ones, so I could do something like (4-20) && (> 20) to get "over 4 minutes" which doesn't exist on Youtube but seems pretty useful.
Another thing that would be useful is filter-by-channel since the search function within Youtube for searching a channel's uploads (using the search button on a channel's page) is a significantly nerfed version of their usual search function.
I wish I could select "between 2.5 and 6 minutes". That search can translate to 2 queries to YouTube (<4 and 4-20), then the results can be combined and pruned to keep only those between 2.5 and 6. To get enough videos if there aren't enough after pruning, we could access the 2nd, 3rd and so on pages from the results. But I doubt YouTube will like 6 searches in a row.
Yea and I'm doubtful we'll see a service willing to do their own post-processing per-query while also being at the whim of Youtube's API (official or not).
Ultimately, I would like these features to come to Youtube itself since there's a lot of nice features built into it that would be hard for a third-party to replicate without permission (such as playing videos inline on hover, with captions).
I doubt it will ever happen. This is Google after all, not a small company we can hope will get it right after a while. They've left the search parameters shitty for years. Google, arguably the most advanced search company ever, can't make an efficient filter for custom time ranges? They obviously can, but, as other comments have noted, they seem to think that good search is an anti-feature.
I looked into youtube's search filters recently, and the length range option is stored as an enum (stored as base64-encoded protobuf in the sp query parameter), so it doesn't look like it's possible to set to specific values.
> And now, with the assistance of AI, I can go much farther in 10 hours and deliver a more complex project. But that means that someone else trying to replicate this execution is still going to need around 10 hours to replicate it.
The blog post does touch upon this. The key difference, I believe, would be that compute scales in a way "meat-heads" doesn't, where if the other person has 100x the capital to throw at it, they could do the same 10 hour thing in 10 minutes.
Basically, what I got from it was that innovation has never been truly scalable enough to create the "dark forest", since hiring more and more engineers saturates quickly. But if/when innovation does become scalable (or crosses some scalability threshold) via AI, that could trigger a "dark forest" scenario.
reply