Hacker Newsnew | past | comments | ask | show | jobs | submit | dcan's commentslogin

I will agree - async rust on an operating system isn’t all that impressive - it’s a lot easier to just have well defined tasks and manually spawn threads to do the work.

However, in embedded rust async functions are amazing! Combine it with a scheduler like rtic or embassy, and now hardware abstractions are completely taken care of. Serial port? Just two layers of abstraction and you have a DMA system that shoves bytes out UART as fast as you can create them. And your terminal thread will only occupy as much time as it needs to generate the bytes and spit them out, no spin locking or waiting for a status register to report ready.


Despite my panning of async elsewhere on this thread, I agree with you here. Embassy is a thing of beauty and a great use of Rust's async. Much of my embedded career was bogged down managing a pile of state machines. With async/await and embassy, that just goes away.

you’re talking to the core of the issue. In no other language did they try to satisfy the case of running on an embedded system vs general purpose computing. Async rust tried to, and came up with a solution that is not great for the majority of programmers writing rust.

I wish to God that the rust library devs would admit to this fact - say that async rust should stay for embedded runtimes usecases, but we shouldn’t be forcing async across the majority of general purpose computing libraries. It’s just not a pleasant experience to write nor read. And it really doesn’t give any performance benefits.


I write reams of async rust for a living, and completely disagree with this characterization. The concurrency primitives in the futures crate are able to elegantly model the large majority of places where concurrency is needed, and they are nicely compostable.

More than once, we have wanted to improve the performance of some path and been able to lift the sequential model into a stream, evaluated concurrently with some max buffer size. From there, converting to true parallel execution is just a matter of wrapping the looped futures in Tasks.

Obviously just sprinkling an async on it isn’t going to make anything faster (it just converts your function into a state-machine generator that then needs to be driven to completion). But being able to easily and progressively move code from sequential to concurrent to parallel execution makes for significant performance gains.


I completely disagree. Having to make sure every little function is Send + Sync + lifetime even if it doesn't need it is fucking hell. writing concurrent code with plain kernel threads is so much easier to write and read.

If you just want to build a normal backend service, you can't escape async libraries. Wrapping the async functions with `block_on` is not ideal I'd rather just have access to standard sync primitives that don't need me to bring an entire async runtime into the system.

My ultimate point is - I would be happy if async stayed in its own world. But the fact is async has completely polluted the rust library landscape and you can't escape it. I'm working on a project that I hope to show rust users that async isn't needed for performant backend services, and that the code can be written much simpler without it.


You don’t have to make everything Send/Sync if you don’t need to. Use tokio’s local runtime and spawn_local(), or use one of the other async runtimes.

You also don’t need to spawn() futures to await them. Spawn enables parallelism on the multithreaded runtime, holding join handles, etc. If all you need is to execute concurrent code, though, the various combinators and functions in the futures crate lets you do so without having hard requirements on Send/Sync. The large majority of the concurrent code I write uses nothing specific from the tokio crate, including spawn.

As is often the case in rust, the compiler is also telling you the correct thing. If you’re using the multithreaded runtime and spawning, your code may execute in another thread, so it has to be Send/Sync, and since the ownership of the future is transferred to the executor, it must also be ‘static.


You literally can't use one of the other async run times because of the current state of async/await does not allow library authors to easily write for multiple runtimes - they were written for one runtime in mind and that is just tokio. And if you're pulling in library methods you're still stuck with the method headers they specify.

All of your arguments are just mental workarounds trying to justify how fucked the rust ecosystem is for traditional backend services.

The project I'm working on is specific to making traditional kernel threads faster (150-200 nanosecond context switches compared to 1500-2000 nano seconds for normal kernel threads). It requires a user scheduler but you can swap those out without any changes to how you write rust. In my testing, it's not only faster than async rust but also much easier to write. I hope it convinces people like you, that are hell-bent on defending the current state of async rust, that there are better paradigms and we don't have to be locked in to shitty, verbose concurrent code.


You’re moving the goalposts, and seem to have a vested interest in this that I, frankly, don’t.

Send/Sync/static is not needed on tokio’s local runtime, which doesn’t require any adjustments to your libraries.

Passing data between threads requires Send/Sync/static, except for certain cases like scoped threads, so making OS threads faster doesn’t seem to solve that issue like using a local runtime would.

Many async libraries (though certainly not all) are runtime-independent. If your library doesn’t have to spawn, it is easy to write runtime-independent code. I would like to see some spawn traits brought into std to make it easier to write libraries that have to spawn, though.

I’ll always try new ways of doing things, but you are making the assumption that the way you feel is the way everyone feels, and totally dismissing the opinions of those who don’t. It puts me off of whatever solution you might be proposing, since you clearly don’t have the empathy to understand the full range of positions of the people whose problems you’re ostensibly trying to solve.

I’m not trying to convince you the way you feel is wrong, but you are wrong that everyone thinks writing async code is miserable. There are times where it’s hard, or where the compiler emits confusing messages about async closures being not generic enough, but on the whole I enjoy writing async rust, so shoot me.


I haven’t moved any goal posts - I’ve coded async rust and is miserable compared to normal rust with threads. That has been my point which is why I started down this project.

My entire goal is to show that coding the same server with pre-async hyper vs post async hyper is nicer and more performant than async rust. I hope to show you it in just a few days.


09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0


That's an illegal number mate. Straight to the slammer!

(for those missing out: https://en.wikipedia.org/wiki/AACS_encryption_key_controvers...)


Thabk you so much this is a beautiful rabbit hole to go down


plz stop! my hddvds...


Patents and licensing, usually

https://news.ycombinator.com/item?id=39543291


Ah so HDMI is one part of it, that's really unfortunate. Thank you for this insight.



After wrecking the suspension of 2 e-scooters downtown from craptastic roads, I can say nothing of value would be lost from avoiding downtown ATX.


I see 15g CO2(eq)/km in the bottom left-hand corner


Terminating HDCP is difficult, you’d have to downgrade it to HDCP 1.4 and then have a 1.4 ‘compliant’ (see: device on the end for it to be a dummy monitor. If you need anything newer than HDCP 1.4, it’s likely not possible.


I did a tear down of this Monoprice dongle: https://tomverbeure.github.io/2023/11/26/Monoprice-Blackbird....

It terminates as an HDCP 2.0 endpoint and converts to HDCP 1.4. You’d still need an HDCP 1.4 sink to make it work though.


I'm using the Monoprice multiviewer. It negotiates HDCP without a display attached. Other than being a bit big and expensive, and being unable to strip HDCP, it's a good solution.

I found the same device in generic packaging on AliExpress, but haven't had the chance to order that version, yet.

There are lots of professional SDI converters and such, but they are either $3k+ or "call for price".


That was written by you?

I don't agree with this section:

> The HDCP converter simply announces itself as a final video endpoint… yet still repeats the content to its output port. Without a very expensive HDMI protocol analyzer, we can’t check if the source is tagging the content as type 0 or type 1, but there is no reason now to think that it’s not type 1.

There's no magic in the HDMI protocol that says type 1 vs type 0. Its just another HDCP message over DDC, but it is only sent to repeaters. In this case, since the HDCP Repeater is lying about not being a repeater, it isn't getting sent the StreamID Type information.


You’re probably right.


Great teardown. Can these things remove HDCP altogether? It seems like if it can report that the sink is HDCP2.x then it can do so even if it has no compliance at all right? So that would mean it streams an encrypted stream to something that needs to then still do the decryption? These devices seem like they'd be underpowered to do that in real time at 18 Gb/s.


I assume the silicon can do it, but it’s not exposed to the user, because that would almost certainly be a license violation.


Try reading a 40+ page document with track changes enabled (and 100+ changes) - it pins a full CPU core for 5 seconds when you go to the next page!


Gray text on a black background is an awful colour choice for this website


I got black text on a white background


the site uses the css prefers-color-scheme to see if your system has light or dark theme selected and chooses the colors based on that.


indeed. the offwhite headline color should also be set to the body text.


I tried Portal RTX on a 9070 XT and got 20 FPS at full resolution (no frame generation). There’s no driver limitations, but I have no idea what the expected FPS is


yikes that's dismal, I wonder what a 5070 gets


Depends if you count real or fake frames and if it fits in what little VRAM Nvidia gives their captive customer base.


To be more precise, four CPUs - two ARM and two RISC. There is just a mux for the instruction and data buses - see chapter 3 of the [datasheet](https://datasheets.raspberrypi.com/rp2350/rp2350-datasheet.p...).

It’s space-inefficient as half of the CPUs are shutdown, but architecturally it’s all on the same bus.


> It’s space-inefficient as half of the CPUs are shutdown

In practice is doesn't matter very much for a design like this. The size is already limited to a certain minimum to provide enough perimeter area to provide wire bonding area for all of the pins, so they can fill up the middle with whatever they want.


They should have filled it with more SRAM instead - 520KB is far too little.


What difference would the extra 16KiB or whatever instead of the 2 RISC-V cores make? If 520KB is far too little for you, you're likely better off adding a 8 MiB PSRAM chip.


Just 16KB? Couldn’t a lot more be fitted?

PSRAM has huge latency.


SRAM takes up a tremendous amount of space compared to logic. Usually at least six transistors per bit, plus passives, plus management logic.


SRAM is big in gate count. typically 6 transistors per bit.

The i386, a 32 bit chip already dragging around a couple of generations of legacy architecture came in at 275,000. I would imagine the Hazard3 would be quite a bit more efficient in transistor usage due to architecture.

16K is 16384(bytes) *8(bits per byte) *6(transistors per bit) = 786, 432


It was the first CPU on my desk! 80386SX 25MHz.

(this one, only 32bit internally)


Thanks for the explanations - was not aware.

…vertically stack a slab of SRAM above or beneath the CPU die, does come to mind ;)


This is way too expensive for something like a microcontroller. AMD calls this 3D V-Cache and uses it on their top end SKUs.


But doesn't the ESP32-S3-WROOM have some large on-chip RAM?

For the Pico, say, something in the line of the approach taken by many smartphone SoCs that package memory and processor together.


The ESP32-S3 has 512 KB of SRAM, and the RP2350 has 520 KB of SRAM. The ESP32-S3-WROOM does indeed come in configurations with additional PSRAM, but that would be comparing apples and pears. The WROOM is an entire module complete with program flash, PSRAM, crystal oscillator etc. It comes in a much larger footprint than the actual ESP32-S3, and it is entirely conceivable that one could create a similar module with the same amount of PSRAM using the RP2350.

Furthermore, the added RAM in both cases is indeed PSRAM. That being said, the ESP32-S3 supports octal PSRAM, not just quad PSRAM, which does make a difference for the throughput.


> "some"

And go cellphone style: Package-on-Package or Multi-Chip Module of some sort.

Wouldn't the massive increase in capabilities from adding 8MB-16MB of closely-integrated, fast RAM far outweigh the modest price increase for many applications that are currently memory-constrained on the Pico?


> But doesn't the ESP32-S3-WROOM have some large on-chip RAM?

They use the same PSRAM chips with relatively bad latency you complained about higher up in the thread. There are boards like those from Pimoroni that even have them on the PCB from the factory.

> For the Pico, say, something in the line of the approach taken by many smartphone SoCs that package memory and processor together.

What for? This only saves you PCB space, the latency is not going to be affected by this. There probably won't be enough people ordering those to justify the additional inventory overhead of (at least) 2 more skews.


I believe there's already a separate Flash die in the same package. Probably not possible to add yet another die for DRAM.

(for various chemistry reasons, it's much more efficient to manufacture Flash, DRAM, and regular logic on separate wafers with different processing)


Wouldn't the massive increase in capabilities from adding 8MB-16MB of closely-integrated, fast RAM far outweigh the modest price increase for many applications that are currently memory-constrained on the Pico?


It may be technically space inefficient but they only added the RISC-V cores because they had area to spare. It didn't cost them much.


Source for the RISC-V cores being essentially free (Luke Wren is the creator of the RISC-V core design used):

> The final die size would likely have been exactly the same with the Hazard3 removed, as std cell logic is compressible, and there is some rounding on the die dimensions due to constraints on the pad ring design.

https://nitter.space/wren6991/status/1821582405188350417


Funny thing is that it cost them more than you might think. It was the ability to switch to the riscv which made it (much) easier to glitch. See the "Hazardous threes" exploit [1]

[1] https://www.raspberrypi.com/news/security-through-transparen...


I wonder if they're using the same die for one or more microprocessor products that are RISC-V-only or ARM-only? They could be binning dies that fail testing on one or the other cores that way. Such a product might be getting sold under an entirely different brand name too.


They're not currently doing that but there is a documented way to permanently disable the ARM cores, so they could sell a cheaper RISC-V-only version of the same silicon if there's enough demand to justify another SKU.


That may be the plan for the future. Right now, this is a hedge / leverage against negotiations with ARM. For developers looking to test their code against a new architecture and compare it to known good code/behavior, it doesn’t get any easier than rebooting into the other core!


I find this whole concept remarkable, and somewhat puzzling.

Have seen the same (ARM + RISC-V cores) even at larger scales before (Milk-V Duo @1GHz-ish). But how is this economical? Is die space that cheap? Could you not market the same thing as quadcore with just minor design changes, or would that be too hard because of power budget/bus bandwidth reasons?


SRAM is very area intensive. What you're asking for is very greedy. The RISC-V core they are using is absolutely tiny.


Thats also a good point. For the big Milk-V systems I mentioned they use external DRAM-- but cache might still be a die-space issue (I'd assume that it's always shared completely between ARM/RISC-V cores, and would need to be scaled up for true multicore operation).

But I'm still amazed that this is a thing, and you can apparently just throw a full core for a different architecture on a microcontroller at basically no cost :O


two things:

1) it needs a certain perimeter to allow all the pins to go from the silicon to the package, which mandates a certain sized square-ish die 2) only the cores are duplicated (and some switching thing is added)

so yes, there is enough space to just add another two cores without any worries, since they don't need more IO or pins or memory or anything.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: