newbye2's comments

newbye2 · on May 12, 2022

Perhaps this is a silly reasoning, but my first impression is that a that a good jit compiler requires a fast compiler, so golang meet that requirement better than Rust, obviously Rust has a more advanced/featured compiler but in a jit compiler for ruby the optimizations are going to be mostly for short parts of code (no heavy math or long loops), so the jit could try to find places where the code use vectors, maps or list of same type elements to avoid having to determine the dynamic type. That is, my impression is that in web related tasks the greater benefit comes from detecting parts of code that can be transformed in static type code and then apply a fast compiler to that part. But perhaps Rust was chosen by its security model rather than by its performance, in which case I agree. But I think that a variant of go using borrow/owner var semantic could be the best track to construct a great YJIT.

carlmr · on May 12, 2022

As far as I understand it the JIT compiler is written in Rust, that doesn't mean that it produces Rust. So the output of the JIT can be in assembly, C or something else that compiles fast and runs fast.

pizza234 · on May 12, 2022

I haven't had a look at the YJIT design, but your reasoning:

> detecting parts of code [...] and then apply a fast compiler to that part

applies to the previous JIT (MJIT) design, which created C snippets and compiled them on the fly. I don't think this is the case for YJIT, although I'm not 100% sure.

Surely this isn't the case, for example, for TenderJIT, which, for the bytecode instructions, builds assembly code and (optionally) invokes the original interpreter functions.

newbye2 · on May 7, 2022

Newbie question: What happen with the huge number of strings created?, byte sequences are cleaned with gc but the strings take memory.

tgv · on May 7, 2022

First, a string object is a small object with a pointer to the actual character sequence. The tokenizer produces the same number of these for all approaches. Second, in the new approach, that character sequence is part of the original input. All string objects point to a slice of the same "backing store", unlike the old approach where each string object had its own memory with a copy from a small part of the input.

newbye2 · on May 7, 2022

Two questions: first, how cache friendly is that backing store? and second how the backing store is updated when the strings are garbage collected? At first sight it seems that a stack allocated byte sequence can be easily removed but a backing store allocated sequence is not easily handle nor cache friendly.

The general problem seems to be about dynamic dispatch of methods, in go using fat pointers, so the author avoid the dynamic dispatch by modifing the code.

tgv · on May 7, 2022

> how cache friendly is that backing store

It's the original input buffer. It's less fragmented than creating small copies for every token. The backing store stays alive as long as there's a single token pointing to it. It's a bit inefficient if there's a lot of whitespace, comment, etc. in it, or if you're only interested in a very small portion of the tokens, but it doesn't have the overhead of a memory block per string (which is at least 16 bytes). So it depends, I guess.

I don't think dynamic dispatch comes into play in go. The strings are represented in the same way.