I wonder if there is a path where a model can be trained for variable reasoning layer reuse and determine at a token level how many times to traverse the reasoning blocks. Much like adjustable reasoning levels now, but only repeating thinking circuits instead of running through full output reasoning chains.
While the author mentioned multiple passes through the block didn't help in this instance, I can't help but wonder if it would work if it were built in during training.
While the author mentioned multiple passes through the block didn't help in this instance, I can't help but wonder if it would work if it were built in during training.