Usually it simply means repeating the loop body N times so that the amount of loop iterations can be reduced to 1/N.
For example, if you know that the iteration count is divisible by 4, you could do something like:
int unrolledN = n / 4;
for (int unrolledI = 0; unrolledI < unrolledN; unrolledI++) {
int i = unrolledI * 4;
// loop body...
i++;
// loop body...
i++;
// loop body...
i++;
// loop body...
}
This wouldn't really offer any advantage over the plain loop, though. Next you'd need to reorganize the loop body so that e.g. memory reads for all iterations would occur at the start of the unrolled loop. This kind of optimizations can offer significant performance increases because you get more control over what the CPU is doing within the loop, but they also depend greatly on the target platform. Even different x86 processors can be very different in this respect, so unrolling can become a disoptimization easily.
For example, if you know that the iteration count is divisible by 4, you could do something like:
This wouldn't really offer any advantage over the plain loop, though. Next you'd need to reorganize the loop body so that e.g. memory reads for all iterations would occur at the start of the unrolled loop. This kind of optimizations can offer significant performance increases because you get more control over what the CPU is doing within the loop, but they also depend greatly on the target platform. Even different x86 processors can be very different in this respect, so unrolling can become a disoptimization easily.