The biggest thing you'll notice in practice from ICC is that it is much more likely to unroll a loop and transform the body to use SSE instructions when requested. If you have dense numeric code and aren't either using ICC or hand-unrolling and using GCC intrinsics, you're probably leaving performance on the floor.
But, there's still no free lunch. For example, as of about two months ago, ICC will unroll loops whose increment is "i++" but will not unroll loops whose increment is "i+=1". Some insight, looking at output assembly, etc. is still required.
Interesting. Another thing I noticed is that code runs faster if floating point and integer arithmetic instructions are interleaved rather than "blocked" together.
But, there's still no free lunch. For example, as of about two months ago, ICC will unroll loops whose increment is "i++" but will not unroll loops whose increment is "i+=1". Some insight, looking at output assembly, etc. is still required.