If there is so much performance difference among generic allocators, it means you need semantic optimized allocators (unless performance is actually not that much important in the end).
Agreed mostly. Going from standard library to something like jemalloc or tcmalloc will give you around 5-10% wins which can be significant, but the difference between those generic allocators seem small. I just made a slab allocator recently for a custom data type and got speedups of 100% over malloc.