It's got nothing to do with pipelining
I'm not talking about the processor pipeline to complete instructions, I'm referring to the practice of adding a register in the middle of a combinatorial circuit to break the critical path in two parts - it is named the same, so confusion could happen.
Every integer op will block a processor's integer execution port until it's finished, whether it's a multiplication or addition and that's because they're all using the same addition circuitry
Also this is not precise, the reason behind what you are referring is because of the fact internally either you can compute one operation at a time, so you have only one port, or if you are on a "Tomasulo-like" architecture, you have to dispatch, and you share queues/entries/ports - whatever you wanna call them - but still they are not going to the same unit. There is no point in that, as I said, it would be too slow.
I'll make a practical reference, just consider
this which is one of the simplest open-source architectures available.
Here is where the adder unit is defined, while
this point to
this and it is where the multiplier is defined.
And this architecture is targeting very very small devices, still there are 2 units for these two operations.
Moreover, that multiplier takes more than 1 cycle - actually 3 cycles - to perform the computation. Not because you need 3 cycles to do that - for a 32 bit multiplication you would need 32 cycles of additions - but because that is the tradeoff in speed they choose.