I managed to find a GT 1030 GDDR5 for about 25 bucks, so I got it for testing purposes. It fits nicely in the bottom PCI-E x4 slot (connected to the chipset).
I only tested Batman: Arkham City so far (4K, max settings, MSAAx4).
When I put PhysX just on the 4070, the GPU is underutilized in all PhysX-heavy scenes in the benchmark, even as low as 60%.
When I put PhysX on the 1030, the 4070 is fully utilized in most PhysX-heavy scenes, only the scene with shattered glass shows lower utilization.
The utilization of the 1030 never goes above 50%. Not sure how relevant that is. I assume that PhysX calculations don't use many parts of the GPU (like TMUs, ROPs etc.).
There definitely seems to be some kind of bottleneck, since the 4070 is underutilized when it's the only active card (and the glass scene is always the lowest performing one, including on the CPU).
I'd be interested to see the results with a 5070 Ti, I wonder if there would be a similar bottleneck of some kind. Maybe I'll get to find out later this year.