cores are high bandwidth bus masters. Making a crossbar that supports 5 high bandwidth masters (4x core + dma) is likely harder, larger, and higher power than one that supports 3.
It's actually 10 masters (I+D for 4 cores + DMA read + DMA write) versus 6 masters. Or you could pre-arbitrate each pair of I and each pair of D ports. But even there the timing impact is unpalatable.