I am currently working on a project trying to convert some numerical linear algebra from Fortran90 into Python and was frankly shocked at how much more performant some relatively naive Fortran is, even over using Numpy/Scipy for matrix operations. Orders and orders of magnitude faster. No, I still don't totally understand why.
I would absolutely expect python to perform orders if magnitude slower. Even optimal python calling numpy is going through multiple levels of abstraction copying data around. Fortran is raw operations, compiled and with the data packed together.
I would strongly suspect that you are misusing Numpy in some way.
Python is horrifically slow, everyone knows that, but if used correctly numpy can even the playing field a great deal. I've seen lots of python/numpy written using nested loops, and this is always going to be really really slow.
This is somewhat of a link dump since the code is definitely too hard to interpret for someone not familiar but this is me converting some fairly performant Fortran code to python w/ numba at only a 1-2% performance penalty.
You're probably doing some unnecessary data movement. Using a modern profiler like scalene might point out where some improvements are possible: https://github.com/plasma-umass/scalene
lots of avoidable copies in NumPy, and even if you manage to get rid of that, the python interpretation overhead isn't amortized until your matrix dimensions are in the several-hundreds at least.