You can blame whoever invented the word "if", as soon as you can branch based on data you can just write an interpreter that turns data into instructions, no matter the architecture.
FWIW, you can make software that runs on Harvard architecture chips. They feature distinct address spaces for ROM and RAM. It's been a while, but it's how Atmel/Microchip AVR micros work.
Not really, most of these configuration as code systems are not executed directly on the CPU but rather interpreted in which case a separate data-only memory would not stop anyone.