It's necessary for arbitrary information processing if you can forget and have no way to "unforget".
A model can decide to forget something that turns out to be important for some future prediction. A human can go back and re-read/listen etc, A transformer is always re-reading but a RNN can't and is fucked.
If the networks are to ever be a path to a closer to general intelligence, they will anyway need to be able to ask for context to be repeated, or to have separate storage where they can "choose" to replay it themselves. So this problem likely has to be solved another way anyway, both for transformers and for RNNs.
For a transformer, context is already always being repeated every token. They can fetch information that became useful anytime they want. I don't see what problem there is to solve here.
That's just because we twisted it's arm. One could for example feed the reversed input after, ie abc|cba where | is a special token. That would allow it to react to any part of the message.
A model can decide to forget something that turns out to be important for some future prediction. A human can go back and re-read/listen etc, A transformer is always re-reading but a RNN can't and is fucked.