generally you are right in spirit. however multi-armed bandit algorithms are hig...

generally you are right in spirit.

however multi-armed bandit algorithms are highly useful in practice. these are a special case of RL (RL with one state, essentially).

there are even some extensions of applied bandit algorithms to "true RL", e.g. for recommender systems that want to consider history.

this is the place to look for real-world applications of RL.

also RL uses importance-sampling estimators of the gradient. these sometimes show up in other applications though not framed as "RL".