What does “expected reward over an infinite horizon” mean in a Markov Decision Process?
a) The total reward over a finite number of steps
b) The expected long-term reward considering all future decisions
c) The reward for the initial step
d) The reward for a single decision