How is the “reward matrix” used in Markov Decision Processes?
a) To determine the transition probabilities between states
b) To assign values to each possible state-action pair
c) To calculate the expected return time
d) To create a Markov Chain graph