Markov Descision Process (1960s Howard a.s. ) : Unlocking The Astonishing [S* 23]

Steven Willers
Jan 1, 2024
5 min read

THE INDIVIDUAL WHO WAS ASKED TO FIND ROUTES

During the World War 2 ,In the 1949 , the US battleship found samples of radioactive materials sample , it was basically some isotopes which don't last more than one month , but the only place it could get there is from a nuclear explosion, but US didn't had performed a Nuclear Bomb Test after the Trinity (Manhattan Project) , So the only place you could get a sample is from Russia (Soviet Union) , it could create not just a weapon race but also a chance to use it .This led to John Von Neumann, the person who discovered this game , he was hired to find a way to help to gain startegy to let the game win. And the orginal copyright of game

Two members of a criminal gang are arrested and imprisoned. Each prisoner is in solitary confinement with no means of speaking to or exchanging messages with the other. The police admit they don't have enough evidence to convict the pair on the principal charge. They plan to sentence both to a year in prison on a lesser charge. Simultaneously, the police offer each prisoner a Faustian bargain. If he testifies against his partner, he will go free while the partner will get three years in prison on the main charge. Oh, yes, there is a catch ... If both prisoners testify against each other, both will be sentenced to two years in jail. The prisoners are given a little time to think this over, but in no case may either learn what the other has decided until he has irrevocably made his decision. Each is informed that the other prisoner is being offered the very same deal. Each prisoner is concerned only with his own welfare—with minimizing his own prison sentence.

The reason why I wanna tell you this story on this new year eve is so interesting and delightful, it is the best example of the Game Theory. Game theory in machine learning involves modeling strategic interactions between multiple decision-making agents. It helps analyze scenarios where the outcome of one agent's actions depends on the actions of others. This framework is often used in reinforcement learning and multi-agent systems to understand how intelligent agents can make optimal decisions in dynamic and competitive environments.

But I am not here to tell you about the game because always tell the last story at the end of blog , I am here to tell that the agents in the game theory makes Descion on a Remarkable Process called Markov Descision Process . So all of the data I will mention here are taken from Dynamic Programming and Markov Processes (Book by Ronald A. Howard) and Artificial Intelligence: A Modern Approach (by Peter Norvig and Stuart J. Russell).

What is Descision Process (Descision Tree)

Despite from the game theory , Descision Process in Machinesplays an important role in various aspects like AI designing, Autonomous Vehicles (how to overtake) , classifying the images and the arrangement of data in Artificial Intelligence.

A simple Descion Process example can be contributed as making a descision wheter you will wait for a table or not is given below:

In a decision tree, the decision process refers to the series of choices made at each node to arrive at a final decision or outcome. The nodes are count from downward to upward.

In the above example each node is in purple colour. Each internal node represents a decision based on a specific feature or attribute, and branches emanating from that node represent the possible outcomes or choices. This process continues until a leaf node is reached, which signifies the final decision or classification.

The image shows every time the individual(a) will wait for table as in green colour and every time it will not wait for table in red colour

At each decision node, the algorithm evaluates a feature and makes a Boolean decision (e.g., true[1] or false[0]). This process continues recursively, creating a tree structure where each path from the root to a leaf node represents a unique decision-making path based on the input features.

The decision process in a decision tree involves traversing the tree by making decisions at each node, ultimately leading to a final decision or prediction based on the input features.

Markov Descision Process

A Markov Decision Process (MDP) is a mathematical framework used to model decision-making in situations where an agent interacts with an environment over time. It is named after the Russian mathematician Andrey Markov. Here are key components of an MDP:

1. States (S): These represent the different situations or configurations that the system can be in. The system evolves over time, transitioning from one state to another based on the agent's actions and the inherent dynamics of the environment.

Actions (A): These are the possible decisions or moves that the agent can make. The set of actions available to the agent depends on the current state.

Transition Probabilities (P):This function defines the probability of transitioning from one state to another given a particular action. It represents the dynamics of the system and captures the uncertainty in the transitions.

Rewards (R): At each state, the agent receives a numerical reward based on the action taken and the resulting state. The goal of the agent is typically to maximize the cumulative reward over time.

Policy (π): A policy is a strategy or a mapping from states to actions, indicating the decision the agent should make in each state. The objective is to find an optimal policy that maximizes the expected cumulative reward.

Discount Factor (γ): This factor accounts for the agent's preference for immediate rewards over delayed ones. It determines the importance of future rewards in the decision-making process.

The dynamics of an MDP can be succinctly expressed using the Bellman equation, which relates the value of a state to the expected sum of discounted future rewards.

Solving an MDP involves finding the optimal policy, often achieved through iterative methods like value iteration or policy iteration. Reinforcement learning algorithms often leverage MDPs to train agents to make decisions in complex environments.

Story of The Process

The term "Markov Decision Process" (MDP) is named after the Russian mathematician Andrey Markov. Andrey Markov was a pioneer in the field of probability theory and stochastic processes. But , I was astonished when heard the fact the MDP was professionally used in United States while it is named after a Russian Computer Scientist. His work laid the foundation for the development of Markov processes, which are essential components in the study of decision processes, including MDPs.

Advantages of MDPs

The MDP framework has been widely used in various fields, particularly in the fields of operations research, artificial intelligence, and reinforcement learning.

Back On Prisoners Dilemma

However to find the best strategy include in Prisoners Dilemma rge computer scientist write a bunch of codes , while some of the codes work actually and some ven loose well . But the main task was never forget here they entered 12 programms with different startegy , but the best known Strategy is Tit For Tat , the computer scientist ran simulation of 200 rounds at five times (totally 1000 times ) and among the other twelve startegy Tit For Tat won . However, of you want to find actuall hardcode of game you could look out.

This article offers a glimpse into the captivating world of MDP and Descion Tree. I admire the fact , that this article doesn't mention a related detail about MDP but that's is the story of next Saturday. It will be a journey that spans the intricacies of artificial neurons, the power of hidden layers, and the transformative potential of these technologies.

-Thanks for your extreme Patience and Precious Time