A theoretical model of measure-valued Markov processes simulating the divergent thinking of man
Wang Zhen-Zhen,Xing Han-Cheng,Zhang Zhi-Zheng,Ni Qing-Jian
DOI: https://doi.org/10.3321/j.issn:0469-5097.2008.02.006
2008-01-01
Abstract:This paper presents a model called measure-valued Markov decision processes(MVMDPs) and within this model the understanding of the agent to the environment is denoted by the mathematical notion of measure.The agent decides his own optimal action according to this measure and then acquires his optimal policy.So we present an algorithm of finding optimal policy under MVMDP,which can also be considered as the approximate optimal policy algorithm of partially observed Markov decision processes(POMDPs).This model is a generalization of a partially observed Markov decision process,that is,partially observed Markov decision process is a particular case of the measure-valued Markov decision process.Be that as it may,it is essentially different from all other papers about POMDPs.Firstly,the main spirit of general POMDPs is to transform partially observable Markov decision problems on a physical state space into a regular Markov decision problem(MDP) on the corresponding belief state space,and such researches all identify the belief state as a probability distribution over the state space.So most of the POMDP models based on this spirit pay more attention to algorithm of various kinds for finding the optimal policy and to novel refinements of existing techniques.However,our work is not based on the transformation between the POMDP on a physical state space and the MDP on a belief state space.On the contrary we take the measure,a more general notion than belief state,on the state space as a new studying object.Then the Markov decision problem we will discuss is taking place on the space composed of these measures.In this way,we have a measure-valued Markov decision process.Secondly,MVMDP,based on the latest theory of measure-valued branching processes in modern probability,reflects an important characteristic of human mind: that people think about problems and choose their own optimal actions in contexts where all the possible states are caught(i.e.,they are able to appropriately measure the state space).In other words,in many cases when the solutions to a problem have not yet emerged or even when the problem itself can be explicitly expressed,the style of human thinking does not move from a definite point to another definite point with time,as the logical reasoning does,but evolves through the changes of the whole grasp of the problem,that is it can creatively proceed from "area" to "area".This "area" manner is what we called "measure" which reflects the understanding of people to the environment.For this reason human mind obeys the laws of quantum mechanics,that is it exists in a probability manner.This phenomena embodied in our model is the evolution of the random variable taking measures on the state space.In summary,we think MVMDP not only can deepen the understanding of MDPs and POMDPs,but also can provide a new pattern for studying the characteristics of human mind.Just as people can manufacture aircraft only after having deeply understood the aerodynamics,we can deepen the studying of AI only when we have deeply understood the essence of human mind.MVMDP is just the initial effort to try to deepen the understanding of human mind.