Sample Learning Complexity

 

Lets say that a 0-level agent does not have perfect knowledge (i.e. its Ki(fi(w)) does not match the oracle M(w) function), then we know that some or all of it's tex2html_wrap_inline1191 mappings must be wrong and need to be learned. If the agent is using some form of supervised learning (i.e. where a teacher tells it which action to take each time), then it is trying to learn one of |Ai||W| possible 0-level models. If instead it is using some form of reinforcement learning, where it gets a reward (positive or negative) after every action, then it is trying to learn one of tex2html_wrap_inline1195 possible models, where R is the set of rewards it gets (tex2html_wrap_inline1199). This means that, if the agent is being taught which actions are better, then it just needs to learn the mapping from state w to action a. While, if it gets a reward for each action in each state, then it needs to learn the mapping from state-action (w,a) pairs to their rewards in order to determine which actions lead to the highest reward.

If, on the other hand, a 1-level agent is wrong, then the problem could be either in its tex2html_wrap_inline1123, or in its KiKj(fij(w)). An interesting case is where we assume that the former knowledge is already known by the agent. This can happen in MASs where the designer knows what the agent should do given what all the other agents will do. So, assuming that tex2html_wrap_inline1123 is always correct, we have KiKj(fij(w)) as the only source of the discrepancy. Since agents can observe each other's actions in all states, we can assume that they learn this knowledge using some form of supervised learning (i.e. the observed agent is the teacher because it ``tells'' others what it does in each w). Therefore, in learning this knowledge an agent will be picking from a set of |Aj||W| possible models.

It should be intuitive (assuming tex2html_wrap_inline1069) that the learning problem that the 1-level agent has is the same magnitude as the one the 0-level agent using supervised learning has, but smaller than the reinforcement 0-level agent's problem. However, we can make this a bit more formal by noticing that we can use the size of the hypothesis (or concept) space to determine the sample complexity of the learning problem. This give us a rough idea of the number of examples that a PAC-learning algorithm would have to see before reaching an acceptable hypothesis (i.e. model).

We first define the error, at any given time, of agent i's action function gi(w), as:
 displaymath1468

where D is the distribution from which world states w are drawn, and gi(w) returns the action that agent i will take in state w, given its current knowledge (i.e. all the tex2html_wrap_inline1235 models). We also let tex2html_wrap_inline1237 be the upper bound we wish to set on the probability that i has a bad model, i.e. one with tex2html_wrap_inline1241. The sample complexity is bounded from above by m, whose standard definition from computational learning theory is:


 displaymath1470

where |H| is the size of the hypothesis (i.e. model) space. Given these equations, we can plug in values for one particularly interesting case, and we get an interesting result.


 theorem212
Proof We saw before that |H|=|A||W| for the 1-level agent, and tex2html_wrap_inline1255 for the 0-level with reinforcement-based learning. Using Equation 2 we can determine that the 1-level agent's sample complexity will be less than the 0-level reinforcement agent as long as |R| > |A|1/|A|, which is always true because tex2html_wrap_inline1199 and |A| > 0.

 
 table224

Table: Size of the hypothesis spaces |H| for learning the different sets of knowledge, depending on whether the agent uses supervised or reinforcement learning. Ai is the set of actions and Ri is the set of rewards for agent i, n is the number of agents, and W the set of possible world states.

This theorem tells us that, in these cases, the 1-level will have better models, on average, than the 0-level agent. In fact, we can calculate the size of the hypothesis space |H| for all the different types of knowledge, as seen in Table 2. This table, along with Equation 2, can be used to determine the sample complexity of learning the different types of knowledge for any agent that uses any form of supervised or reinforcement learning. In this way, we can compare two agents to determine which one will have the more accurate models, on average. Please note that some of these complexities are independent of the number of agents (n). We can do this because we assume that all actions are seen by all agents so an agent can build tex2html_wrap_inline1315 models of all other agents in parallel, and assume everyone else can do the same. However, the actual computational costs will increase linearly with each agent, since the agent will need to maintain a separate model for each other agent. The sample complexities rely on the assumption that, between each action, there is enough time for the agent to update its models.

A designer of an agent for a MAS can consult Table 2 to determine how long his agent will take to learn accurate models, given different combinations of implemented versus learned knowledge, and supervised versus reinforcement learning algorithms. However, we can further refine this table by noticing that if a designer has, for example, tex2html_wrap_inline1123 knowledge he can actually apply this knowledge when building a 0-level agent. The use of this knowledge will result in a reduction in the size of the hypothesis space for the 0-level agent.

The reduction can be accomplished by looking at the tex2html_wrap_inline1123 knowledge and determining which tex2html_wrap_inline1191 pairings are impossible and eliminating these from the hypothesis space of the 0-level modeler. That is, for all tex2html_wrap_inline1323 and tex2html_wrap_inline1067, eliminate from the table of all possible mappings all the tex2html_wrap_inline1191 mappings for which:

  1. There does not exist an tex2html_wrap_inline1329 such that tex2html_wrap_inline1123 and tex2html_wrap_inline1125, i.e. the action ai is never taken in state w, regardless of what the others do.
  2. For all tex2html_wrap_inline1329 it is true that tex2html_wrap_inline1123 and tex2html_wrap_inline1125, i.e. the agent takes the same action ai in w no matter what the others do.

After their application, we are left with a new table tex2html_wrap_inline1349 with tex2html_wrap_inline1351, and each tex2html_wrap_inline1353 has a set Awi associated with it. We can then determine that, if the new 0-level modeler uses supervised learning, the size of its hypothesis space will be tex2html_wrap_inline1357. While, if it uses reinforcement learning, its hypothesis size will be |Ri||Ti|. Table 3(a) summarizes the size of the hypothesis spaces for learning the different types of knowledge given that the designer uses the tex2html_wrap_inline1123 knowledge to reduce the hypothesis spaces of other types of knowledge.

 
 table316

Table: Size of the hypothesis spaces |H| for learning the different types of knowledge. The (a) columns assume the designer already has tex2html_wrap_inline1123 knowledge, while in (b) he also has KiKj(fij(w)). If knowledge is known then |H|=0.

Similarly, if the designer also has the knowledge tex2html_wrap_inline1139, he creates a reduced table Tj for all other agents. The new hypothesis spaces will then be given by Table 3(b).

For example, a designer for our example market economy MAS can quickly realize that he knows what price his agent should bid given the bids of all others and the probabilities that the buyer will pick each bid. That is, the designer has tex2html_wrap_inline1123 knowledge. He also can determine that in a market economy he can not implement a 0-level supervised learning agent because, even after the fact, it is impossible for a 0-level to determine what it should have bid. Therefore, using Theorem 2, the designer will choose to implement a 1-level supervised learning agent and not a 0-level reinforcement learning agent. More complicated situations would be dealt with in a similar way using Table 3.


Jose M. Vidal
jmvidal@umich.edu
Thu Apr 24 15:00:31 EDT 1997