|
The Pond
|
Overview
The Pond is a simulator for studying learning agents. It is an
object-oriented Java framework for building experiments involving
agents, food, and inanimate objects. The system was constructed
to study learning in environments that have impediments to both
learning and survival and to study methods
of agent cooperation in an environment where agents explore unknown
territory.The key features of the Pond are
- Objected-oriented design that allows new types of environments,
agents, food, and inanimate objects to be created
- Virtual or gird location of Pond objects
- Genetic algorithm (GA) that can be used to tune learning parameters
- Reinforcement learning agent
- PondProject class that provides a convenient way to build simulation runs
- Scaled arithmetic for learning parameters
Architecture
The primary classes in the system are- Pond - The class that performs the simulation
- PondObject - Superclass for agents, food, and inanimate objects
- Agent - The superclass for learning agents
- AgentQ - A Q-learning agent that implements
the QLearner interface
- Veggie - The superclass for food. Has color and calorie
attributes
- GrowingVeggie - Food that requires a growth delay after being eaten
- Other - The superclass of inanimate objects. Has a color attribute
- PondEnvironment - This class defines how a single simulation step will be carried out and handles PondObject interactions
- LittlePond - This class implements an environment with changing attributes over time
- PondGA - This class implements a genetic algorithm where a simulation run with a set of heterogeneous agents corresponds to a single GA population
- PondGA2 - This class implements a genetic algorithm where each agent in a population corresponds to a single simulation run
- PondProject - This class is used to contain all the simulation parameters and object classes of PondObjects for a simulation or GA run
- pondType - "Virtual" or "Grid". In virtual mode, pond objects are selected randomly for interaction. One can think of this mode as implementing a pond where objects float into each other and are interacted. In gird mode, pond objects have location in two dimensions. Grid mode has not been completed yet.
- runType - "Simulate" or "GA". In simulation mode, a single simulation run is performed. In GA mode, a genetic algorithm performs a number of simulation runs to optimize some learning characteristic.
- Simulation parameters - These parameters set the maximum number
of iterations and how many times the simulation is run.
- GA parameters - These parameters specify the GA object class and parameters such as population size and mutation rate.
- Pond object parameters - These parameters specify the object class, number, and initial parameters for PondObjects in the simulation.
Survival Experiments
The Pond was used to study reinforcement learning agents in an environment with impediments to survival and learning. In these experiments, there are Q-learning agents (class AgentQ), red veggies, and blue veggies. In the static version of the Pond, the red veggies give nutrients and the blue veggies remove nutrients when they are eaten by agents. In the dynamic version, the role of blue and red veggies change over time.Agents learn to make survival decisions based on a reward they generate internally based on their energy level. Agents only have access to a 4-level health value while making moves and eating veggies affects a finer grain energy level. Learning is also affected by the way veggies behave after having been eaten. Both type of veggies have no nutritional affect on agents who eat them during the growth period that occurs after they are eaten. The affect of the low resolution feedback the agents receive and the veggie growth delay is that the agent reward system is nondeterministic. But, because the feedback is statistically related to the effect of eating the two types of veggies, the agents can learn which veggies to eat. Another impediment to learning and survival is that there is not enough food to support the initial number of agents.
In the dynamic version of the Pond where the roles of the red and blue veggies change, learning is more difficult, but the agents can learn to survive.
The paper, Tuning Q-Learning Parameters with a Genetic Algorithm, describes a set of Pond experiments in both static and dynamic environments. A GA was used to tune Q-learning parameters to increase the survival rate of agents.
Future
Direction
To complete the Pond simulator, grid location mode must be
implemented. Virtual mode reduces the complexity of agents
because they do not have to plan their moves in a coordinate
system. However, grid mode will allow more flexibility and the
ability to study agents that plan physical moves.Once grid mode is completed, I plan to use the Pond to study cooperating agents that explore an unknown terrain. In particular, I want to investigate ways in which the outcome of exploratory actions by one agent can be shared with other agents in a way that leads to quicker learning.
| This site © copyright 2004 by
Ben E. Cline. Contact info: |
|
8/2004 |