The key features of the Pond are
- Objected-oriented design that allows new types of environments,
agents, food, and inanimate objects to be created
- Virtual or gird location of Pond objects
- Genetic algorithm (GA) that can be used to tune learning parameters
- Reinforcement learning agent
- PondProject class that provides a convenient way to build simulation runs
- Scaled arithmetic for learning parameters
- Pond - The class that performs the simulation
- PondObject - Superclass for agents, food, and inanimate objects
- Agent - The superclass for learning agents
- AgentQ - A Q-learning agent that implements
the QLearner interface
- Veggie - The superclass for food. Has color and calorie
- GrowingVeggie - Food that requires a growth delay after being eaten
- Other - The superclass of inanimate objects. Has a color attribute
- PondEnvironment - This class defines how a single simulation step will be carried out and handles PondObject interactions
- LittlePond - This class implements an environment with changing attributes over time
- PondGA - This class implements a genetic algorithm where a simulation run with a set of heterogeneous agents corresponds to a single GA population
- PondGA2 - This class implements a genetic algorithm where each agent in a population corresponds to a single simulation run
- PondProject - This class is used to contain all the simulation parameters and object classes of PondObjects for a simulation or GA run
- pondType - "Virtual" or "Grid". In virtual mode, pond objects are selected randomly for interaction. One can think of this mode as implementing a pond where objects float into each other and are interacted. In gird mode, pond objects have location in two dimensions. Grid mode has not been completed yet.
- runType - "Simulate" or "GA". In simulation mode, a single simulation run is performed. In GA mode, a genetic algorithm performs a number of simulation runs to optimize some learning characteristic.
- Simulation parameters - These parameters set the maximum number
of iterations and how many times the simulation is run.
- GA parameters - These parameters specify the GA object class and parameters such as population size and mutation rate.
- Pond object parameters - These parameters specify the object class, number, and initial parameters for PondObjects in the simulation.
Agents learn to make survival decisions based on a reward they generate internally based on their energy level. Agents only have access to a 4-level health value while making moves and eating veggies affects a finer grain energy level. Learning is also affected by the way veggies behave after having been eaten. Both type of veggies have no nutritional affect on agents who eat them during the growth period that occurs after they are eaten. The affect of the low resolution feedback the agents receive and the veggie growth delay is that the agent reward system is nondeterministic. But, because the feedback is statistically related to the effect of eating the two types of veggies, the agents can learn which veggies to eat. Another impediment to learning and survival is that there is not enough food to support the initial number of agents.
In the dynamic version of the Pond where the roles of the red and blue veggies change, learning is more difficult, but the agents can learn to survive.
The paper, Tuning Q-Learning Parameters with a Genetic Algorithm, describes a set of Pond experiments in both static and dynamic environments. A GA was used to tune Q-learning parameters to increase the survival rate of agents.
Once grid mode is completed, I plan to use the Pond to study cooperating agents that explore an unknown terrain. In particular, I want to investigate ways in which the outcome of exploratory actions by one agent can be shared with other agents in a way that leads to quicker learning.
|This site © copyright 2004 by
Ben E. Cline. Contact info: