The new RepeaterBot site, which describes a different approach, is now here.
VHF and UHF radios are limited to a large degree to line-of-site communications. To provide communications in a hilly region, radio repeaters can be used to pick up a signal from some source A and retransmit it to a receiver B that is located in some area blocked from receiving signals directly from A. Repeaters are typically placed on a mountain or tower; however, a repeater can be placed near the end of a ridge to provide communications to and from radios on both sides of the ridge as illustrated below.
Using autonomous robots to place repeaters would be useful in a number of applications. For example, robots exploring the landscape of a distant planet could share information and strategy over their radio links. In hilly terrain, any particular robot might not be able to hear all the other robots due to signal blockage. However, intelligent robots could position themselves to facilitate exploring while still maintaining a wide coverage, repeater-based network.
My initial attempt to solve this problem uses Q-learning, a reinforcement learning technique¹. The robots can sense their immediate surrounding and know which beacons and robots they can hear. Based on this input from the environment, the robots can internally generate a reward for each action they take. The robots are trained on three mazes, and then attempt to solve the repeater placement problem for the three training mazes and two new mazes.
Although convergence is difficult in this setup (see Tuning Q-Learning Parameters with a Genetic Algorithm²), the Q-learning approach did partially succeed. With careful tuning of learning and reward parameters, the robots consistently converged to a solution for the three training mazes. Only occasionally did the trained robots solve one or both of the additional two mazes.
See RepeaterBot Q-Learning Notes³ for more details on the approach. Evolution of Meta-parameters in Reinforcement Learning 4 is a master's thesis on using a GA to evolve q-learning parameters.
Click here to view solutions to all mazes from one RepeaterBot run.
The algorithm was able to solve three or more mazes 80% of the time. (With different tuning, a 90% rate was attained but with less probability of finding a solution to all five mazes.) Solutions to four mazes were found about 25% of the time, and all mazes were solved about 10% of the time. Tuning also affected the average number of trails needed to solve the mazes.
The 2 beacon, 2 robot problem is a simple version of the full repeater placement problem. In a real environment, one can envision terrain that would require many robots. The simple version of the problem does not address a reward system that would work in the general case.
The simple approach does not use robot cooperation. But, in a system where robots can communicate over their radio links, valuable information could be passed among robots that would potentially help with repeater placement and exploration.
By using this analogy to ants, trail-following techniques could be used by robots to build repeater chains in a manner that ants use to return food to their nest. Robots losing repeater contact could retrace their trails until they were again in repeater range of other robots and then transmit their coordinates and path to recruit other robots to follow them. Or, groups of robots could explore a path together. When radio communications with other sections of a maze were lost, one of the robots could retrace their trail to try to reinstate communications with other robots. Or, a more strict ant-like simulation could be used where the robots explore and return to their "nest". The trails of all the exploring robots could be used to construct a map of the terrain that could be used to plan repeater placement. But, in the case of RepeaterBots, returning to the "nest" is inefficient as long as location information can be broadcast to other robots.
There are a number of challenges associated with this approach to the repeater placement problem. First, trail retracing will disrupt exploration until more robots can be enlisted to provide intermediate communications. In the case where more robots are needed to provide communications than are available, some mediation is needed between the goals of exploring and communicating with other robots.
Second, because there are multiple robots moving independently, trail retracing might not immediately restore full radio coverage. So, there is a challenge to build robust controllers that can handle situations where communications are lost indefinitely or where other robots are lost due to mechanical or environmental conditions.
An appealing feature of the artificial life approach to the repeater placement problem is that it should be easily applicable to a large number of exploring robots because they can have relatively simple controllers that cause the robots to explore and follow "pheromone" trails. I believe this approach will scale well as the number of robots and the size of the area to be explored increase.
The ALIFE version of the RepeaterBot system is currently under development.
¹ Sutton, R. and Barto, A. (1998). Reinforcement Learning. MIT Press, Cambridge, MA.
² Tuning Q-Learning Parameters with a Genetic Algorithm
³ RepeaterBot Q-Learning Notes
4 Eriksson, A. 2002. Evolution of Meta-parameters in Reinforcement Learning. Master's Thesis.
|This site © copyright 2004 by
Ben E. Cline. Contact info: