Use of evolution of data and software to create coherent behaviour in a non-centralized colony organism

This text is about how the mechanism of evolutionary propagation and competition can be used to make cooperative behaviour emerge in a set of similar entities that interact with each other. Examples of such sets of entities in which the individuals cooperate in such a way that the set as a whole becomes a coherent entity of its own, are: a collection of cells forming a multicellular organism; an ant or other social insect colony; and the collection of neuron cells in an animal or human brain.

In the last example, the information processing that leads to coherent behaviour of the animal is distributed in a decentralized way over many neuron cells, each of which only performs an extremely simple operation, and interacts only with a relatively small set of other neuron cells, mostly located in its close physical neighbourhood. The case of the brain cells parallels the case of a social insect colony, in which each individual insect only performs very simple tasks, interacts only locally, and is not aware of the goals and tasks of the colony as a whole.

The question is by what mechanisms such cooperative behaviour can arise in a set of simple entities that interact only locally, have no "consciousness" or "awareness" of the goals or state of the colony/group as a whole, and have information processing and physical capabilities that are significantly smaller and significantly simpler than that of the colony/group as a whole. That is: how relatively complex behaviour can arise in a group of interacting simple entities in such a way that the group exhibits coherent behaviour as if it were an organism in its own right, without there existing any central "homunculus" in which the "intelligence" of the group is concentrated.

This text presents one such mechanism. The mechanism is demonstrated to work practically, in the example case of a simulated an ant colony-like system. The mechanism presented consists of evolution of data and executable software that are communicated between the "ants". The data and software that is present in an individual ant affects the behaviour of that ant. The replicators that undergo the process of evolution are these data and software items that live inside the "ants", and to which the ants are the vessels or carriers.

For maximum clarity in order to communicate the principle of the mechanism, the example system that is presented is deliberately kept at the minimum possible complexity. It is felt however that the same mechanism should work also in much more complex systems.

In this subsection, we look at how the evolutionary mechanism presented in this text compares with more "traditional" types of adaptive systems. The individual ants in the simulated ant colony presented in this text could be said to "learn" to behave in a cooperative way. The approach followed in how this learning/adaptive mechanism is set up, is however different from traditional adaptive systems.

Any adaptive system needs feedback, a "figure of merit" (FOM) that tells the adaptive system how good or bad a solution s_i is that it has generated. On the basis of this FOM, the adaptive system can then adapt the solutions it generates. Most adaptive systems are used in a situation where a human judge is responsible for generating the feedback:

In a biological animal, there is also a similar feedback, but there is no human judge. Instead, the animal is an autonomous entity that interacts with its natural environment, and it is this natural environment that generates the feedback:

This second type of feedback is what is used in the evolutionary mechanism in this text. In the process of evolution of the replicating entities (= the data and software items that are communicated between the simulated "ants"), there is no figure-of-merit feedback function that is explicitly specified or programmed by a human operator; instead, the replicating entities are left free to propagate into and interact with the environment in which they live (= the interacting ants), and the feedback is formed simply by how well any replicator proves to propagate and compete with other replicators in that environment.

The point-of-view of traditional adaptive systems with a human-specified FOM function is therefore in two ways different from the evolutionary point-of-view presented in this text:

The above two alternative points-of-view each highlight another type of mechanism. One is the "traditional" type of adaptation of a entity capable of learning to a human-given FOM feedback function, the other is "free" evolution of an entity living inside a group of physical organisms in which the feedback is formed by how well the replicating entity propagates in the environment formed by these interacting physical organisms.

In the second type of mechanism, the basic players are the data and software items that are communicated between the "ants"; but nevertheless the effect of the mechanism is that the ant colony, seen as a single organism, exhibits "learning" behaviour -- namely, the ant colony learns how internally to behave coherently, that is, how to make use of its constituent parts in such a way that the parts cooperate to the whole. Since there's no human responsible for giving feedback to the underlying evolutionary process of replicating data and software items, there's also no human feedback-generator needed to guide the "learing" behaviour of the colony (the colony learning to behave internally consistently and cooperactively) that emerges as a result of this underlying mechanism.

Therefore, the evolutionary mechanism presented here is felt to be relevant to the topic of designing autonomous systems -- systems that should adapt to a given environment and learn to survive in it autonomously, without a human operator being involved.

These two mechanisms, the adaptation to human-generated feedback and the autonomous adaptive behaviour emerging from an "uncontrolled" evolutionary process, are in principle not mutually exclusive: most real-life intelligent human-designed systems would probably be a mix of both types of mechanisms going on at the same time. The most appropriate view might be that each practical AI system is a mix of both types of mechanism. "Intelligent" adaptive systems might usefully be ranged on a gradual scale between the two feedback types. In the activity of designing of AI systems, reformulating a system in terms of another system at another point on this scale would seem to be an important kind of engineering activity, involving various kinds of trade-offs. For example, to make a given AI system more autonomous, we displace it in this scale and change the mix of the two mechanisms into a mix that contains a stronger degree of the "implicit-feedback" evolutionary component.

This view would seem to be also relevant to the broader subject subject of metaheuristics [Ref.: Dorigo _Metaheuristics Network_ text, "workprogramme_applicants.pdf"], because it points out a general type of "design dimension", or "degree of freedom", in designs of "intelligent" systems.

I have a feeling that it is one of the characteristics of the field of metaheuristics, that it operates in the grey overlap area *in between* the two extreme types. Strictly speaking, there is always a mix of the two: A human-designed genetic algorithm has to perform well in human society, and thereby is also an autonomous system that gets feedback from how well it propagates itself in its environment of human minds and libraries. A swarm-bot that is as autonomous as we can make it, and is set free to runs around unsupervised in a natural environment without direct human intervention, is still created by a human, and no doubt with excessive behaviour of the autonomous swarm-bot humans will step in and affect the situation, and thereby act as the "human judge" in the first diagram.

In a more strongly autonomous system, the fact that evolution is inherent in the feedback mechanism makes that the "goal" to the system is the survival and "prosperity" (= survival/propagation power) of the system. For example, an ant colony gathers food and buidling materials for the "goal" of furthering the survival/propagation of the colony. (Note however that neither the ants nor the colony as a whole need have any "awareness" of this.) This seems rather to be what we want in an artificial swarm-bot: we'd like it to autonomously figure out that gathering energy (e.g. new batteries) and materials (e.g. new sensor parts, or replacements for broken parts) from its environment, is good for the swarm-bot.

1.3 - The simulated swarm-bot system used to demonstrate the mechanism in operation

I am at this moment creating a very simple (minimally complex) simulation of a swarm-bot that demonstrates this second, "implicit", type of feedback at work in a swarm-bot. The simulation has "ants" that learn to form connected chains. The goal of forming chains is not pre-programmed, and there is no explicit function that assesses the ant or swarm-bot behaviour and explicitly generates FOM feedback numbers. Instead, the ants communicate their programming to each other; the programming that is most successful in getting propagated and in overwriting other programs is the one that survives. The chain-forming "goal" is implicitly programmed into the system only by programming things so that ants can swap or communicate their programming only when connected to each other. The ants thus need to connect to make their programming propagate; i.e. the ants that seek to connect are the ones that have their programming propagated, which is why the chain-forming behaviour spreads and takes over in the whole swarm.

The essential point in all this is that the "goal" that we humans have with the swarm-bot, e.g. the goal that it forms chains or other geometric forms, must be translated into a goal of maximum propagation of some kind of essential entity that "lives inside" the swarm-bot, for example propagation of programming, of specific data values, etc. Only when our human goal is reformulated as such an evolutionary propagation of something inherent in the swarm-bot, can the autonomous type of adaptation, with the "implicit" feedback directly from the swarm-bot's environment, take off.

A swarm-bot is a collection of simple robots, here called bots, in which the individual bots cooperate in such a way that the collection of bots exhibits coherent and "purposeful" behaviour when looked upon as an organism in its own right, similarly as an ant colony behaves as a single coherent organism. [Ref: Dorigo, file "Swarm-bots.descr.x.applicants.pdf"] [Ref: Dorigo & al, _Swarm Intelligence_] The idea of a swarm-bot is obviously inspired by social insects such as ants or bees.

The individual bots have no knowledge or consciousness of the goals of the colony (swarm-bot) as a whole. Each individual bot interacts not with the swarm as a whole, but instead only with its own immediate surroundings. E.g. it interacts with bots it happens to meet physically and is in physical contact with; or it leaves marks in the environment that are then sensed by other bots that happen to pass along the mark; or it broadcasts pheromone and senses the pheromone emitted by other bots. The interesting point in swarm-bots is that coherent behaviour of the swarm arises from decentralized and local interactions between the individual bots; that is, that coherent behaviour of an entity made up from many small parts can arise without there being present any centralized coordinator.

... The program is intended as a, deliberately minimally complex, example demonstrating how the process of evolutionary propagation and competition can be used as a mechanism to create coherent behaviour in a swarm-bot; that is, how evolution can be used as a mechanism to make the bots "cooperate" in such a way that, looked at from a higher level, there emerges coherent and "purposeful" behaviour of the swarm-bot. Also, the aim is to show (1) how such "cooperation" can arise by mechanisms that are not driven by any will in the bots to cooperate and instead are driven by "survival-of-the-fittest" evolutionary survival, and (2) to show how such coherent behaviour of a more complex entity (the swarm-bot) can emerge from being built up from parts that are only relatively "dumb" and automatic actors that do not have any knowledge of the whole.

A key aspect in making evolution work in this situation and for the purposes described above is that the bots serve as vessels, i.e. as a substrate, in which the evolution of the replicating entities can play out itself. The replicating entities that undergo, or act out, the process of evolution are in this case not the physical bots themselves or other tangible physical things, but instead are pieces of data and/or software that are stored in, and communicated between, the bots. For the mechanism to be able to result in change of behaviour of the bots (towards more "cooperative" behaviour), it is of course essential that the replicating entities must affect the behaviour of the bots; hence the replicating entities, while they are communicated as data, must in each bot act (among others) also as a "software programs" that affect the behaviour of the bot.

Seen from the perspective of the propagation of the replicating entities, the "cooperative" behaviour of the bots that emerges as a result of the evolution of these replicators, will always be such that the evolution-end-product behaviour is more favourable to robust survival (resistance against being overwritten or wiped out) and speedy propagation (copying into other bots) of these replicators than competing behaviours.
Behaviour that makes the bots communicate
The replicators that elbow out the other replicators,

The bots live in a 2-dimensional chessboard world. Each bot occupies one square, and each square is either empty or contains at most one bot. The number of bots is fixed; during the simulation no bots are deleted and no new bots are created. A bot has a definite front side, and can face in four directions: east, south, west, and north. Each square on the chessboard is regarded as having four neighbours, namely the squares with which it has edges in common; squares that only touch with corners are not regarded as neighbours. Each bot has a unique integer id number that is indelibly imprinted into it; and each bot knows its own id number.

Each bot has at its front a "gripper" with which it can grasp and clamp on to the bot in the square in front of it. A bot can be clamped from any neighbouring square and from any direction, however a bot can only be clamped by at most one bot at the same time. When a bot X is clamping another bot Y, a communication link from X to Y is established, through which bot X can communicate things into bot Y. This is the only direct, 1-to-1 communication that is possible between bots.

.     .     .     .     .     .     .
.     .     .     .     .     .     .
                    `                
                     3--             
.     .     .     .     .     .     .
                     |               
                     2               
.     .     .     .     .     .     .
.     .     .     .     .     .     .

The figure on the left schematically shows a chessboard of 6 by 4 squares, with two bots on it. The dots show the positions of the corners of the squares. The body of each bot is represented schematically by the integer id number of the bot. The direction in which the bot is facing, and the gripper at the front of each bot, is shown by a short line ( -- , | ) projecting from its body. Bot 2 is facing north, and is clamping bot 3 which is facing east. When a bot is being clamped by another bot, a small mark ( ` ) is drawn at its upper left.

There is one other communication mechanism between bots, namely: each bot can emit pheromone (scent), which is "broadcast" around, and which can be sensed by other bots. The pheromone spreads infinitely rapidly into all empty squares, and decreases in concentration as the distance from the pheromone emitter increases.

Each bot contains a small memory in which the following two data values (variables) are stored:

The ic of each bot, as long as it is not actively overwritten by another bot, "by default" is equal to the fixed integer id number of the bot. Each bot's ic is initialized with the bot's id number at the beginning of each simulation run.
The only way in which the ic value of any bot Y can be changed to another value than this default value is when another bot X clamps onto Y and communicates its ic value into it. A bot Y that is being clamped is a passive receiver of whatever ic value is written into it, and as long as bot Y remains clamped it simply stores and remembers unchanged any current ic value that it has. In that way, ic values can be communicated along the whole length of a chain of bots, in the direction from the bot at the "base" (the bot that clamps another but is not clamped itself) to the bot at the "head" of the chain (the bot that is clamped but doesn't clamp another bot).
However, when the gripper of bot X releases the bot Y that up till then it was clamped to, then the ic value of bot Y immediately reverts to its "default" value, the fixed id number of bot Y. The effect of this is therefore that two bots can only have the same ic values as long as they are part of the same connected chain.

The pheromone emission and -sensing and the clamping action are automatic, "unconscious" low-level processes in the bots. The gripper and the pheromone emission/sensing machinery operate continuously, in parallel to the execution by the bot in each turn of one of the five elementary moves { n, l, r, f, s }.
The value ic in the bot's memory affects the operation both of the pheromone emission and -sensing of the bot, and also of its gripper. The automatic action of the gripper and of the pheromone emitting/sensing machinery is such that they facilitate the behaviour of a bot to go find and clamp on to the nearest non-clamped bot with another ic as itself. That is, the way in which ic affects these automatic low-level processes in a bot assists the process of propagation of a bot's ic value into other bots. In general, such low-level facilitation to the propagation of the replicators being already firmly present seems a necessary precondition for propagation (and therefore evolution) of these replicators to be able to take off. (Low-level processes need to evolve before higher-level processes, that make use of the low-level processes in their propagation, can evolve.)

A complete and detailed description of the automatic gripper action is as follows:
When a bot X at any time finds another bot Y in front of it that is not already clamped by another bot and that has another ic value than itself, then the gripper of X automatically clamps on to Y.
The clamping is a connection that only serves to establish a direct data communication link between the bots; it is not a physically strong connection that e.g. keeps the clamped bot immobile. Functionally, the clamping action is no more than one bot plugging in the connector of a data communication cable into the receptor connector of another bot. Each bot has only one receptor connector; when this is occupied, no other bot can clamp on to it.
A bot Y that is clamped by another bot X can still execute every action. A clamped bot that turns but stays on the same square stays clamped.
When bot X is clamping bot Y, and either of the bots then moves in such a way that bot Y is no longer in front of bot X -- for example, when Y moves forward or when X turns 90 degrees --, then this automatically breaks the clamping connection.
Barring un-clamping resulting from such physical movement, there is only one situation in which the clamping connection is actively broken, namely: When bot X is clamping bot Y, and when the ic value of bot X is equal to the id number of bot Y, then bot X un-clamps itself from Y.

The rationale behind these automatic clamping and un-clamping actions of the gripper is as follows. The only function of the gripper of bot X is to communicate the data values of X into the bot Y that it's clamping. That is: the gripper is an aggressive tool, serving to propagate (overwrite) the ic value of bot X into other bots, thus making more copies of the replicator ic.
The gripper of bot X is an autonomous sub-unit of the bot, that is automatically "attracted" by a bot that appears to have different data values than bot X: in such a case, the gripper recognizes a target that can be overwritten, and actively clamps on to the target bot. The gripper of bot X is not attracted by a bot with same ic as bot X, for spending energy in executing a clamping action only makes sense if the target bot presents any new room for expansion of the replicator ic, i.e. if it doesn't already have identical ic as bot X.
Once a clamping connection is made, the gripper, in the absence of significant "repulsion", normally just passively stays connected where it is. When the gripper of bot X finds itself connected to a bot Y with the same ic as X, then the gripper feels neither attracted nor replused by that same ic value.
However, when the gripper sees that bot X and bot Y have the same ic values, then the gripper looks a level deeper, and examines the "default" values of the ic of both bots, i.e. the id numbers of the bots. If the gripper of bot X then finds that the ic value that is its purpose to communicate into Y is identical to the default ic value of bot Y, then the gripper recognizes that it is trying to propagate an ic value into the very bot from which this very ic value originated in the first place, and that therefore it is working against the grain of propagating the value ic into the world: it recognizes that it is "rowing upstream" into the source of that ic value. When the gripper sees this situation, then it feels repulsed, and immediately un-clamps.

2.3 - Detailed description of pheromone emission, -propagation, and -sensing

The integer value ic stored in each bot is used in pheromone emission and pheromone sensing. The pheromone emitted by a bot is of a typical kind unique to its ic value. (One could say that the bot broadcasts pheromone on "wavelength" ic.)
A bot senses the pheromone concentration in each of the empty squares neighbouring it, in the following way :-- A bot is insensitive to the pheromone emitted by bots with the same ic value as itself; that is, it is insensitive to the pheromone that it emits itself. (This makes sense, since a sensor that picks up signals that it emits itself is not of much use.) For the bots with other ic than itself, each bot senses their pheromones indiscriminately bunched together; i.e. it senses, in each of the empty squares neighbouring it, only the total superposed pheromone concentration due to all bots with ic values different from its own.
The pheromone emitter in a bot is always on, except in bots that are being clamped, in which it pheromone emission is automatically turned off as long as the bot is clamped.
The result of this is that a bot can tell from its pheromone sense which of its empty neighbour squares is the one nearest to the nearest non-clamped bot with another ic value than its own -- that is, where to move in order to find the nearest non-clamped bot with different ic.

A bot's process of choosing, in each turn, which of the five elementary moves { n, l, r, f, s } to execute, is largely automated as well, in such a way that if the bot is not currently clamping another bot, it automatically chooses the action that moves it towards the highest pheromone concentration it senses. This mostly hard-coded selection process is as follows: (The bot that is executing the selection process and the selected action is referred to as "I")

Within this mostly hard-coded process of choosing which of the five elementary actions to execute, there is only one degree of freedom left, one choice that is still open to the bot, namely the choice between execution of either the action 'n' or the action 's' in the case that the bot is clamping another bot. This gives rise to the following two "high-level strategies" between which the bot can choose:

Both of these two strategies make a bot X seek the nearest pheromone-emitting other bot if the bot X is not currently clamping another bot, but keep the bot X clamped to its current "target" if it is currently clamping another bot. The only difference between the two strategies is the action the bot executes in case it is clamping another bot: strategy q(n) never communicates anything into another bot; strategy q(s) is a maximally agressive strategy in which the bot communicates its data values into another bot as much as possible.

( Note from the pheromone-seeking behaviour in the "else"-branch of the high-level decision process described above that when all bots have the same ic value and therefore emit the same type of pheromone, and when therefore none of the bots senses any pheromone, that in such a case no bot will move or turn from its current position/orientation, even when it is not clamping another bot. Another, more obvious, situation in which none of the bots will physically move or turn is when all bots are clamping another bot. Either of these cases happening means that a static situation is reached where none of the bots will move or turn any more. )

Note how the sensor input into the bot and the affecting or influencing by the sensor input of the behaviour chosen by the bot, is hidden away into these high-level strategies. On the level of choosing between the two strategies q(n) and q(s), the bot is oblivious of the sensor input. All that a bot does in each turn, is to choose and execute one of these two high-level strategies. For example, choosing strategy q(n) means that the bot issues to its lower-level processes the command: "seek maximum pheromone, but remain inactive when I'm clamped". It is left to (delegated to) these lower-level processes to actually gather and examine the sensor inputs needed to excute this command.

The bot chooses between the two strategies q(n) and q(s) on the basis of the number bv in its memory. The number bv is the likelihood of the bot choosing strategy q(s) over q(n). For small bv, the bot most often chooses strategy q(n); for large bv, the bot most often chooses strategy q(s). For bv = 0, the bot always chooses q(n); for bv = 1, the bot always chooses strategy q(s). The number bv therefore determines how "agressively" the bot behaves as regards copying of its data values into other bots.

Since the value bv controls the behaviour of the bot, the number bv therefore acts as the "programming" of the bot. The number bv can be seen as the representation of how the top-level, relatively most intelligent, decision process is wired in the bot; that is, as the contents of the "brain" of the bot (bv stands for "brain value"). The one number bv is the complete recipe or program in a bot's brain for how to behave, in the same way as the characteristics and connections between the neurons in an animal brain form a complete recipe for how the animal behaves (where "behaviour" = how the bot or animal responds to external stimulae).

Although the number bv is an extremely minimal example of a program, still functionally it is a program, a piece of software. The number bv is also software in the sense that it is executable code (i.e. a command) that is not hard-wired, but is mutable in the course of the simulation. When a bot X copies its ic and bv values into another bot Y, bot X thereby changes the programming of bot Y. The number bv is therefore like a "meme" [Memetics refs], a rule of thumb or precept that affects behaviour, that is communicated between humans or eny other species capable of "cultural transmission".

The organization of the course of the simulation as a whole is as follows. The bots make discrete moves, one after the other. The simulation proceeds in discrete turns. In each turn, one random bot is selected, and this bot then makes one move. This move consists of the bot choosing and executing one of the strategies q(n) or q(s).
The random selection of the bot that is executed in each turn is such that in each turn every bot has the same probability of being selected; that is: on average, every bot is executed equally frequently. This way of executing the simulation is used to mimic simultaneous and "parallel" operation of the bots in a real-life situation, where each bot is continuously active and doesn't synchronize its movements with any other bot.

Initialize with random bot positions (no 2 bots on same square) and random ic and bv, where bv in range [0, bvmax].

Compare simuations with different bvmax. Results: low bvmax --> no chains, only pairs. High bvmax --> all bots in same chain.

n	"No operation": do nothing
l	Turn left 90 degrees (but remain on the same square)
r	Turn right 90 degrees (but remain on the same square)
f	Move forward one square
s	Communicate data values into the bot I'm clamping, that is: overwrite that bot's data values with my own.

Online treasures