Sometimes people ask: “with your minds-everywhere framework, you might as well say the weather is intentional too!”. The assumption being that 1) these things can be decided from an armchair (by logic alone), and 2) that this would be an unacceptable implication of a theory (i.e., we can decide in advance, by definition, that whatever else, systems like the weather cannot be anywhere on the cognitive spectrum).
My answer is: “I don’t know, have you ever tried training it? We won’t know until we do.”
I think it is essential that we don’t make assumptions or just have feelings about where things belong on the spectrum of persuadability. We need to do experiments. Fortunately, the tools of behavior science can be applied outside of their normal domain, of brainy animals behaving in 3-dimensional space. Thus, the field of diverse intelligence is emerging, to help us get better at recognizing, predicting, building, and relating to unconventional systems with degrees of intelligence (competencies at solving problems in some space). And one of the earliest lessons it teaches is to have humility about our ability to recognize the beginnings of minds in novel embodiments. This post is about a new paper (in preprint form – not yet peer-reviewed), which explores that idea in silico.
One important aspect of diverse intelligence research is basal cognition: looking for minimal systems that show early, simple versions of intelligent behavior: learning, memory, problem-solving, decision-making, etc. This is crucial not only to understand the evolutionary origin of our own complex cognition, but also for AI and synthetic morphology which seeks to create novel systems of various degrees of agency. Really it’s a fundamental part of the biggest question of all: what are the necessary and sufficient ingredients for forming a mind?
Most such work is done in biological model systems such as slime molds, bacteria, or single cells. The problem with any biological system, no matter how “simple”, is that there is always more mechanisms to be discovered. That is, if you see an unexpected interesting behavior, it is always possible that it is explicitly baked in and implemented by some mechanism you just haven’t found yet. For this reason, some people are studying the behavior of simple physical systems, like inorganic chemical droplets.
Being interested in surprising competencies in unexpected substrates, I tried to think of the most outrageous example I could. Now, we did something like this once before – in two papers on Boolean and continuous gene-regulatory networks, we showed that even very simple network models had surprising proto-cognitive behaviors – several different kinds of learning including Pavlovian conditioning. But I wanted something even simpler and more minimal.
I landed on the idea of using sorting algorithms – these are short (<10 lines of code), classic algorithms for sorting a mixed up string of numbers so that they end up in perfect order. The nice things about these are that: 1) they are very simple and transparent – you can see all the code in one glance, there is no place for undiscovered mechanisms, 2) they’ve been studied by generations of computer science students for decades, and everyone thinks they know what these algorithms can do, 3) they are totally deterministic – no randomness or any oracle elements, and 4) the process of sorting a bunch of elements that begin scrambled, into an invariant order, bears an uncanny resemblance to the rearrangements that morphogenetic remodeling can accomplish to repair, for example, the tadpole face.
Taining Zhang, Adam Goldstein, and I studied these because I wanted the maximal surprise value. I hypothesize that intelligence (agential behavior implementing problem-solving) is present at the lowest levels in our Universe, not requiring brains or even life per se. Specifically, I am not interested in emergence of mere complexity. That is easy; simple cellular automata (CAs) such as the Game of Life enable huge complexity to come from very simple rules, as do the fractals emerging from short generative formulas. But what such cellular automata and fractals do not offer is goal-seeking behavior – they just roll forward in an open-loop fashion regardless of what happens (although, I must say that I am no longer convinced that we have plumbed the depths in those systems – after what we saw in this study, I am motivated to start looking for goal-directed closed-loop activity in CA’s as well, who knows). What I am after is very minimal systems in which some degree of problem-solving capacity emerges, and specifically, capacity that is not specifically baked in by an explicit mechanism. It is a related set of questions to those studied by Stuart Kauffman in his inspiring books.
We used the methods of TAME, which focuses on perturbative interventions to gauge an agent’s behavior in some problem space, to ask what these sorting algorithms can do. In other words, we treated them as a new kind of animal and characterized how they traverse the sorting space, and what they do when presented with barriers.
We made 2 small but fruitful changes to the algorithms. First, instead of a top-down God’s-eye view where an omniscient single algorithm controls how each element is moved, we went bottom up, with a distributed approach: each number (we call them cells, in a 1-dimensional “tissue” array) has its own ability to carry out the algorithms and its own local preferences about what neighbors it wants to see to its left and its right. So this is now (like in biology) a parallel process in which each cell exercises its own agenda, using the standard sorting algorithm as a policy for how to get to the right location by swapping with neighbors until everything is sorted. Thus, we use a cell’s eye view, not a global top-down view. The first result was that it turns out that bottom-up control (with no global knowledge) works perfectly well to solve sorting tasks.
The second change is that we implemented “unreliable computing”. In standard sorting algorithms, it is assumed that when the algorithm issues a command for two cells to switch positions, they do it. There is no notion of failure, and the algorithm never checks to see if it worked. We introduced the concept of “broken cells” – ones that either lack initiative to move themselves, or are so broken they cannot be moved by other cells either.
Note that we did not change anything else – we didn’t add code to enable cells to see how well they are doing, whether a move worked or not, or any other embellishment. They’re still short, deterministic algorithms with nothing added to make them any smarter. Everything described below is emergent, and surprising (and hadn’t, to our knowledge, been discovered despite the very wide use of these algorithms). The details of our work (currently in review) are described in this preprint.
We did many experiments to understand the properties of this system (see the preprint for details); I want to focus on just a few specific ones here. Our two changes allowed us to study aspects that have not been studied before. By introducing the notion of broken cells, we get to ask how this system fares under perturbation; this is crucial because, as William James said, the measure of intelligence is whether you can achieve the same goal by different means, when surprising circumstances get in your way. What do these algorithms do, when they are proceeding on their way through sorting space, moving around as needed, and then BAM – they come upon a barrier – a cell that just won’t move. The algorithm itself makes no provision for this scenario. I turns out that not only can they still complete the task, they do it via delayed gratification: the ability to temporarily get further from their goal in order to do better later. This is illustrated in William James’ example of the magnets vs. Romeo and Juliet; shown here in this diagram by Jeremy Guay:
While humans, many animals, and autonomous vehicles can get around obstacles to get to their goal, magnets for example will not – they are simply trying to minimize energy and thus can’t even temporarily go against the gradient, pull further apart, to then go around the barrier to meet each other and get even closer together. The ability to do some degree of delayed gratification is one component of intelligence.
Where would a simple sorting algorithm fit on this scale? As emphasized in my TAME framework, is that you cannot guess in advance, just by knowing the components of the system (which in this case, are perfectly known – the algorithm is there for all to see): you have to do experiments. And when we did the experiments, we found that the algorithms not only can get around obstacles by temporarily disordering the string further (a weird thing for a deterministic sorting algorithm to do!), but they do more of this when greater numbers of defective cells are introduced (showing that it’s a contingent, contextual response to their unexpected situation, not just a random back-pedaling that just routinely happens no matter what).
Another thing that were were able to do, because in our system it is each cell that runs an algorithm, is study chimeric scenarios in which half the cells are running on algorithm, and half the cells are running another. This is actually a critical issue in biology because the field currently has no formalisms for predicting what happens to the anatomy when cells of different genetic make-up (and different policies for action that lead to each species’ target morphology) are combined into a single whole organism. For example, as illustrated below from this review, what head shape will be built in a flatworm that contains stem cells that normally build a flat-head planarian and ones that normally build a round head one? Will one of the shapes be dominant, or maybe an intermediate shape, or perhaps it will just keep morphing, since neither set of cells will ever get to the stop criterion (a completed target morphology of the relevant species)?
We created arrays of mixed up numbers, where half the numbers belonged to cells executing one algorithm, and half of them executed a different algorithm. The assignment of algotype (a word coined by Adam Goldstein, parallel to genotype and phenotype, indicating the overall behavioral tendencies resulting from a specific algorithm) to each numbered cell was totally random. Crucially, the algorithm didn’t have any explicit notion of this – the standard sorting algorithm doesn’t have any meta properties that allows it to know what kind of algorithm it is running or what its neighboring cells are running. Its algotype is purely something that is known to us, as 3rd person external observers of the process. But it guides the cells’ behavior and the decisions they make on when and where to move to in their quest to have properly sorted neighbors. The basic result is that chimeric strings sort just fine – the cells don’t all need to be using the same policies, for the collective to get to its endpoint in sequence space.
We then asked a weird question. What would the spatial distribution of algotypes within a given string look like, during the sorting process (its journey through sequence space)? This is akin to a traditional morphological analysis of an embryo during its journey through morphospace – an anatomical/histological scan of what kinds of cells are placed where. We thus defined a quantity called “clustering” which simply described how likely it is that your neighbor is the same algotype as you. So, we knew that at the beginning of the process, the clustering had to be 50% (because the assignment of algotype to position in the array was random). And we also knew that at the very end, when all the numbers have been put into their final correct positions, it would also have to be 50% because there was no relationship between algotype assignment and position in the final numerical order (we assigned it randomly). But what did it looks like between those two endpoints, while the algorithm was working hard to do its thing?
Amazingly, what they did during that time was cluster significantly. For any two combinations of algorithms, cells of the same algotype tended to hang out together, until the cruel hand of the sorting imperative pulled them apart again (as the numerical ordering must win in the end, since the algorithms are guaranteed to establish sort order eventually). Take a moment to take stock. These simple systems not only have the ability to solve their task despite a novel situation (barriers in their space), they use delayed gratification to do it, and they exhibit a specific behavior that maximizes a meta property (clustering of algotypes) that is implemented nowhere in the algorithms themselves. I was frankly shocked to see this, even though a gut feeling caused me to plan the experiment in the first place.
Here’s an example. The blue line shows the progress of the sorting. The faint pink line is a negative control to make sure our code isn’t doing something wonky – it wobbles around the usual 50% when we’re not actually combining 2 different algorithms. The bright red line is the aggregation index – the tendency of each kind of cell to cluster, while it can, with those of its own kind (and the haze around it represents standard deviation measurements of 100 experiments). You can see here it goes above 0.6 – highly statistically-significant effect.
This inherent tendency of cells to travel, even for a time, with behaviorally-defined kin is perhaps relevant to a couple of other concepts. First is the Hebbian idea of fire-together-wire-together: could it be a more general property of things to associate with those that behave like them? Second, one interesting thing about copies of you is that they are more predictable than random features of the environment. Chris Fields and I proposed a model of multicellularity based on this idea – cells keep their progeny close because it’s a kind of bulwark against the unpredictability of the outside world. Could this tendency to cluster with similar algotypes be due to a kind of Fristonian surprise minimization, in which Hebbian behavior arises due to an emergent drive to minimize uncertainty of your local microenvironment? This remains to be tested in this model, but it seems reasonable that agents with the same personalities (behavioral algorithms, or algotypes) would be comforting and predictable, to have as neighbors.
One last result to note. Given that these algorithms have a cryptic goal – to cluster with their own kind – how strong is it, really? In our case, we inevitably suppress their ability to pursue this unexpected situation by demanding (via the explicit algorithm) that the numbers get sorted – it is impossible, under the standard system, to do both – keep algotypes segregated and sort the numbers, because it’s 50% likely that the number any cell wants next to it happens to have the wrong algotype. This limits how much clustering they can do. What we then did to let them flex their inherent behaviors a bit more was simply allow duplicate numbers in the string. That way, for example, if you have a string of 555555, it can occur between the 4’s and the 6’s, satisfying the algorithm’s need to sort on numerical value, and also allowing as much clustering as it wants (because for example, the left half of the string of 5’s can all be of algorithm 1 type, while the right half can all be of algorithm 2 type – plenty of clustering with its own kind within each set of repeated digits). When we did that, the clustering did in fact rise, revealing that the explicit sorting criterion was indeed suppressing its innate desire to cluster.
There is something to be said here about goals and where they come from, in general. Where do animals’ goals come from? Evolution? Where do Xenobots’ and Anthrobots’ goals come from? They have never been specifically selected for, they’re brand new (and we do not yet know their goals – that is very actively being researched). Where do humans’ goals come from? Human children’s come from their parents (environment) and some built-in circuitry; how about the adults’ – both average, and genius-level adults – how do they get their goals? And where do goals arise – which animals (down to amoeba and bacteria and the networks within them) have them and how do they scale? I think these algorithms are teaching us something. They have derived goals – the goal of having a sorted string is given to them by us; they inherit them (i.e., second-hand goals) by virtue of the algorithm we designed. But they also seem to have intrinsic goals (clustering, and who knows what else that we didn’t think to look for) which come from … I’m not sure where. I have some ideas which I will write about soon. At the very least, if these simple things can have cryptic goals intrinsic to their behavior and not to the algorithm, just think what kind of latent dreams we could find in more complex systems (even algorithmic ones, before we even get to biologicals that might use quantum dynamics and other things not captured in common digital algorithms).
I wonder if we can think of the unexpected clustering behavior of these algorithms as a kind of “subconscious” influence over their behavior (which is otherwise controlled by the policies explicitly implemented by the algorithm). In this paper, we uncovered cryptic drives and motivations for behavior (algotype of your neighbor), which are not apparent to the agents (the algorithms give the cells no way to query the algotype of their neighbor or themselves). Is what we did here – determining the hidden causes for behavior – a kind of proto-psychoanalysis, in which at least the external observer gets to find out why the agent does what it does (even if they are too simple to take up that insight themselves, as we hope a human psychoanalytic subject would)? And what of the psychological stress (perhaps not visible in this simple system, or maybe we just don’t know how to measure it?) of having your explicit goals (numerical sorting) be in conflict with your implicit goals (clustering)? And by the way, inevitably win… I’m doing my best not to feel nano-bad about the existential futility of their plight.
Fortunately, I know two amazing people with whom to discuss this sort of thing, both having expertise in psychoanalysis and basal cognition: Mark Solms and Karl Friston. I’ve got talks scheduled with them about this in the next month, I’ll put up the video links once we have a chance to talk about this. Entirely possible that my thinking on all of this will change, after we do more experiments and try to define quantitative metrics of stress, implicit sensing of large-scale behavioral policies, and motivation in these systems.
To summarize: these basic algorithms have unexpected competencies to solve the problem we explicitly designed them for (in sorting space), and also apparently have behaviors (maximizing algotype clustering – a meta-property in morphogenetic space) that we had no idea about until we looked for it. My suspicions (which we are now testing) is that this may be fairly general, and that once we look, many (most? all?) algorithms will prove to have unexpected tendencies and capabilities. I think the continuity we see in development and evolution is far deeper than we realize. As the diverse intelligence field increasingly finds forms of learning, decision-making, goal-directed activity, and other emergent competencies in minimal unconventional substrates, some who want cognition to be a magically unique property of advanced brains will say “that’s not really what we meant by these terms”. Listen closely – can you hear the screech of the goal-posts being moved?
Featured image was made by DALL-E. Planarian schematic was made by Daniel Lobo.
Leave a Reply