A Parallel Genetic Algorithm for solving the Quadratic Graph-Matching Problem

Matthew C Scott



In this paper I present a summary of my current research in the development of a system which attempts to find the optimal matching of two general directed graphs. The problem is intended to model a parallel processing environment such that one graph represents the data-dependencies between a system of concurrent processes, and the other graph represents the parallel hardware topology upon which we wish to distribute these processes. Besides its obviouse practical applications, it also provides a basis for analysis of some theoretical aspects of Genetic Algorithms. In particular:


  1. Consideration of epistasis and preservation of dependency knowledge through the phenotype->genotype mapping. The graph-matching problem provides a perfect model for such analysis in that the dependencies we wish our GA's to discover are simply the weights of the edges between nodes of the graphs. In other models such dependencies are generally unknown. It seems that all such models, if partitionable into finite elements, would be reduced to such a graph anyway to define the inherent dependencies.


  1. Consideration of complexities of graph problems and the degree to which local search may be used in conjuction with a global GA. Here I am mostly concerned with the topology of the problem and the behaviour of the GA in the space. Deceptional graph-matching problems are developed, some customized simulated annealing paradigms are exploited and use of speciation and nearest-neighbor mating are examined. The local search employs a GA which partitions the graphs in order to reduce the space and avoid premature convergence.


Keywords: Asynchronous Parallel Genetic Algorithms Load Balancing Connection optimization K-way Graph Partitioning Autonymous Genetic Algorithms

( Outline) ( A) Introduction: ( I) Description of GAs.... ( II) Problem of Epistasis. ( a) Reordering operators: Condensed description of GARF ( b) Weighting of dependencies: Concise, intelligible description of partition of the dependency space according to degrees of density ( as describe in prior paper). ( c) Masking: Using gradient information in To: From: Subject: ( d) Decomposing problem for local search: Parametrize a multidimensional space into multiple, fairly independent functions. Attack each function with independent GA populations. Surviving strings from each population provide information that may be incorporated by higher level GAs. For example, the K-way partitioning problem returns sets of nodes with a greater than average degree of connectivity ( note that this equates to minimization of connectivity between sets). Apply a similar partitioning to the program topology. A higher level GA may then determine a matching between the set-graph of the hardware and that of the software. The resulting strings representing the matching of partition sets may then have the entire process applied to each pair of matched sets, and so forth in a recursive fashion until remaining sets are sigular processors with assigned programs.

( III) Relation to Neural Nets. ( a) Simulated annealing and the boltzman machine. ( b) Constrained mean field nets for Graph Matching [Simic91, Neural Computation 3,268-281] Objective function, effective cost.

( c) Genetic Neural Networks.

( B) Analysis: ( I) Graph analysis of related problems: ( a) TSP solutions K-opt local optimization. Edge representation vs Node representation ( b) Job Shop Scheduling ( JSS). ( 1) Show how a JSS may be reduced to a tiling or matching problem. ( 2) Show how directed graph matching relates to JSS if one breaks the software dependency net into Hamilton paths

( c) K-way graph partitioning GAs and algorithms. Neighbors ( hardware) only reproduction ( d) Graph coloring Allowing temporary invalid colorings to smooth the search space. The autonymous GA system does this implicitly.

( II) Degrees of Complexity, GA-Hardness. Discuss proportionality of local search and Blind-GA global search. For the follow tasks ( in consideration of the problem space): What factors are amenable to local search? Given a graph with an average K-degree, ie ( n-1)Prob, what should the population size be, what rate of convergence, mutation.

Testbed: ( a) Find a match of two simple isomorphic graphs. ( b) Randomly weight the edges ( still isomorphic, but reduces possibility of many symmetric matches).

( c) Let the graphs be non-isomorphic.

( d) Allow graphs to have non-equal number of nodes ( allow one-to-many pairings).

( e) Incorporate data dependencies between programs ( directed graph).

( f) Let the GA system try to create one-way loops in the processor graph in order to reduce conflict and possibility of deadlock.

( g) Allow nodes to be different sizes. For example, let the processor nodes have sizes ( computational speed/ local memory) between 1 and 3 such that they may accomodate that many standard program nodes. The program nodes may also have variable sizes ( integer <= max processor size).

( h?) Let the hardware topology be unknown. This option, of course, does not provide the ability to do local search as in the multi-level system above, but ( I beleive) it is still possible if given autonomous GAs. ( expound upon dynamic autonmous GA system)

( C) Implementations: ( I) Multilevel GA system: As described above but with specific details.

( II) Large populations of autonymous GAs: Each GA species represents a program. Each member only has knowledge of addresses of a subset of the members of other species to which its own species/program must communicate. A node determines its relative fitness of position on a particular processor by sending messages to the nodes with which it ( being a representative of the real program) must communicate. By measuring times of reply the node may determine the 'direction' of attraction. That is, it would migrate to a neighbor processor such that one of the reply times would be reduced ( which? Hmmm). It may not actually know the direction of attraction but it can tell after migrating whether the situation has improved, and thus backtrack if necessary. Note that this sort of memory provides a simple gradient with which it may guide future movements. There should be a limit on the number of nodes allowed on a processor. Otherwise the whole system would eventually converge upon one. Let each processor keep a population count. A nodes fitness/comfort on a processor will have a factor inversely proportional to the population size on that processor. Thus, a remote, lone processor with one connection to the net may not provide a good mapping for any of the program nodes, but it will probably still be occupied as it has much processing power to lend. Likewise, a node will have a hard time immigrating to an overpopulated processor. If a processor has a large complement of a particular species, it is safe to assume that that is an optimal position for them. But, it would not hurt to take a number of the representative members and send them off into the wild unknown. They may find a position wherein they have a better fitness. But they will still be pulling on the same nodes of other species ( those to which their representative program must communicate) as the original cluster is. Thus their new position must be relatively good if they are to survive. There needs to be some kind of function by which a node gives up, dies and reincarnates itself somewhere else. Thus it seems that between competition periods there should be a reporting period in which each processor determines the number and average fitness of each species resident in it, and that it should report this to a central processor. The central processor would have a table for current and last generation in which it tabulates this data. Statistics may be computed and broadcast back to all the processors. Thus, a particular node, knowing its own history of fitness in a particular area, would be able to determine if it had stagnated in a less than optimal area. Besides migrating, local search may be executed by pairs of nodes by swapping pointers to nodes of other species. If one considers the attraction that node has towards the nodes of other species members to which it has pointers, then one can conceive that these attractions combine to form an acceleration which is a sum of the vector components. Swapping nodes changes the acceleration ( direction and degree of attraction). After an letting the system run for a while attractors will form. Using the techniques of simulated annealing, the energy of the system is gradually reduced by removing less fit nodes and reducing the average number of nodes that each processor may accomodate. Eventually we have a fairly optimal mapping. This mapping may be saved and the system may be run again to find another mapping. After N runs we either choose the best ( according to an effective energy/objective function measurement) or we may toss the saved optimal mappings back into the system and possibly get further improvements. [Include defense of definition of this system as being a GA]

( D) Results:

( E) Conclusions:

Review of the Quadractic Assignment Problem

The Quadratic Assignment Problem of order n is the search of a mapping of n activities to n locations. It was first described by (Koopmans and Beckman, 1957). Originally it was devised to solve the Job Shop Scheduling problem in which items at each station where transfered periodically to other stations, and the transfer had varying costs according to the item type. Thus, given a limited number of stations, assign the items to them in order to minimize the total average cost of tranfers.

D = [dih] = matrix of distances (location i to location h)....

F = [fjk] = matrix of flows (item j to item k) .....................

C = [cij] = matrix of assignment costs (item i to location j)

Now, simply find a permutation : i->(i) which is a particular assignment of item j = (i) to station i (i = 1,...,n), such that it minimizes the cost of the flow (D*F)*C. As the cost function C is easily applied, it is not presented in the equations.

min z = {i,h=1 -> n} dih*f(i)(h)

This is reformulated to a quadratic function by introducing a permutation matrix X of dimension n×n, such that each xij = 1 if item j is assigned to station i and 0 otherwise. Thus:

min z = {i,j=1 -> n} {h,k=1 -> n} dih* fjk* xij* xhk

where the xij are such that

{i=1 -> n}( xij)<=1 (j = 1, ..., n) : No more than 1 item per station

{j=1 -> n}( xij)=1 (i = 1, ..., n) : Every item is assigned

each xij an element of (0,1)

Figure 3. QAP, D,F,C example


This algorithm is feasible for up to about n=20. Typically a Branch-and-Bound method is employed to search the permutations on X, and it is known to be NP-complete by transformation to the Traveling Salescreature Problem. We will emply this algorithm below in the development of various autonomous agents.

Review of Gentic Algorithms for Combinatorial Optimization

The methods of evolutionary computation were first proposed by [J. H. Holand (1962)] and the first published reference to Genetic Algorithms is in [J. H. Holland (1973)]. Borrowing from the mechanics of natural selection and genetics, Genectic Algorithms execute objective function optimization through cross-over reproduction techniques and survival of the fitest selection algorithms. Basically the algorithm entails:

  1. devise a string representation of the function to be optimized (the genotype),
  2. create a population of these strings using an initially random distribution,
  3. evaluate the strings under the objective function (as a phenotype),
  4. choose a percentage of the most fit strings for propagation into the next generation,
  5. invoke cross-over reproduction between randomly selected pairs of the surviving strings until the population is restored, (add mutations will some small probability)
  6. repeat to step 3.

The primary advantages of GAs are their simplicity, robustness and ability to exploit implicit parallelism. And "Furthermore, they are not fundamentally limited by restrictive assumptions about the search space (assumptions concerning continuity, existence of derivatives, unimodality, and other matters)" [David E Goldberg, (1989)].

The phenomena of implicit parallelism comes from the fact that throughout the population of strings there are reacurring patterns (schema) that are each evaluated independently but whose combined probability of propagation lead to exponentially increasing samples of the observed best. One may envision a hidden network connecting every set of equivalent schema such that each schema has not only knowledge of its strength with respect to the other schemata on its string, but also a probabilistic knowledge of its performance with any other string whose configuration lies somewhere between it and its cousin strings in the search space (see figure below). This implicit parallelism allows on order of n3 schema evaluations even though there are only n string evaluations [David E Goldberg, (1989)].

Figure 4. Implicit Parallelism in a GA population.

One characteristic of GAs that they share with other combinatiorial search methods is the potential to become trapped in local minima. As usuall, a Monte-Carlo (simulated annealing) method is used to enable the GA to escape. This is generally done by introducing new randomly generated strings or by increasing the probability of mutations. Also, a reacurring question is how to best vary the proportion of exploration vs. exploitation. Exploitation in a GA is best acheived by having the very highest rated strings reproduce.

Reducing Network Latency: GAs, QAP and the GMP.

A logical way to reduce network latency between two communicating processes is simply to move them closer together. Thus the first project referenced "Parallel Distributed Genetic Algorithms for the Quadratic Graph Matching Problem", [Scott (1990)] involves the traditional problem of load balancing in an amorphous distributed processing environment, i.e., a graph matching problem. As this problem is transformable to the Job-Shop Scheduling Problem, it was logical to use the aforementioned quadratic function of the mapping of jobs to shops. Since this problem typically has many sub-optimal solutions, a simulated-annealing algorithm was naturally employed. And, naturally again, a Genetic-Algorithm search methodology provides a simple, robust and easiliy distributed means of competetive hill climbing. The other classical gradient scaling functions do not have the flexibility of information sharing between permutations of solutions. This system was completed using the Parallel C++ language of [Dennis Gannon (1990)]. The code was limited to a symmetric Alliant FX8 64 node machine and thus the graphs tested were not tied to any real-time dynamic system. It is expected that it will easily port to Java and thus execute on a truly heterogenous/amorphous dynamic network topology. Also, it is expected that intelligent "shepherd" agents will execute this graph mapping function on small (local) partitions of a distributed environment in the forthcoming development.

Figure 5. PN->PE QAP and Results

The intent of the devised GA Q-GMP (Quadratic Graph Mapping Problem) system was to address the problem of process migration in parallel distributed systems in order to maximize data throughput. But, several of the problems in such a scheme include (from [Craige Partridge, (1993)]):

  • a) How to migrate the control of a process,
  • b) How to migrate the binary information, instructions and data associated with the process,
  • c) How to support heterogeneity and
  • d) When and where to migrate.

Here we are not concerned with the details of the implementation (a-c), but rather just the theoretical component of combinatorial optimization (d).

First we define the graph of a distributed software system and a similar distributed parallel processing hardware system. Each processor element (pe) has a capacity k which we will illustrate as a relative size of the node's circle. Likewise, each pair of processors have a certain bandwidth (bw) between them which we will display as weighted undirected lines. The software system consists of software elements (se) and data weights (dw). Call the hardware graph H, and the software S. Then

H = {Vpe,Ebw} and S = {Vse,Edw}.

The representation of the GA string is just a list, each element representing the se that will be assigned to the pe of the same index.

GA1 = (pe4 pe3 pe1 pe2 pe5)

Note that if two se (sei,sej) that have an dwij>0 are assigned to a pair of pe (pei,pej) with bwij=0, then the quadratic evaluation will just give that string 0 additional fitness for the contribution of that matching. To prevent such a constraint violation from surving too long, the punishment may be increased over time. Also note that if there are more se than pe (likely with high granularity problems), then it makes sense to break the pe into many sub-pe, with all the sub-pe's being fully connected and having maximal bandwidth between them. The capacity ki of the pei will be split evenly among the sub-pe's. If the capacity of every pe is normalize to the range {1..10}, and they are all broken into 10 sub-pe's, then the se must be subdivided by the same normalization factor in order that one unit of se may be assigned to one unit of pe. With this methodology, the GA need only perform vanilla cross-over and then vector dot products on the matched edges in order to evaluate the strings.

Note that the QAP for load balancing is also the basis of Flow-Based Routing: a static algorithm that accounts for both topology and load. In this method a traffic matrix Fij, a line capacity matrix Cij and a topology matrix Tij are used. It determines mean packet delay from queueing theory. Its drawbacks are that it needs a piori knowledge of the subnet topology. Also the algorithm is not dynamic and requires an offline calculation due to O(n^2) complexity. But, it will be shown below that intelligent agents can execute this algorithm on small local topologies inducing global emergent optimization.


This is the max fit match on a particular node of the 64 running.  Since the best fit migrate, the eventual solution can not be worse than this.

Note the spike drop near the end run. As the system converges on a solution, bold mutations are tested for possible 'MDP' hops to better solution spaces.


Emile Aarts, Jan Korst: Simulated Annealing and Boltzmann Machines

Christos H. Papadimitriou, Kenneth Steiglitz: Combinatorial Optimization, algorithms and complexity.

Lawrence Davis: Genetic Algorithms and Simulated Annealing.

James A McHugh: Algorithmic Graph Theory.

Gregory Rawlins: Foundations of Genetic Algorithms.

G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, D. Walker: Solving Problems on Concurrent Processors.

Peter D Simic': Constrained Nets for Graph Matching and Other Quadratic Assignment Problems. Neural Computation, Volume 3, Number 2, pp 268-271.

Heinz Mhlenbein: Evolution in Time and Space - The Parallel Genetic Algorithm. -Distributed mating. Local hill climbing. The Parallel Genetic Algorithm as a Function Optimizer. -Superlinear Speedup, late local hill climbing, subpopulations

Weber, Depew, Smith: Entropy, Information, and Evolution.

Josef Hofbauer, Karl Sigmund: The Theory of Evolution and Dynamical Systems.

David E Goldberg: Genetic Algorithms.

Richard Dawkins: The Selfish Gene. The Blind Watchmaker.

David E Goldberg, Clayton L Bridges: An Analysis of a Reordering Operator on a GA-Hard Problem.

Dale Schuurmans, Jonathan Schaeffer: Representational Difficulties with Classifier Systems.

Sugato Bagchi, Serdar Uckun, Yutaka Miyabe, Kazuhiko Kawamura: Exploring Problem Specific Operators for Job Shop Scheduling

Gregor Von Laszewski: Intelligent Structural Operators for the K-way Graph Partitioning Problem.

D. Whitley, K. Mathias, P Fitshorn: Delta Coding: An Iterative Search Strategy for Genetic Algorithms.

Robert J. Collins, David R. Jefferson: Selection in Massively Parallel Genetic Algorithms. ( graph partitioning, local beat panmictic reproduction)

Jay N. Bhuyan: Genetic Algorithm for Clustering with an Ordered Representation.

Donald R. Jones and Mark A. Beltramo: Solving Partioning Problems with Genetic Algorithms.

Nashat Mansour, Geoffrey C. Fox: ***A Hybrid Genetic Algorithm for Task Allocation in Multicomputers.


Matthew Scott
Fri Oct 17 16:26:52 MST 1997